|Table of Contents|
|Table of Contents|
We have defined a module as a representation of an information unit that concentrates on a single concept, to allow for a presentation of information that meet the readers' requirement of a focused account of a specific subject. Furthermore, a module has a single unique characterisation, to allow for complete and precise retrieval. Therefore, the questions are 1) how to group the information into units that satisfy a precise information need, and 2) how to characterise the resulting modules. This raises the more fundamental question: What types of concepts are of interest to the intended readership? These questions have to be answered in the context of a particular domain and a particular genre or, in other words, for a particular intended readership. Hence, a domain and genre dependent classification is required for the creation and characterisation of modules.
Readers can consider information from different points of view, such as its subject and its function in the research process. In that case, it is useful to distinguish information units from those different points of view: an information unit of a certain type, which forms a single entity from one point of view, can be further subdivided into units of different types from another point of view. The resulting characterisation of the modules in explicit labels then has to enable readers to locate relevant modules through combined approaches in complex search operations.
Following Bailey's terminology [Bailey, 1994], a classification is considered a general process of grouping entities by similarity. The term typology' is used for a special case of classification, namely a multidimensional classification, in which the categories (i.e. the types) are distinguished from a conceptual rather than an empirical perspective. Thus, the general definition of a module has to be complemented with a multidimensional typology of information.3.12
To allow for a systematic creation and characterisation of modules, the typology has to satisfy the following conditions.3.13
Firstly, the typology should be exhaustive, in the sense that it allows for the identification and categorisation of all types of information that are relevant to scientific communication in the specified domain and genre.
Secondly, the different types of information must be clearly defined and, thirdly, homogeneous, so that all modules of the same type indeed represent similar information.
Fourthly, the typology has to be economical, in other words, as simple as possible. And fifthly, the types should be mutually exclusive: the types of modules, and hence the central concepts of the modules, may not overlap. We do not apply this condition so stringently that the modules themselves cannot overlap. Information of the same type can be included in different modules, because that turns out to be necessary for their self-containedness.
In order to visualise the complex characterisation of the information following a multidimensional typology, we consider what we call a `characterisation space' : a space spanned by the dimensions of the typology. Then, the information is characterised by its location or the region it occupies in the characterisation space. In social science, a similar notion is used: the dimensions of a multidimensional typology are said to form a `property space' [Barton, 1955].
We model our characterisation spaces on Gärdenfors's idea of a `conceptual space' , which we briefly summarise here. In [Gärdenfors, to be published], a framework is introduced for representing information at the conceptual level. Gärdenfors distinguishes three cognitive levels of representation: the most abstract level is the symbolic level, on which the information is represented in terms of symbols that can be manipulated without taking into account their meaning. The least abstract level is the biomechanical level of the subconceptual representation of the information in some configuration of connected neurons. Bridging these two levels is the level of the conceptual representation, in which the concepts are explicitly modelled. As we emphasised in chapter 2, we make a similar distinction between the conceptual and the symbolic level, as we use a conceptual notion of information units and of relations, defining modules and links as representations of these units and relations. Likewise, the characterisation of the information at the conceptual level can be represented in a label assigned to the modules at the symbolic level, in terms of key words, classification codes or other index terms.
|Figure 3.2: Conceptual space of the apple, with domains and quality dimensions. The colour of the particular apple represented here is ``apple green".|
In Gärdenfors's framework of conceptual spaces, concepts are represented in conceptual spaces spanned by `quality dimensions'.3.14 The quality dimensions represent aspects or `qualities' of the concept. The canonical example, which is illustrated by figure 3.2, is the representation of the concept `apple' in terms of a conceptual space that is defined by the six main aspects of the apple: its colour, shape, texture, taste, pomological characterisation and nutritional value. The concept apple is then represented by the following values: ``green-red-yellow", ``roundish", ``smooth", ``sweet-sour", ``[a particular seed structure]" and ``[a particular sugar and vitamin content]".
There exists a hierarchy in the aspects of a concept: the aspects themselves may in their turn have different subaspects. For example, the colour of the apple, can in its turn be represented in a three-dimensional `colour space' spanned by the hue, the brilliance and the saturation, as is illustrated in figure 3.2. Its taste can be represented in a `taste space' spanned by the four basic tastes: sweet, sour, salt, bitter. In this way, the concept `apple' is represented in full detail in a nested conceptual space that has far more dimensions than six. The concept can then be characterised by its co-ordinates, such as its precise hue and sweetness.
The conceptual spaces have a structure. For instance, the space of percepted colour can be modelled as a `spindle', in which the dimension of the hue is given by a circle, the dimension of the saturation by a line starting at grey and ranging to full saturation, and the brightness by a line ranging from black to white. The structure of the space, and thereby the representation of concepts, depends on its purpose. The salience of the colour versus the taste for instance, depends on the usage of the apple in a decorative basket or in a fruit salad. Thus, the structure of conceptual spaces allows for a sense of similarity that depends on the purpose of the representation of the concepts.
In the present work, we recognise what we call `characterisation spaces': multidimensional spaces spanned by `characterisation dimensions' representing the characterisation of the various aspects of the information at issue. Such a `characterisation space' is similar, but not exactly equal, to a `conceptual space'. The difference is that conceptual spaces are in principle used to represent elementary concepts and we use our characterisation spaces to characterise, from different points of view, information units that focus on a particular concept.
The structure of characterisation spaces also provides a sense of distance and thereby a notion of similarity between information units. The complete characterisation of a particular information unit is given by its location in the characterisation space. In other words, the information unit is represented by a set of n co-ordinates, or an n-dimensional vector, where n is the number of different aspects of the information that are used in the characterisation.3.15
As in the case of the domains and quality dimensions of conceptual spaces, we allow for a hierarchy in the level of detail. For instance, an information unit about atomic collisions can have a one-dimensional characterisation, which is expressed in an unstructured label, such as the general `atomic collisions', or the specific `differential cross sections of chemi-ionisation in sodium atom-iodine atom collisions between 7 and 10 eV'. The information can also be characterised in a five-dimensional space spanned by the measured quantity, the type of reaction that takes place during the collision, the two particles involved and the energy range.3.16 In the latter case, the resulting complete characterisation is structured: it consists of five labels that form a particular pattern (measured quantity: `differential cross section'; reaction: `chemi-ionisation'; projectile: `sodium atom'; target: `iodine atom'; energy: `7-10 eV'). The prominence of each dimension in the complete characterisation can vary, as in the case of the conceptual space. For example, the type of reaction is a more salient characteristic than the energy range.
|Figure 3.3: A complex search operation in a multidimensional characterisation space for a particular type of reaction, particular collision partners and a specified energy range: differential cross section AND iodine atoms AND (sodium atoms OR potassium atoms) AND 7-10 electronvolt.|
The representation in a characterisation space provides us with a complex characterisation of the information allowing for complex search operations, as is illustrated in figure 3.3. Information thus characterised can be located by specifying a conjunction of co-ordinates in separate characterisation dimensions (as in a Boolean AND in the symbolic representation), a disjunction of co-ordinate values in the same dimension or in different dimensions (as in a Boolean OR), or a combination of these. Depending on the geometry of the characterisation space, these values may be discrete (e.g. an author name) or continuous (a time interval).
The purpose of the characterisation determines what the characterisation space looks like: how many dimensions it has, which dimensions can be summarised3.17, which are salient, and what is the resulting sense of how similar particular concepts are. In the context of this study, the purpose of the characterisation is to allow members of a particular target audience to locate, in a collection of publications of a particular domain and genre, the information relevant to their needs at that time. Therefore, the specific characterisation space would be determined by the domain in science and the journal in which the articles are published.
In section 4.2, we introduce a general typology for the domain of experimental science, with four main kinds of characterisation, and we briefly sketch the characteristics of these four `characterisation dimensions' : 1) a characterisation by the conceptual function of the information, 2) a domain-oriented characterisation, 3) a characterisation by the range of the information, and 4) a characterisation by a specified set of bibliographic data.3.18
In order to realise the second component of the modular structure, we require a systematic classification of the links that can be created. The questions now are how to connect the modules in such a way that the reader's need for coherence is satisfied, and how to label the resulting links between the modules. The underlying problem is to determine what classes of relations are relevant in the context of scientific communication via publications in a specified domain and of a particular genre.
Each relation having only one aspect, the classification of the relevant relations does not have to be multidimensional. The type of a link, however, can determined by the different, complementary relations it expresses. In that case, the full characterisation of the link yields a complex label. In section 4.3, we present a typology for the links by the relations identified as relevant in communication by means of articles in the domain of experimental sciences.