... giants''1.1
This famous quotation is derived from a letter from Newton to Hooke dated February the 5th 1676 [Turnbull et al., 1959].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... science''.1.2
Crick's statement, made in a discussion on the BBC, is quoted in [Garvey, 1979, p.ix], a book titled Communication: the essence of science.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... media.1.3
The history of the scientific journal is described in more detail in [Meadows, 1974], [Meadows, 1998], [Bazerman, 1988], [Kircz, 1998] and references therein.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... longer.1.4
The growth of the scientific literature is shown, for example, in [Meadows, 1998, p.16]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 1826:1.5
In [Meadows, 1998], this result of a recent user-survey is mentioned on page 211, and Faraday is quoted on page 19.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... established.1.6
The Philosophical transactions were first published on 6th of March 1665, a few months after the very first scientific journal appeared: Le Journal des Sçavans, which was published in Paris on the 5th of January of that year.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... periodicals''.1.7
This quotation is given in [Meadows, 1974, p.72].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... archives.1.8
In physics, this cumulative effect is tempered by the fact that information is absorbed relatively quickly into the `common knowledge', so that the original article in which it was published does not have to be referred to for a very long time. According to [Meadows, 1998, p.222], half the literature cited in an article has been published maximally 4.6 years before the publication of the article.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... p.8].1.9
Another consequence is the library crisis, which is described, for example, in [Butler, 1999]. Because of the number of journals and of the increasing subscription prices, the university libraries can no longer afford to subscribe to all relevant journals. And the more subscriptions are cancelled, the more prices will go up, so that publishers and libraries are caught in a downward spiral.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... information.1.10
A medical metaphor often used to describe this phenomenon is that of an `information infarct': the circulation of information is obstructed, because the circulatory system is overloaded.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... services.1.11
The term `Internet' has been officially defined in the following resolution [FNC, 1995]:
``The Federal Networking Council (FNC) agrees that the following language reflects our definition of the term `Internet'.`Internet' refers to the global information system that -
(i) is logically linked together by a globally unique address space based on the Internet Protocol (IP) or its subsequent extensions/follow-ons;
(ii) is able to support communications using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons, and/or other IP-compatible protocols; and
(iii) provides, uses or makes accessible, either publicly or privately, high level services layered on the communications and related infrastructure described herein.''
A (rather technical) account of the history of the Internet is given in [Leiner et al., 1998] by a group of authors involved in the development of the Internet.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... application.1.12
Basically, sending a message from one computer to the other entails the following steps. Software at the sending computer breaks down the message into little `packets' of information and provides each with an address. The packets are delivered to that address via any suitable and available route. At the destination computer, software mirroring that at the sending computer reassembles the message.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... million.1.13
Internet statistics about the online community are provided at the web site of an on-line marketing company [Global Reach, 1999]; in the version of June 15, 1999, the total online population was estimated to be 204 million. The Internet Software Consortium (ISC) identified in their domain survey of July 1999 56,218,000 hosts, i.e. computers that act as sources of information [ISC, 1999].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... medicine.1.14
With respect to physics in particular: the Institute of Physics published in June 1999 33 electronic journals, the American Physical Society 8 and the American Institute of Physics 40, and Elsevier Science listed 169 electronic titles under the heading of physics, which include some journals in materials science and related domains. These numbers of electronic journals have been derived from the Web sites of the various distributors and publishers: [Swets & Zeitlinger, 1999], [IOP, 1999], [APS, 1999], [AIP, 1999] and [Elsevier, 1999].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... scientists.1.15
The archives started in 1991 as a documentbase for a small community of 200 researchers in theoretical high energy physics. In 1996, the archives were used by 35.000 scientists and contained over 75.000 e-prints [Ginsparg, 1996]. By July 1999, the archives have grown to over 100.000 e-prints and continue to grow by over 2000 e-prints a month. The user statistics are given at the Web site of the archives [Los Alamos, 1999]. The archives consist of 38 sections for particular subjects in the area of physics, as well as 37 in the related areas of mathematics and non-linear sciences. For computer science, the Computing Research Repository was created in 1998, in connection with the Los Alamos archives.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... `universe'2.1
That universe usually is the `real world', but it can also be a simplified `ideal universe' hypothesised for the sake of an argument, or a `fictional universe' Aspects of that universe can be `static' entities, such as physical objects and abstract entities, but also processes and lines of reasoning involving various relations between simple and complex concepts. These aspects of the universe are the most basic level we take into account. Loosely speaking, it could be seen as `that what the information is about'
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transmitted.2.2
Information also has the following characteristics. It is wrong if a particular aspect of the universe has not been represented properly at the conceptual level. It can `exist' without being communicated successfully. It is aimed at communication. In this respect, information differs from knowledge, which we consider primary as an internal representation reflecting a true justified belief about some aspect of the universe. New information allows the receiver to add to, confirm or modify his beliefs and thereby to increase his knowledge. In this thesis on scientific communication, we deal with information, rather than knowledge. In addition, information can be complex. We do not use the term `information' exclusively for simple, factual information; for that specific type of information we use the term  `data'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... `language'.2.3
The distinction between the conceptual level and the symbolic level in the communication process is also made in a cognitive model of the communication process described in [Conklin, 1987, p.24]:
[in writing, a] loosely structured network of internal ideas and external sources is first organised into an appropriate hierarchy [...], which is then `encoded' into a linear stream of words, sentences, etc. [...]

[reading is] taking the linear stream of text, comprehending it by structuring the concepts hierarchically and absorbing it into the long-term memory as a network.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... him.2.4
Whereas our notion of communication is compatible with the mathematical approach, our notion of information is not. In the mathematical approach, information is a property of a signal, providing a measure for the reduction of the uncertainty of the receiver. Thus, a transmitted signal does not carry any information if the receiver is already fully aware of it. In our approach, we connect the reduction of the uncertainty of a receiver to the adequacy of the different steps of the communication: if the goals of the interactants are not achieved, the communication has not been adequate. Thagard distinguishes two other basic approaches to the notion of information, which intuitively highlight its different aspects [Thagard, 1992]: the `ecological approach' and the `information-processing approach'. In the ecological approach, information is a property of situations: it already exists in the environment and the role of `cognitive agents' is to select and pick up the information that suits them. They must be able to pick up information about one situation from another situation, which leads to the question formulated in [Barwise and Seligman, 1997, p.25] as ``how is it that information about some components of a system carries information about other components of the system?''. We do not consider this philosophical question, as we concentrate on practical communication between people. Furthermore, we concentrate on scientific information, which is not so easy to `pick up' and can turn out to be incorrect. To emphasise that aspect, we consider information as a conceptual representation of an aspect of the universe, so that the sender has to play an active role representing a particular, complex situation. Thus, we follow the information-processing approach, rather than the ecological approach. In that approach, information is considered as an object, which can take the shape of a mental object that is operated on in cognitive activities, or a computational object in a computer algorithm.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... receiver.2.5
These models can be visualised with the `conduit metaphor' of sending information packages through a conduit. In [Gärdenfors, 1996], two metaphors are given for communication: the machine-oriented conduit metaphor and the more human-oriented `resonance metaphor', in which information emerges only when the interactants can `resonate' with the material. We shall use an (enhanced) conduit metaphor, because that metaphor allows us to visualise the different aspects and stages of communication.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... detail.2.6
In [Garvey, 1979], an influential model of the scientific communication system is given; a series of models incorporating information technology based on that model is given in [Crawford et al., 1996].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transmitted.2.7
In [Garvey et al., 1972], the information-exchange process associated with the creation of  journal articles is described in detail, including feedback impacting on the research itself, on its representation at the conceptual level and on its representation at the symbolic level.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... first.2.8
We shall get back to these different types of documents in section 2.1.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... librarians.2.9
We do not consider these tasks in detail. A study of the roles of publishers and libraries in the scholarly information process and the types of value they can add, including legal support, financial management and marketing activities, is presented in [Scovill, 1995].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... process.2.10
In [Belkin et al., 1993], this activity is considered in terms of Information Seeking Strategies, which have four dimensions or factors: 1) method of interaction (scanning-searching); 2) goal of interaction (learning-selecting); 3) mode of retrieval (recognition-specification); and 4) resource considered (information items-meta-information).

Scanning is mostly associated with retrieval by recognition and searching with retrieval by specification. Both seeking patterns are common in scientific communication. In a user-survey cited in [Meadows, 1998, p.212], it was found that ``[s]ome two-thirds of the information obtained via this usage of [refereed articles, colleagues, books, on-line databases and abstracts] was deliberately sought. The remaining third of the information was gained unexpectedly.''

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... relevant.2.11
In this thesis, we call `browsing' all methods of locating relevant information by navigation through sources, and `searching' all methods of locating relevant information by pin-pointing information by its characterisation. In [Ellis, 1989, p.178], six characteristics of the information seeking patterns of scientists are discussed: 1) Starting: activities characteristic of the initial search for information; 2) Chaining: following chains of citations or other forms of referential connection between material; 3) Browsing: semi-directed searching in an area of potential interest; 4) Differentiating: using differences in sources as filters on the nature and quality of the material examined; 5) Monitoring: maintaining awareness of developments in a field through the monitoring of particular sources; and 6) Extracting: systematically working through a particular source to locate material of interesting. Chaining, browsing, monitoring and extracting, as discussed by Ellis, involve locating relevant information by (more or less directed) navigation through selected sources. `Starting' can involve searching for a starting point for browsing. Differentiating is important in both searching and browsing. The scientists whose behaviour is discussed in [Ellis, 1989] are social scientists, but in [Ellis et al., 1993] it is shown that there are no overriding differences between social scientists and physicists in this respect.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... used.2.12
Searching in the contents of the article is free-text searching. The meta-information used in a search can consist of index terms associated to the document in activity (7).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... it.2.13
Here, we use the term `reading' as shorthand for `decoding a symbolic representation of the information (in terms of some natural, pictorial or other `language')'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... communication.2.14
Of course, we do not pretend to give an exhaustive account of the precise nature of science. As Meadows puts it: ``A picture, or more accurately a series of vignettes, of science built up from conjectures such as Kuhn's and Popper's provides a helpful framework for discussing research and communication in science. But no single picture provides a definitive description of how the system of science works'' [Meadows, 1998, p.53].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....2.15
From a sociological point of view, Merton has formulated four `institutional imperatives' for scientific research: 1) communism, i.e. findings should be common property, because science is a co-operative effort; 2) disinterestedness, i.e. the primary goal is the advancement of science, 3) universalism, scientific work should be judged based on impersonal, objective criteria; and 4) organised scepticism: everything should be open to critical scrutiny [Merton, 1973].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... community.2.16
The notion of a research programme, with a `hard core' that characterises the programme is discussed in [Lakatos, 1978].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... world.2.17
In philosophy and sociology of science, the idea that scientific terms refer to some reality existing independently of the observer is debated (see for example [Gross, 1990] and [Albert, 1978]). In science, it is usually assumed (even if `reality' cannot be observed without interference, according to quantum mechanics). The idea that science is concerned with the real world, and thereby with universal laws, implies that findings cannot be personal, and thereby it implies the norm of universalism. It also influences the style of scientific communication. Although scientific communication has different communicative functions (of informing receivers and of justifying standpoints about research), findings tend to be reported in an informative, `objective' style, rather than justified in an argumentative, `subjective' style.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... irrefutable.2.18
As Popper pointed out, a hypothesis or theory can never be proven to be absolutely true by any number of experiments confirming it, but it can be falsified. In `naive falsificationism', a theory is immediately falsified by any observational statement which conflicts with it, but there are more sophisticated ways of dealing with disagreement between theory and experiment (See [Lakatos, 1978]).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... point.2.19
On the one hand, we are less `strict' than philosophers who assume that scientists take the `hypothetico-deductive' approach: formulating a hypothesis about some aspects of the natural world, testing the hypothesis, and comparing the results of the test with the original hypothesis, so that the hypothesis can be falsified (e.g. Popper). In this thesis, we study the structure of actual scientific articles, and in experimental science this approach is not explicitly adhered to. On the other hand, we are more `strict' than sociologists of science such as Gross, who argues that scientific research is neither an inductive process, nor a deductive process, but that formal communication, in which a systematic pattern is made explicit, is ``an a posteriori rationalisation of the real process.'':
[r]eading experimental or descriptive papers in science, we invariably experience an inductive process, a series of laboratory or field events leading to a general statement about natural kinds; in theoretical papers we experience the opposite movement, a series of deductions whose conclusions invoke or imply confirming observations.  [Gross, 1990, p.85]

In this thesis, we indeed consider scientific research as an idealised process, rather than as the every-day work performed in a laboratory that is studied by sociologists of science, because we need a normative description in order to formulate requirements for adequate communication.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... IMRaD.2.20
IMRaD means Introduction, Methods, Results, and Discussion.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... medium).2.21
Thus, a document is not necessarily verbal. It is a message that can be encoded e.g. in printed form or in a computer file. For a general discussion as to what is a document, see for example [Buckland, 1997].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... channel.2.22
By direct communication we mean real-time interaction driven communication. In direct communication, the sender has to take into account the background knowledge and beliefs of the receiver, but he doesn't have to prepare his entire contribution in advance, because he can use feedback to adapt it in the course of the communication session. Examples of direct communication are a face-to-face conversation, a telephone conversation, and Internet `chatting'.

In weakly indirect communication, there is a time lag between the contributions of the interactants, so that the receiver cannot give immediate feedback. Therefore, the sender has to convey the complete message, anticipating the receiver's understanding and acceptance. For example, weakly indirect communication can take place by means of personal letters and by means of lectures in which the discussion is entirely postponed until the end. In strongly indirect communication, the interaction is even more restricted.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... peers.2.23
We focus on this type of article, because it plays a predominant role in present day scientific communication. The narrow definition excludes preprints, reviews and monographs. The difference between a preprint and an article in the narrow sense of the word is that a preprint has not been subjected to peer review. This feature has no direct relation to the structure of the presentation of the information, which we aim to analyse. The role of reviews and monographs and scientific communication differs from the role played by short articles on original research. Therefore, reviews and monographs may have a different format. In chapter 4, we shall present a modular model for articles in the narrow sense of the word. We take reviews and monographs into account by allowing for modules representing information with a wide range, in addition to the modules that are part of scientific articles. Many remarks we make on articles in the narrow sense also apply, less stringently, to other types of scientific documents.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... it.2.24
These functions are discussed in  [Kircz and Roosendaal, 1996]. We shall discuss them in section 2.2.2 in relation to the requirements of the interactants in the process of communication via scientific articles.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... issues.2.25
In this respect, our notion of a journal is broader than that used in [Crawford et al., 1996], where a communication model based on the article as the unit of distribution is called the `no-journal model'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... articles.2.26
For an empirically grounded interactants profile, a comprehensive user-survey is required.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... p.6].2.27
The notion of rationality, in the context of communication, is discussed in [Van Eemeren and Grootendorst, 1994 i]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... science.2.28
In principle, this definition excludes teaching and the popularisation of science, which are primarily aimed at the advancement of individual knowledge. However, there is no clear borderline separating these genres. Considering communication via published documents, we find communication via reviews and monographs in the grey area between scientific communication (via scientific articles) and instructive communication (via tutorials). The grey area between scientific communication (between scientists) and popularisation (by scientists or journalists for the benefit of the general public) includes, for example, the journal Scientific American, which is, according to the instructions for authors, aimed at ``intelligent members of the general public''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... theory'.2.29
Speech act theory describes the use of language as an action, namely the performance of `speech acts' . See [Austin, 1962]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... communication.2.30
The four felicity conditions were first introduced in [Searle, 1965] and they are summarised in [Van Eemeren et al., 1993] as follows:
(1) The propositional content condition. The utterance must express propositional content appropriate to its force. For example, promises must refer to future states, while reports of occurrences must not refer to future states. (2) The essential condition. Making the utterance `count' as an expression of a certain objective, within some set of social understandings. (3) The sincerity condition. The sender must actually believe, want, and intend anything represented as believed, wanted, or intended. (4) The preparatory condition. The sender must have adequate justification for undertaking to achieve the underlying objective and must believe that performing the speech act itself will help lead to the achievement of the objective.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... maxims2.31
According to these maxims, the contributions have to be: 1) true (the Quality Maxim), 2) as informative as is required for the goals of the interaction (Quantity Maxim), 3) relevant to the goals of the interaction (the Relation Maxim), and 4) clear (the Manner Maxim).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... conditions.2.32
Van Eemeren and Grootendorst have formulated these rules in the context of a normative theory for argumentation called `pragma-dialectics' [Van Eemeren and Grootendorst, 1984], [Van Eemeren en Grootendorst, 1992]. The pragma-dialectical model describes argumentation as a methodical exchange of speech acts in a critical discussion aimed at the resolution of a difference of opinion. The pragma-dialectical model is an ideal model that specifies rules for the successive stages of a critical discussion. It can be used for heuristic-analytical, critical-evaluative and instructive purposes.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... acts.2.33
The information is unnecessary if, for example, the receiver is already aware of it. It is pointless if it is clear in advance that the receiver will not be able to understand or accept the information anyway, e.g. because he lacks specialised background knowledge.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... interaction.2.34
For a discussion of the notion of relevance in discourse and argumentation analysis, see [Sperber and Wilson, 1986] and [Van Eemeren and Grootendorst, 1994 ii].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... requirements.2.35
See also [Line, 1992], in which the requirements of the authors, publishers, libraries, and consumers that the system of making scientific and technical articles available must meet are discussed. See also [Van Rooy, 1995], which is based on a study performed for a publisher and in which acquisition needs with respect to the information and the acquisition process, and dissemination needs are discussed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... receiver.2.36
We concentrate on academic research. The conditions of industrial and military research are different: information is not freely and universally available. Researchers at Research & Development departments in these sectors predominantly play the role of receiver.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... science.2.37
In a user-survey [Coles, 1993], most respondents indicated that their first motivation for publication was the dissemination of information. Many indicated, as a secondary motive, improved funding and career prospects.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (6).2.38
According to [Kircz and Roosendaal, 1996], scientific journals have four main functions: certification, registration, archiving and awareness.  
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... receiver.2.39
In [Kircz, 1991], three discrete types of receives are distinguished: uninformed, partially-informed and informed receivers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... informed.2.40
In a user-survey [Coles, 1993, p.110], it was found that the main reasons for the most recent use of the information sources/services were 1) writing a paper, 2) background reading for new projects, and 3) generally keeping up with the literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... information.2.41
Note that by `browser' we mean the browsing scientist, not the software assisting him in this activity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... result.2.42
`Recall' is the proportion of relevant documents retrieved and `precision' is the proportion of retrieved documents which are relevant, (see [Van Rijsbergen, 1979, p.10]).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... content.2.43
In [Kircz, 1991], this is called a `non-reader'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....2.44
This implies that an electronic publication does not have to be either originally generated or read in electronic form, and that the same article can be published both in electronic form and on paper. We concentrate on on-line publication, because communication via that channel exhibits more characteristics specific to electronic publication than publication by means of e.g. CD-ROM.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... journals.2.45
This is not the case for the electronic versions of prestigious paper journals [Speier et al., 1999].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (yet).2.46
The medium-independent aspects of the requirement of completeness, concerning the choices of the scientist representing and then presenting the information, are unaffected by the transition from paper to electronic publication of course.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... office.2.47
Thus, electronic publishing may lower the cost of the publication, but that is not necessarily the case. The financial aspects of electronic publication are discussed in, for example, [Scovill, 1995], [Coles, 1993].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... presentation.2.48
For an introduction to hypertext, see [Conklin, 1987]. The word hypertext was introduced in 1965 by Ted Nelson to describe a set of texts and images that are linked in a complex way. Conklin defined hypertext as a synonym of non-linear text. In his definition, a text with notes and references is also a hypertext. Following the present-day usage of the term, we denote by hypertext a non-linear text implemented in an electronic environment, with automated, clickable links.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... facilitated.2.49
On the larger scale, hypertext only makes sense if a large part or all of the information is available in this electronic form.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... purpose.3.1
We have to emphasise that the modules we define represent units of information in the article and not units of functions in the communication process (such as the steps in the process we described in section 2.1.1). Therefore, our notion of modularity differs from the one developed in Fodor [Fodor, 1983]. Their `cognitive modules' are functional modules, such as components of a perceptual system and a language production system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... concepts.3.2
Following Thagard, we mean by a `concept' a largely learned open mental entity. [Thagard, 1992] gives an overview of what can be meant by the notion `concept'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...${\rm ^,}$3.3
In our modular model, the distinction of modules is primarily based on their underlying concepts. In section 4.2.1, we shall see that modules are further distinguished by additional features, namely the `range' of the information and a specified set of bibliographic data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... node.3.4
In present-day electronic articles, presentation units are usually distinguished by the type of the representation of the information: the main text of articles, the figures and the tables are presented in separate hypernodes. Such hypernodes do not fit in our definition of a module. An example of the distinction of information units by the storage format is given in [Murray-Rust, 1997]. These `machine-oriented' scientific information components, such as files in a particular picture format or text-only files, are not modules according to our definition either.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... information.3.5
In section 2.2.2, the profile of more informed and less informed readers has been sketched in general. The specific target audience of a particular journal has to be determined by its editorial board.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....3.6
In computer science, modularity plays a role in the organisation of computer programmes. Its purpose is to allow for modules, containing related data and operations, that can be designed and revised independently. The replacement of such modules by improved versions does not necessitate a revision of the rest of the programme. In the traditional paradigm of `structural programming', modularisation primarily entails a meaningful grouping of subprogrammes. In the object-oriented approach, modularity is more pronounced. In the context of object-oriented design, [Booch, 1994, p.52] defines modularity as follows: ``Modularity is the property of a system that has been decomposed into a set of cohesive and loosely coupled modules. Thus the principles of abstraction, encapsulation and modularity are synergistic.'' Abstraction implies the identification of essential characteristics. Deciding upon the right set of abstractions for a given domain is the central problem in the object-oriented design. The abstractions can be organised in a hierarchy, which allows for generalisation and aggregation. The details of modules are hidden, or `encapsulated'; only the essential characteristics that are necessary for the interaction with other modules are visible from the outside, so that the module can function as a `black box'.

In our work, we also have to decide what are the essential characteristics by which different types of information should be distinguished. A hierarchy in these characteristics leads to the  composition of modules, which we shall discuss in section 3.1.3. Our modules carry visible labels that allow readers to locate them, and contain details that remain hidden until the module is consulted. The modules can also be created and used in different contexts. However, the interdependence of the information represented in related modules in a scientific article  can be quite strong, so that the modules are not always `loosely coupled'. If the function of a module is, for example, to inform readers of the apparatus used in a particular experiment, a new version of that module can describe that apparatus in a clearer way. However, this new version cannot entirely replace the old module, because others may have cited the old text. In addition, the new version has to describe exactly the same experiment, because the module is connected to a module about the results generated in that particular experiment. Rules can be formulated to specify how different types of modules, that are connected in different ways to other modules, can be designed and revised.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... `relata'3.7
The term `relata' refers to the objects that are related.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... to.3.8
The difference between these entities, information and the symbolic representation of information is explained in section 2.1.1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modules.3.9
The importance of the composition has already been suggested for hypertext nodes by Halasz in his influential paper on hypermedia systems [Halasz, 1988]. In [Smith and Smith, 1977], a similar notion is discussed from the angle of database research, in terms of abstractions: ``An abstraction of some system is a model of that system in which certain details are deliberately omitted. [...] The objective is to allow users to heed details of the system which are relevant to the application and to ignore other details. In some applications a system may have too many relevant details for a single abstraction to be intellectually manageable. Such manageability can be provided by decomposing the model into a hierarchy of abstractions.''[p.105].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... generalisation.3.10
Whereas Halasz focuses in [Halasz, 1988] on aggregation as a mechanism to create composites, in [Smith and Smith, 1977] two types of abstractions are defined: aggregation and generalisation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module'.3.11
In the earlier articles [Harmsze et al., 1996] and [Harmsze and Kircz, 1998], we did not distinguish between these different kinds of complex modules and we used the adjective `compound' (in a non-chemical sense) for all kinds of complex modules
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... information.3.12
The modules carry a single characterisation (as specified in definition 3.1.1) that is complex: the components of the characterisation reflect the different dimensions of the typology.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... conditions.3.13
These conditions are based on, albeit less stringent than, the conditions mentioned in [Bailey, 1994] for adequate typologies. Notably, our typology does not have to be closed: types and even dimensions can be added, as long as they are consistent with the basic typology.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... dimensions'.3.14
Concepts are represented as regions and particular instances as points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... characterisation.3.15
In the domain of Information Retrieval, n-dimensional vectors are typically used for the characterisation of documents. The multidimensional vector spaces in that domain are spanned by all key words or by all words in a document, such that n is the size of the vocabulary, which is a very large number [Van Rijsbergen, 1979]. These vector spaces are not designed to aid the intuition of the characterisation of documents, but the calculation in large document bases of e.g. the probability of their relevance. Our characterisation spaces are spanned only by the different types of characterisation, allowing for an intuitive picture of the characterisation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... range.3.16
It can even be characterised in an n-dimensional vector space spanned by all key words to reproduce the standard Information Retrieval characterisation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... summarised3.17
When modules with a different characterisation, in different specific characterisation spaces are involved, the dimensions they have in common are taken into account. For this purpose, subspaces can be summarised into a single dimension (the five-dimensional characterisation of the collision given above, for example, reducing to the one-dimensional characterisation).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... data.3.18
Concrete examples of labels expressing the complex characterisation of particular modules on the subject of experimental molecular dynamics can be found in appendix [*].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... components.3.19
The choice proves to be consistent if it leads to a modular model which can be used in practice to write articles with a modular structure that satisfies the general definitions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... approval.4.1
An artificial article cannot be used as a bench-mark, and we could not simultaneously (co-)author linear and modular versions of new ``real" articles in the domain of experimental molecular dynamics, because we are not researchers that field. Therefore we have recast existing ``real" articles in modular form.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... model.4.2
Of course, a model based solely on a standard, paper-based, format might be restricted to the characteristics of that format. Hence, we have taken into account the requirements for scientific articles in general, as well as the intrinsic new possibilities of the electronic medium, in the development of our modular model.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... research.4.3
We have defined the notion of an article as we use it in section 2.1.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... language.4.4
By a language we not only mean natural languages, but also mathematical formulae, figures, tables and other codes for the representation of scientific information.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 4.5
In chapter 2, we already used insights from the field of speech communication to formulate the requirements of the interactants in scientific communication.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... articles.4.6
This analysis is facilitated by the generous co-operation of the leader of the analysed research program and senior author of the articles, Prof. Los. He gave us direct access to more information than has been represented in the articles, by providing a physical and technical background, and information on the (scientific and other) considerations for the development of the research programme.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... chapter.4.7
Earlier versions have been reported in [Harmsze et al., 1996], [Van der Tol and Harmsze, 1997] and [Harmsze and Kircz, 1998].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.4.8
Remark that the conceptual function characterises the information by the role it plays in the article at hand. The same information can play a different role in another article. For example, the outcome of an article can be used in a subsequent article in the methods to solve a new problem.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... articles.4.9
Remark that these bibliographic data describe the modules at hand. The term `bibliographic data' does not refer to the list of references of the modules.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.10
In [Paice, 1991], the rhetorical structure of scientific articles is studied for the purpose of indexing and abstracting on the basis of indicators. Paice identified the following components: Problem (with the Background and a specification of the Problem), Response (including the approach or hypothesis the methodology, and analytical techniques), and Outcome (Results, Discussion, Conclusions, Practical implications, and Ideas for the future). Note that Paice aims to use the rhetorical structure to improve the retrieval of traditional articles, whereas we propose to make the structure more explicit in new, modular articles.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... detail.4.11
The sequence of the identification codes m2 to m6 follows the sequence of consecutive stages of the problem-solving process, which is also the preferred sequence to consult the article as a whole. The identification code m1 has been reserved for the module Meta-information introduced in the previous section. This module is included in the list of main modules, because the information it contains plays an important role in the article, although it does not really play a role in the research process. It is even heading the list, because a sequential consultation of the article as a whole begins with meta-information, such as the title, the author names and the abstract.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... m2b.4.12
In the first version of the model in [Harmsze et al., 1996], this module, then called `Goal', was organised in a constituent module m2a called `Problem' and a constituent module m2b called `Embedding'. The present terminology and organisation are closer to the problem-solution pattern. In [Harmsze and Kircz, 1998], we used the term `Situated problem' for the compound module, but the term `Positioning' (which has been suggested by Prof. Los) expresses the meaning of the module in a more intuitive way.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... obtained.4.13
Considering experimental results, they depend in the first place on the experimental methods used to generate them. For instance, for electron affinities different values may be obtained using different methods. Therefore, the methods employed to determine the electron affinities are indicated in the reference tables in e.g. the Handbook of Chemistry and Physics [Lide, 1992, p.10, 180].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... discussion.4.14
Data analysis plays an important role in experimental sciences. For example, in the textbook `Statistical and computational methods in data analysis' [Brandt, 1970], it is stated: ``As for laboratory work, we define an experiment to be a strict following of a prescribed procedure, as a consequence of which a quantity or a set of quantities is obtained which constitute the result. These quantities are continuous (temperature, length, current) or discrete (number of particles, birthday of a person, one of three possible colours) in nature. Now, no matter how accurately all conditions and prescriptions are maintained, the result of repetitions of an experiment will generally differ. This is caused either by the intrinsic statistical nature of the phenomenon under investigation or by the finite accuracy of measurement.''

Another example is Braddick's textbook on the physics of experimental methods [Braddick, 1963], which also contains a chapter `Errors and the treatment of experimental results', explaining how systematic errors can be taken into account and how random errors can be calculated. Braddick has also included a chapter about `The natural limits of measurement', in which he discusses noise caused by thermal agitation, the particular nature of matter and electricity and the `uncertainty' limitations of quantum mechanics, which limit the accuracy of physical measurements. Not only the accuracy of the results is thus restricted, but also the domain in which they are valid. This domain is also determined by both these natural causes and the limitations of the instrumentation.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.4.15
In a first version of the model given in [Harmsze et al., 1996], a module Discussion is defined, which included the information now part of the Interpretation, plus the discussion of the reliability of the results. This original module is closer to the Discussion section of linear articles. The discussion of the reliability of the results has now been transferred to the Results module in order to get more systematic definitions of the modules: the discussion of the reliability of an object is part of the same module as the report on that object (see section 4.2.6). In addition, the Interpretation module, which can be very complex as we shall see in chapter 5, has been somewhat streamlined by the transfer of the discussion of the reliability of the results.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Retrieval.4.16
A standard overview of the efforts and achievements in Information Retrieval is given in [Salton, 1983]. The Condorcet project [Van Bakel et al., 1998], for instance, is a domain-specific Information Retrieval project aiming firstly at the development of structured concept systems (ontologies) which can be used to define indexing terms, and secondly at the development of a method for semi-automatic assignment of such indexing concepts to machine-readable documents or document descriptions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... whole4.17
As we stated in section 3.1.3, an article is special case of a complex module.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...[Van Raan, 1988]).4.18
In a modular environment, citation studies can be specified by module. Two authors who use each other's work in their account of their experimental methods could, for example, be considered to be closer than people who only cite each other's work in a mesoscopic Situation module.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... summary.4.19
The author himself first decides whether a module summary is required; the referees then check if they agree with that decision.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... problem.4.20
An example of such a journal is `Measurement Science & Technology', previously called `Journal of Physics E: Scientific Instruments'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... structure.4.21
The communicative function will also be used explicitly to characterise links between modules, when the target module has a particular communicative function with respect to the source module.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... environment.4.22
The functions and components of an abstract of a modular article are discussed in detail in [Van der Tol, 1999].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modules4.23
As mentioned in section 3.1.2, when we speak of a link between modules, we imply that the link can also connect segments of modules, unless explicitly stated differently.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... source.4.24
The modular model allows for the assignment of weight to the links. In general, some types of relations can be stronger than others and, within a type of relations, some instances can be stronger than others. Nevertheless, in our practical rules given in appendix A we have endowed the links in the domain-specific modular structure only with types, not with weights. Following the link back and forth is equivalent to remaining at the original module, i.e. the reverse of a link is its inverse. In that case, a link and its `backtracking' reverse could simply be characterised by a single, oriented label, with two opposite sides. Such a definition of `monolithic' labels in our general modular model, however, may impede a possible later assignment of weight to links. In general, the reverse of the link does not necessarily have to be equal to its inverse, allowing for explicitly `lopsided' linking. For example, a particular link can be required to emphasise the fact that the target module provides arguments supporting a standpoint stated in the source module, without putting the same emphasis on the reverse of that link pointing back to the standpoint. Therefore, we explicitly define types and reverse types.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.4.25
The various relations are defined in sections 4.3.2 and 4.3.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... be:4.26
These different types of relata correspond to the different levels we distinguished in section 2.1.1: the basic level of the universe itself, and three representation levels: the conceptual level of the information, the symbolic level of, for example, the text and the technical level of e.g. the computer file. We do not identify `technical' relations between signals; this type of relations could be added to the modular model.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.27
An earlier version of this typology is introduced in [Harmsze and Kircz, 1998].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... article.4.28
Organisational links are also distinguished in the classical paper about hypertext [Conklin, 1987] and in [Baron et al., 1996]. The inclusive links distinguished in [De Rose, 1989] are organisational in nature as well.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modules.4.29
Except for `representational relations', which will be defined in the following.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... project.4.30
The modular model allows for a more or a less detailed labelling of links representing the proximity-based relation. For a particular domain, the distinction of internal and external relations may suffice, whereas for another domain more elaborate proximity-based relations could be defined. The proximity-based relation depends on the notion of the distance between modules, which can be provided by the geometry of the characterisation space, in particular the bibliographic component of that space: articles by collaborating or competing research groups, who refer to each other's work, can be considered to be closer than articles by unrelated groups, but not as close as articles issued from the same project.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... explicitly.4.31
In [Harmsze and Kircz, 1998] this notion of the difference between information with a larger and a narrower range is included in the content-oriented category. In the model in its present form, we have included this notion in the organisational category, because the range of the information leads to a practical distinction of modules, rather than a distinction of different central scientific concepts. Furthermore, the links representing organisational relations connect only complete modules, rather than segments of modules.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... relations4.32
In [Harmsze and Kircz, 1998], we called the same category `referential relations'. We have renamed them in order to distinguish our relations from the more general referential links used in [Conklin, 1987]. These relations correspond to the relational links in [De Rose, 1989], that also include annotational links. Our scientific discourse relations correspond to a subtype of these referential links. In De Rose's terminology, they are associative [De Rose, 1989], in Trigg's terms `normal' [Trigg, 1983] and in the terminology of [Baron et al., 1996] these relations are content-based.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... relata.4.33
   When relata are speech acts, the term `communicative function' refers to their `illocutionary' level , and the term `content' refers to their `propositional' level  [Austin, 1962],[Searle, 1969]. Here we use the terms `communicative function' and `content' not only for speech acts (i.e. for textual entities, such as a phrase or a more complex set of sentences), but also for non-textual relata, such as figures, tables and formulae. For example, a figure can serve as a clarification of the text, and the reader can `zoom in' on a figure to obtain more details. An interesting set of relations between text elements is defined in the Rhetorical Structure Theory, in which the organisation of a text is described in terms of relations between its parts [Mann and Thomson, 1988]. In this set of relations, the distinction between these two levels is not taken into account systematically.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distinguished.4.34
The difference between explanatory and argumentative discourse is discussed in [Houtlosser, 1995] and [Snoeck Henkemans, 1999]. Explanation is aimed at increasing the reader's understanding of how a particular state of affairs has come into being. The explained statements (explananda) must refer to a factual state of affairs and the explaining statements (explanantes) must state a cause of this state of affairs. Argumentation is aimed at increasing the reader's acceptance of a standpoint. Unlike in the case of explanation, there are no restrictions on the propositional content of the statements serving as standpoint and argument.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... articles.4.35
A special case of reasoning is theorem-proving. We do not define this last case in greater detail, because the modular model concentrates on experimental sciences and formal proofs typically are not used in that domain.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distinction.4.36
Depending on the perspective and the purpose of the classification, argumentation can be classified in different ways [Van Eemeren et al., 1996]. In a scientific context, it may, for instance, be useful to distinguish between rigorous proof and justification. In the justification, the author can argue that his methods were correct and appropriate, by stating the grounds for the choices he made. In the discussion of the internal structure of modules in section 4.2.6, we distinguished different types of argumentation by the type of standpoint, concerning reliability and concerning relevance. The modular model allows for the distinction of subtypes of the argumentation relation reflecting these different types of argumentation, although we shall not do so for the domain of experimental molecular dynamics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... one).4.37
The aggregation relation and the generalisation relation refer to different aspects of the same two complex, central concepts: the generalisation emphasises the fact that both concepts deal with an experimental set-up, one specific and one more general. The aggregation indicates that between the same two central concepts there is a difference in aggregation level: one is complete and the other one a component.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... corpus5.1
The developments in the field of molecular dynamics leading to the research project at AMOLF are described in greater detail in the mesoscopic  Situation module MESO-m2a included in Appendix [*].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... collisons5.2
See the Theoretical methods module MESO-m3c-mod in Appendix [*] for a detailed account of that model.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... experiments.5.3
See, for example, [Manz and Wöste, 1995]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... R45.4
We have assigned codes of all publications in the corpus, indicating reviews with R and a number. Articles are referred to with A and thesis with T. The numbers are given in the bibliography provided in Appendix B. The identification codes are used and presented in greater detail in Appendix [*].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Netherlands.5.5
Private communication Prof.Dr J.Los.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... experiment.5.6
The experimental set-up for the molecular dynamics described in the corpus can be reconstructed and used in another laboratory, unlike, for example, the set-up for modern high energy physics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... readers.5.7
In the hypertext version of the appendix, the modularised articles are realised rudimentarily using the (insufficient) hypertext tools we have at our disposal, and in the printable version we provide them as a simulation of hypertext on paper.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Discussion.5.8
Please note that this does not imply that all information in these sections is recast in the corresponding module.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Interpretation.5.9
In the modularised version of A05 in Appendix C, we have indicated typographically which parts of the modularised article have a one-to-one correspondence with parts of the original version, which parts correspond to rephrased parts of the original article and which parts have been added by us, as well as what information is represented in more than one module. We have suppressed these typographical indications in the modularised version of A08 in order to improve the readability.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... difficult.5.10
See for example the discussion of the module Theoretical methods in section 5.3.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... version.5.11
We aim to get qualitative idea of the size of the different versions, rather than a full statistical analysis. As an indication of the size, we count the number of words of the text in the sections and in the modules. We take into account neither graphical representations, which in particular form a large part of the results modules, nor the meta-information. We do take into account the text associated to the links, because the links are an integral part of the modular version and their creation is part of the author's task.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.5.12
The relata of this dependency relation can also be the entities that the information is about.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.5.13
According to the definition of a module in section 3.1.1, an adequate module is self-contained, in the sense that an expert can consult it separately.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... view.5.14
This presentation of the details can be compared to footnotes. However, we consider it to be part of the module itself, rather than a separate unit that has to be linked to the main text. Therefore, we have not defined an explicit type to express the relation between the main line and these details, contrary to, for example, [De Rose, 1989] and [Weber, 1995], who distinguish `annotational links'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....5.15
We have estimated the overlap by counting the number of words in the textual representation of the information in one module and calculating the proportion of the text that it shares with the various other modules. We are interested in the information overlap between modules, rather than the overlap in literal text. However, information as such is difficult to quantify. Therefore, we count the number of shared words, but we consider different representations, e.g. different phrasings, of similar information to be similar.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... chosen.5.16
Some possibilities and pitfalls are given in [Weber, 1995]. We discuss the implementation in more detail in section 6.2.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... hand.5.17
The methods that are represented in the Methods module are methods that are used in response to a problem that is to be solved. If the goal of the research is to develop a new method, this method forms the outcome of the research, rather than part of the response. This `technical' research is of a slightly different type than experimental research, so that the modular structure should be adapted accordingly.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ones.5.18
On the level of a sub-problem-solution pattern, the results and the interpretation are not easily separable. For example, in the module Treated results the treatment and presentation techniques (as presented in MESO-m3c-treat) are used in response to the problem of the data analysis, and the outcome of that sub-problem-solving process are carefully analysed and clearly presented treated results. In that sub-problem-solving process, we cannot distinguish between results, interpretation and findings. However, that is not necessary, as the stages of this process are not explicitly labelled.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... consideration.5.19
In the articles we have analysed, the determination of the electron affinity was considered a secondary goal of the research.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... relations.5.20
In this table, we have not taken into account the links starting from the module Meta-information, or any of its constituents, as the source.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modules.5.21
Organisational relations do play a predominant role in the text of the module Meta-information or any of its constituents, in particular in the Abstract.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... every5.22
The exception is the module Positioning, in which the link to the Acknowledgements, created for the step back along the complete route, also expresses an administrative relation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... confirmed.6.1
According to [Buxton and Meadows, 1978, p.177], the proportion that the Discussion section takes up in in printed articles in social sciences (as opposed to the sections Introduction and Methods) differs from the proportion that its counterpart takes up in natural sciences. Buxton and Meadows explain this difference as follows. In natural sciences, the theoretical model used to interpret experimental results is usually part of the generally accepted paradigm. Therefore, it is clearer what measurements are relevant. However, once the measurements have been made, the authors have to clarify in the Discussion how the specific, new experimental results relate to the general, existing model. In hypothesis-testing research in social sciences, authors must formulate in the beginning a new model and an explicit hypothesis. Once they have formulated the hypothesis and performed the empirical work to test this hypothesis, they only have to point out in the Discussion whether the hypothesis was confirmed by the experimental results, so that this section can be short. Similarly, in a modular environment, the module Interpretation  for this hypothesis-testing research may be shorter and less complex than the Interpretation modules that we have created in our work. This need not be the case for explorative research in the social sciences and humanities.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... information.6.2
For the textual representation of information, many standards exist. In an electronic environment, non-textual representations will play a more important role. Therefore standards must be developed for such representations as well.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... fashion.6.3
Usually, modules will coincide with storage and presentation units, i.e. with files and with hypertext nodes. However, we emphasise that this is not necessarily the case. If a particular module is too large and cumbersome to be rapidly downloaded in its entirety, it can be stored in more than one file.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... rules.6.4
The DTD defines the elements that the article can consist of (e.g. different kinds of complex modules and elementary modules, module labels, overlapping text, figures, link labels) and rules for the relations between elements (e.g. the fact that a complex module has to be connected -directly or indirectly- to at least one elementary module, by means of a link expressing a hierarchical relation).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... module.6.5
Using the XML Linking Language [Maler and DeRose, in progress], it will be possible to specify bi-directional links, and to create link databases that allow for filtering, sorting, analysing, and processing link collections.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... automatically7.1
In section 6.2.2 requirements are given for such authoring tools.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...chemistry9.1
If the article had reported and discussed biological or medicinal results, the section Results and discussion would have included subsections about those types of results as well.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... isoform9.2
CPT-1 exists in at least two isoforms with different physical and kinetic properties. Liver and skeletal muscle each contain a different isoform, and the heart contains both of them. The compound that the authors have synthesised inhibits the liver isoform.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... introduction.9.3
This project is mentioned in a later version of this article [Van Eemeren et al., 1995], which has been published in a scholarly journal. In that version, the research is not reported explicitly in terms of the testing of hypotheses. Its main sections are titled 1.Introduction, 2.Theoretical background, 3.Research questions, 4.Design, 5.Results, 6.Discussion.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Frederique Harmsze
2000-01-04