Summary of the PhD thesis A modular structure for scientific articles in an electronic environment (to be published, 2000), Frédérique Harmsze

Summary

In this thesis we have developed a modular model for scientific articles. We propose to group the different kinds of information into different modules and to express the coherence of this information in the composition of the modules into complex modules themselves in explicitly characterised links that describe various relations between them.

Chapter 1 Introduction

In chapter 1, we have sketched the context of this thesis. Emerging electronic publication technology could make the overloaded process of scientific communication via articles more effective and efficient. The linear format of the traditional article is tailored to the paper medium but it is not necessarily equally suited to new media. We have aimed to define a structure for electronic scientific articles that fulfils the authors' and readers' needs and that takes full advantage of the specific characteristics of electronic media.

Chapter 2 Effective and efficient communication via scientific articles

In chapter 2, we have set out the main features of scientific communication by the use of articles and the requirements of the scientists involved in that process. Readers require a clear presentation of precisely that information that is relevant to their information needs and sufficiently reliable, without being encumbered by redundant information. The authors require the capability to present all their work in a convenient way and, since they wish to communicate their work to the readers, they also require the readers to be able to locate, retrieve and consult it.

The main feature of new electronic media concerning the structure of articles is the possibility of hypertext - storing and presenting information in a well-defined distributed way, enhanced by a linking system. The scientific discourse in a printed article is presented and stored as a linear essay, and can only be retrieved in its entirety. In practice, however, readers often consult only particular parts of the article. In an electronic environment, this reading strategy can be anticipated: an electronic article can represent a network of information units within the network of all scientific information available in published work.

This chapter culminates in a set of `communication criteria' that the structure of scientific electronic articles must meet in order to allow for effective and efficient communication.

Chapter 3 A modular presentation of information: general definitions

In chapter 3, we have introduced the notion of a modular structure as a pattern of modules and links between them. A module is defined as a uniquely characterised and self-contained representation of a conceptual information unit, aimed at communicating that information. The most basic component of the scientific article is the elementary module. From this elementary level, we allow syntheses into complex modules.

In the modular structure, the coherence of the information is not only expressed by the composition of modules but also by means of links. A link is defined as a uniquely characterised, explicit, directed connection, between entire modules or particular segments of modules, that represents one or more different kinds of relevant relations. In practice, a link is an explicitly labelled hyperlink.

Chapter 4 A modular model for experimental sciences

In chapter 4, we have presented our modular model for experimental science. To determine what can be considered as similar information that is to be represented in a single module and to determine how the resulting module is to be characterised, the general definition of a module has been complemented with a domain-specific typology for the information. Likewise the definition of a link has been complemented with a typology specifying the types of relations that can be expressed in a link.

The first main component of the model is the typology for the information. In the characterisation of the information we have chosen to take into account four different aspects: the conceptual function of the information, its domain-oriented content, its range, and its bibliographic characteristics.

Firstly, we have developed a characterisation of the information by its conceptual function, i.e. by the role that the information plays in the problem-solving process of the research. This forms the core of the model. The main modules of scientific articles represent information units distinguished by this characterisation. Following the conceptual function, we have defined the modules: Positioning (composed of Situation and Central problem), Methods (that can consist of Theoretical methods, Numerical methods and Experimental methods), Results (that can be composed of Raw Data and Treated results), Interpretation and Outcome (to be made up of Findings and Leads for further research). (This figure gives an overview of these modules)

Secondly, the modules distinguished by the conceptual function are further refined from the domain-oriented point of view. For example, different kinds of results for different types of chemical reactions are represented in different modules. In this case, the Treated results is a complex module containing constituent modules with the specific results. The characterisation of the information by its domain-oriented content, in terms of key words or other index terms, is standard practice; we have not developed new terms.

Thirdly, we introduce in our model a characterisation of the information by its range. On the basis of the range, we can distinguish between information that is unique to the article itself (microscopic), information that plays a role in the research project from which the article is issued (mesoscopic), and information that plays a role in the field in general (macroscopic).

Finally, the modules that represent information units distinguished by these three aspects are labelled by specified bibliographic data, such as the authors' names and the publication date. Since in the characterisation of the information by its domain-oriented content, the bibliographic data are already used in standard practice; we have not developed a new bibliographic approach.

In order to express all relevant bibliographic information we have also defined a module Meta-information in addition to the modules representing scientific information. This module serves as a linchpin holding the article together. Important components of this module are the Map of contents, which is an extension of the linear table of contents, and the Abstract.

The second main component of our modular model is a systematic typology for the links. The type of a link is determined by the different kinds of relations it expresses. We have found it both feasible and useful to distinguish two main classes of relations: organisational relations and scientific discourse relations. (See the figures providing an overview of the organisational and the scientific discourse relations we distinguish).

Links representing organisational relations, which can be identified between entire modules, organise the modules into an explicit network. A useful type of organisation relation yields an `essay-type sequential path', which guides the reader along the important modules in a way that mimics a linear essay. Links expressing `administrative relations' allow the reader to switch between the modules representing scientific information and modules with meta-information about the article, such as the Map of contents. We also define `hierarchical relations', `range-based relations', `proximity-based relations' and `representational relations'.

The class of scientific discourse relations is very broad. We distinguish two main subclasses of scientific discourse relations. The first is the subclass of `relations based on the communicative function' that can be identified between entire modules or parts of modules: `elucidation relations' and `argumentation relations'. The second is the subclass of `content relations', which can not only be identified between modules or parts of modules, but also between the information units underlying the modules and between the entities that these information units are about. For instance, specific content relations express the dependency of particular results on the methods used to generate them, while others express the agreement between particular results or the fact that an elaboration is given of a particular part of a module. We furthermore distinguish `synthesis relations' and `causality relations'.

To ensure that the model is grounded in scientific practice, we have developed the model in conjunction with an analysis of published articles. These articles concern experimental molecular dynamics, which can be considered as a prototypical experimental science. Our sample was a coherent corpus of high quality articles in this field. We have analysed original articles and recast them into modular form; in other words, we have `modularised' linear articles.

Modularising linear articles has enabled us to compare the modular versions to their original versions in the light of the authors' and readers' requirements. We have used our experiences in the modularisation process iteratively, as feedback, to improve the modular structure. Thus, the evaluation of modularised articles has allowed us to determine useful specific rules for the creation of adequate modular articles on experimental molecular dynamics. These are given in appendix A. The instructions are in principle intended for authors writing modular articles from scratch, although we have applied them in practice as guidelines for the modularisation of existing articles. Two modularised articles are given in appendix C as examples. Guided by the specific analysis, we have been able to formulate a -more general- modular module for experimental science. Other modular models for other domains and types of publications can be derived from this particular model.

Chapter 5 Evaluation of the modular model

In chapter 5, we have evaluated the feasibility and the usefulness of our modular model by discussing the modularisation process and the resulting modularised articles presented in appendix C.

Modular articles are not completely different from linear articles. The modules distinguished by the conceptual function of the information have been defined analogously to the sections in traditional linear scientific articles published on paper. Therefore, the basic structure of a modular article resembles the basic structure of a traditional article. Also, in modular electronic articles the same systems of domain-oriented index terms and bibliographical labels can be used as for linear, printed articles. However, the modules turn out to differ too much from their corresponding sections to make it practical or even possible to automatically transform linear articles into modular articles. The most efficient way of creating a modular article is to do so directly, instead of recasting a linear article in modular form. In order to be able to write a modular article in practice the author will need appropriate authoring tools.

The key difference between the modular structure and the traditional section structure is that the modular structure is more explicit, systematic and fine-grained. It is thus clearer and more flexible, allowing the reader to follow different paths through the information network and facilitating multiple usage by the author. Furthermore, in the modular case the multidimensional characterisation of the modules and of the links allows for more complex searches. The index terms associated with a particular module can be far more precise than the index terms that are associated to an entire article, and the traditional labels are complemented with range-based labels and labels associated with the conceptual function. In addition, the characterisation of the modules is complemented with an explicit characterisation of the links that can be taken into account in complex searches.

Modular articles are designed for selective reading rather than for reading an entire article linearly. The first consequence is that when a reader wants to consult all components of the article he has to navigate. However, the creation of the sequential path has made this navigation as easy as turning pages. The second consequence is that in order to make each module self-contained we had to introduce different kinds of overlap between the modules. The overlap could be disturbing for readers who consult more than one module, but this problem can be solved in the implementation by using the possibilities of electronic media.

Mesoscopic and macroscopic modules increase the efficiency for the author, since published modules can be integrated into new modular publications by means of explicitly characterised links. In this way, the description of an apparatus or of a theoretical model, the presentation of important results, and other representations of information that are suitable for multiple usage can be re-used in a convenient way that gives full credit to the original authors. These modules increase the efficiency for the reader as well: firstly, they are more complete than the individual original articles were and secondly, the reader can easily avoid them.

We have found that mesoscopic information can easily be represented in the mesoscopic modules Situation, Central problem (about the context and the central problem of the research project as a whole), Experimental methods and Theoretical methods. The most obvious candidates for macroscopic modules are Experimental methods and Theoretical methods (with established experimental set-ups and theoretical models and theories).

The part of the article that lends itself best to modularisation is the account of the experimental methods, since in the original articles the set-up was restricted to a particular section. In a modular environment, modules Experimental methods may be easily created by the author. The reader can consult or avoid them in an efficient way, depending on the question whether he needs experimental details.

The modules Positioning and Outcome, and their constituent modules, were less easy to recast. However, defining and determining rules for these modules was quite straightforward. It is probably easier to write these modules directly in a modular form. They are quite useful to the reader, the Situation being particularly useful for less informed readers. In particular if the interpretation of the results is complex, the module Findings plays an important role in the clarification of what has been achieved in the article, or in directly informing readers who only wish to consult the findings of the work. Defining and determining concrete rules for the module Results and its constituents was more difficult. Applying the rules to form modules, however, was not difficult and the resulting modules are likely to be consulted by many readers.

The part of the article that is most difficult to modularise concerns the interpretation of the results. This is probably the most difficult part to write in any representation of scientific research, since the interpretation involves a complex scientific discourse. Making explicit the structure of the discourse in the modular structure does clarify it, so that the modular version of the article turns out to be more readable than the original version.

The coherence of the information is adequately expressed by the composition of the modules and by the characterised links. In the modularised articles, several layers of complex modules were required to make the structure of the information explicit. The `module summaries' of their constituents and the relations between these constituents turn out to play an important role in clarifying the article. In the original version, it is only the abstract that provides an overview of the content of the article. In the modular version the module summaries give the reader additional assistance.

Representing organisational relations in characterised links turns out to be quite simple. With respect to scientific discourse relations, we found that most content relations can also be easily and usefully expressed in links. However, it is often quite difficult to decide precisely which relations based on the communicative function, which synthesis relations and which causal relations should be expressed. Expressing such relations directly in new modular article will probably be easier than making them explicit during the modularisation process.

Chapter 6 Conclusions

In chapter 6, we conclude that it is indeed possible to formulate a modular model for the structure of scientific articles that can allow for effective and efficient communication. The modular structure enhances the clarity of scientific articles and facilitates multiple use of modules. It also allows the reader to selectively locate and consult individual modules, as well as sets of related modules. Since we have developed multi-dimensional typologies for both the modules and the links, readers can locate a module representing relevant information by means of a well-considered browsing path or by means of a complex search in which they can take into account not only the different aspects by which the information is characterised, but also the module's embedding in its context.

We have developed our modular model for articles in experimental science. By adapting and replacing components of the typologies for modules and link, modular models can be derived from our model for other domains. We illustrate the applicability of our model by considering examples of other types of publication in the light of the modular structure.

In order to determine whether modular articles can improve scientific communication in practice, the model must be implemented and subjected to a user-survey. We have formulated a set of requirements for this implementation to satisfy: the authors and readers need appropriate tools to take advantage of the benefits of the modular structure.





Back to the Communication in Physics Project home page with frames
or without frames


The URL of this page is: http://www.wins.uva.nl/projects/commphys/papers/fhsummary.html
Last modifications on: 1-12 1999