Monday, 6th October 1997
Key Note In his invited lecture Harry Sneed gave an overview of the two problems that the European data processing community is currently dealing with: the year 2000 problem and the introduction of the Euro as a new and uniform currency for the European Economic Community. 

He explained the different ways of impact that both changes have on existing software systems. While the year 2000 problem, which can be classified as adaptive maintenance, affects the software mainly at a very low level, the Euro currency change affects the software at a much higher semantic level. Since the Euro currency change affects both the data model and the program design it has to be classified as enhancement maintenance. The speaker emphasized the great demand from the European data processing community for reverse engineering, reengineering, and regression test tools and technology. 

The panel discussion afterwards concentrated on the impact of the currency problem for software systems and focused on general ideas about Y2K related reverse engineering difficulties.

Tool Evaluation 

chair: Stephen Fitzpatrick

The first paper in this session presented by Berndt Balley reported on a comparison of four different Reverse Engineering Tools (Refine/C, Imagix 4D, Rigi, Sniff+). The main goal of this study has been to investigate the differences between the tools with respect to the generation of graphical reports (call trees, data and control graphs). All tools have been applied to analyze a real-world real-time embedded system (Train Control System) written in C and assembler. A detailed tabular comparison with approximately 80 assesment criteria is given. The authors conclude that it is difficult to analyze embedded systems with the current crop of reverse engineering tools and that especially the graphical views and layouts need considerable improvement and that mixed-language support is a necessity for real-world applications. 

The second presentation, given by Peggy Storey also dealed with a comparison of three different tools (Rigi, SHriMP, SNiFF+). However, in this case the point of perspective is to study the influence of reverse engineering tools on the way that programmers understand programs. The actual experiment described is a "think-aloud" session in which programmers have to answer several questions about a Monopoly program with the aid of one of the tools. The authors conclude that in general the tools enhance the preffered comprehension strategies of the participants. However, in some situations the users had to change their strategies due to the lack of certain features (e.g. effective source code searching) in some tools.

Object and Module Identification 

chair: Mark van den Brand

This session was opened by Hans Bosma who spoke about "Scenarios for the Identification of Objects in Legacy Systems". He motivated the advantages of using object oriented technologies in information systems but argued that a rigorous migration to object orientation is in most cases not a serious option because of issues concerning risk, time and resources. Therefore, he proposed an incremental approach to identify objects in legacy applications when renovating software. Three scenarios are distinguished for the replacement of parts of an application by objects. These scenarios are called function driven, data driven and object driven objectification, respectively. 

The paper "Using Clustering Algorithms in Legacy Systems Remodularization" was presented by Patrick van Bommel due to the untimely death of the author Theo Wiggerts. In this talk a general overview of clustering was presented. Clustering is considered a good starting point for remodularization. The author concluded that some existing modularization techniques use or reinvent cluster algorithms. Hence, the author expects that modularization of software systems could benefit more from the general idea of cluster analysis. 

In his talk, Kostas Kontogiannis argued that cloning of code fragments may result in redundant code, higher maintenance costs and less modular systems. For the detection of such similar code fragments he examined and evaluated five data and control flow related metrics. These metrics can be computed compositionally by using Abstract Syntax Tree techniques. The matching technique discussed in the paper is based on the assumption that if two fragments are clones, then they share a number of structural and data flow characteristics that can be effectively classified by these metrics.

Industry-wide Projects Elliot Chikofsky gave an introductory talk about the Reverse Engineering Demonstration Project. This project is a cooperative study among commercial and non-commercial research groups and product organizations in the reverse engineering and reengineering fields and its goal is to demonstrate the results and value of the different methods and tools in analyzing or converting a common software example.
The goal of the project is to analyze the software of the WELTLAB II Election Tabulation System. This system was originally created in Fortran in the late 1970s. Its evolution has resulted in a system of C programs and command/data files.
The results of the project will be presented at the 6th Reengineering Forum. At the time of WCRE'97 the project has not been as successful as was expected. Factors that possibly had a negative influence include: the lack of time, the language of the system (why not COBOL?), the size of the example (some people considered the example code too small to give a realistic demonstration of their tools), the fact that participants have to pay and conditions that were not acceptable by all potential participants.

Architectural Understanding 

chair: Alex Quilici

Understanding a program's architecture (its major components and their interactions), is one of the most essential aspect of program understanding according to the authors of the paper "Using Visualization for Architectural Localization and Extraction". The paper describes a program understanding technique that combines static and dynamic analyses to extract components and connectors. This process is supported by the tool ISVis (Interaction Scenario Visualizer). Part of the paper describes the successful application of ISVis to a real-world problem of extending the Mosaic web browser. 

In the paper of J.F. Girard, R. Koschke and G. Schied the notion of atomic components (i.e., the smallest components which are useful to build a significant architectural overview of the system) is introduced. The main question in the paper is how to detect such atomic components when facing a language which does not have a mechanism to express them directly or when programmers do break the encapsulation directly in order to access the internal data directly. The paper describes five published techniques which extract atomic components from source code. The results from each approach are compared with the components selected by hand by a group of software engineers. 

The paper of V. Tzerpos and R. C. Holt deals with the question what happens to the subsystem hierarchy of a software system as it evolves. The authors identify two subproblems, namely that of assigning a newly introduced resource to a subsystem (incremental clustering), and that of accommodating structural changes (corrective clustering). Both subproblems are referred to as "the Orphan Adoption problem". The authors propose an algorithm to solve this problem and describe case studies that suggest that the algorithm can deal with it effectively. More case studies have to be conducted in order to further validate the effectiveness of the algorithm.

Tuesday, 7th October 1997
Program Understanding and Scalability 

chair: Hausi Müller

Alysa Reeves gave a talk about her joint work with Judith Schlesinger, concerning Jackal. Jackal is a tool that can apply string and tree matching algorithms to find clichés in an abstract language. 

This talk was followed by a talk about KARE (Knowledge-based Assistant for Reverse Engineering), which is a system designed by Srinivas Palthepu, Jim Greer and Gordon McCalla. With KARE, clichés of C-source can be found, using the human-in-the-loop-principle, which means that a human must guide the system, so the intelligence of the human is combined with the brute-force search of the computer. 

The last talk of the first part of this session was given by Ira Baxter, who claimed with Michael Mehlich that reverse engineering is reverse forward engineering. Their main point was, that if reverse engineering is done without a study of forward engineering models, it can never lead to satisfactory solutions. A discussion concerning the efficiency of reverse engineering algorithms was going on: does it matter if a certain algorithm takes exponential time if one takes in mind that running a program for several months leading to a solution is better than having no solution at all? A remark was raised whether compiler based optimalizations can be used to reverse engineer register transfer languages. Another person remarked that abstract data types should be recovered from code to get rid of the eight bit assumptions. The question whether clichés can be highlighted in an editor was answered with "in the future". 

Alex Quilici presented his joint work with Steven Woods and Yongjun Zhang, concerning constraint-based plan matching. He showed many figures, which strongly suggested that his newest approach takes linear time to find a plan match. 

Arie van Deursen presented a different kind of problems concerning year 2000 and leap year problems in particular. He used techniques of Alex and Steven described in the previous talk to locate these kind of problems. In the discussion, it became apparent that plans are the same thing as clichés. A point was made that humans should control plan recognizing and that idiot programming constructs can never be detected automatically.

Tool Support 

chair: Cordell Green

James Cross presented a tool environment that supports Reverse Engineering in a multi-lingual environment. The emphasis of the GRASP (Graphical Representations of Algorithms, Structures and Processes) tool to this point has been on visualizing program structure by means of automatically generated CSDs (control structure diagrams). CSDs use a small set of abstract symbols for general programming language concepts with which the original source code is annotated graphically. Because CSDs do not depend on the concrete syntax of a language its notation is consistent across various programming languages, which is especially helpful in reverse engineering multi-lingual systems. GRASP also generates CPGs (Complexity Profile Graphs) which is the visualisation of a statement-level complexity metric. The user can navigate through hyperlinks that connect a CPG with the corresponding CSD. GRASP currently supports C, Ada and Java. For more information refer to the GRASP web site

Next, Alex Sellink and Chris Verhoef gave a real-time demonstration of the use of the ASF+SDF system with respect to reverse engineering. ASF+SDF is a modular algebraic specification formalism for the definition of both syntax (SDF) and semantics (ASF) of (programming) languages. It was shown how this system can be used to generate components for software renovation factories. Here, a software renovation factory stands for a three staged process in which source code is translated into an abstract syntax tree (AST), the AST is transformed according to some renovation strategy and finally the AST is translated into text again. 
It is possible to generate the front-end (parser) and back-end (unparser) of a software factory from the context-free grammar of the source and the target language used in the renovation process. With the ASF+SDF system it is also possible to generate generic AST-transformation components, which are in fact different recursive AST-traversals into which the user can plug in custom transformation functions. The demonstration showed these aspects applied to some toy "cats & dogs" languages. The paper also contains an example that shows the usage of these techniques for COBOL programs.

Poster Session 

chair: Alex Quilici

In the poster session, a lot of small demonstrations and talks were given. The chairman did not exactly succeed in having a strongly synchronized session, because people who were interested in a certain poster stayed longer to watch and listen. Although this lead to a little chaotic atmosphere, most people stated that this session was successful. 
Alex has gathered more technical information about the individual posters on a special page.

Domain-Oriented Reengineering 

chair: Kostas Kontogiannis

Melody Moore gave a presentation about the use of domain analysis in the setting of transformational reuse (migration). She showed how MORPH (Model Oriented Reengineering Process for Human-Computer Interface) is used to support the migration process of changing character-oriented user interfaces as found in many legacy systems into graphical user interfaces. Domain knowledge, here about user interface elements like selection lists, was gathered by manually analyzing 22 applications written in various languages and by interviews with domain experts. Next, this information was used to describe the domain knowledge in the CLASSIC knowledge representation language. To test whether a transformation based on this representations was feasible, two GUI toolkits (tcl/tk, Java AWT) were analyzed and captured in the CLASSIC language too. In the next step, the user interface elements that were found were categorized into a hierarchy. Once again it was checked that the two toolkits could also be represented by this hierarchy, which turned out to be possible apart from some small changes. A toolset implementing the MORPH technique is currently under development. Future extensions of the model will be the addition of widget toolkits such as MOTIF and MS-Windows to test the flexibility and coverage of the domain model.

The second presentation of this session was given by Jean-Marc DeBaud who talked about "Domain-Augmented ReEngineering" (DARE). At current, most reengineering approaches focus on syntactical transformations. DARE has been developed to go beyond this limitation by trying to incorporate domain knowledge into the reengineering process in a systematic way. To this end DARE defines a list of six steps that have to be executed. At first a domain model is constructed using conventional methods. Next, the domain information from the first step is used to find a projection into the new executable environment by choosing (new) libraries and tools. The third step is the construction of a dictionary of concept realizations using a cliche/plan specification methodology. The dictionary is used in step four to match the concepts against the actual implementation in order to 'understand' the legacy system. The fifth step takes care of the translation of the recognized realizations into the new executable environment to form a new system. The final (optional) step is to incorporate any corrective or evolutionary changes in the system.

Wednesday, 8th October 1997
Data Base Reverse Engineering 

chair: Peter Aiken

This session was chaired by Peter Aiken, who showed a funny Dilbert story on his Mac referring to possible signs of fatigue among the people in the audience as a result of Tuesday's conference dinner. 

Michael Blaha started with the serious stuff this morning and talked about the different dimensions of database reverse engineering. He stated that the state of the art is good, but that the practice lays far behind, because companies do not know how to judge their databases and software and do not have their own technical demands.  

Nicodeme Mfourga continued with his presentation titled `Extracting entity-relationship schemas from relation databases: a form-driven approach'. His main goal is to make implicit semantics explicitly by analyzing forms that handle the communication between a user and a (Relational)database Management System. 

The last talk of this session was given by Jean-Luc Hainaut, who showed how to extract SDL from a COBOL source. He also stated that data reverse engineering is difficult due to the difference in (programming/specification) languages and dialects, data managers, programming styles and cultures and backgrounds. Data reverse engineering is a sporadic activity, is complex, is non-creative and unattractive for industrial companies and no two data reverse engineering projects are the same. 

In the discussion it became apparent that the gap between an intended design and the implemented system is not always created by accident. Often, intended reports or features are added to speedup the performance. The general attitude of programmers is not to tell the users of extra features that are included during the implementation of the system. However, most participants of the discussion, stated that the users should be told about these extra features. Another discussion point was the difference between data reverse engineering and software reverse engineering. The attitudes of database designers and programmers is different and the target of data reverse engineering and software reverse engineering is different: data reverse engineering tries to find a Logical Entity Relation Model, where software reverse engineering has not such a commonly used `design language'. Despite these differences, it is possible, according to the data reverse engineering experts, to exchange techniques between data reverse engineering and other reverse engineering areas if these techniques are generalized.


chair: James Cross

Harry Sneed presented some results and experiences from an ongoing project in which an ATM booking application written in COBOL and Assembler is to be reengineered. The main objective in the early stages of this project has been to prepare the current software for wrapping. The main problem that was encountered was the fact that the system was implemented according to a functional model with one big central control module accepting booking transactions which are distributed to several functional modules that may again contain subfunctional modules. Due to this structure it is impossible to access the functional and data access modules directly from an outside 'wrapping' shell, because these modules depend on (global) data elements that are pushed down from the top of the hierarchy. Analysis showed that there were five different possible origins for a data element (global variable, local variable in calling module, data base field, system parameter, input arument in the original ATM-booking message). These scattered origins were then gathered into one data structure in order to create a system in which each module has only one parameter. These actions not only prepared the system to be accessed by outside shell programs but also led to an increased program comprehension.

The second talk in this session was given by Elizabeth Burd and dealt with the implications of non-functional requirements for the reengineering of legacy code. Examples for non-functional requirements are: performance, reliability, operational costs and error handling. In previous research that focused on identifying reusable units from legacy systems it was found that the decomposition of software systems into separate units is often hindered by the fact that functional units that each implement different functional requirements nevertheless tend to be tied together by the implementation of non-functional requirements. To be able to achieve a better separation of the functional units it is important to better understand the way in which the non-functional parts of a system tie together the functional ones.
To this end the IDENT method was extended by the possibilty to abstract from non-functional requirements. The modified method was then applied to 6 case studies involving C and COBOL programs with 10,000 to 50,000 lines of code. The results of these case studies were shown with respect to one of the possible non-functional requirements namely: error handling.

Enabling Technologies 

chair: Melody Moore

The first paper in this session was presented by Françoise Balmas. She discussed the process of re-documenting programs by means of outlines. Outlines are represented by frames that are semantically equivalent with the programming constructs they describe. The current focus of research is finding outlines for linear loops in LISP programs. To this end a model has been defined to categorize linear loops.
A linear loop is a loop that traverses a single linear collection, performs no physical modifications and swaps no variables. A loop is based on five computational categories called components: traversal, numerical accumulation, list accumulation, repeated update and action. Each component can be particularized through alterations: modification, resctriction and inverse moded. Components can be combined using two composition rules: juxtaposition and embedding. With this model it is possible to analyze and categorize loops.
The results from this research have been implemented in the PRISME system, which has some similarities with a plan recognition system but offers the possibility to deal with smaller units (components). PRISME is currently used for debugging and reverse specification of programs. Future plans aim at extending the types of loops that can be recognized (recursive or iterative, linear of tree-like), outlining other structures than loops, allow program construction through outlines and offering different levels of outlining (program abstraction).

Alain April closed the conference with a paper that showed how techniques that are emerging within the field of Reverse Engineering can be applied to other related domains. This presentation mainly dealt with Function Point Analysis (FPA), a method to determine the functional size of software. The rules that describe how Function Points (FP) have to be counted are documented in a series of CPM (Counting Practices Manual) publications by the IFPUG (International Function Point Users Group).
Currently most FP counting is performed manually. A study showed that, although most CASE vendors claim that their tools support FPA all of them need some external intervention at some point. To be able to achieve a better foundation for automatic FP counting the authors propose a transformation from the semi-formal definition of the IFPUG counting rules into a more rigid formal model. The authors gave a formal representation of a subset of the IFPUG rules and came up with a new analysis definition called definition propagation, which was the result of being able to evaluate a specific FP counting rule more precisely due to the formal model.
Future work aims at giving more precise and more complete formal definitions of the FP counting rules, selecting the relevant reverse engineering techniques to implement the FP counting rules and finally the construction of an automatic FP counting tool.

some pictures taken at WCRE'97
Royal Cafe de
Kroon (1) The Triple X Files Royal Cafe de
Kroon (2)
Click on one of the thumbnails to see a larger version!

Session Rapporteurs:

Eggie van Buiten
Merijn de Jonge
Gert Veltink