|
|
|
| Monday, 6th October 1997 | ||
| Key Note | In his invited lecture Harry Sneed gave an overview of the two problems
that the European data processing community is currently dealing
with: the
year 2000 problem and the introduction of the Euro as a new and
uniform currency
for the European Economic Community.
He explained the different ways of impact that both changes have on existing software systems. While the year 2000 problem, which can be classified as adaptive maintenance, affects the software mainly at a very low level, the Euro currency change affects the software at a much higher semantic level. Since the Euro currency change affects both the data model and the program design it has to be classified as enhancement maintenance. The speaker emphasized the great demand from the European data processing community for reverse engineering, reengineering, and regression test tools and technology. The panel discussion afterwards concentrated on the impact of the currency problem for software systems and focused on general ideas about Y2K related reverse engineering difficulties. |
|
|
|
||
| Tool Evaluation
chair: Stephen Fitzpatrick |
The first paper in this session presented by Berndt Balley reported on
a comparison of four different Reverse Engineering Tools (Refine/C,
Imagix
4D, Rigi, Sniff+). The main goal of this study has been to
investigate the
differences between the tools with respect to the generation of graphical
reports (call trees, data and control graphs). All tools have been
applied
to analyze a real-world real-time embedded system (Train Control System)
written in C and assembler. A detailed tabular comparison with
approximately
80 assesment criteria is given. The authors conclude that it is difficult
to analyze embedded systems with the current crop of reverse engineering
tools and that especially the graphical views and layouts need
considerable
improvement and that mixed-language support is a necessity for real-world
applications.
The second presentation, given by Peggy Storey also dealed with a comparison of three different tools (Rigi, SHriMP, SNiFF+). However, in this case the point of perspective is to study the influence of reverse engineering tools on the way that programmers understand programs. The actual experiment described is a "think-aloud" session in which programmers have to answer several questions about a Monopoly program with the aid of one of the tools. The authors conclude that in general the tools enhance the preffered comprehension strategies of the participants. However, in some situations the users had to change their strategies due to the lack of certain features (e.g. effective source code searching) in some tools. |
|
|
|
||
| Object and Module Identification
chair: Mark van den Brand |
This session was opened by Hans Bosma who spoke about "Scenarios for
the Identification of Objects in Legacy Systems". He motivated the
advantages
of using object oriented technologies in information systems but
argued that
a rigorous migration to object orientation is in most cases not a serious
option because of issues concerning risk, time and resources. Therefore,
he proposed an incremental approach to identify objects in legacy
applications
when renovating software. Three scenarios are distinguished for the
replacement
of parts of an application by objects. These scenarios are called
function
driven, data driven and object driven objectification,
respectively.
The paper "Using Clustering Algorithms in Legacy Systems Remodularization" was presented by Patrick van Bommel due to the untimely death of the author Theo Wiggerts. In this talk a general overview of clustering was presented. Clustering is considered a good starting point for remodularization. The author concluded that some existing modularization techniques use or reinvent cluster algorithms. Hence, the author expects that modularization of software systems could benefit more from the general idea of cluster analysis. In his talk, Kostas Kontogiannis argued that cloning of code fragments may result in redundant code, higher maintenance costs and less modular systems. For the detection of such similar code fragments he examined and evaluated five data and control flow related metrics. These metrics can be computed compositionally by using Abstract Syntax Tree techniques. The matching technique discussed in the paper is based on the assumption that if two fragments are clones, then they share a number of structural and data flow characteristics that can be effectively classified by these metrics. |
|
|
|
||
| Industry-wide Projects | Elliot Chikofsky gave an introductory talk about the
Reverse Engineering
Demonstration
Project. This project is a cooperative study among commercial and
non-commercial research groups and product organizations in the reverse
engineering and reengineering fields and its goal is to demonstrate
the results
and value of the different methods and tools in analyzing or converting a
common software example. The goal of the project is to analyze the software of the WELTLAB II Election Tabulation System. This system was originally created in Fortran in the late 1970s. Its evolution has resulted in a system of C programs and command/data files. The results of the project will be presented at the 6th Reengineering Forum. At the time of WCRE'97 the project has not been as successful as was expected. Factors that possibly had a negative influence include: the lack of time, the language of the system (why not COBOL?), the size of the example (some people considered the example code too small to give a realistic demonstration of their tools), the fact that participants have to pay and conditions that were not acceptable by all potential participants. |
|
|
|
||
| Architectural Understanding
chair: Alex Quilici |
Understanding a program's architecture (its major components and their
interactions), is one of the most essential aspect of program
understanding
according to the authors of the paper "Using Visualization for
Architectural
Localization and Extraction". The paper describes a program understanding
technique that combines static and dynamic analyses to extract components
and connectors. This process is supported by the tool ISVis (Interaction
Scenario Visualizer). Part of the paper describes the successful
application
of ISVis to a real-world problem of extending the Mosaic web
browser.
In the paper of J.F. Girard, R. Koschke and G. Schied the notion of atomic components (i.e., the smallest components which are useful to build a significant architectural overview of the system) is introduced. The main question in the paper is how to detect such atomic components when facing a language which does not have a mechanism to express them directly or when programmers do break the encapsulation directly in order to access the internal data directly. The paper describes five published techniques which extract atomic components from source code. The results from each approach are compared with the components selected by hand by a group of software engineers. The paper of V. Tzerpos and R. C. Holt deals with the question what happens to the subsystem hierarchy of a software system as it evolves. The authors identify two subproblems, namely that of assigning a newly introduced resource to a subsystem (incremental clustering), and that of accommodating structural changes (corrective clustering). Both subproblems are referred to as "the Orphan Adoption problem". The authors propose an algorithm to solve this problem and describe case studies that suggest that the algorithm can deal with it effectively. More case studies have to be conducted in order to further validate the effectiveness of the algorithm. |
|
| Tuesday, 7th October 1997 | ||
| Program Understanding and
Scalability
chair: Hausi Müller |
Alysa Reeves gave a talk about her joint work with Judith Schlesinger,
concerning Jackal. Jackal is a tool that can apply string and tree
matching
algorithms to find clichés in an abstract language.
This talk was followed by a talk about KARE (Knowledge-based Assistant for Reverse Engineering), which is a system designed by Srinivas Palthepu, Jim Greer and Gordon McCalla. With KARE, clichés of C-source can be found, using the human-in-the-loop-principle, which means that a human must guide the system, so the intelligence of the human is combined with the brute-force search of the computer. The last talk of the first part of this session was given by Ira Baxter, who claimed with Michael Mehlich that reverse engineering is reverse forward engineering. Their main point was, that if reverse engineering is done without a study of forward engineering models, it can never lead to satisfactory solutions. A discussion concerning the efficiency of reverse engineering algorithms was going on: does it matter if a certain algorithm takes exponential time if one takes in mind that running a program for several months leading to a solution is better than having no solution at all? A remark was raised whether compiler based optimalizations can be used to reverse engineer register transfer languages. Another person remarked that abstract data types should be recovered from code to get rid of the eight bit assumptions. The question whether clichés can be highlighted in an editor was answered with "in the future". Alex Quilici presented his joint work with Steven Woods and Yongjun Zhang, concerning constraint-based plan matching. He showed many figures, which strongly suggested that his newest approach takes linear time to find a plan match. Arie van Deursen presented a different kind of problems concerning year 2000 and leap year problems in particular. He used techniques of Alex and Steven described in the previous talk to locate these kind of problems. In the discussion, it became apparent that plans are the same thing as clichés. A point was made that humans should control plan recognizing and that idiot programming constructs can never be detected automatically. |
|
|
|
||
| Tool Support
chair: Cordell Green |
James Cross presented a tool environment that supports Reverse
Engineering
in a multi-lingual environment. The emphasis of the GRASP (Graphical
Representations of Algorithms, Structures and Processes) tool to this
point
has been on visualizing program structure by means of automatically
generated
CSDs (control structure diagrams). CSDs use a small set of abstract
symbols
for general programming language concepts with which the original source
code is annotated graphically. Because CSDs do not depend on the concrete
syntax of a language its notation is consistent across various
programming
languages, which is especially helpful in reverse engineering
multi-lingual
systems. GRASP also generates CPGs (Complexity Profile Graphs) which
is the
visualisation of a statement-level complexity metric. The user can
navigate
through hyperlinks that connect a CPG with the corresponding CSD.
GRASP currently
supports C, Ada and Java. For more information refer to the
GRASP web site.
Next, Alex Sellink and Chris Verhoef gave a real-time demonstration of the use of the
ASF+SDF
system with respect to reverse engineering. ASF+SDF is a modular
algebraic
specification formalism for the definition of both syntax (SDF) and
semantics
(ASF) of (programming) languages. It was shown how this system can be
used
to generate components for software renovation factories. Here, a
software
renovation factory stands for a three staged process in which source code
is translated into an abstract syntax tree (AST), the AST is transformed
according to some renovation strategy and finally the AST is
translated into
text again. |
|
|
|
||
| Poster Session
chair: Alex Quilici |
In the poster session, a lot of small demonstrations and talks were
given.
The chairman did not exactly succeed in having a strongly
synchronized session,
because people who were interested in a certain poster stayed longer
to watch
and listen. Although this lead to a little chaotic atmosphere, most
people
stated that this session was successful. Alex has gathered more technical information about the individual posters on a special page. |
|
|
|
||
| Domain-Oriented
Reengineering
chair: Kostas Kontogiannis |
Melody Moore gave a presentation about the use of domain analysis in
the setting of transformational reuse (migration). She showed how
MORPH (Model
Oriented Reengineering Process for Human-Computer Interface) is used
to support
the migration process of changing character-oriented user interfaces
as found
in many legacy systems into graphical user interfaces. Domain knowledge,
here about user interface elements like selection lists, was gathered by
manually analyzing 22 applications written in various languages and
by interviews
with domain experts. Next, this information was used to describe the
domain
knowledge in the CLASSIC knowledge representation language. To test
whether
a transformation based on this representations was feasible, two GUI
toolkits
(tcl/tk, Java AWT) were analyzed and captured in the CLASSIC language
too.
In the next step, the user interface elements that were found were
categorized
into a hierarchy. Once again it was checked that the two toolkits
could also
be represented by this hierarchy, which turned out to be possible
apart from
some small changes. A toolset implementing the MORPH technique is
currently
under development. Future extensions of the model will be the addition of
widget toolkits such as MOTIF and MS-Windows to test the flexibility and
coverage of the domain model.
The second presentation of this session was given by Jean-Marc DeBaud who talked about "Domain-Augmented ReEngineering" (DARE). At current, most reengineering approaches focus on syntactical transformations. DARE has been developed to go beyond this limitation by trying to incorporate domain knowledge into the reengineering process in a systematic way. To this end DARE defines a list of six steps that have to be executed. At first a domain model is constructed using conventional methods. Next, the domain information from the first step is used to find a projection into the new executable environment by choosing (new) libraries and tools. The third step is the construction of a dictionary of concept realizations using a cliche/plan specification methodology. The dictionary is used in step four to match the concepts against the actual implementation in order to 'understand' the legacy system. The fifth step takes care of the translation of the recognized realizations into the new executable environment to form a new system. The final (optional) step is to incorporate any corrective or evolutionary changes in the system. |
|
| Wednesday, 8th October 1997 | ||
| Data Base Reverse Engineering
chair: Peter Aiken |
This session was chaired by Peter Aiken, who showed a funny Dilbert
story
on his Mac referring to possible signs of fatigue among the
people in
the audience as a result of Tuesday's conference dinner.
Michael Blaha started with the serious stuff this morning and talked about the different dimensions of database reverse engineering. He stated that the state of the art is good, but that the practice lays far behind, because companies do not know how to judge their databases and software and do not have their own technical demands. Nicodeme Mfourga continued with his presentation titled `Extracting entity-relationship schemas from relation databases: a form-driven approach'. His main goal is to make implicit semantics explicitly by analyzing forms that handle the communication between a user and a (Relational)database Management System. The last talk of this session was given by Jean-Luc Hainaut, who showed how to extract SDL from a COBOL source. He also stated that data reverse engineering is difficult due to the difference in (programming/specification) languages and dialects, data managers, programming styles and cultures and backgrounds. Data reverse engineering is a sporadic activity, is complex, is non-creative and unattractive for industrial companies and no two data reverse engineering projects are the same. In the discussion it became apparent that the gap between an intended design and the implemented system is not always created by accident. Often, intended reports or features are added to speedup the performance. The general attitude of programmers is not to tell the users of extra features that are included during the implementation of the system. However, most participants of the discussion, stated that the users should be told about these extra features. Another discussion point was the difference between data reverse engineering and software reverse engineering. The attitudes of database designers and programmers is different and the target of data reverse engineering and software reverse engineering is different: data reverse engineering tries to find a Logical Entity Relation Model, where software reverse engineering has not such a commonly used `design language'. Despite these differences, it is possible, according to the data reverse engineering experts, to exchange techniques between data reverse engineering and other reverse engineering areas if these techniques are generalized. |
|
|
|
||
| Reengineering
chair: James Cross |
Harry Sneed presented some results and experiences from an ongoing
project
in which an ATM booking application written in COBOL and Assembler is to
be reengineered. The main objective in the early stages of this
project has
been to prepare the current software for wrapping. The main problem that
was encountered was the fact that the system was implemented according to
a functional model with one big central control module accepting booking
transactions which are distributed to several functional modules that may
again contain subfunctional modules. Due to this structure it is
impossible
to access the functional and data access modules directly from an outside
'wrapping' shell, because these modules depend on (global) data elements
that are pushed down from the top of the hierarchy. Analysis showed that
there were five different possible origins for a data element (global
variable,
local variable in calling module, data base field, system parameter,
input
arument in the original ATM-booking message). These scattered origins
were
then gathered into one data structure in order to create a system in
which
each module has only one parameter. These actions not only prepared
the system
to be accessed by outside shell programs but also led to an increased
program
comprehension.
The second talk in this session was given by Elizabeth Burd and dealt
with
the implications of non-functional requirements for the reengineering of
legacy code. Examples for non-functional requirements are: performance,
reliability, operational costs and error handling. In previous
research that
focused on identifying reusable units from legacy systems it was
found that
the decomposition of software systems into separate units is often
hindered
by the fact that functional units that each implement different
functional
requirements nevertheless tend to be tied together by the implementation
of non-functional requirements. To be able to achieve a better separation
of the functional units it is important to better understand the way
in which
the non-functional parts of a system tie together the functional
ones. |
|
|
|
||
| Enabling Technologies
chair: Melody Moore |
The first paper in this session was presented by Françoise
Balmas.
She discussed the process of re-documenting programs by means of
outlines.
Outlines are represented by frames that are semantically equivalent with
the programming constructs they describe. The current focus of
research is
finding outlines for linear loops in LISP programs. To this end a
model has
been defined to categorize linear loops. A linear loop is a loop that traverses a single linear collection, performs no physical modifications and swaps no variables. A loop is based on five computational categories called components: traversal, numerical accumulation, list accumulation, repeated update and action. Each component can be particularized through alterations: modification, resctriction and inverse moded. Components can be combined using two composition rules: juxtaposition and embedding. With this model it is possible to analyze and categorize loops. The results from this research have been implemented in the PRISME system, which has some similarities with a plan recognition system but offers the possibility to deal with smaller units (components). PRISME is currently used for debugging and reverse specification of programs. Future plans aim at extending the types of loops that can be recognized (recursive or iterative, linear of tree-like), outlining other structures than loops, allow program construction through outlines and offering different levels of outlining (program abstraction).
Alain April closed the conference with a paper that showed how techniques
that are emerging within the field of Reverse Engineering can be applied
to other related domains. This presentation mainly dealt with
Function Point
Analysis (FPA), a method to determine the functional size of
software. The
rules that describe how Function Points (FP) have to be counted are
documented
in a series of CPM (Counting Practices Manual) publications by the IFPUG
(International Function Point Users Group). |
|
|
|
||
| Gallery some pictures taken at WCRE'97 |
|
|
|
|
||
| Session Rapporteurs:
Eggie van Buiten |
||