1.0 Problems Identified in VSAM based Legacy SystemsActually, most information systems (IS) have to deal with problems such as European currency or millennium adaptations. For the long-living legacy systems among them the necessary adaptations are even more difficult because they require changes in the data semantics and structure. The problems arising are based on the fact that applications of file based legacy systems do not only use but also interpret the data. Thus the data semantics is spread over the whole system so that an homogenous and parallel change of all applications and data structures concerned is required. First of all the deduction of an integrated model of the data semantics is needed in order to be able to plan a adaptation process.
2.0 Deducing Data Semantics by Reverse Modelling an Object Model Based on Data and Program InformationTo regain a complete and integrated model of the data semantics of an legacy IS it is necessary to use a combined approach of data and program reverse engineering techniques in order to achieve a complete and correct result. The example IS we examine bases on VSAM data files and uses the 3GL PL/I as application programming language. Our major goal is to improve the system structure by re-organizing the application source code using the object-oriented concepts like encapsulation and clearly defined method interfaces. Therefore we use the object-oriented modelling technique OMT as description basis and gather the results obtained in a reverse created object diagram. We use the methods listed below to examine the application programs and the data description sets of the IS. The methods are used explicitly to extract information that detect semantic units as object-like entities to build an object-oriented model of the data. The methods and the achievable results are listed below:
- Data Flow Analysis (Usage, Dependency, Redundancy) - Detection of Semantic Pertinence Sometimes variables are not used any longer in an IS but they are not removed from the source code and the data descriptions for multiple reasons. Also data stored redundantly or functionally dependent can be found in legacy systems. These variables are detected on the basis of the data flow analysis. Variables which are never used, stored redundantly or can be derived by computation within the applications can be ignored.
- Neighbourhood Analysis - Forming of Semantically Independent Classes Variables which are mostly used together often form a semantic unit.
- If/Case-Statement Analysis - Detection of Specializations and Discriminators Variables used in the expression of if or case statements indicate a specialisation of classes and the usage of the variable as discriminator. In sequential files, semantically different data sets are often stored together. A special variable is used as discriminator to enable the correct interpretation. So two or more specialization classes can be found according to the usage of the variables in the different control flow paths.
- Loop-Statement Analysis - Detection of Aggregations, Cardinality and Part Classes If a set of variables is a substructure in a data set description and is used in a loop body this indicates that the set of variables forms a part class of an aggregation and that an aggregation association between this part class and the aggregate class exists whereby the loop boundaries deliver the information about the aggregation cardinality.
3.0 ConclusionsThe deadlock situation in changing either data formats or programs in legacy system is a severe drawback in re-engineering these IS. By deducing the object model of a legacy system this problem is solved because a complete overview of the semantics buried in the stored data and in the interpretation sequences of the programs is provided. Consequently an adaptation path can be search and performed without endangering the existing data basis or the running applications by uncontrolled and unplanned changes of the underlying semantics.