A reverse engineering process for design level document production from ADA code

May 27, 2017 | Autor: Gerardo Canfora | Categoria: Information Systems, Computer Hardware, Reverse Engineering, Computer Software, Electrical And Electronic Engineering

Share Embed

Denunciar este link

Descrição do Produto

A reverse engineering process for design level document production from ADA code G Canfora, A Cimitile and U De Carlini describe the use of reverse engineering for producing design level documents by static analysis of ADA code

The paper describes a reverse engineering process for producing design level documents by static analysis of ADA code. The produced documents, which we call concurrent data flow diagrams, describe the task structure of a software system and the data flow between tasks, Firstly, concurrent data flow diagrams are defined and discussed and then the main characteristics and features of the reconstruction process are illustrated. The process has been used to support maintenance and reuse activities on existing real-time software and to check consistency between design and code. reverse engineering

design level d o c u m e n l s

A D A code

The availability of ever more powerful microprocessors at lower and lower costs has led to the widespread use of computers as embedded components in large real-time systems. Nowadays computers are extensively used in a wide range of applications from the simple management of machine tools or medical devices to the monitoring and control of much more complex systems, such as telecommunication networks and nuclear power plants, Software for real-time applications differs from traditional software in many respects. A first fundamental difference lies in the fact that software systems for realtime applications are generally more complex and much larger than commercial and business software systems, Despite their intrinsic complexity, embedded software systems have to be reliable as an error might cause DIS - Dipartimentodi Informatica e Sistemistica,Universityof Naples "Federico I1",Via Claudio 21, 80125 Naples,Italy Paper received: 12 January1992

disastrous economic effects or indeed the loss of human life. Consequently software validation is much more important for real-time systems than for commercial applications, and it is this factor that is largely responsible for the high costs of this type of software. A second feature of embedded software systems concerns their working life. As shown in Reference 1, embedded systems generally remain in use for very long periods of time, typically from 15 to 25 years. During this period of time the software's external environment may change significantly and, consequently, the specifications that an embedded software system must satisfy will also change. Moreover, the continual progress of technology may lead to frequent 'migrations' towards more powerful and more reliable computing systems. This entails adapting the software to the new hardware environment by redesigning or recoding all or part of the software system. The result of this is that the largest share of costs for real-time systems can be attributed to the maintenance, re-engineering and reuse of old software rather than the development of new systems. The correct planning and execution of the above mentioned validation, maintenance and reuse activities obviously require forms and documents that are able to represent a software system at higher levels of abstraction than code. Unfortunately, however, experience has shown that these documents, which should normally be generated during the system specification and design phases, are very often not available. Furthermore, even when these documents are available, their degree of consistency with code is typically very low and, in any case, difficult to verify 2. This explains why the activities of reverse engineering (RE),whose contribution to supporting comprehension, maintenance, testing and quality assurance is widely acknowledged as fundamental in the

0141-9331/91/100531-12 © 1991 Butterworth-Heinemann Ltd Vol 15 No 10 December 1991

531

traditional software sector 3-6, are gradually gaining more and more favour in the field of embedded applications, However, the use of RE in the real-time software sector is currently somewhat limited as the available methodologies, techniques and tools refer to traditional software and do not take into account the characteristics and specific needs of software for real-time embedded systems. A particularly significant aspect of these systems is the definition of RE processes for generating from code forms and documents that can represent not only the functional and structural characteristics of a system but also, and primarily, its organization in concurrent processes and the interactions existing between such processes, As real-time systems have to deal with a large number of stimuli coming from an intrinsically parallel world, they are typically arranged as a set of processes evolving in parallel, at least from a logical point of view. Then, following certain internal and/or external events these processes interact to exchange data or synchronization messages. The first problem to be solved in setting up RE activities for the production of embedded system design documents is therefore the identification, or ex novo definition, of forms and documents that can synthesize and make available all the information on the software system's organization in concurrent processes and how these interact. The fundamental requisite of these documents is that it must, of course, be possible to extract them straight from source code using static analysis. It would, furthermore, be desirable to have documents representing a software system at various levels of abstraction. It very often happens that documents produced by reverse engineering are specific and rich in detail because they are derived from code 7. The possibility to have representations at various levels of abstraction would thus allowthe software designer to select the most appropriate level for the specific operation to be carried out and therefore find the information he needs without getting lost in the large amount of details available, The definition of an RE process for embedded systems strictly depends on the system software language, Specifically, the language mechanisms and primitives for the creation and interaction of concurrent processes both affect the identification of documents to be reconstructed and determine the effort needed to generate them. Many of the difficulties encountered in setting up a code static analyser for RE processes are related to the fact that software for embedded systems is very often written using traditional languages extended with call constructs, These constructs allow access to a system kernel that provides all the primitives for concurrent operations. This, of course, complicates the identification of the processes and their interactions by means of source code analysis, This situation is rapidly improving thanks to the ever more widespread use of high-level languages capable of expressing parallelism, primarily ADA °. Designed as a language for embedded applications, ADA not only promotes new and improved methods for real-time software design 9, but also paves the way towards the definition and setting up of RE processes specifically for real-time embedded systems. ADA has concurrency and process communications as inherent features of the language and for which calls to the services of an external kernel are not needed. Furthermore, ADA uses a single mechanism for process synchronization and data exchange. These characteristics make it possible to search for documents for the high level representation of

532

embedded systems that can be reconstructed by reverse engineering from code. In this paper we present the main characteristics and features of an RE process for the reconstruction of high level representations of ADA-coded embedded systems. The process was set up as an instantiation of a general reference paradigm for RE activity development called Goals/Models/Tools 1°. The purpose of this process is (i) to rebuild the system structure identifying the system concurrent processes and their interactions and (ii) to produce documents able to represent this information at various levels of abstraction so as to guide and assist the software designer in software comprehension, maintenance, validation and reuse. Specifically, the next section briefly describes the Goals/Models/Tools paradigm while the rest of the paper illustrates the documents to be produced and the program representations to be extracted from code for document generation. The produced documents, which we will call concurrent data flow diagrams (CDFDs), are an extension and formalization of the well-known data flow diagrams 1~ ~and can represent a program at various levels of abstraction. Finally, the last section illustrates the concurrent data flow diagrams tool (CDT) which is a prototype for the automatic reconstruction of the CDFDs of an ADA system by code static analysis.

REVERSE ENGINEERING PROCESSES A N D THE G O A L S / M O D E L S / T O O L S PARADIGM Reverseengineering can be briefly defined as a collection of theories, methodologies and technologies for (i) the extraction and abstraction of information from existing software, (ii) the integration of this information with human knowledge so as to produce documents supporting maintenance, evolution, reuse and other forward engineering activities TM. At the current state of the art reverse engineering cannot be thought of as a set of tools capable of solving every type of problem. We must rather consider it as a complex process that needs to be specifically tailored according to the goals established. Moreover, as a large number of methodologies and tools have so far been proposed for setting up RE processes and most of these provide only partial solutions, the definitions of an RE process must be founded on a reference model that defines a framework in which the available methodologies and tools are to be used. A general reference paradigm for RE processes has been recently proposed 1°. This paradigm divides the setting up of an RE process into three sequential phases called Goals/Models/Tools. Goals This is the phase in which the reasons and needs for setting up the process are analysed so as to identify the documents that must be produced from code.

Models This is the phase in which the documents identified in the previous step are analysed so as to define program representation models that are able to represent all the information needed for document generation.

Microprocessors and Microsystems

Tools This is the phase for defining, acquiring or setting up" • extractor tools, for the generation from code of data required for instantiatingthe program models defined in the Models phase; • abstractor tools, for the transformation of the program models into the documents identified in the Goals phase, It should be noted that the main characteristics of the Goals/Models/Tools paradigm are such that: • this paradigm makes it possible to incorporate RE methodologies forthe solution of partial problems in a single process capable of meeting the specific needs of the software designer; • the clear distinction between documents and models makes it possible to uncouple internal and external representations of the information produced from code and to make a careful evaluation of the possibility of generating model instantiations automatically from code; • the distinction between extractor and abstractor tools, of which the former are highly language dependent while the latter depend almost solely on the documents, paves the way towards the setting up of Software Engineering Laboratories in which several extractors construct the models starting from systems coded in different languages, while a single abstractor generates documents from the models, As a result of its generality and versatility we decided to set up the RE process described in this paper as an instantiation of the Goals/Models/Tools paradigm,

GOALS: THE CONCURRENT DATA FLOW DIAGRAMS The purpose of the RE process we illustrate is to understand and reliably represent the process structure of an ADA coded embedded system and interactions between such processes so as to guide and support the software designer in maintenance, validation, evolution and reuse operations. This means that the goals of the RE process must be documents capable of representing the following fundamental aspects at various levels of abstraction: • the structuring of a software system into processes; • process parallelism and/or concurrency; • process synchronization; • data exchange between processes, The first problem dealt with was, therefore, a detailed analysis of the best known high-level representations of software systems so as to identify a set of documents and forms that could meet the above mentioned needs and could also be reconstructed from ADA code. During this study, the detailed description of which does not fall within the objectives of this paper, the need emerged for these documents to be defined ex novo as none of the documents currently available in the literature satisfied the above aims. This was essentially because ADA adopts a concurrency management model that is completely new and different compared to the current approaches used for process scheduling, communication and synchron-

Vol 15 No 10 December 1991

ization; hence the need for new documents tailored to ADA. On the other hand, ADA supplies a rich set of building blocks (i.e. packages, tasks, subprograms, generics) from which programs can be built. This led ADA application designers to look for and define documents, typically digraphs, that would take into account the nature of the different blocks and various types of relationships (e.g. nesting, with-ing, generic instantiation, exported entities, subprogram call, task call) that can be established between these blocks. A large number of graphic documents have been proposed which use different symbols to represent the various ADA building blocks and specific composition rules for representingthe different types of relationships between blocks ls-17. The main limitation of these documents lies in the fact that the richness and specificity of the symbols adopted tend to make them very complicated and difficult to interpret for designers who do not have a thorough knowledge of ADA, even if they have considerable experience in the use of traditional diagrams such as data flow diagrams and structure charts 11,12. Moreover, hardly any of these documents are formally defined, thus making them unsuitable as a goal of an RE process. Therefore, we need to identify design level documents that are (i) sufficiently formalized to be produced automatically from code, (ii) powerful and versatile so as give a faithful representation at various levels of detail of the management of concurrency in ADA, and at the same time, (iii) easy to understand for designers that, despite being experts in the use of more traditional diagrams, do not have a thorough knowledge of ADA language. These requirements acted as our guide in defining CDFDs as target documents of the RE process. CDFDs are an extension and formalization of data flow diagrams (DFDs), which are well-known and widely used in software development. The problems concernin~ DFD standardization by means of formal definitions 18 ' l v , and their extension to the representation of real-time software systems 2°-22, have been tackled several times in the literature. Our proposal differs from the others as it takes into account (i) the characteristics of theADA language as far as concurrency is concerned, and (ii) the fact that it must be possible to reconstruct the documents by reverse engineering from ADA code.

The M 1 form A DFD is a graph describing the logical data flow in a software system. DFDs describe a software system in terms of Transforms and Data Flow. Transforms represent functional blocks, while the interactions between functional blocks are represented by Data Flow. Starting from this definition, and with the aim of extending DFDs to the representation of embedded software systems, we defined a first level of CDFDs that describe the system in terms of concurrent processes and their interactions. Since ADA concurrent processes are Tasks and their interactions are Task Calls, we can define this first level, which we will hereafter refer to as M1, as follows: Definition 1 M1 is formally defined by the pair

(T, fTC)

533

where T is a finite set of Tasks; fTC is a function that maps every task t C T in the set TC of the tasks that it calls, In order to bring CDFDs closer to the classic DFDs and thus make them easier to read, we associated every level of CDFD with a graphic representation, A graph G can generally be defined by means of a triple: G = (N, A, fLAB)

G (M1) = (N, A, fLAg) where N coincides with T; A is defined by the set of couples {ti, tj) such that ti C T and t i C fTc(ti); fLAB(ai) = ~ ~'a i CA, where 2 is the null label, Figures l a and l b show the form M1 and the graph G(M1) relative to a simple system in which two producer and four consumer tasks write and read, respectively, an element of type T to/from a circular buffer also defined as atask, CHANNEL, so as to compensate forthe differences in speed between producers and consumers.

messages between tasks, and does not accurately illustrate the ADA task interaction mechanism. A knowledge of the data flow between tasks is fundamental for the maintenance and/or qualification of reusable components, while an accurate description of the task interaction mechanisms is indispensable for the validation of a concurrent system. In light of these considerations, a second and more detailed level of CDFD is clearly needed to support the designer. In order to define this second level it is worthwhile briefly referring to the salient characteristics of ADA tasking. In ADA the tasks are defined using a specification part and a body. The specification part defines the global operations (Entries), visible outside the task. The implementation of each of these operations is contained in the body (Accept). Task communication is synchronous, asymmetricaland extended rendez-vous;taskrendez-vous involves a requesting task asking for the activation of a global operation (Entry) and is expressed by the pair of entry-call/accept statements. Data exchange takes place by means of the association of the formal parameters specified in the accept and the actual parameters defined in the entry-call. At this point we can define a second level of CDFD, M2, as follows:

The M 2 form

Definition 2

where N is a finite set of nodes; A is a finite set of edges; fLAg is a function that maps every edge ai C A in the corresponding label. The graph associated with CDFD M1, which we will indicate with G(M1), is defined as follows:

The CDFD M1 meets the need to have a document that concisely represents concurrent process structuring in a software system and the interactions between the processes. This document is easy to read, even for designers who do not have a thorough knowledge of ADA tasking characteristics, and provides considerable support in the general understanding of a software system, However, the information content of this document is very low, which makes it rather unsuitable for fully supporting the activities of maintenance, validation and reuse. For example, M1 does not describe the flow of

T = {Producer~.Consumerj, CHANNEL}

h'~(Producer i i)={CHANNEL}

JE[I..2] j~ll..41

whereTisasetofTasks;EisasetofEntries;fEDisafunction that maps every task t C T in the set ED of the entries it declares; fEC is a function that maps every task t C T in the set EC of the entries it calls; fOUT is a function that maps everypair(ti, e j ) w i t h t i C T a n d e i C f E c ( t i ) i n t h e l i s t O U T o f the information (messages)that the calling task passes to the entry called when the rendez-vous is established; fin is a function that maps every pair (ti, e i) with ti C T and ej C fEc(ti) in the list IN of the information (messages) that the calling task receives from the entry called when the rendez-vous is closed. M2 can be represented by the graph:

where N coincides with T U E; A is given by A1 U A2, where A1 is defined by all the possible pairs {ti, e i) such that ti C T and ei C fED(ti), A2 is defined by all the possible pairs {ti, e i) such that ti C T and e i C fec(ti); fLAB(ai) = ;~ iff ai CA1; fLAB(ai) = ftN(ti, e i) U fOUT(ti, eli iff (ai =- {ti, ei))

C A2.

Ca)

C0nsumerT t~~1 C0nsumer2

I

C0nsumer3

I

Cons . . . . 4

I

CHANNEL

Producer2 (b)

Figure 1. Example of M1 form and related graph: the producers/consumers system 534

(T, E, fED, fEC, /OUT, fiN)

G(M2) = (N, A, fLAg)

~rc(Consumerj)={CHANNEL} (CHANNEL)=O

Producerl

M2 is formally defined by the sextuple

It is clear that edges belonging to A1 represent declaration relationships while edges belonging to A2 represent call relationships. Let us now consider, for example, the producers/ consumers system introduced above. Let READ and WRITE be the two entries exported from the CHANNEL task for the collection and delivery of an element, respectively (see Listing 1). In this case the form M2 and the graph G (M2) are the ones shown in Figures 2a and 2b. In this and all the other examples we use rectangles and circles to indicate task and entry nodes; solid and dashed lines represent call and declaration relationships respectively.

Microprocessors and Microsystems

task channel is entry

entry

w r i t e (item:

read(item:

in T) ;

out T ) ;

end channel;

task body channel is x : array

( 0 . . 9) of T;

subtype

cunter

top,bottom:

is

integer

range

counter:~x'First;

message_number:

Integer

x'First..x'Last+l;

0..x'Last-x'First+l:=0;

range

begin loop

m e n t of real-time software it is very o k e n necessary to

introduce intermediary tasks in order to decouple

select when

message_number

One very commonly used class of intermediary tasks is relay tasks. These provide a temporary data store, thus making it possible for the processes using this data to operate asynchronously. They accept data by being called (or calling) and provide data by calling (or being called by) another task. Another class of tasks that cannot be classified as strictly calling or strictly called includes tasks that offer complex services whose performance requires other tasks to be called. Let us consider, for instance, the TELLER task (see Listing 2) in a bank simulation system 23. This can be called by the CUSTOMER task to get one of the services DEPOSIT, WITHDRAW or BALANCE. At the same time, the performance of one of these services entails the TELLER

in T) do

(top) :=item;

x e n d write; t o p := (top+l)

Mod

(x'Last+l);

message_number:=message_number+l;

°rwhen message_number>0 = > accept read (item: out T) do item:=x(bottom)

;

end read;

bottom:=(bottom+l)

Mod

(x'Last+l) ;

message_number ;=message_number-I;

or

terminate;

end s e l e c t ;

end loop; end

deliver or collect messages, obtain information on system component status, communicate the status of the controlled device and/or receive commands. Nevertheless, in the development of an embedded system it may become necessary t o introduce tasks that are neither strictly calling tasks nor strictly called tasks. Nielsen and Shumate, for instance, pointed O U t 9 that in the developor

channel;

Listing 1.

The task channel

calling the BANK_DATABASE task so as to obtain, for

The M 3 form The CDFD M2 fully describes the client/server system structure, i.e. the systems in which processes can either request or provide services. Many e m b e d d e d systems are actually structured as client/sewer systems: a set of clients to monitor several senso~ and control several ----~ -~ac[ua[ors, and a set of servers that are called by clients in order to

T = {Producer i. Consumer i, CHANNEL} E = {Write, Read}

~ffEEED(Producer i) E) o (Consumer j) = 0 D(CHANNEL) = {Write, Read}

IfFc

instance, CUSTOMER identification and/or his current account balance update. In this case, therefore, the TELLER task behaves as a server towards the CUSTOMER task and, at the same time, as a client towards the

BAN K_DATABASE task.

task teller is entry deposit (id: cust id; val: money: stat: out status); entry withdraw(id: cust id: val: money: stat: out status); entry balance(id: cust id; val: out money: stat: out status); end teller; task body teller is current balance : money;

ie [ 1_2]

v a l i d : boolean;

begin loop select accept deposit (id: cust id;

jE [1 _4]

(Producer0={Write}

,' ~f~c/r'onsumer I - ,~ J~=~Rea~ , v. -,

IfEC (CHANNEL)=O L_--

val : money; stat : out status) bank_database.verify_cust id (id, valid); if not valid then stat :=bad_cust id;

do

else bank_database.deposit (id, val) ;

~o

UT (Producer i, Write)={lTEM}

i louT (Consumer ], Read)= O

FN

(Producer i, Write) = ~ i fiN (Consumer 1,Read) ={ITEM }

stat ....... ms:

end i f ;

orend deposit; accept withdraw(id: cust_id; val: money; stat: out status) do bank_database.verify_cust id(id, valid) ; if not valid then stat :-bad cust id; else bank_database.balance (id, current_balance) if current balance

c

~j

63

~ oJ

s3

~ c s3

~ ~ 63

I ~>"

U~ I

IN={Iteml

c-

>C_'-

% m6

Figure 5. Example of M4 graphic representation: the producers/consumers system

L 63

~ ~

~ ~

while.

~

~ ~

~ ~-,

~

~ 63 I

c

63

C

rN

v

~3

°

~ 03

I

"

L)

I

""

loop

c

v

rn

--

~ xD

~ n~

~

CHANNEL

63 I

~

.~

end

produce

the

.WRITE

next (ITEM)

ITEM ;

loop;

c

rn

Lihat lebih banyak...

A reverse engineering process for design level document production from ADA code

Descrição do Produto

Comentários