A reverse engineering process for design level document production from ADA code

May 27, 2017 | Autor: A. Cimitile | Categoria: Information Systems, Computer Hardware, Reverse Engineering, Computer Software, Electrical And Electronic Engineering

Share Embed

Denunciar este link

Descrição do Produto

A reverse engineering process for design level document production from ADA code G Canfora, A Cimitile and U De Carlini

The paper describes a reverse engineering process for producing design level documents by static analysis o f ADA code. The produced documents, which we call concurrent data flow diagrams, describe the task structure o f a software system and the data flow between tasks. Firstly, concurrent data flow diagrams are defined and discussed and then the main characteristics andfeatures of the reconstruction process are illustrated. The process has been used to support maintenance and reuse activities on existing real-time software and to check consistency between design and code. reverse engineering, design level documents, ADA code

The availability of ever more powerful microprocessors at lower and lower costs has led to the widespread use of computers as embedded components in large real-time systems. Nowadays computers are extensively used in a wide range of applications from the simple management of machine tools or medical devices to the monitoring and control of much more complex systems, such as telecommunication networks and nuclear power plants. Software for real-time applications differs from traditional software in many respects. A first fundamental difference lies in the fact that software systems for real-time applications are generally more complex and much larger than commercial and business software systems. Despite their intrinsic complexity, embedded software systems have to be reliable, as an error might cause disastrous economic effects or indeed the loss of human life. Consequently software validation is much more important for real-time systems than for commercial applications, and it is this factor that is largely responsible for the high costs of this type of software. A second feature of embedded software systems concerns their working life. As shown in Reference 1, embedded systems generally remain in use for very long periods of time, typically from 15 to 25 years. During this period

DIS Dipartimentodi Informaticae Sistemistica,Universityof Naples 'Federico II', Via Claudio 21, 80125 Naples, Italy This paper was first published in Microprocessors and Microsystems Vol 15 No I0 -

Vol 35 No 1 January 1993

of time the software's external environment may change significantly and, consequently, the specifications that an embedded software system must satisfy will also change. Moreover, the continual progress of technology may lead to frequent 'migrations' towards more powerful and more reliable computing systems. This entails adapting the software to the new hardware environment by redesigning or recoding all or part of the software system. The result of this is that the largest share of costs for real-time systems can be attributed to the maintenance, re-engineering and reuse of old software rather than the development of new systems. The correct planning and execution of the above mentioned validation, maintenance and reuse activities obviously require forms and documents that are able to represent a software system at higher levels of abstraction than code. Unfortunately, however, experience has shown that these documents, which should normally be generated during the system specification and design phases, are very often not available. Furthermore, even when these documents are available, their degree of consistency with code is typically very low and, in any case, difficult to verify2. This explains why the activities of reverse engineering (RE), whose contribution to supporting comprehension, maintenance, testing and quality assurance is widely acknowledged as fundamental in the traditional software sector 3-6, are gradually gaining more and more favour in the field of embedded applications. However, the use of RE in the real-time software sector is currently somewhat limited as the available methodologies, techniques and tools refer to traditional software and do not take into account the characteristics and specific needs of software for real-time embedded systems. A particularly significant aspect of these systems is the definition of RE processes for generating from code forms and documents that can represent not only the functional and structural characteristics of a system but also, and primarily, its organization in concurrent processes and the interactions existing between such processes. As real-time systems have to deal with a large number of stimuli coming from an intrinsically parallel world, they are typically arranged as a set of processes evolving in parallel, at least from a logical point of view. Then, following certain internal and/or external events these

0950--5849/93/010023-12 © 1993 Butterworth-Heinemann Ltd

23

A reverse engineering process for design level document production from ADA code

processes interact to exchange data or synchronization messages. The first problem to be solved in setting up RE activities for the production of embedded system design documents is therefore the identification, or e x novo definition, of forms and documents that can synthesize and make available all the information on the software system's organization in concurrent processes and how these interact. The fundamental requisite of these documents is that it must, of course, be possible to extract them straight from source code using static analysis. It would, furthermore, be desirable to have documents representing a software system at various levels of abstraction. It very often happens that documents produced by reverse engineering are specific and rich in detail because they are derived from code 7. The possibility to have representations at various levels of abstraction would thus allow the software designer to select the most appropriate level for the specific operation to be carried out and therefore find the information he needs without getting lost in the large amount of details available. The definition of an RE process for embedded systems strictly depends on the system software language. Specifically, the language mechanisms and primitives for the creation and interaction of concurrent processes both affect the identification of documents to be reconstructed and determine the effort needed to generate them. Many of the difficulties encountered in setting up a code static analyser for RE processes are related to the fact that software for embedded systems is very often written using traditional languages extended with call constructs. These constructs allow access to a system kernel that provides all the primitives for concurrent operations. This, of course, complicates the identification of the processes and their interactions by means of source code analysis. This situation is rapidly improving thanks to the ever more widespread use of high-level languages capable of expressing parallelism, primarily ADA 8. Designed as a language for embedded applications, ADA not only promotes new and improved methods for real-time software design9, but also paves the way towards the definition and setting up of RE processes specifically for real-time embedded systems. ADA has concurrency and process communications as inherent features of the language and for which calls to the services of an external kernel are not needed. Furthermore, ADA uses a single mechanism for process synchronization and data exchange. These characteristics make it possible to search for documents for the high-level representation of embedded systems that can be reconstructed by reverse engineering from code. In this paper we present the main characteristics and features of an RE process for the reconstruction of high level representations of ADA-coded embedded systems. The process was set up as an instantiation of a general reference paradigm for RE activity develop24

ment called Goals/Models/Toolst°. The purpose of this process is (i) to rebuild the system structure identifying the system concurrent processes and their interactions and (ii) to produce documents able to represent this information at various levels of abstraction so as to guide and assist the software designer in software comprehension, maintenance, validation and reuse. Specifically, the next section briefly describes the Goals/Models/Tools paradigm while the rest of the paper illustrates the documents to be produced and the program representations to be extracted from code for document generation. The produced documents, which we will call concurrent data flow diagrams (CDFDs), are an extension and formalization of the well-known data flow diagrams lt-~3 and can represent a program at various levels of abstraction. Finally, the last section illustrates the concurrent data flow diagrams tool (CDT) which is a prototype for the automatic reconstruction of the CDFDs of an ADA system by code static analysis. REVERSE ENGINEERING PROCESSES AND THE GOALS/MODELS/TOOLS PARADIGM Reverse engineering can be briefly defined as a collection of theories, methodologies and technologies for (i) the extraction and abstraction of information from existing software, (ii) the integration of this information with human knowledge so as to produce documents supporting maintenance, evolution, reuse and other forward engineering activities 14. At the current state of the art reverse engineering cannot be thought of as a set of tools capable of solving every type of problem. We must rather consider it as a complex process that needs to be specifically tailored according to the goals established. Moreover, as a large number of methodologies and tools have so far been proposed for setting up RE processes and most of these provide only partial solutions, the definitions of an RE process must be founded on a reference model that defines a framework in which the available methodologies and tools are to be used. A general reference paradigm for RE processes has been recently proposed l° This paradigm divides the setting up of an RE process into three sequential phases called Goals/Models/Tools.

Goals This is the phase in which the reasons and needs for setting up the process are analysed so as to identify the documents that must be produced from code.

Models This is the phase in which the documents identified in the previous step are analysed so as to define program representation models that are able to represent all the information needed for document generation. Information and Software Technology

G CANFORA, A CIMITILE AND U DE CARLINI

Tools

This is the phase for defining, acquiring or setting up: • extractor tools, for the generation from code of data required for instantiating the program models defined in the Models phase; • abstractor tools, for the transformation of the program models into the documents identified in the Goals phase. It should be noted that the main characteristics of the Goals/Models/Tools paradigm are such that: • this paradigm makes it possible to incorporate RE methodologies for the solution of partial problems in a single process capable of meeting the specific needs of the software designer; • the clear distinction between documents and models makes it possible to uncouple internal and external representations of the information produced from code and to make a careful evaluation of the possibility of generating model instantiations automatically from code; • the distinction between extractor and abstractor tools, of which the former are highly language dependent while the latter depend almost solely on the documents, paves the way towards the setting up of Software Engineering Laboratories in which several extractors construct the models starting from systems coded in different languages, while a single abstractor generates documents from the models. As a result of its generality and versatility we decided to set up the RE process described in this paper as an instantiation of the Goals/Models/Tools paradigm.

GOALS: THE C O N C U R R E N T DATA FLOW D I A G R A M S The purpose of the RE process we illustrate is to understand and reliably represent the process structure of an ADA coded embedded system and interactions between such processes so as to guide and support the software designer in maintenance, validation, evolution and reuse operations. This means that the goals of the RE process must be documents capable of representing the following fundamental aspects at various levels of abstraction: • • • •

the structuring of a software system into processes; process parallelism and/or concurrency; process synchronization; data exchange between processes.

The first problem dealt with was, therefore, a detailed analysis of the best known high-level representations Vol 35 No 1 January 1993

of software systems so as to identify a set of documents and forms that could meet the above mentioned needs and could also be reconstructed from ADA code. During this study, the detailed description of which does not fall within the objectives of this paper, the need emerged for these documents to be defined e x n o v o as none of the documents currently available in the literature satisfied the above aims. This was essentially because ADA adopts a concurrency management model that is completely new and different compared to the current approaches used for process scheduling, communication and synchronization; hence the need for new documents tailored to ADA. On the other hand, ADA supplies a rich set of building blocks (i.e. packages, tasks, subprograms, generics) from which programs can be built. This led ADA application designers to look for and define documents, typically digraphs, that would take into account the nature of the different blocks and various types of relationships (e.g. nesting, with-ing, generic instantiation, exported entities, subprogram call, task call) that can be established between these blocks. A large number of graphic documents have been proposed which use different symbols to represent the various ADA building blocks and specific composition rules for representing the different types of relationships between blocks '5 ,7. The main limitation of these documents lies in the fact that the richness and specificity of the symbols adopted tend to make them very complicated and difficult to interpret for designers who do not have a thorough knowledge of ADA, even if they have considerable experience in the use of traditional diagrams such as data flow diagrams and structure charts t','2. Moreover, hardly any of these documents are formally defined, thus making them unsuitable as a goal of an RE process. Therefore, we need to identify design level documents that are (i) sufficiently formalized to be produced automatically from code, (ii) powerful and versatile so as give a faithful representation at various levels of detail of the management of concurrency in ADA, and at the same time, (iii) easy to understand for designers that, despite being experts in the use of more traditional diagrams, do not have a thorough knowledge of ADA language. These requirements acted as our guide in defining CDFDs as target documents of the RE process. CDFDs are an extension and formalization of data flow diagrams (DFDs), which are well known and widely used in software development. The problems concerning D F D standardization by means of formal definitions '8''9, and their extension to the representation of real-time software systems2°-22, have been tackled several times in the literature. Our proposal differs from the others as it takes into account (i) the characteristics of the ADA language as far as concurrency is concerned, and (ii) the fact that it must be possible to reconstruct the documents by reverse engineering from ADA code.

25

A reverse engineering process for design level document production from ADA code

The M I form A D F D is a graph describing the logical data flow in a software system. DFDs describe a software system in terms of Transforms and Data Flow. Transforms represent functional blocks, while the interactions between functional blocks are represented by Data Flow. Starting from this definition, and with the aim of extending D F D s to the representation of embedded software systems, we defined a first level of C D F D s that describe the system in terms of concurrent processes and their interactions. Since ADA concurrent processes are Tasks and their interactions are Task Calls, we can define this first level, which we will hereafter refer to as M1, as follows: Definition 1 M1 is formally defined by the pair (T, f~c)

where T is a finite set of Tasks; frc is a function that maps every task t e T in the set TC of the tasks that it calls. In order to bring C D F D s closer to the classic D F D s and thus make them easier to read, we associated every level of C D F D with a graphic representation. A graph G can generally be defined by means of a triple: G = (N, A,A^.)

where N is a finite set of nodes; A is a finite set of edges; fLAB is a function that maps every edge ai ~ A in the corresponding label. The graph associated with C D F D MI, which we will indicate with G(M1), is defined as follows:

G(MI) = (N, A,fLAB) where N coincides with T; A is defined by the set of couples (ti, tj) such that ti~T and tjefTc(te); fLAB (ai) = 2 Vai ~ A, where 2 is the null label. Figures la and lb show the form M I and the graph G(M1) relative to a simple system in which two producer and four consumer tasks write and read, respectively, an element of type T to/from a circular buffer also defined as a task, C H A N N E L , so as to compensate for the differences in speed between producers and consumers.

The M 2 form The C D F D M1 meets the need to have a document that concisely represents concurrent process structuring in a software system and the interactions between the processes. This document is easy to read, even for designers who do not have a thorough knowledge of ADA tasking characteristics, and provides considerable support in the general understanding of a software system. However, the information content 26

T = {Producer i, C o n s u m e r I, C H A N N E L }

i¢

[ 1..2]

j~ []_4]

~

c (Producer i)={CHANNEL l c ( C o n s u m e r i)={CHANNEL} c (CHANNEL)=O

(a)

I

Consumer2

II

Consumer3

I

Co,Isomer4

I

Producer I CHANNEL Producer2

Co)

Figure 1. Example of M I form and related graph: the producers~consumers system of this document is very low, which makes it rather unsuitable for fully supporting the activities of maintenance, validation and reuse. For example, M I does not describe the flow of messages between tasks, and does not accurately illustrate the ADA task interaction mechanism. A knowledge of the data flow between tasks is fundamental for the maintenance and/or qualification of reusable components, while an accurate description of the task interaction mechanisms is indispensable for the validation of a concurrent system. In light of these considerations, a second and more detailed level of C D F D is clearly needed to support the designer. In order to define this second level it is worthwhile briefly referring to the salient characteristics of ADA tasking. In ADA the tasks are defined using a specification part and a body. The specification part defines the global operations (Entries), visible outside the task. The implementation of each of these operations is contained in the body (Accept). Task communication is synchronous, asymmetrical and extended rendez-vous; task rendez-vous involves a requesting task asking for the activation of a global operation (Entry) and is expressed by the pair of entry-call/accept statements. Data exchange takes place by means of the association of the formal parameters specified in the accept and the actual parameters defined in the entry-call. At this point we can define a second level of C D F D , M2, as follows: Definition 2 M2 is formally defined by the sextuple

(T, E,Ao,Ac, fouT,f~N) where T is a set of Tasks; E is a set of Entries; fEt, is a function that maps every task t e T in the set ED of the Information and Software Technology

G CANFORA, A CIMITILE AND U DE CARLINI

entries it declares; fEC is a function that maps every task t • T in the set EC of the entries it calls; fOUT is a function that maps every pair (ti, ej) with tie T and ej efEC (ti) in the list OUT of the information (messages) that the calling task passes to the entry called when the rendezvous is established; f~N is a function that maps every pair (t~,ej) with t~eT and ejefvc(ti) in the list IN of the information (messages) that the calling task receives from the entry called when the rendez-vous is closed. M2 can be represented by the graph:

T = {Producer =.Consumer j, CHANNEL} E = (Write, Read}

~

o (Producer i) = O D (Consumer j) = ID D (CHANNEL) = {Write,

Read}

t a s k c h a n n e l is e n t r y w r i t e (item: in T) ; e n t r y r e a d ( i t e m : out T); e n d channel; t a s k b o d y c h a n n e l is x: a r r a y (0..9) of T; s u b t y p e c u n t e r is i n t e g e r r a n g e x ' F i r s t . . x ' L a s t + l ; t o p , b o t t o m : c o u n t e r :=x'First ; message_number: Integer range 0..x'Last-x'First+l:=0; begin loop select w h e n m e s s a g e _ n u m b e r < x ' Last-x' F i r s t + l => a c c e p t w r i t e ( i t e m : in T) do x (top) :-item; e n d write; t o p : = ( t o p + l ) M o d (x'Last+l); mes s a g e _ n u m b e r := m e s s a g e _ n u m b e r + l ; or when message_number>0 => a c c e p t r e a d ( i t e m : out T) do item: =x (bottom) ; e n d read; bottom:=(bottom+l) M o d (x'Last+l); m e s s a g e _ n u m b e r := m e s s a g e _ n u m b e r - l ; or terminate; e n d select; e n d loop; end channel ;

Listing 1. The task channel

The )143 form The C D F D M2 fully describes the client/server system structure, i.e. the systems in which processes Vol 35 No 1 January 1993

c (Producer i)={Write} c (Consumer l)=(Read} c (CHANNEL)=E~

(Producer i, Write) = (Consumer j. Read) =(ITEM }

(a)

Pr

u erl J

J i

~Write

(ai- (ti, ej))eA2. It is clear that edges belonging to A1 represent declaration relationships while edges belonging to A2 represent call relationships. Let us now consider, for example, the producers/ consumers system introduced above. Let READ and WRITE be the two entries exported from the C H A N N E L task for the collection and delivery of an element, respectively (see Listing 1). In this case the form M2 and the graph G(M2) are the ones shown in Figures 2a and 2b. In this and all the other examples we use rectangles and circles to indicate task and entry nodes; solid and dashed lines represent call and declaration relationships respectively.

jE [1-41

[~

(Producer i. Write)=(]TEM} (Consumer j. Read)=~

G(M2) (N, A,fLAB) where N coincides with TLJE; A is given by AI LJA2, where AI is defined by all the possible pairs such that tie T and ejefEo(ti), A2 is defined by all the possible pairs such that t i e T and ejefEc(ti); fLAa(ai) ----k iff a, e A 1;fLAB(ai) ----fin(ti, ej) UfouT(ti, ej) iff

i~ [1_2]

Producer2

x

)

I

( Read ~.~-(,~,m,

r OOT,(Item)

m.O ternNi

(b)

Figure 2. Example of M2 form and related graph: the producers~consumers system can either request or provide services. Many embedded systems are actually structured as client/server systems: a set of clients to monitor several sensors and control several actuators, and a set of servers that are called by clients in order to deliver or collect messages, obtain information on system or component status, communicate the status of the controlled device and/or receive commands. Nevertheless, in the development of an embedded system it may become necessary to introduce tasks that are neither strictly calling tasks nor strictly called tasks. Nielsen and Shumate, for instance, pointed out 9 that in the development of real-time software it is very often necessary to introduce intermediary tasks in order to decouple asynchronous processes. One very commonly used class of intermediary tasks is relay tasks. These provide a temporary data store, thus making it possible for the processes using this data to operate asynchronously. They accept data by being called (or calling) and provide data by calling (or being called by) another task. Another class of tasks that cannot be classified as strictly calling or strictly called includes tasks that offer complex services whose performance requires other tasks to be called. Let us consider, for instance, the TELLER task (see Listing 2) in a bank simulation system23. This can be called by the CUSTOMER task to get one of the services DEPOSIT, W I T H D R A W or BALANCE. At the same time, the performance of one of these services entails the T E L L E R calling the 27

A reverse engineering process for design level document production from ADA code B A N K _ DATABASE task so as to obtain, for instance, C U S T O M E R identification and/or his current account balance update. In this case, therefore, the T E L L E R task behaves as a server towards the C U S T O M E R task and, at the same time, as a client towards the B A N K _ D A T A B A S E task. task teller is entry deposit(id: cust id; val: money; stat: out status); entry withdraw(id: cus[ id; val: money; stat: out status); entry balance(id: cust_id: val: out money; star: out status); end teller; task body teller is current balance: money; valid: boolean; begin loop select accept deposit(id: cust_id; val: money: stat: Out status) do bank_database.verify_cust_id(id, valid); if not valid then stat:-bad_cust_id; else bank_database.deposit(id, val); stat:-success; end if; end deposit; or accept withdraw(id: cust_id: val: money; star: out status) do bank_database.verify_custid(id, valid): if not valid then stat:-bad_cust_id; else bank_database.balance(id, current_balance) if current balance

"~

03

C~ I _~

03 ¢~ ¢'~ ~

tX~ nm aJ 03

--

rn

C~

~

eo

tC3 c~

o~

CX~ -Q ~ ~

"~ cm I

for

each

• F O R M A L P A R A M E T E R S LIST of the formal parameters ej declares, with the related rolls (i.e. IN, OUT, IN OUT); • N E S T E D E N T R Y CALLS LIST of the entries ti calls in a nested rendez-vous through ej.

J

O~

c-

LIST

m

Lihat lebih banyak...

A reverse engineering process for design level document production from ADA code

Descrição do Produto

Comentários