Serializability by commitment ordering

Share Embed


Descrição do Produto

ELSEVIER

Information

Processing

Letters

Information Processing Letters

51 (1994) 257-264

Serializability by commitment ordering Yoav Raz ’ Digital Equipment Corporation, I IO Spit Brook Road (ZKO3-3 /X46), Communicated

by D. Dolev; received

3 August

Nashua, NH 03062, USA

1991; revised 1 March

1994

Abstract A new serializability across

multiple

generalizes

concept,

autonomous

the popular

Commitment

Database

Strong-Strict

Ordering (CO), allows to effectively achieve global serializability Systems, that may use different (any) concurrency control mechanisms. CO Two Phase Locking (S-S2PL) concept.

Key words: Atomic

Multi database

commitment; Concurrency systems; Transactions

control;

1. Introduction

Serializability is the most commonly accepted general criterion for the correctness of concurrent transactions (see e.g. [l]), and supported in most Database Systems (DBSs). When transactions involve more than one DBS this property may be violated in general, unless special measures are taken, or certain conditions exist to guarantee it. This issue is dealt with, for example, in [l-3,5,6,10,1 1,161. Achieving global serializability with reasonable performance, especially across DBSs that implement different concurrency control mechanisms, has been considered a difficult problem (e.g. [14,151X An important goal is reaching global serializability while maintaining DBS autonomy. We define a DBS to be autonomous, if it does not have to share any concurrency control information (e.g. transaction timestamps) with

’ Email:

systems;

Distributed

systems;

Global serializability;

other entities, and is being coordinated with other DBSs (at the non-application level) solely via Atomic Commitment (AC; see [l]) protocols ‘. These protocols are necessary to achieve gfobal atomicity (“all or nothing” semantics). The Two Phase Commitment protocol (2PC; see [7]), is a popular special case of AC. A well-known local (i.e. local to each DBS) concurrency control mechanism that together with AC guarantees global serializability is StrongStrict Two Phase Locking (S-S2PL; “release locks issued on behalf of a transaction only after the transaction has ended”; see [1,4]). This fact has been known for several years, and has been the major correctness foundation for implementations of distributed transactions. The observation that local S-S2PL guarantees global serializability

‘We confine ourselves to atomic transactions thus, the definition of DBS autonomy implies requirement for DBSs’ cooperation.

[email protected].

0020-0190/94/$07.00 0 1994 Elsevier SSDI 0020.0190(94)00093-E

Database

Science

B.V. All rights reserved

only, and a minimal

2%

Y. ~i~a/Information

Processing Letters 51 (19941 257-264

ayipears explicitiy at least in [ll] and [2] j. When taking this approach, all the DBSs involved have to implement S-S2PL based concurrency control, even if other types are preferable for some DBSs, for performance considerations. Other methods for achieving global serializability, referenced above, violate DBS autonomy as defined, by requiring information exchanges among DBSs, in addition to AC messages (e.g., piggy-backing transaction timestamps on messages). Dynamic atomicity defined in [161 is an exception, and can be enforced by autonomous DBSs. However, no general protocol for enforcing it is given in [16]. In this paper we generalize the above observations. We define a history (schedule) property named Commitment Ordering (CO; see also [12] 4), and show that it implies global serializability, if applied locally in each DBS, and if AC is used for DBS coordination. CO implementations can consist of stand-alone serializability mechanisms, as well as combinations with other concurrency control mechanisms. Since CO can be enforced solely by controlling the order of transactions’ commit events, it can be combined with any other concurrency control mechanism without affecting the mechanism’s data access scheduling strategy. This allows selecting and optimizing concurrency control for each DBS according to the nature of the transactions involved. Enforcing CO does not require aborting more transactions than those needed to be aborted for global serializability violation prevention (which is determined exclusively by the data access orders, and independent of the commit orders). Most existing DBSs, which are SS2PL based, provide CO already, since S-S2PL is a special case of CO. CO generalizes the dynamic atomic&y property defined in [161 as well (see

WI).

3s

the term rigorousness for S-S2PL. 4 [2] provides an alternative definition, referred to as strong recouerubility, and uses it to show that applying rigorousness (S-S2PL in our terminology) locally implies global serializability. No algorithm for enforcing CO (beyond S-S2PL) is given there.

In summac’ serializabil;ry of transaction histories across difterent DBS types, that may use different concurrency control mechanisms but provide the CO property, is guaranteed without any global coordination or distributed services except AC.

2. Background The

following

terminology

is used

(see

also

[ll): A transaction is an execution of a set of programs that access DBSs’ shared data items. It is required that a transaction is atomic, i.e., either the transaction completes successfully, and its effects on the shared data items become permanent, or all its effects on the data items are undone. In the first case the transaction is committed. In the latter the transaction is aborted. A transaction is local, if it is confined to accessing data of a single DBS. It is global, if it spans two or more DBSs. The event of ending transaction T, (either committing or aborting it) is denoted e,. The notation ci may be used for this event, if Ti is committed. A transaction is decided, if it is either aborted or committed; otherwise it is undecided. An undecided transaction is ready, if it has completed its processing, and is prepared either to be committed or aborted; otherwise it is active. The diagram in Fig. 1 defines the possible transitions between transaction states.

Y. Raz /Information

Two operations on a data item X, pi[xl, 4j[Xl, of transactions 7;, 7; respectively, are con@ting, if at least one of them is a write operation. Transaction T, is in a conflict with transaction T,, if for respective conflicting operations qJx1, pi[x], pi[x] precedes q2[xl. Notation: pl[xl <

qJx1. The relation defined by the relationships “ < ” is denoted history. It is a partial order over accumulated transactions’ events, that maintains the partial orders enforced by the individual transactions and the systems where the transactions run (e.g., see [11X A history is serializable (SER, or is in SER, the class of serializable histories), if its commit projection (i.e., its restriction on committed transaction) is conflict equivalent (i.e., have the same operation conflicts) to some serial history (a history in which transactions are executed in a serial order). The Serializability Graph of a history H, SG( H), is the following directed graph: SG( H) = (T, C), where T is the set of all unaborted (i.e. committed and undecided) transactions in H, C (a subset of TX T) is a set of edges that represent transaction conflicts: Let T,, T, be any two transactions in T; an edge from T, to T2 exists, if and only if T, is in a conflict with T,. The Committed Transaction Serializability Graph of a history H, CSG(H), is the subgraph of SG(H) with all the committed transactions as nodes and all the respective edges. The Undecided Transaction Serializability Graph of a history H, USG(H), is the subgraph of SG( H) with all the undecided transactions as nodes and all the respective edges. The following theorem provides a criterion for checking serializability: Theorem 1 (The Serializability Theorem). A history H is serializable, if and only if CSG( H) is cycle-free. (For a proof see for example 3. Commitment

259

Processing Letters 51 (1994) 257-264

[l]).

Ordering (CO)

Commitment Ordering (CO) is a property of transaction histories (schedules) that guarantees

serializability. It generalizes S-S2PL. While SS2PL blocks any conflicting operations on a data item accessed by a transaction until the end of the transaction, CO does not necessarily block. Thus, it can be implemented in a nonblocking that guarantees (optimistic; see [93) manner, deadlock-freedom. The price for this, however, is the possibility of cascading aborts when recocerability (see e.g. [l,S]> is applied. The class of CO histories is incomparable with the classes of recoeerable, cascadeless, strict, and 2PL histories 1121. Definition 2. A history is CO (in the class CO) if for any conflicting operations p ,[ xl, q2[ x I of any committed transactions T,, T, respectively, p ,[ x I < qJx] (i.e., T2 is in a conflict with T,) implies e, < e2. We now show that CO implies Theorem 3. SER 1 CO (where containment ).

serializability:

3 denotes a strict

Proof. (i) Let a history H be in CO, and let . . . + 7; - . . . - 7; + . . . be a (directed) path in CSG(H). By the CO definition and an induction by the order on the path above we conclude that ci < cj. (ii) Now suppose that H is not in SER. By Theorem 1 (without loss of generality) there is a cycle T, + T2 + . . . +T,-+T, inCSG(H),n> 2. First, let T, and 7; in (9 be T, and T, above respectively (consider an appropriate prefix of the path representing the cycle above). This implies by (i) that c, < c2. Now let T. and 7; in (i) be T2 and T, above respectively (consider an appropriate suffix of the path above). This implies that c2 < c,. However, c, < c2 and c2 < c, contradict each other since the relation “ < ” is asymmetric. Hence CSG( H) is acyclic, and H is in SER by Theorem 1. Now examine the following serializable, non CO history to conclude that the containment is strict:

260

Y Raz / Information Processing Letters 51 (1994) 257-264

where Y,[X] is a read write operation. 0

operation

and

w2[x] is a

Note that the proof above leaves the interpretation of the commit event open. We choose to interpret it as the commit decision (logging) event. For a local transaction this event usually takes place in the respective DBS. For a global transaction this event is typically associated with the execution of the AC protocol (see Section 5), out of the boundaries of any DBS. S-S2PL can be defined as follows: A history H is S-S2PL (in the S-S2PL history class), if for any conflicting operations pr[x], q&xl of transactions T,, T, respectiveIy in H,

These aborts by lemma 5:

cannot

be compromised,

as stated

Lemma 5. If all the members of ABORT,,(T) are not aborted after T is committed, then CO may be violated (assuming that every transaction is eventually decided ). Proof. Suppose that T is committed. Let T’ be some transaction in ABORT,,(T). Thus T’ is undecided when T is committed. If T’ is later committed, then c 0) by the algorithm. Then also H, is trivially in CO, USG, = USG(H,) (its UT component) includes all the undecided transactions in H,. Assume that the history H, is in CO. Now perform an additional iteration, and commit transaction T, (without loss of generality) in includes all the transactions in H,, USG,. K+, and the new (undecided) transactions that have been generated after completing step iz (and are in USG,,,). Examine the following cases after completing iteration n + 1: Let T2, T3 (without loss of generality) be two committed transactions in H,. If T3 is in a conflict with T, then c2 < c3 since H,* is in CO by the induction hypothesis. Obviously c2 < c1 for every (previously) committed transaction T2 in H,, with which T, is in a conflict. Suppose that a committed transaction T2 is in a conflict with T,. This means that T, is in ABORT&T,), and thus aborted when T2 was committed. A contradiction. The cases above exhaust all possible pairs of conflicting committed transactions in H,, + 1. q Hence H,, 1 is in CO. The CO algorithm enforces Lemma 5, which is necessary

the condition of for guaranteeing

Corollary 9. Histories generated by a scheduler that includes a COCO are serializable. Note

that

aborting

the

transactions

observation is a direct way to show that the algorithm above guarantees serializability (using Theorem 1). If a that does not reside on any cycle exists in the USG, then there exists a transaction T with no edges from any other transaction. T can be committed without aborting any other transaction,

transaction has to be aborted when committing a on the cycle. Thus, only one per cycle has to be aborted to enforce CO, which is also the minimum necessary to guarantee serializability algorithms; e.g., [ 11). The COCO can be incorporated

nonblocking,

and

hence

ensuring

deadlock-free-

cycle-free, or cycles are eliminated by the scheduler aborting transactions. Note that if the scheduler is S-S2PL based, then the USG does not have any edges. This means that no aborts by the COCO are needed, as one can expect, and the entire COCO is unnecessary. This is an extreme case. Other sched-

2h2

uler types may induce respective USGS.

5. Guaranteeing global commitment ordering

Y

other

Rar/ Information Processing Letters 51 11994) 257-264 properties

serializability

of the

by local

It is assumed that an atomic commitment (AC) protocol is applied to guarantee atomicity in the distributed, multi DBS environment. An AC protocol implements the following general scheme each time a global transaction is decided: AC: Each DBS participating in a transaction votes either YES or NO on the transaction (also absence of a vote within a time limit may be considered NO) after its respective local subtransaction ’ has reached the ready state, or votes NO if unable to reach the ready state. The transaction is committed by all DBSs if and only if all have voted YES. Otherwise it is aborted by all the DBSs. Remarks. (9 The YES vote is an obligation to end the local subtransaction (commit or abort) as decided by the AC protocol. After voting YES the DBS cannot affect the decision anymore. (ii) The fact that AC is used allows one to assume that like in the single DBS case, also a multi DBS (committed) transaction has a single commit event (the commit decision event; see remark following the proof of Theorem 3). In a multi DBS environment, the COCO described above typically receives a request via an AC protocol to commit some global transaction T in the USG. If the COCO can commit the transaction, it votes YES via AC, which is an obligation to either commit or abort according to the decision reached by the AC protocol. When the COCO commits T all transactions in ABORT,,(T) need to be aborted (by Algorithm 1). Thus, the COCO has to delay its YES vote on T (or the decision on T if it is local), if it has voted YES on any transaction T’ in ABORT,,(T), until it “knows” (by a notifica-

’ The portion of a transaction within a single DBS. A local subtransaction has states as defined in Section 2.

tion) that T’ is decided. Not delaying the YES vote can result in a contradiction, if the AC protocol decides to commit both. Note that in this case committing (locally) first T’ in ABORT,,(T), and then committing T is not sufficient, since T can be in ABORT,o(T’) in another DBS, and committing both will result in a global serializability violation. By similar considerations the COCO cannot vote YES on T’ in ABORT,,(T) (or decide on T’ if it is local), after it has voted YES on T, until it “knows” that T is decided by AC. If a YES vote on some T is possible, the COCO may either choose to do so immediately upon being requested (the nonblocking without delays approach), or to delay the voting for a given, predetermined amount of time (nonblocking with delays). During the delay the set ABORT,,(T) may become smaller or empty, since its members may be decided and removed from the USG, and since ABORT,,(T) cannot increase after T has reached the ready state. Instead of immediately voting, or delaying the voting for a given amount of time (which may still result in aborts) the COCO can block the voting on T until all transactions in ABORT,,(T) are decided. However, if another DBS in the environment also blocks, this may result in a global deadlock (e.g. if T’ is in ABORT,,(T) for one DBS, and T is in ABORT,o(T’) for another DBS). A common way to resolve global deadlocks is by timeout. Note that aborting transactions by the COCO is necessary only if a local cycle in its USG is not eliminated by some external entity (e.g. a scheduler that generates a cycle-free USG or one that uses aborts to resolve (local) cycles), or if a global cycle (across two or more local USGS) is generated. As for the single DBS case (see end of Section 41, one transaction per cycle needs to be aborted. Definition 10. For any history property X a (global) history is in Local-X if the (local) history of every DBS in the environment is in X. Note that AC provides

Local-SER 1 SER. However, since a single commit-decision event for

Y. Rar /Information

each (committed) transaction, the theorem follows by Definitions 2 and 10: Theorem 11. Local-CO = CO (i.e. a (global) tory is in CO if and only if it is in Local-CO).

Processing Letters 51 (I 994) 257-264

below

his-

Proof. (i) Assume that a history H is in Local-CO. By the arguments given in the discussion above, for each DBS that enforces CO and for any two committed transactions T, and T,, T2 is in a conflict with T1 in the DBS (reflected by the DBS’s operations), if and only if ci < c2 (commit decision events). Thus for all the DBS participating in both T, and T, either T2 is in a conflict with T,, or no conflict exists (T, being in a conflict with T, implies that c2 < c,, a contradiction). Thus, the condition above is globally true, and H is in CO. (ii) If H is in CO, then T2 being in a conflict with T, implies c, < c2. This is true for all the DBS that view this conflict (i.e., T2 is locally in a conflict with T,). Thus every DBS is in CO, and 0 H is in Local-CO. The following theorem Theorems 3 and 11: Theorem history

is

is a consequence

of

(i.e. if a (global) then it is serializable).

12. SER 2 Local-CO

in Local-CO,

Theorem 12 means that if each DBS provides CO (by using an S-S2PL based scheduler or any other scheduler type that includes a COCO), then global serializability is guaranteed.

6. Conclusion The simplicity of the CO algorithm and the lack of any related communication overhead (AC protocols are anyhow utilized for atomicity), suggest that CO provides a practical, fully distributed solution for the global serializability problem in a multi database environment with a high transaction rate. No change in existing AC protocols and related services is required. The DBSs involved may implement any concurrency control mechanism, as long as transaction conflict

263

information is available to enforce CO locally in each DBS. It is shown in [12] that CO is also a necessary condition for guaranteeing global serializability across DBSs that coordinate solely via AC and cannot distinguish between local and global transactions. If the DBSs have the knowledge to distinguish between local and global transactions, a more general property, Extended Commitment Ordering (ECO), which imposes commitment order on global transactions only, is the necessary condition for global serializability

[131.

References V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Database Systems (Addison-Wesley, Reading, MA, 1987). M. Rusinkiewicz and A. 121Y. Breibart, D. Georgakopoulos, Silberschatz, On rigorous transaction scheduling, IEEE Trans. Software Engineering 17 (9) (1991). and W. Du, A paradigm for concurrency [31 A. Elmagarmid control in heterogeneous distributed database systems, in: Proc. 6th Internat. Con& on Data Engineering, Los Angeles, CA, 1990. J.N. Gray, R.A. Lorie and I.L. Traiger, [41 K.P. Eswaran, The notions of consistency and predicate locks in a database system, Comrn. ACM 19 (11) (1976) 6244633. M. Rusinkiewicz and A. Sheth, On [51 D. Georgakopoulos, serializability of multi database transactions through forced local conflicts, in: Proc. 7th Internat. Conf: on Data Engineering, Kobe, Japan, 1991. Concurrency control 161 V. Gligor and R. Popescu-Zeletin, issues in distributed heterogeneous database management systems, in: F.A. Schreiber and W. Litwin, eds., Distributed Data Sharing Systems (North-Holland, Amsterdam, 1985) 43-56. [71 J.N. Gray, Notes on database operating systems, in: Operating Systems: An Advanced Course, Lecture Notes in Computer Science 60, (Springer, Berlin, 1978) 393-481. A theory of reliability in database systems, 181 V. Hadzilacos, J. ACM 35 (1) (1988) 121-145. [9] H.T. Kung and J.T. Robinson, On optimistic methods for concurrency control, ACM Trans. Database Systems 6 (2), (1981) 213-226. [lo] D. Lomet, Consistent timestamping for transactions in distributed systems, Tech. Rept. CRL 90/3, Digital Equipment Corporation, Cambridge Research Laboratory, 1990. [ll] C. Pu, Transactions across heterogeneous databases: The superdatabase architecture, Tech. Rept. No. CUCS-24386 (revised June 19881, Dept. of Computer Science, Columbia University, New York, NY.

111P. Bernstein,

264

Y. Raz /Information

Processing Letters 51 (1994) 257-264

[12] Y. Raz, The principle of commitment ordering, or guaranteeing serializability in a heterogeneous, multi resource manager environment, DEC-TR 841, Digital Equipment Corporation, 1990; revised April 1992. [13] Y. Raz, Extended commitment ordering, or guaranteeing global serializability by applying commitment order selectively to global transactions, DEC-TR 842, Digital Equipment Corporation, 1991; revised April 1992. [14] A. Sheth and J. Larson, Federated database systems for

managing distributed, heterogeneous, and autonomous databases, ACM Cornput. Surceys 22 (3) (1990) 183-236. [15] A. Silberschatz, M. Stonebraker and J. Ullman, Database systems: Achievements and opportunities, Comm. ACM 34 (10) (1991) [16] W.E. Weihl, Local atomicity properties: Modular concurrency control for abstract data types, ACM Trans. Programming Languages Systems 11 (2) (1989) 249-282.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.