Overview of IBM system/390 parallel sysplex-a commercial parallel processing system

June 3, 2017 | Autor: Jen-yao Chung | Categoria: Parallel Processing, OPERATING SYSTEM, Parallel Architecture

Descrição do Produto

Overview of IBM System/390 Parallel SysplexA Commercial Parallel Processing System Jeffrey M. Nick 522 South Road IBM System/390 Division Poughkeepsie, NY 12601, USA jeff [email protected] Abstract Scalability has never been more a part of System/390 than with Parallel Sysplex. The Parallel Sysplex environment permits a mainframe or Parallel Enterprise Server to grow from a single system to a configuration of 32 systems (initially), and appear as a single image to the end user and applications. The IBM S/390 Parallel Sysplex provides capacity for today’s largest commercial workloads by enabling a workload to be spread transparently across a collection of S/390 systems with shared access to data. By way of its parallel architecture and MVS operating system support, the S/390 Parallel Sysplex offers near-linear scalability and continuous availability for customers’ mission-critical applications. S/390 Parallel Sysplex optimizes responsiveness and reliability by distributing workloads across all of the processors in the Sysplex. Should one or more processors fail, the workload is redistributed across the remaining processors. Because all of the processors have access to all of the data, the Parallel Sysplex provides a computing environment with near-continuous availability.

1

Introduction

Parallel and clustered systems are emerging as common architectures for scalable commercial systems. Once popular in numerically intensive environments, the introduction of advanced coupling technology (e.g., S/390’s ESCONTM [4] and Coupling Facility[7], SP2TM switch[11] and Tandem’s ServerNetTM [2]) is driving the acceptance of these systems in commercial markets. These systems share some common objectives; namely to harness large amounts of processing power while providing availability improvements over single systems. Their architectures span a broad spectrum from traditional parallel processors that were initially focused on high performance for numerically intensive workloads [6] to clustered operating systems that focus on availability [1]. This paper describes a new parallel architecture and a set of related products for IBM’s S/390 processors and MVSTM operating system. The system is clearly unique. From an architectural perspective it contains many novel features that

Jen-Yao Chung, Nicholas S. Bowen P. O. Box 704 IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA jychung,[email protected] enable parallelism. However, at the same time that parallelism is enabled, a single system image is preserved. From the end-user’s view it appears as a scalable and available system by taking advantage of database managers who have themselves become “parallelized.” This paper describes the architecture and design objectives for the S/390 parallel systems (herein called “Parallel Sysplex”). This architecture contains new and innovative parallel data-sharing technology, allowing direct, concurrent read/write access to shared data from all processing nodes in the parallel configuration, without sacrificing performance or data integrity. This in turn enables work requests associated with a single workload to be dynamically distributed for parallel execution on systems in the sysplex based on available processor capacity rather than data-to-system affinity. Through this state-of-the-art parallel technology, the power of multiple MVS/390 systems can be harnessed to work in concert on common workloads, taking the commercial strengths of the MVS/390 platform to new heights in terms of competitive price/performance, scalable growth and continuous availability. The key design objectives in guiding this system were:

Reduced total cost of computing. Compatibility with existing systems and programs. Dynamic workload balancing. Scalability and granular growth. Continuous availability of information assets. The purpose of the paper is to review the S/390 Parallel Sysplex architecture, the MVS operating system services built on the architecture, and provide an overview of the key application environments that exploit the S/390 Parallel Sysplex technology. This paper is organized as follows. Section 2 presents the objectives of building parallel systems. Section 3 discusses the technology enhancements for data sharing. Section 4 discusses the scalability of the S/390 Parallel Sysplex. Section 5 presents the exploitation product details. Section 6 concludes the paper.

2

Design Objectives

This section describes the basic objectives that drove the design of the system.

2.1

Reduced Cost of Computing

The primary business objective was to reduce the total cost of computing. This meant using S/390 CMOS microprocessors to leverage industry-standard CMOS technology to price/performance advantage, both in terms of reduced base manufacturing cost and significant on-going customer savings in reduced power, cooling and floorspace requirements. Although the use of multiple interconnected microprocessors can provide the aggregation of large amounts of processing power, low cost can only be truly achieved if the processors are efficiently utilized. Therefore, the ability to dynamically and automatically manage system resources is a key objective. A new component, the Workload Manager (WLM), was designed to meet this objective. While the S/390 Parallel Sysplex is physically comprised of multiple MVS systems, it has been designed to logically present a single system image to end-users, applications, and the network, and provides a single point of control to the systems operations staff. Systems management costs do not increase linearly as a function of the number of systems in the sysplex. Rather, total cost of computing efficiencies of scale accrue through the centralized control over the integrated multi-system configuration.

2.2

Compatibility

The second key objective was to maintain compatibility with the existing systems and programs. Given the huge customer investment in S/390 MVS commercial applications, it was an imperative for success that the parallel sysplex technology be introduced in a compatible manner with customers’ existing application base. It was a further design objective to enable transparent leverage of the parallel sysplex data-sharing technology to advantage for customers existing application investments. With few exceptions, these objectives have been met. The parallel sysplex technology extensions to the S/390 architecture (introducing new cpu instructions, new channel subsystem technology, etc) are fully-compatible with the base S/390 architecture. The IBM subsystem transaction managers (CICSTM and IMSTM ) and key subsystem database managers (DB2TM , IMS/DBTM ) have exploited the data-sharing technology while preserving their existing interfaces. This has protected customer investments in OLTP and decision support applications and also provided improvements in terms of reduced cost of computing, scalable growth, and application availability.

2.3

Dynamic Workload Balancing

The ability to dynamically adjust total system resources to best satisfy workload objectives in real-time is a key objective in a commercial parallel processing environment [14]. There are two fundamental approaches to workload distribution and data access used in the commercial

parallel processing systems; data-partitioning (i.e., “shared nothing”) system designs and data-sharing system designs. In a data-partitioning system, the database and the workload are divided among the set of parallel processing nodes so that each system has sole responsibility for workload access and update to a defined portion of the database. The data-partitioning is required in order to enable each system to locally cache data in processor memory with coherency and to eliminate the need for cross-system serialization protocols in providing data-access concurrency control. This is the approach most commercial parallel processing systems have taken. However, limitations are imposed in a commercial processing environment by such a design point [13]. Significant capacity planning skills and cost are required to tune the overall system to match each system node’s processing capacity to the projected workload demand for access to data owned by that given system. While it is possible to achieve an optimized match between system capacity and workload demand for a well-tuned benchmark environment based on careful system monitoring and analysis, real commercial workload applications are not so well-behaved. Significant fluctuations in the demand for system processor resources and access to data occur during real-time workload execution, both within a single workload and across multiple workloads in concurrent execution across the parallel processing system nodes. These real-time spikes and troughs in system capacity demand can result in significant over- or under-utilization of system resources across all of the parallel nodes. This problem is further aggravated by the fact that commercial processing applications are becoming more complex in their nature with respect to the diversity of data that such applications access during execution of business transactions. In a data-partitioning system, it becomes increasingly difficult to insulate a particular business transaction to execution on a single system node without incurring the overhead associated with message passing requests for data owned by other nodes in the parallel configuration. The S/390 Parallel Sysplex environment employs the “data-sharing” strategy. The new high-performance datasharing technology provides the means for MVS and its subsystems to support dynamic workload balancing across the collection of systems in the configuration. Functionally, workload balancing can occur at two levels. Initially, during user logon, session binds can be dynamically distributed to balance the load across the set of systems. Subsequently, work requests submitted by a given user can be executed on any system in the configuration based on available processing capacity, instead of being bound to a specific system due to data-to-processor affinity (which is typically the case with alternative data-partitioning parallel systems). Normally, work will execute on the system on which the request is received, but in cases of over-utilization on a given node, work can be directed to other less-utilized system nodes. Examples of two types of commercial workloads that

lend themselves well to dynamic workload balancing include Online Transaction Processing (OLTP) and decision support. OLTP workloads are comprised of many individual work requests, i.e., transactions, each transaction being relatively atomic in its execution with respect to other transactions in the workload. Thus, it is possible to balance the OLTP workload by distributing individual transactions for execution in parallel across the set of systems in the parallel sysplex. Decision support workloads consist predominantly of query requests, wherein a given query can involve scanning multiple relational database tables. Here, parallelism can be attained by breaking up complex queries into smaller sub-queries, and distributing the component queries across multiple processors (cpu) within a single system or across multiple systems in a parallel sysplex. Once all sub-queries have completed, the original query response can be constructed from the aggregate of the sub-query answers and returned to the requester. For both OLTP and decision support workloads, dynamic workload balancing across systems can be made predominantly transparent to the customer applications or users, which remain unchanged.

2.4

Scalability and Granular Growth

In the S/390 Parallel Sysplex environment, processing capacity can be added in granular increments; from the addition of a single processor within an existing system to the introduction of one or more data-sharing systems. New systems can be introduced into the parallel sysplex in a nondisruptive manner. That is, the already-running systems continue to execute work concurrent with the activation of the new system. Once the new system is active, it can become a full participant in dynamic workload balancing. New work requests are naturally driven at an increased rate to that system until its utilization has reached steady-state with respect to the demand for overall processor resources across all system nodes in the parallel sysplex configuration. This capability eliminates the need (and considerable costs) to re-partition the databases and re-tune each system’s workload affinity to distribute work evenly after introduction of the new system into the configuration, as is typically required with a data-partitioned parallel processing system. Most significantly, the parallel sysplex data-sharing technology enables systems to be added to the configuration with near-linear scalability and nearly-unlimited capacity. The first S/390 Parallel Sysplex implementation supports up to 32 systems, where each system can be a tightlycoupled multi-processor system with up to 10 cpus. In a parallel sysplex consisting of 32 S/390 CMOS systems, a total processing capacity of several thousand S/390 MIPS is configurable.

2.5

Continuous Availability

With the advent of the S/390 Parallel Sysplex datasharing technology and exploitation by the MVS system and subsystems, it is possible to construct a parallel processing environment with no single points of failure. Since all systems in the parallel sysplex can have concurrent access to all critical applications and data, the loss of a

system due to either hardware or software failure does not necessitate loss of application availability. Peer instances of a failing subsystem(s) executing on remaining healthy systems can take over recovery responsibility for resources held by the failing instance, or the failing subsystem(s) can be automatically restarted on still-healthy systems by the MVS Automatic Restart Manager (ARM) component to perform recovery for work in progress at the time of the failure. While the failing subsystem instance is unavailable, new work requests can be redirected to other data-sharing instances of the subsystem to provide continuous application availability across the failure and subsequent recovery. The ARM component is fully integrated with the existing parallel structure and provides significantly more functions than a traditional “restart” service. First, it utilizes the shared state support described in Section 3.2 so at any given point in time it is aware of the state of all processes on all processors (i.e., even of processes that “exist” on failed processors). Second, it is tied into the processor heartbeat functions so that it is immediately aware of processor failures. Third, it is integrated with the WLM so that it can provide a target restart system based on the current resource utilization across the available processors. Finally, it contains many features to provide improved restarts such as affinity of related processes, restart sequencing, and recovery when subsequent failures occur. These services are described more fully in [3]. The same availability characteristics associated with handling unscheduled outages are applicable to planned outages as well. A system can be removed from the parallel sysplex for planned hardware or software reconfiguration, maintenance or upgrade. New work can be dynamically re-distributed across the remaining set of active systems. Once the system is ready to be brought back online, it is re-introduced into the sysplex in a non-disruptive manner and participates in dynamic workload balancing and re-distribution as described earlier. In this manner, new releases of MVS and key IBM subsystems supporting the parallel sysplex environment will also support release to release migration co-existence, allowing new software product release levels to be rolled through the parallel sysplex one system at a time, providing continuous application availability across the systematic migration install process. Further, since all systems in the parallel sysplex can be configured to provide concurrent, direct access to common customer applications and data, each individual system only requires 1/N spare system capacity (where N represents the number of fully-configured systems in the sysplex) in order for all remaining systems to continue execution of critical workloads without any observable loss of service in the event of any single system failure.

3

S/390 Parallel Systems

This sections provides an overview of the technical capabilities for the S/390 Parallel Sysplex. It covers the overall system architecture, the basic operating system support for parallel systems, and the advanced technology introduced to enable efficient coupling of systems.

3.1

System Model

Figure 1 shows the overall structure of the system. It consists of a set of processing nodes (each of which can be

Coupling Facility S/390 CMOS 12 11

1

10

2

9

3 4

8 7

5 6

Sysplex Timer

special time-out logic to handle faulty processors) and duplexing of the disks containing the state data. In addition, there are availability enhancements for planned and unplanned changes to the state repositories (e.g., “hot switching” of the duplexed disks). Third, processor heartbeat monitoring is provided. In addition to standard monitoring of each processors health, functions are also provided to automatically terminate a failed processor and disconnect the processor from its I/O devices. This enables other multi-system components to be designed with a “fail-stop” strategy (to prevent problems from processors that appear faulty because of the heartbeat function and then resume processing).

3.3 ES/9000 Mainframe

ESCON

Shared data

Figure 1: System Model tightly coupled multiprocessors) connected to shared disks. There can be up to 32 processing nodes where each node can be a tightly coupled multiprocessor containing between 1 and 10 processors. The systems do not have to be homogeneous; that is, mixed configurations supporting both S/390 CMOS processor systems and traditional ES/9000 bipolar systems can be deployed. The basic processor design has a long history of fault-tolerant features [10]. The disks are fully connected to all processors. The I/O architecture has many advanced reliability and performance features (e.g., multiple paths with automatic reconfiguration for availability). The basic I/O architecture is described in [4] and one aspect of the dynamic I/O configuration is described in [5]. The sysplex timer serves as a synchronizing time reference source for systems in the sysplex, so that local processor timestamps can be relied upon for consistency with respect to timestamps obtained on other systems. The Coupling Facility (CF) is a key Parallel Sysplex technology component providing multi-system data-sharing functions and is described in Section 3.3.

3.2

Base MVS Multi-system Services

There are a set of operating system services that are provided as building blocks for providing multi-system services. These are described in detail in[12] and here we only briefly cover three of the most relevant aspects. First, a set of group membership services are provided. These allow processes to join/leave groups, signal other group members and be notified of events related to the group. Second, the ability to provide efficient, shared access to operating system resource state data is provided. This data is located on shared disks and many advanced functions are provided including serialized access to the data (with

Coupling Facility

Given the advantages outlined above through datasharing in a parallel processing system environment, the obvious question arises as to why the predominant industry structure for parallel processing is a data-partitioning model. The basic answer lies in the fact that in the past, data-sharing multi-node parallel processing systems have exhibited poor performance and rapidly-diminishing scalability characteristics as the number of nodes in the parallel configuration grows. The data-sharing performance overhead and limited scalability are driven by two fundamental structural attributes. First, significant processing overhead is incurred with respect to the need to provide multi-system concurrency controls for serialized access to shared data. High inter-system communication traffic arises in order to grant/release locks on shared resources. Second, significant processing overhead is incurred in order to provide multisystem buffer coherency controls for shared data cached in each system’s processor memory. This function is essential for a data-sharing parallel structure, as it is critical for performance to enable local caching of shared data with full read/write integrity. Here, the overhead is associated with processing to broadcast messages to other nodes to perform buffer invalidation when an update to shared data is made on one system, or to determine whether data cached in local memory is current at the time of use. The S/390 Parallel Sysplex introduces new architecture, hardware and software technology to address the fundamental performance obstacles which have heretofore precluded implementation of a high-performance, scalable data-sharing parallel-processing system as shown in Figure 2. At the heart of this inter-system “coupling” technology is the Coupling Facility (CF), a new component providing hardware assists for a rich and diverse set of multi-system functions, including: High-performance, finely-grained locking and contention detection Global buffer coherency mechanisms for distributed local caches Shared intermediate memory for global data caching Queueing mechanisms for workload distribution and message-passing Physically, the Coupling Facility consists of hardware

REQUESTS

DATABASE MANAGER

LOCKS

DATA BUFFERS

REQUESTS

Multi-System - Serialization - Changed Data

DATABASE MANAGER

DATA BUFFERS

LOCKS

MVS Sysplex Services MVS

MVS

S/390

S/390

Coupling Facility Locks Caches Lists

Shared DASD

Shared DASD

Figure 2: Parallel Sysplex Data-Sharing Architecture and specialized Coupling Facility microcode supporting the S/390 Parallel Sysplex architecture extensions. The hardware for the CF is also based on the S/390 processor which provide an additional cost advantage. Coupling Facilities are physically attached to S/390 processors via high-speed coupling links. The coupling links support specialized protocols for highly-optimized transport of commands and responses to/from the CF. The coupling links are fiber-optic channels providing either 50 MegaBytes/second or 100 MB/second data transfer rates. Commands to the CF can be executed synchronously or asynchronously, with cpu-synchronous command completion times measured in micro-seconds, thereby avoiding the asynchronous execution overheads associated with task switching and processor cache disruptions. Multiple CF’s can be be connected for availability, performance, and capacity reasons. Logically, the CF storage resources can be dynamically partitioned and allocated into CF “structures”, subscribing to one of three defined behavior models: lock, cache, and list models. Specific commands are supported by each model and while allocated, CF structure resources can only be manipulated by commands for that structure type as specified at initial structure allocation. Multiple CF structures of the same or different types can exist concurrently in the same Coupling Facility.

3.3.1

Lock structures

The Coupling Facility lock model supports highperformance, finely-grained lock resource management, maximizing concurrency and minimizing communication overhead associated with multi-system serialization protocols. The purpose of this model is to enable a specialized lock manager (e.g., a database lock manager) to be easily extended into a multi-system environment. The CF lock structure provides a hardware-assisted global lock contention detection mechanism for use by distributed lock managers, such as the IMS Resource Lock Manager (IRLM). The

lock structure supports a program-specifiable number of lock table entries used to record shared or exclusive interest in software locks which map via software-hashing to a given CF lock table entry. Interest in each lock table entry is tracked for all peers connected to the CF structure across the systems in the sysplex. Through use of efficient hashing algorithms and granular serialization scope, false lock resource contention is kept to a minimum. This allows the majority of requests for locks to be granted cpusynchronously to the requesting system, where synchronous execution times are measured in micro-seconds. Only in exception cases involving lock contention is lock negotiation required. In such cases, the CF returns the identity of the system or systems currently holding locks in an incompatible state with the current request, to enable selective cross-system communication for lock negotiation. MVS provides cross-system lock management services to coordinate lock contention negotiation, lock request suspension and completion, and recording of persistent lock information in the Coupling Facility to enable fast lock recovery in the event of an MVS system failure while holding lock resources.

3.3.2

Cache structures

The CF cache structure serves as a multi-system shared data cache coherency manager. The purpose of this model is to enable an existing buffer manager (e.g., a database buffer manager) to be easily extended into a multi-system environment. It enables each system to locally cache shared data in processor memory with full data integrity and optimal performance. Additionally, data can be optionally cached globally in the CF cache structure for high-speed local buffer refresh. As a global shared cache, the CF can be viewed as a second-level cache in between local processor memory and DASD in the storage hierarchy. A CF cache structure contains a global buffer directory which tracks multi-system interest in shared data blocks cached in one or more systems’ local buffer pools. A separate directory entry is maintained in the CF structure for each uniquely-named data block. When a database manager, such as IBM’s DB2, first connects to a CF cache structure via MVS system services, MVS allocates a local bit vector in protected processor storage on behalf of the database manager. The local bit vector is used to locally track the coherency of data cached in the local buffer pool. The database manager associates each buffer in the buffer pool with a unique bit position in the local bit vector. When the database manager brings a copy of a shared data block from DASD into a local buffer in processor memory, it first registers its interest in that data with the CF, passing the program-specified data block name and the local bit vector index associated with the local buffer where the data block is being cached. The CF now tracks that system’s interest in the locally cached data. Later, when another instance of the database manager on a different system updates its copy of the shared data block, it issues a command to the CF directing the buffer invalidation of any locally cached copies of the same data

block on other systems. The CF checks its global buffer directory and then sends a cross-invalidate signal via the coupling links in parallel to only those systems having a registered interest in that data block. Specialized coupling link hardware provides processing for multi-system buffer invalidation signals sent by the CF to attached systems. The hardware receives the buffer invalidation signal and updates the CF-specified bit in the data manager’s local bit vector to indicate the local copy is no longer valid. This process does not involve any processor interrupt or software involvement on the target system. Work continues without any disruption. Once the CF has observed completion of all buffer invalidation signals, it then responds to the system which initiated the data update process. Again, this entire process can be performed cpu-instruction-synchronous to the updating system, with completion times measured in micro-seconds. The issuing database manager is then free to release its serialization on the shared data block. When another instance of the database manager attempts to subsequently re-use its local copy of the now-down-level data block, it first checks the coherency of its local buffer copy. This check does not involve a CF access, but rather is achieved through execution of new S/390 cpu instructions which interrogate the state of the specified bit in the local bit vector to determine its buffer coherency. If the buffer is invalid, the database manager can re-register its interest in the data block with the CF, which might also return a current copy of the data if it had been cached there by the system which earlier performed the update. Through exploitation of the cache coherency and global buffer cache management mechanisms described above, it can be seen that the Coupling Facility and related S/390 parallel sysplex processor technology provides the means for high-performance, scalable read/write data sharing across multiple systems, avoiding the message passing overheads typically associated with data-sharing parallel systems.

3.3.3

requiring a static view of a list or the entire structure can set the lock causing mainline operations to be rejected. Such a protocol avoids the necessity for mainline processes to explicitly gain or release the lock for every request, but still allows such requests to be suspended or rejected in the presence of long-running recovery operations. Programs can register interest in specific list headers used as shared work queues or in-bound message queues. When an entry is added to the specified list causing it to go from an empty to non-empty state, the CF can send a list-nonempty transition signal to the registered program’s system, providing an indication observed via local system polling that there is work to be processed on the specified list. As with the cache buffer invalidation signal handling, there is no processor interruption or cache disruption caused as a result of processing the list transition signal.

4

Parallel Sysplex Scalability

Figure 3 depicts effective total system capacity as a function of the number of physically configured cpu’s in a processing system. The IDEAL line shows a 1:1 correspondence between physical capacity and effective capacity. That is, as each cpu is added to the total processing system, the full capacity of each additional processor would be realized in terms of available capacity. Real configurations of course do not exhibit this ideal behavior.

l

ea

Parallel Sysplex

Id

Effective Capacity

List structures

The CF list structure supports general-purpose multisystem queueing constructs which are broadly applicable for a wide range of uses, including workload distribution, inter-system message passing, and maintaining shared control block state information. A list structure includes a program-specified number of list headers. Individual list entries are dynamically created when first written and queued to a designated list header. List entries can optionally have a corresponding data block attached at the time of creation or subsequent list entry update. Existing entries can be read, updated, deleted, or moved between list headers atomically, without the need for explicit software multi-system serialization in order to insert or remove entries from a list. List structures can support queueing of entries in LIFO/FIFO order or in collating sequence by key under program control. Optionally, the list structure can contain a program-specified number of lock entries. A common exploitation of the serialized list structure is to request conditional execution of mainline CF commands as long as a specified lock is not held. Recovery operations

Tightly Coupled Multiprocessors

Physical Capacity

Figure 3: Parallel Sysplex Scalability The Tightly-Coupled Multi-Processing (TCMP) line shows the behavior of a TCMP as additional cpu’s are added to the same single physical system. TCMP systems provide maximum effective throughput at relatively small numbers of engines, but as more cpus are added to the TCMP system, incremental effective capacity begins to diminish rapidly, limiting ultimate scalability. This is attributable to the overheads associated with inter-processor serialization, memory cross-invalidation and communication required in the hardware to support conceptual sequencing

of instructions across cpus, cache coherency, and serialized updates to storage performed atomically to cpu instruction execution. These processes are performed in the hardware without the benefit of knowledge of software serialization that may already be held on storage being manipulated at a much more coarse level. In addition TCMP overheads are incurred in the system software due to software serialization and communication to manage common system resources. The S/390 Parallel Sysplex scalability characteristics are excellent. Physical capacity introduced to the configuration via the addition of more data-sharing systems in the sysplex (where each system can be a TCMP or uni-processor) provides near-linear effective capacity growth as well. Recent performance studies conducted in a parallel sysplex environment consisting of multiple S/390 9672 CMOS systems running a 100% data-sharing CICS/DBCTL workload demonstrated an incremental overhead cost of less than half a percent for each system added to the configuration. In addition, the initial data-sharing cost associated with the transition from a single-system non-data-sharing configuration to a two-system data-sharing configuration was measured at less than 18% [8, 9]. These results testify to the excellence of the S/390 MVS Parallel Sysplex technology in providing near-linear scalability, minimizing overheads previously precluding implementation of a true data-sharing parallel-processing system.

5

Exploitation and Product Details

Through exploitation and support of the Parallel Sysplex data-sharing technology, MVS and its major subsystems have combined to provide an industry-leading fullyintegrated commercial parallel processing system.

5.1

Operating System Support

At the base of the software structure, the MVS/ESA Version 5 operating system provides extensive support for Coupling Facility resource management and mainline access services enabling subsystem CF exploitation, as shown in Figure 4. MVS has built these services as extensions to its prior sysplex support, which provided multi-system configuration management, system status monitoring and recovery mechanisms, and inter-system communication facilities. Several MVS base system components including JES2, RACF, and XCF are exploiting the Coupling Facility to facilitate or enhance their respective functions in a parallel sysplex configuration. In addition, the MVS Workload Manager component provides policy-driven system resource management for customer workloads, and is a key component in sysplex-wide workload balancing mechanisms.

5.2

Database

IBM’s hierarchical and relational database managers, IMS and DB2 respectively, provide multi-system datasharing through exploitation of the CF cache and lock structures. DFSMS support for multi-system data-sharing of VSAM files is currently under development and will similarly exploit the Coupling Facility.

User

Single Image to Network

VTAM

Transaction Managers CICS Appl

Appl

IMS DB

Dynamic Workload Balancing

IMS TM Appl

Appl

DB2 Data Managers

Appl

VSAM

Applications Unchanged

Data Sharing Base Services

MVS/ESA

Hardware Interfaces

Figure 4: Parallel Sysplex Software Structure With these database products enabled for sysplex-wide data-sharing, the IBM CICS and IMS Transaction Management subsystems are providing multi-system dynamic workload balancing for customers’ OLTP workloads. In conjunction with the CICSPLEX/Systems Manager (CICSPLEX/SM) product, CICS has already delivered its dynamic transaction routing capabilities. IMS is currently developing its dynamic workload balancing functions through exploitation of the Coupling Facility for workload distribution.

5.3

Network

VTAM provides single system image to the SNA network for the Parallel Sysplex through its “Generic Resource” support, enabling session binds for user logons to be dynamically distributed for workload balancing across the systems in the sysplex. VTAM provides the Generic Resource facilities through exploitation of the CF list structure. CICS and DB2 currently support VTAM’s generic resource facilities in the parallel sysplex environment. CICS users, for example, can simply logon to “CICS” without having to specify or be cognizant of which system their session will be dynamically bound. These and other MVS Parallel Sysplex components and subsystems combine to bring the Parallel Sysplex business advantages of reduced cost, scalable growth, dynamic workload balancing, and continuous availability to customer’s existing commercial workloads in a transparent manner. End-users and business applications are insulated from the technology infrastructure through subsystem exploitation and parallelization of the underlying application execution environments they provide.

6

Conclusion

The S/390 MVS Parallel Sysplex is a state-of-the-art commercial parallel-processing system. Through the integration of innovative hardware and software technology,

the S/390 MVS platform supports high-performance, direct, concurrent multi-system read/write data-sharing with high-performance enabling the aggregate capacity of multiple MVS systems to work in parallel on shared workloads. Exploitation in the base MVS system and key subsystem middleware provides single-system-image for the multi-system parallel configuration with transparent value for end-users and customers MVS business applications. Future enhancements are focused on leveraging the Parallel Sysplex data-sharing technology to support new application environments, including distributed applications in a heterogeneous networking environment, single system image for native TCP/IP networks, MVS servers to the WorldWideWeb, and applications exploiting object-oriented technology. The S/390 MVS Parallel Sysplex leverages parallel technology to business advantage, offering competitive price/performance and state-of-the-art availability, scalability and investment protection characteristics.

References [1] A. Azagury, D. Dolev, J. Marberg, and J. Satran. Highly available cluster: A case study. In 24th Symp. on Fault-Tolerant Computing, pages 404–413, June 1994. [2] W.E. Baker, R.W. Horst, D.P. Sonnier, and W.J. Watson. A flexible servernet-based fault tolerant architecture. In 25th Symp. on Fault-Tolerant Computing, pages 2–11, June 1995. [3] N.S. Bowen, C.A. Polyzois, and R.D. Regan. Restart services for highly available systems. In The 7th IEEE Symposium on Parallel and Distributed Processing, October 1995.

[4] S.A. Calta, J.A. deVeer, E. Loizides, and R.N. Strangwayes. Enterprise systems connection (escon) architecture–system overview. IBM Journal of Research and Development, 36(4):535–552, 1992. [5] R. Cwiakala, J.D. Haggar, and H.M. Yudenfriend. Mvs dynamic reconfiguration management. IBM Journal of Research and Development, 36(4):633– 646, 1992. [6] R. Duncan. A survey of parallel computer architectures. Computer, 23(2):5–16, 1990. [7] IBM Corporation. MVS/ESA Programming: Sysplex Services Guide, 1994. [8] IBM Corporation. S/390 MVS Parallel Sysplex Performance, March 1995. [9] C.L. Rao and C. Taaffe-Hedglin. Parallel sysplex performance. In Proceedings of CMG, pages 3–7, December 1995. [10] Lisa Spainhower, Jack Isenberg, Ram Chillarege, and Joseph Berding. Design for fault-tolerance in system es/9000 model 900. In 22nd Symp. on Fault-Tolerant Computing, pages 38–47, July 1992. [11] C.B. Stunkel et al. The SP2 high-performance switch. IBM Systems Journal, 34(2):185–204, 1995. [12] M.D. Swanson and C.P. Vignola. MVS/ESA coupled systems considerations. IBM Journal of Research and Development, 36(4):667–682, 1992. [13] P.S. Yu and A. Dan. Performance analysis of affinity clustering on transaction processing coupling architecture. IEEE Transactions on Knowledge and Data Engineering, 6(5):764–786, October 1994. [14] P.S. Yu and A. Dan. Performance evaluation of transaction processing coupling architectures for handling system dynamics. IEEE Transactions on Parallel and Distributed Systems, 5(2):139–153, February 1994.

Lihat lebih banyak...

Overview of IBM system/390 parallel sysplex-a commercial parallel processing system

Descrição do Produto

Comentários