Seamless Access to Decentralized Storage Services in Computational Grids via a Virtual File System

Share Embed


Descrição do Produto

Cluster Computing 7, 113–122, 2004  2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Seamless Access to Decentralized Storage Services in Computational Grids via a Virtual File System RENATO J. FIGUEIREDO ∗ Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA

NIRAV KAPADIA Capital One Services, Inc., Glen Allen, VA 23060, USA

JOSÉ A.B. FORTES Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA

Abstract. This paper describes a novel technique for establishing a virtual file system that allows data to be transferred user-transparently and on-demand across computing and storage servers of a computational grid. Its implementation is based on extensions to the Network File System (NFS) that are encapsulated in software proxies. A key differentiator between this approach and previous work is the way in which file servers are partitioned: while conventional file systems share a single (logical) server across multiple users, the virtual file system employs multiple proxy servers that are created, customized and terminated dynamically, for the duration of a computing session, on a peruser basis. Furthermore, the solution does not require modifications to standard NFS clients and servers. The described approach has been deployed in the context of the PUNCH network-computing infrastructure, and is unique in its ability to integrate unmodified, interactive applications (even commercial ones) and existing computing infrastructure into a network computing environment. Experimental results show that: (1) the virtual file system performs well in comparison to native NFS in a local-area setup, with mean overheads of 1 and 18%, for the single-client execution of the Andrew benchmark in two representative computing environments, (2) the average overhead for eight clients can be reduced to within 1% of native NFS with the use of concurrent proxies, (3) the wide-area performance is within 1% of the local-area performance for a typical compute-intensive PUNCH application (SimpleScalar), while for the I/O-intensive application Andrew the wide-area performance is 5.5 times worse than the local-area performance. Keywords: file system, computational grid, network-computing, logical account, proxy

1. Introduction Network-centric computing promises to revolutionize the way in which computing services are delivered to the end-user. Analogous to the power grids that distribute electricity today, computational grids will distribute and deliver computing services to users anytime, anywhere. Corporations and universities will be able to out-source their computing needs, and individual users will be able to access and use software via Web-based computing portals. A computational grid brings together computing nodes, applications, and data distributed across the network to deliver a network-computing session to an end-user. This paper elaborates on mechanisms by which users, data, and applications can be decoupled from individual computers and administrative domains. The mechanisms, which consist of logical user accounts and a virtual file system, introduce a layer of abstraction between the physical computing infrastructure and the virtual computational grid perceived by users. This abstraction converts compute servers into interchangeable parts, allowing a computational grid to assemble computing systems at run time without being limited by the traditional con∗ Corresponding author.

E-mail: [email protected]

straints associated with user accounts, file systems, and administrative domains. Specifically, this paper describes the structure of logical user accounts, and presents a novel implementation of a virtual file system that operates with such logical accounts. The virtual file system described in this paper allows data to be transferred on-demand between storage and compute servers for the duration of a computing session, while preserving a logical user account abstraction. It builds on an existing, defacto standard available for heterogeneous platforms – the Network File System, NFS. The virtual file system is realized via extensions to existing NFS implementations that allow reuse of unmodified clients and servers of conventional operating systems: the proposed modifications are encapsulated in software proxies that are configured and controlled by the computational grid middleware. The described approach is unique in its ability to integrate unmodified applications (even commercial ones) and existing computing infrastructure into a heterogeneous, wide-area network computing environment. This work was conducted in the context of PUNCH [8,10], a platform for Internet computing that turns the World Wide Web into a distributed computing portal. It is designed to operate in a distributed, limited-trust environment that spans multiple administrative

114

domains. Users can access and run applications via standard Web browsers. Applications can be installed “as is” in as little as thirty minutes. Machines, data, applications, and other computing services can be located at different sites and managed by different entities. PUNCH has been operational for five years – today, it is routinely used by about 2000 users from two dozen countries. It provides access to more than 70 engineering applications from six vendors, sixteen universities, and four research centers. The virtual file system of this paper integrates components used in both grid computing and traditional file system domains in a novel manner. It differs from related work in file-staging techniques for grid computing, e.g., Globus [4] and PBS [2,6], in that it supports user-transparent, on-demand transfer of data. It differs from related on-demand grid dataaccess solutions in that it does not require modifications to applications (e.g., as in Condor [11]) and it does not rely on non-native file system servers (e.g., as in Legion [5,19]). A key differentiator between this paper and previous work on traditional distributed file systems is the way in which file servers are partitioned: while conventional file systems (e.g., NFS-V2/V3 [14], AFS [13,17]) share a single (logical) server across multiple users, the virtual file system employs multiple independent proxy servers that are created, customized and terminated dynamically, on a per-user basis. The advantages of a per-user approach include fine-grain authentication, mapping of user identities without the necessity of global naming, and the possibility of applying user-customized performance optimizations such as caching and prefetching. Previous efforts on the Ufo [1] and Jade [15] systems have considered the advantages of employing per-user agents in the context of a wide-area file system. However, unlike this paper, Jade requires that applications be re-linked to dynamic libraries, while Ufo requires low-level process tracing techniques that are highly O/S-dependent. In addition, both systems require the implementation of full NFS client functionality on the agent. The virtual file system of this paper is thus unique in providing on-demand data access for unmodified applications that work through native clients and servers of existing operating systems. In summary, this paper makes the following contributions. First, it describes an implementation of a virtual file system based on call-forwarding NFS proxies. This implementation has been in place since the Fall of 2000 and has been extensively exercised during normal use of PUNCH. Second, this paper quantitatively evaluates its performance. The experimental analyses considers two scenarios: virtual file system sessions within and across administrative domains. Experimental results for the same-domain setup show that the performance overhead introduced by the virtual file system is small relative to native NFS: average overheads of 1 and 18% are observed for single-client executions of the Andrew benchmark in two different PUNCH computing environments. Cross-domain (wide-area) results show that the performance of the virtual file system is within 1% of the localarea setup for a typical PUNCH compute-intensive applica-

FIGUEIREDO, KAPADIA AND FORTES

tion (SimpleScalar [3]). However, for an I/O-intensive application (Andrew) the wide-area execution time is 5.5 times larger than the local-area time. These experimental results are important for two reasons: first, they show that the virtual file system is a effective technique that is currently applicable to compute-intensive applications, even across wide-area networks. Second, they motivate future work on performance enhancements for wide-area deployments that exploit two unique characteristics of the virtual file system, namely: (a) middleware-driven migration of logical user accounts to improve data locality at a coarse granularity, and (b) fine-grain locality techniques (e.g., caching, prefetching) customized on a per-user basis. The first direction is enabled by the underlying abstraction of logical user accounts, while the second direction is enabled by the implementation of a per-user proxy solution that is controlled by grid middleware. The paper is organized as follows. Sections 2 and 3 describe the core concepts behind logical user accounts and virtual file systems, respectively. Section 4 describes the PUNCH implementation of a virtual file system – PVFS. Section 5 presents considerations on the security and scalability of PVFS, and section 6 quantitatively analyzes its performance. Section 7 explains how the new paradigm allows computational grids to dynamically and transparently manage network storage. Section 8 outlines related work and section 9 presents concluding remarks.

2. Decoupling users, data, applications, and hardware Today’s computing systems tightly couple users, data, and applications to the underlying hardware and administrative domain. For example, users are tied to individual machines by way of user accounts, while data and applications are typically tied to a given administrative domain by way of a local file system. This causes several problems in the context of large computational grids, as outlined below: • Users need “real” accounts on every single machine to which they have access. This causes logistical problems in terms of controlling user access and managing user accounts and also increases the complexity of resource management solutions that attempt to automatically allocate resources for users. • Organizations may add new users, remove existing ones, and change users’ access capabilities at any time. In a computational grid, this information must be propagated to a large number of resources distributed across multiple organizations. Doing this in a timely manner is a difficult proposition at best. • Policies for sharing resources may change over time – or they may be tied to dynamic criteria such as system load. Giving users direct access to resources via “permanent” user accounts makes it difficult to implement and enforce such policies.

SEAMLESS ACCESS TO DECENTRALIZED STORAGE SERVICES

• Data and applications are typically accessible to users via a local file system, which is often implicitly tied to a single administrative domain. NFS [14], for example, assumes that a given user has the same identity (e.g., the Unix uid) on all machines, making it difficult to scale it across administrative boundaries. Wide-area file systems do exist (e.g., AFS [13,17]), but are not commonly available in standard machine configurations, and hence would be difficult to build upon in grids. In order to deliver computing as a service in a scalable manner, it is necessary to effect a fundamental change in the manner in which users, data, and applications are associated with computing systems and administrative domains. This change can be brought about by introducing a layer of abstraction between the physical computing infrastructure and the virtual computational grid perceived by users. The abstraction layer can be formed by way of two key components: (1) logical user accounts, and (2) a virtual file system. A network operating system, in conjunction with an appropriate resource management system, can then use these components to build systems of systems at run-time [9]. This abstraction converts compute servers into interchangeable parts, thus allowing a computational grid to broker resources among entities such as end users, application service providers, storage warehouses, and CPU farms. The described approach has been deployed successfully in PUNCH, which employs logical user accounts, a virtual file system service that can access remote data on-demand, a network operating system, and resource management service that can manage computing resources spread across administrative domains. The components of a logical account are traditional system accounts that are divided into two categories according to their functionality: shadow accounts, which can be dynamically allocated during a computing session, and file accounts, which store user files and directories. 2.1. Shadow accounts Traditionally, a user account is expressed as a numeric identifier (e.g., the Unix uid) that “belongs” to a given person (i.e., user). Each numeric identifier is permanently assigned to a given person – regardless of whether or not the person is actively making use of any computing resources. A user account could be conceptualized as a more dynamic entity by treating the numeric identifiers associated with user accounts on local operating systems as interchangeable entities that can be recycled among users on demand. With this approach, a user is allocated a numeric identifier when he/she attempts to initiate a run (or session); the identifier will be reclaimed by the system after the session is complete. The user accounts represented by such dynamically recycled numeric identifiers are called shadow accounts. A logical user account, then, is simply a capability that allows a user to “check out” a shadow account on appropriate computing resources via the corresponding resource management systems. Such dynamic capability can be achieved in

115

a way that preserves the functionality of traditional user accounts and achieves high scalability [9]. 2.2. File accounts In today’s computing environments, a user’s files typically reside in accounts that are directly accessible to the user via a login name and password, or indirectly accessible via a shared file system. In a large, dynamic environment where there is a need for transparent replication and migration of data (for reliability or performance reasons), this one-to-one association of a user’s files with a specific “account” introduces several constraints that limit the computing system’s ability to manage data. The virtual file system approach provides an effective mechanism to decouple this association. With this approach, files are stored in one or more file accounts. A given file account typically stores files for more than one user, and the computing system may move files across file accounts as necessary. Access to the files is brokered by the virtual file system; users never directly login to a file account. In the currently deployed PUNCH system, for example, all user files are multiplexed into a single file account. This file account contains one top-level sub-directory for each PUNCH user; the files are associated with users on the basis of their positions in the directory tree.1

3. Virtual file system A virtual file system establishes a dynamic mapping between a user’s data residing in a file account and the shadow account that has been allocated for that user. It also guarantees that any given user will only be able to access files that he/she is authorized to access. There are different ways in which a virtual file system can be implemented. In the context of PUNCH, several alternatives have been investigated; this section describes previous approaches, highlighting their limitations, and presents a novel virtual file system solution that is currently in use by PUNCH. 3.1. Explicit file transfers A simple virtual file system could copy all of a user’s files to a shadow account just before initiating a run (or session) and then copy the files back once the run (or session) is complete. This approach has two disadvantages: it is likely to result in large amounts of unnecessary data transfer, and it would require a complex coherency protocol to be developed in order to support multiple, simultaneous runs (or sessions). A variation on this theme is to allow (i.e., require) users to explicitly specify the files that are to be transferred. This approach is commonly referred to as file staging. File staging works around some of the issues of redundant data transfer and coherency problems (both of which must then be manually resolved by the user), but is not suitable for (1) situations in which the user does not know which files will be

116

required a priori (e.g., this is true for many CAD and other session-based applications), or (2) applications that tend to read/write relatively small portions of very large files (e.g., most database-type applications).

3.2. Implicit file transfers Another possibility is to transfer data on demand. An approach previously deployed on PUNCH relies on system-call tracing mechanisms such as those found in the context of Ufo [1]. Entire files still need to be transferred, but the process is automated. (The transfer is a side effect of an application attempting to open a file.) The disadvantages of this approach are that it is highly O/S-dependent, and it demands extensive programming effort in the development of system-call tracers and customized file system clients.

3.3. Implicit block transfers The third option is to reuse existing file system capabilities by building on a standard and widely-used file system protocol such as NFS. There are three ways to accomplish this goal. One is to enhance the NFS client and/or server code to work in a computational grid environment. This would require kernellevel changes to each version of every operating system on any platform within the grid. The second approach is to use standard NFS clients in conjunction with custom, user-level NFS servers. This approach is viable, but involves significant software development. The third possibility is to use NFS call forwarding by way of middle-tier proxies. This approach is attractive for two reasons: it works with standard NFS clients and servers; and proxies are relatively simple to implement – they only need to receive, modify, and forward standard remote procedure calls (RPC).

FIGUEIREDO, KAPADIA AND FORTES

4. The PUNCH Virtual File System The PUNCH Virtual File System – PVFS – is based on a callforwarding solution that consists of three components: serverside proxies and file service managers, and client-side mount managers. The proxies control access to data in the various file accounts. The file service managers and mount managers together control the setup and shut down of virtual file system sessions. Although it is based on a standard protocol, the virtual file system approach differs fundamentally from traditional file systems. For example, with NFS, a file system is established once on behalf of multiple users by system administrators (figure 1(A)). In contrast, the virtual file system creates and terminates dynamic client–server sessions that are managed on a per-user basis by the grid middleware; each session is only accessible by a given user from a specified client, and that too only for the duration of the computing session (figure 1(B)). The following discussion outlines the sequence of steps involved in the setup of a PVFS session. When a user attempts to initiate a run (or session), a compute server and a shadow account (on the compute server) are allocated for the user by PUNCH’s active yellow pages service [16]. Next, the file service manager starts a proxy daemon in the file account of the server in which the user’s files are stored. This daemon is configured to only accept requests from one user (Unix uid of shadow account) on a given machine (IP address of compute server). Once the daemon is configured, the mount manager employs the standard Unix “mount” command to mount the file system (via the proxy) on the compute server. After the PVFS session is established, all NFS requests originating from the compute server by a given user (i.e., shadow account) are processed by the proxy. For valid NFS requests, the proxy modifies the user and group identifiers of the shadow account to the identifiers of the file account in the arguments of NFS remote-procedure calls; it then forwards

Figure 1. Overview of conventional (A) and virtual (B) file systems. There are two clients (C1, C2) and one server (S). In (A), the NFS clients C1, C2 share a single logical server via a static mount point for all users under /home. In (B), the file account resides in /home/fileA, and two grid users (X, Y) access the file system through shadow accounts 1 and 2, respectively. The virtual file system clients connect to two independent (logical) servers and have dynamic mount points for users inside /home/fileA that are valid only for the duration of a computing session.

SEAMLESS ACCESS TO DECENTRALIZED STORAGE SERVICES

Figure 2. Example of shared NFS/PVFS setup currently deployed in PUNCH. In the file server “S”, two user-level proxy daemons (listening to ports 10001, 10002) authenticate requests from clients C1, C2. Both daemons map shadow1/shadow2 to the uid of the PUNCH file account “fileA”, and forward RPC requests to the kernel-level server via a privileged proxy (which listens to port 20000).

the requests to the native NFS server. If a request does not match the appropriate user-id and IP credentials of the shadow account, the request is denied and is not forwarded to the native server. Figure 1(B) shows the client-side mount commands issued by the compute servers “C1” and “C2”, under the assumption that the PUNCH file account has user accounts laid out as sub-directories of /home/fileA in file server “S”. The path exported by the mount proxy ensures that userX cannot access the parent directory of /home/fileA/userX (i.e., this user cannot access files from other users). 4.1. Local-area network setup A virtual file system server can be configured either as a dedicated PVFS server, or as a shared (non-dedicated) PVFS/NFS server. The current deployment of PVFS is configured to coexist with a conventional NFS server to a local-area network. In this scenario, the configuration of the native mount daemon is leveraged, and the user-level mount proxy is not used. Furthermore, a second proxy, owned by the file account but with access to a pre-opened privileged socket, is used to forward requests from the user-level proxy to the native NFS daemon via a secure port.2 Figure 2 depicts an example of the serverside configuration of PVFS in a local-area setup. Mount proxies are not currently deployed in PUNCH because the setup is contained within a single administrative domain. Tighter access control can be introduced on the server side by using a proxy for the mount protocol. This proxy negotiates mount requests in the same manner that the NFS proxy negotiates file system transactions. It is possible to support cross-domain PVFS mounts, since authentication is dynamically configured by the grid middleware via unique combinations of IP addresses and shadow-account uids.

5. Considerations in scalability and security The scalability of PVFS can be evaluated in terms of its ability to support a growing number of users (or sessions), clients (i.e., compute servers), and file servers. Since PVFS only establishes point-to-point connections over TCP/IP, and because there is no communication between sessions at the

117

PVFS level, the system scales simply by replicating the different components of PVFS (file servers and file accounts) appropriately. The security implications of PVFS are tied to whether or not it spans multiple administrative domains. When the system is deployed within a single administrative domain, it co-exists with the native NFS services. In a non-dedicated PVFS setup, and if login sessions to the file server are allowed, it is conceivable for a malicious user with access to a NFS file handle to gain access to a file account via the privileged proxy. Such situation can be avoided by introducing an intra-proxy (i.e., between user-level and privileged proxies) authentication mechanism. It is possible to implement intra-proxy authentication mechanisms that are transparent to the NFS protocol and are managed by the grid middleware. The following is one example: at the beginning of a computing session, when a user-level proxy is spawned, the grid middleware records the port assigned to the proxy and adds the port number to a list maintained by the privileged proxy.3 Then, at the end of the computing session, before killing the user-level proxy, the grid middleware removes the port number from the list. This technique ensures that the privileged proxy only responds to requests from authorized user-level proxies. When PVFS is deployed across administrative domains, it becomes necessary to preserve the limited-trust relationship between the nodes of the computational grid. For example, when a user belonging to administrative domain ‘A1’ is allocated a compute server ‘C2’ in a different domain ‘A2’, PVFS will map the user’s data from the file server(s) in ‘A1’ to ‘C2’. At this point, ‘C2’ has access to the specific user’s data for the duration of the computing session. To the extent that ‘C2’ has access to user data, and the user has access to ‘C2’, there is a trust relationship between ‘A1’ and ‘A2’. However, this trust between ‘A1’ and ‘A2’ is limited in the sense that ‘A2’ cannot gain access to files outside the user’s account or to computing resources in ‘A1’ via PVFS – even if root on ‘C2/A2’ is compromised. This is a consequence of the fact that ‘A1’ controls the mount point and all NFS transactions via the serverside mount and NFS proxies. When PVFS sessions need to be established across institutional firewalls, the grid middleware can start user-level proxies on non-privileged ports that are not blocked by the firewall; if necessary, the grid middleware can negotiate ports with the firewall. One issue that arises in this context is that some NFS mount clients do not allow the specification of mount ports. Consequently, it becomes necessary for the mount client to talk to a port-mapper daemon running on a privileged port on a file server. Typically, this port will be blocked by firewalls. In such a situation, it is still possible to deploy the PUNCH virtual file system by relying on a third node – a “file system gateway” – inside the firewall that (1) accepts mount requests from the client via the port-mapper, and (2) forwards NFS requests to the file server across the firewall via user-level ports managed by the grid middleware. Figure 3 shows an example of this configuration.

118

FIGUEIREDO, KAPADIA AND FORTES

ment (section 6.3). The main objective of the experiment is to characterize the performance of PVFS in an existing grid environment, rather than to investigate a wide range of possible design points. 6.1. Multiple-client analysis

Figure 3. Configuration with PVFS file system gateway. Solid lines represent kernel-to-PVFS connections that may involve access to the privileged port of the gateway’s port-mapper; dashed lines represent PVFS-to-PVFS connections through user-level ports. Table 1 Configuration of servers (S1, S2, S3, S4) and clients (C1, C2, C3-L, C3-W) used in the performance evaluation. With the exception of C3-W, which is connected to a wide-area network through a cable-modem link, nodes are connected by a local-area switched Ethernet network. Machine #CPUs

CPU type

Memory Network

O/S

S1 S2 S3

2 1 1

400 MHz UltraSparc 1 GB 100 Mb/s Solaris 2.7 167 MHz UltraSparc 256 MB 100 Mb/s Solaris 2.6 70 MHz Sparc 96 MB 10 Mb/s Solaris 2.6

S4

2

933 MHz P-III

C1 C2

4 4

400 MHz UltraSparc 480 MHz UltraSparc

C3-L

1

900 MHz P-III

256 MB 100 Mb/s Linux 2.4

C3-W

1

900 MHz P-III

256 MB

512 MB 100 Mb/s Linux 2.4 2 GB 4 GB

100 Mb/s Solaris 2.7 100 Mb/s Solaris 2.7

1 Mb/s Linux 2.4

6. Performance The relative performance of PVFS can be measured with respect to native NFS in terms of its impact on the number of transactions per second. PVFS introduces a fixed amount of overhead for each file system transaction. This overhead is primarily a function of RPC handling and context switching; the actual operations performed by the proxy are very simple and independent of the type of NFS transaction that is forwarded. The following performance analyses are based on the execution of the Andrew file system benchmark (AB [7]) and the SimpleScalar [3] simulator on directories mounted through PVFS. Andrew consists of a sequence of Unix commands of different types (directory creation, file copying, file searching, and compilation) that models a workload typical of a software development environment. SimpleScalar consists of a cycle-accurate simulator of super-scalar microprocessors; it represents a typical compute-intensive engineering application that interfaces with the file system through standard I/O operations. The experimental setup consists of the set of machines described in table 1. These machines have been chosen to allow the investigation of three scenarios: one that is typical of a local-area configuration of PUNCH (section 6.1), one that employs previous-generation servers for a sensitivity analysis (section 6.2), and one that captures a wide-area PVFS deploy-

This first analysis considers the execution of simultaneous instances of AB on each client machine ‘C1’ and ‘C2’ described in table 1. For each combination of client, server, and file system, 200 samples of AB executions were collected at 30-minute intervals over a period of four days. The server ‘S1’ is a dual-processor machine that is currently the main PUNCH file server at Purdue University; the clients ‘C1’, ‘C2’ are quad-processor machines representative of PUNCH compute servers. The experiments are performed in a “live” environment where the file server is accessed from both regular PUNCH users and AB. In the largest experiment, 8 instances of AB are executed concurrently – i.e., one instance in every CPU of the two 4-way multiprocessor clients. The AB benchmark performs both I/O and computation; thus its execution time is dependent on the client’s performance. Since this experiment considers clients with different configurations and speeds, performance results are reported separately for each machine. The results from the AB experiments are summarized in table 2; NFS refers to the native network file system setup, and PVFS refers to the virtual file system setup with multiple server-side user-level proxies and a single privileged proxy. Both NFS and PVFS use version 3 of the protocol and 32 KByte read/write buffers. The results for a single client (AB = 1) show that the overhead introduced by the PVFS proxy is small: the overheads in mean execution times are 1% (C1) and 18% (C2). The multiple-client results (AB = 2, 4, 8) indicate a degradation in the performance of PVFS: the average relative overhead increases with the number of clients (up to 47%). Furthermore, the standard deviation becomes larger relative to the average (up to 21%), and the ratio between PVFS/NFS maximum execution times increases to up to 2.2. This performance degradation is due to the fact that the current implementation of the PVFS proxy is not multithreaded, causing RPC requests to be serialized by the (single) privileged proxy. To investigate the impact of the serialization overhead, the 8-client experiment was repeated for a configuration with 8 user-level and 8 privileged proxies. The results from this experiment are shown in table 2 under the label PVFS-8. In summary, the results show that the performance degradation with multiple clients previously observed is indeed due to serialization in the single-threaded privileged proxy. When multiple proxies are employed, the multipleclient performance of PVFS becomes, on average, within 1% of the performance of native NFS. The virtual file system is thus able to support multiple clients with performance comparable to the underlying file system. The setup with a single privileged proxy shown in figure 2 can be extended to one with a fixed number of privi-

SEAMLESS ACCESS TO DECENTRALIZED STORAGE SERVICES

119

Table 2 Mean, standard deviation, minimum and maximum execution times (in seconds) of 200 samples of Andrew benchmark runs for native and virtual file systems (NFS, PVFS) measured at clients C1 and C2. The file server is S1 (as defined in table 1). The table shows data for 1, 2, 4, and 8 concurrent executions of AB (#AB). In all cases, the average load of the 4-processor clients, measured prior to the execution of AB, is 0.04. #AB

FS

Client C1

Client C2

Mean

Stdev

Min

Max

Mean

Stdev

Min

Max

1 1

NFS PVFS

18.2 s 18.4 s

1.6 s 1.9 s

16 s 17 s

28 s 27 s

14.1 s 16.7 s

0.5 s 1.4 s

13 s 15 s

17 s 29 s

2 2

NFS PVFS

24.0 s 22.8 s

2.1 s 3.8 s

20 s 20 s

30 s 42 s

18.7 s 21.4 s

1.4 s 4.2 s

14 s 17 s

24 s 43 s

4 4

NFS PVFS

29.8 s 35.5 s

2.4 s 10.3 s

24 s 24 s

36 s 85 s

25.6 s 35.0 s

2.3 s 8.3 s

21 s 22 s

34 s 72 s

8 8

NFS PVFS

43.4 s 55.4 s

4.7 s 12.5 s

31 s 29 s

72 s 94 s

38.4 s 56.5 s

4.4 s 12.1 s

28 s 29 s

53 s 116 s

8

PVFS-8

42.5 s

4.3 s

32 s

57 s

38.8 s

4.8 s

30 s

65 s

leged proxies with little additional complexity to the PUNCH grid middleware; a simple solution is to map dynamic requests onto the privileged proxies in a round-robin fashion. The number of clients supported by PVFS mechanism can also scale by means of employing multiple servers and file accounts, as described in section 5. 6.2. Sensitivity to server performance This analysis considers three different server configurations: a fast, dual-processor machine (S1), a medium-speed workstation (S2) and a slow workstation (S3). These configurations have been chosen to study the impact of server performance on the overhead introduced by the virtual file-system proxy. For each combination of client, server and file-system, 200 samples of AB executions have been collected, in 30-minute intervals. Each sample corresponds to the execution of a single instance of AB (i.e., a single NFS client). Unlike the experiment of section 6.1, where clients were exclusively executing (one or more copies of) AB, this experiment considers a scenario where the client is shared between AB and other independent PUNCH jobs that compete for the client’s CPUs. This experiment models a typical situation where a PUNCH session competes for resources with other users in a computing node; i.e. the load at the client machine is high. The recorded average loads show that the 4-processor client machine was neither idle nor overloaded in this experiment (table 3). The results from the sensitivity analysis are summarized in table 3. For the fast server (S1), the difference between the performance of NFS and PVFS is small – 3.4% on average.4 The performance overhead of PVFS becomes relatively larger in the slower servers: 9.1% (S2) and 34.6% (S3). This is explained by the fact that the overhead introduced by PVFS is only on the CPU bound RPC-processing phase of an NFS transaction. With a fast server processor, this overhead is small with respect to the network and disk I/O components of NFS.

Table 3 Mean, standard deviation, minimum and maximum execution times (in seconds) of 200 samples of Andrew benchmark runs for file-systems mounted from servers S1, S2, and S3. “Load” is the average load of the 4-processor client C1, measured prior to the execution of the benchmark. Server

Mean

Stdev

Min

Max

Load

S1, NFS S1, PVFS

34.9 s 36.1 s

4.2 s 4.8 s

26 s 27 s

48 s 57 s

3.9 3.9

S2, NFS S2, PVFS

36.1 s 39.4 s

8.3 s 8.5 s

23 s 26 s

52 s 56 s

2.9 2.9

S3, NFS S3, PVFS

41.3 s 55.6 s

7.3 s 7.5 s

29 s 43 s

58 s 78 s

2.8 2.8

6.3. Wide-area performance This analysis considers virtual file systems established across wide-area networks. The experimental setup consists of a PVFS client (C3) that connects to a server (S4) via two different networks: a switched-Ethernet LAN (C3-L) and a residential cable-modem WAN (C3-W, table 1).5 Virtual file systems are established on top of native Linux NFS-V2 clients and servers with 8 KB read/write transfer sizes. The experiments consider results from executions of Andrew (100 samples) and SimpleScalar (10 samples), collected at intervals of 30 or more minutes to avoid possible cache-induced interference. Table 4 summarizes the performance data from this experiment. The results show that the average wide-area performance of a compute-intensive application under PVFS is very close (within 1%) to the local-area performance. However, for the I/O-intensive application Andrew, the local-area setup delivers a 5.5 speedup relative to wide-area. The difference in performance can be attributed to a combination of the larger latency and smaller bandwidth of the wide-area setup. About 70% of the RPC requests are latencysensitive because the amount of data transferred is small (e.g., NFS lookup, getattr) while the remaining 30% are bandwidthsensitive (e.g., NFS read, write). It is conceivable to exploit

120

FIGUEIREDO, KAPADIA AND FORTES

Table 4 Mean, standard deviation, minimum and maximum execution times of 100 samples of Andrew benchmark and 10 samples of SimpleScalar runs on local- and wide-area virtual file systems (clients C3-L and C3-W). The file server is S4. SimpleScalar simulates the “test” dataset of the Spec95 benchmark Tomcatv.

Andrew SimpleScalar Andrew SimpleScalar

Mean

Stdev

Min

Max

25.5 s 2155 s

Client C3-L 3.6 s 22 s 17 s 2136 s

44 s 2183 s

141.4 s 2142 s

Client C3-W 7.6 s 132 s 20 s 2136 s

167 s 2148 s

Figure 4. Histogram of file system transactions-per-second collected across 500 PUNCH user sessions. The total user time across these sessions is 172 hours.

unique characteristics of PVFS to reduce the performance degradation due to smaller bandwidth and longer latency. Directions for future research include the use of middlewaredriven techniques for latency-hiding (e.g., prefetching and relaxed consistency models) as well as for improving bandwidth (e.g., aggressive proxy-based caching).

tions/second measured in the clients C1 and C2 are in the range 100–130, while figure 4 shows that the majority of PUNCH sessions are not as I/O-intensive, with file system activity of less than 20 transactions/second – in particular, 24% of the logged sessions have equal or less file system activity than the compute-intensive SimpleScalar (0.7 transactions/second). 7. Outlook: self-organizing network storage Computational grids present a unique opportunity to effect a fundamental change in the manner in which users interact with computing systems. Access to computing resources can be provided via Web portals that interact with an underlying, global network computing infrastructure. Users could access and use applications via their Web browsers – without regard to location- or platform-related constraints. A key issue that arises in this context is the ability of a computational grid to manage data. Once it is possible to access and use compute servers and applications independently of their geographical location, performance and scalability constraints will quickly drive the need for dynamic co-location of data and compute servers. However, current computing systems implicitly tie data to individual user accounts and file servers, making it very difficult for computational grids to take advantage of available network storage to dynamically optimize performance. Virtual file systems and logical user accounts together provide the decoupling necessary to allow computational grids to dynamically broker network storage. The technique presented in this paper enables a middleware-driven model where data placement is driven solely by resource and performance requirements; the distribution of storage across the network is dynamically and transparently re-organized to adapt to changes in the network-computing environment, and is not constrained by naming restrictions. 8. Related work

6.4. Characteristics of PUNCH user loads The choice of the Andrew benchmark for the performance analysis has been motivated by the fact that it is widely use to summarize the user-perceived performance of file systems. While this benchmark characterizes a workload typical of a software-engineering environment, other application environments may have characteristics that are different from this model (e.g., computer architecture simulators such as SimpleScalar). In order to investigate the characteristics of workloads typical of PUNCH users, a logging mechanism has been incorporated into the PVFS proxy. This scheme logs the total number of NFS calls issued by a user during an interactive computing session, and the duration of the session. Figure 4 shows the distribution of transactions per second collected across 500 PUNCH user sessions. In comparison with the Andrew benchmark, PUNCH sessions tend to have less file system activity: the number of AB transac-

Employing logical user accounts streamlines and distributes many of the typically centralized tasks associated with creating and maintaining user accounts in a distributed computing environment. It also facilitates access control at a finer granularity than is possible with traditional user accounts. To our knowledge, PUNCH is the first and, to date, the only system to exploit this mechanism. Some Web-based and masterworker applications (e.g., Java applets and SETI@home, respectively) could be construed to run on remote machines without requiring individual user accounts. However, these solutions take advantage of the fact that such applications can run in a very restrictive sand-box, and thus cannot be easily adapted for general-purpose applications. Condor [11] is another system that allows execution on remote machines without requiring users to obtain accounts. It accomplishes this by running all jobs within a single user account. In order to do this, Condor requires that applications be relinked with

SEAMLESS ACCESS TO DECENTRALIZED STORAGE SERVICES

Condor-specific I/O libraries, making this approach unsuitable for situations where object code is not available (e.g., as with commercial applications). Current grid computing solutions typically employ file staging techniques to transfer files between user accounts in the absence of a common file system. Examples of these include Globus [4] and PBS [2,6]. As indicated earlier, file staging approaches require the user to explicitly specify the files that need to be transferred, and are often not suitable for session-based or database-type applications. Some systems (e.g., Condor [11]) utilize remote I/O mechanisms (from special libraries) to allow applications to access remote files. The Kangaroo technique [18] also employs RPC-forwarding agents. However, unlike PVFS, Kangaroo does not provide full support for the file system semantics commonly offered by existing NFS/UNIX deployments (e.g., delete and link operations), and therefore it is not suitable for general-purpose programs that rely on NFS-like file system semantics (e.g., database and CAD applications). Legion [5,19] employs a modified NFS daemon to provide a virtual file system.6 From an implementation standpoint, this approach is less appealing than call forwarding: the NFS server must be modified and extensively tested for compliance and for reliability. Moreover, current user-level NFS servers (including the one employed by Legion) tend to be based on the older version 2 of the NFS protocol, whereas the call forwarding mechanism described in this paper works with version 3 (the current version).7 Finally, user-level NFS servers generally do not perform as well as the kernel servers that are deployed with the native operating system. The Self-certifying File System (SFS) [12] is another example of a virtual file system that builds on NFS. The primary difference between SFS and the virtual file system described here is that SFS uses a single (logical) proxy to handle all file system users. In contrast, PVFS partitions users across multiple, independent proxies. In addition, SFS introduces extra parameters in the NFS remote procedure call semantics. Previous efforts on the Ufo [1] and Jade [15] systems have employed per-user file system agents. However, Jade is not an application-transparent solution – it requires that applications be re-linked to dynamic libraries, while Ufo requires low-level process tracing techniques that are complex to implement and highly O/S-dependent. In addition, both systems require the implementation of full NFS client functionality on the file system agent. In contrast, PVFS works with unmodified binaries and does not require the implementation of NFSclient functionality – only RPC-level argument modification and forwarding. 9. Conclusions This paper proposes a technique for establishing dynamic connections between computing and data servers of computational grids in a manner that is decoupled from the underlying system accounts. This virtual file system is built on top of existing NFS clients and servers, and achieves performance levels comparable to native NFS setups: for the PUNCH server

121

and client machines considered in this paper, the virtual file system introduces an average overhead of 18% or less over the native NFS for a single instance of AB. A grid environment leverages the computing power of existing networked machines. Grid-oriented solutions therefore must be able to work with standard software solutions that are deployed across its heterogeneous computing nodes. The virtual file system described in this paper is well suited for a grid environment because (1) NFS is a “de-facto” standard for local-area file systems, and (2) PVFS can be deployed on top of existing configurations, with small administrative overheads. Solutions based on existing wide area file systems (e.g., AFS [13,17]) could be conceived, but would be difficult to build upon in grids – wide area file systems are not commonly supported in today’s standard machine configurations. The PVFS solution described in this paper has been supported by PUNCH without requiring modifications to the system software available in its existing nodes. It has been extensively exercised during normal use since Fall of 2000; PUNCH has employed logical user accounts since early 1998 and a virtual file system since Fall of 1999. This paper shows that the user-perceived wide-area performance of PVFS is good for a compute-intensive application, but 5.5 times worse than the local-area performance for an I/O-intensive application. Future work will investigate wide-area performance enhancements that exploit two unique characteristics of the virtual file system: coarse-grain locality enhancement via middleware-driven migration of logical user accounts, and fine-grain locality enhancement techniques (e.g., caching, prefetching) customized on a per-user basis. Acknowledgements This work was partially funded by the National Science Foundation under grants EEC-9700762, ECS-9809520, EIA9872516, and EIA-9975275, and by an academic reinvestment grant from Purdue University. Intel, Purdue, SRC, and HP have provided equipment grants for PUNCH computeservers. Renato Figueiredo has also been supported by a CAPES fellowship. Notes 1. The top-level sub-directories are equivalent to “home directories” on Unix file servers, except for the fact that all files and directories are owned by a single Unix-level account. 2. In a dedicated PVFS setup, a privileged proxy would not be necessary; the underlying NFS server can be configured to accept requests only from the local host (i.e. the PVFS proxies) via non-secure ports. 3. The PVFS daemon allows dynamic reconfiguration by modifying its input file and subsequently notifying the process via a UNIX signal. 4. The absolute AB execution times are larger than the single-AB results of table 2 due to the larger load of client C1. 5. The cable-modem setup delivers around 1 Mbit/s download and 100 Kbit/s upload bandwidths; actual bandwidths vary depending on the number of network users. 6. This approach has also been investigated on PUNCH. 7. The call forwarding mechanism also works with NFS-V2.

122

References [1] A.D. Alexandrov, M. Ibel, K.E. Schauser and C.J. Scheiman, Ufo: A personal global file system based on user-level extensions to the operating system, ACM Transactions on Computer Systems 16(3) (August 1998) 207–233. [2] A. Bayucan, R.L. Henderson, C. Lesiak, B. Mann, T. Proett and D. Tweten, Portable batch system: External reference specification, Technical report, MRJ Technology Solutions (November 1999). [3] D. Burger and T.M. Austin, The simplescalar tool set, version 2.0, Technical report 1342, Computer Sciences Department, University of Wisconsin at Madison (June 1997). [4] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith and S. Teucke, A resource management architecture for metacomputing systems, in: Proceedings of the 4th Workshop on Job Scheduling Strategies for Parallel Processing (1998). Held in conjunction with the International Parallel and Distributed Processing Symposium. [5] A.S. Grimshaw, W.A. Wulf et al., The legion vision of a worldwide virtual computer, Communications of the ACM 40(1) (1997). [6] R.L. Henderson and D. Tweten, Portable batch system: Requirement specification, Technical report, NAS Systems Division, NASA Ames Research Center (August 1998). [7] J.H. Howard, M.L. Kazar, S.G. Menees, D.A. Nichols, M. Satyanarayanan, R.N. Sidebotham and M.J. West, Scale and performance of a distributed file system, ACM Transactions on Computer Systems 6(1) (February 1988) 51–81. [8] N.H. Kapadia, R.J.O. Figueiredo and J.A.B. Fortes, PUNCH: Web portal for running tools, IEEE Micro (May–June 2000) 38–47. [9] N.H. Kapadia, R.J.O. Figueiredo and J.A.B. Fortes, Enhancing the scalability and usability of computational grids via logical user accounts and virtual file systems, in: Proceedings of the Heterogeneous Computing Workshop (HCW) at the International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA (April 2001). [10] N.H. Kapadia and J.A.B. Fortes, PUNCH: An architecture for webenabled wide-area network-computing, Cluster Computing: The Journal of Networks, Software Tools and Applications 2(2), Special Issue on High Performance Distributed Computing (September 1999) 153– 164. [11] M. Litzkow, M. Livny and M.W. Mutka, Condor – a hunter of idle workstations, in: Proceedings of the 8th International Conference on Distributed Computing Systems (June 1988) pp. 104–111. [12] D. Mazières, M. Kaminsky, M.F. Kaashoek and E. Witchel, Separating key management from file system security, in: Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), Kiawah Island, SC (December 1999). [13] J.H. Morris, M. Satyanarayanan, M.H. Conner, J.H. Howard, D.S. Rosenthal and F.D. Smith, Andrew: A distributed personal computing environment, Communications of the ACM 29(3) (1986) 184–201. [14] B. Pawlowski, C. Juszczak, P. Staubach, C. Smith, D. Lebel and D. Hitz, NFS version 3 design and implementation, in: Proceedings of the USENIX Summer Technical Conference (1994). [15] H.C. Rao and L.L. Peterson, Accessing files on the internet: The jade file system, IEEE Transactions on Software Engineering 19(6) (1993) 613–625. [16] D. Royo, N.H. Kapadia, J.A.B. Fortes and L. Diaz de Cerio, Active yellow pages: A pipelined resource management architecture for wide-area network computing, in: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC’01), San Francisco, CA (August 2001). [17] A.Z. Spector and M.L. Kazar, Wide area file service and the AFS experimental system, Unix Review 7(3) (1989).

FIGUEIREDO, KAPADIA AND FORTES

[18] D. Thain, J. Basney, S.-C. Son and M. Livny, The kangaroo approach to data movement on the grid, in: Proceedings of the 2001 IEEE International Conference on High-Performance Distributed Computing (HPDC) (August 2001) pp. 325–333. [19] B.S. White, A.S. Grimshaw and A. Nguyen-Tuong, Grid-based file access: The legion I/O model, in: Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing (HPDC’00), Pittsburgh, PA (August 2000) pp. 165–173.

Renato J. Figueiredo received the Ph.D. degreee in computer engineering from Purdue University in 2001. He is currently an Assistant Professor in the Department of Electrical and Computer Engineering at Northwestern University. His research interests include computer architecture, high-performance multiprocessors, network-centric grid computing systems, virtual machines and distributed file systems. E-mail: [email protected]

Nirav Kapadia is the Chief Research Scientist at Cantiga Systems. His work focuses on networkcentric and grid computing. Prior to that, he was Senior Research Scientist at Purdue University. Kapadia was the primary architect of PUNCH – the Purdue University Network Computing Hubs. He was instrumental in taking the PUNCH technology from initial concept to a production system that has been used by thousands of users. E-mail: [email protected] José A.B. Fortes received the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles in 1984. From 1984 until 2001 he was on the faculty of the School of Electrical and Computer Engineering of Purdue University at West Lafayette, Indiana. In 2001 he joined both the Department of Electrical and Computer Engineering and the Department of Computer and Information Science and Engineering of the University of Florida as Professor and BellSouth Eminent Scholar. At the University of Florida he is the Founding Director of the Advanced Computing and Information Systems laboratory. He has also served as a Program Director at the National Science Foundation and a Visiting Professor at the Computer Architecture Department of the Universitat Politecnica de Catalunya in Barcelona, Spain. His research interests are in the areas of network computing, advanced computing architecture, biologically-inspired nanocomputing and distributed information processing systems. His research has been funded by the National Science Foundation, AT&T Foundation, IBM, General Electric and the Office of Naval Research. José Fortes is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) professional society and a former Distinguished Visitor of the IEEE Computer Society. E-mail: [email protected]

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.