Theme Feature
Virtual Reality and Parallel Systems Performance Analysis Daniel A. Reed Keith A. Shields Will H. Scullin Luis F. Tavera
S
calable parallel systems are becoming the standard architecture for high-performance computing. However, achieving close to peak performance requires careful attention to a plethora of sys
tem details. Not only do hundreds of processors interact on a microsec
Christopher L. Elford
University ofIllinois, Urbana-Champaign
ond time scale, but also the space of possible performance optimizations is large, complex, and highly sensitive to both application behavior and system software. Although no general theory predicts the performance effects of software changes, a cycle of experimentation involving software modi fications and performance measurements does permit application per formance tuning. Given the complexity of parallel systems and the num ber of possible performance optimizations, two keys to this tuning are capturing and analyzing dynamic performance data and understanding the performance effects of software changes. Just as a logic analyzer lets a hardware designer study signal transi tions, software event tracing provides the raw performance data needed to understand all possible spatial and temporal interactions of parallel tasks. However, on parallel systems with hundreds of processors, ap plication instrumentation of procedure calls, message passing, and input/output can quickly generate a large amount of performance data. (See Adve et al.l and other articles in this issue for a discussion of the alter natives to event tracing. ) If the event frequency is high and the number of processors is large (in the hundreds), the aggregate data rate can be many megabytes/second. Moreover, for a fixed application problem size, both processor-interaction frequency and performance data volume can grow superlinearly with the number of processors. Finally, the relations of specific performance met
-
rics to application performance can vary widely across applications and
A data-immersive virtual
potentially large volumes of dynamic performance data, we have devel
parallel architectures. To understand these relations while managing oped a data-immersive virtual environment, called Avatar, which explores
world enables exploration of
performance data and provides real-time adaptive control of application behavior.
complex, parallel-system performance data and
PERFORMANCE-DATA PRESENTATION TE CHNIQUES
supports real-time adaptive
on physical processes that are continuous in space and time. Hence, com
control of parallel-system
ings of regularly spaced, n-dimensional data sets.
behavior. It has been
performance presentation techniques (for example, those from statisti
operational for about
ates
Much scientific measurement and computational simulation focuses plementary scientific visualization techniques focus on intuitive render In contrast, performance data is irregular in space and time, and other cal graphics2) are more appropriate. Performance measurement gener n
metrics for each of the p processors in a parallel system, but
measurement times often depend on loosely correlated event transitions
two years.
0018-9162/95/$4.00
in each processor.
1995 IEEE
November 1995
-
� ""0 c:
1.00 r-------�
world for performance analysis and tuning, generalizes
o
these scatterplot matrices to scattercube matrices.
u
� I
0.75
Scatterplot matrices A scatterplot matrix is a generalization of the simple two
>. ra
�
which are used widely in the statistical graphics commu nity, can help alleviate this problem. Avatar, our virtual
dimensional scatterplot that containsNZ x-y scatterplots. As
0.50
illustrated in Figure 2, each component scatterplot shows one of the possible projections from N to two dimensions. Because theN projections on the diagonal of the scatterplot matrix are degenerate (both variables on the individual scatterplots are the same), there areNZ -N nondegenerate
0.00 0.00
I 0.25
0.50
0.75
I
I
1.00
1.25
projections. By symmetry, the projections above and below the diagonal are simple transpositions of each other. For 1.50
Procedure lifetime (milliseconds)
example, Figure 2 shows an eight-dimensional scatterplot matrix with performance data from a parallel genome sequencing code involving extensive input/output and
Figure 1. Processor behavioral curves for two processors.
interprocessor communication. The diagonal of this scat terplot matrix contains box and whisker plots2 of each met ric's minimum, mean, and quartiles. Figure 2 highlights several important aspects of perfor
The behaviors of the p processors define p curves in an
mance data and the limitations of scatterplot matrices.
n-dimensional performance metric space. The measured
First, and most important, some performance metric pairs,
data defines a series of irregularly spaced points on each
such as seek and file-read durations, are highly correlated,
processor's behavioral trajectory. Figure 1 shows the
while others, such as blocking (synchronous) message
behavioral trajectory for two of eight processors execut
send delays and procedure-invocation lifetimes, are not.
ing a simple Jacobi iteration to solve a sparse linear sys
The application code, input data, and underlying system
tem. The x-axis represents sliding-window averages of
software and hardware determine which metrics are
procedure-invocation lifetimes, while the y-axis lists the
strongly correlated and influence their dynamic ranges;
idle time as the processors await message receipt.
wide variations exist across codes and code executions.
This behavioral trajectory, called a phase portrait in clas
Second, in each projection, the data forms a few behav
sical mechanics, shows the relationship between two vari
ioral equivalence classes-all processors typically execute
ables that depend on a third, independent variable (time).
the same code with data-dependent control f low. In
Because the code associated with Figure 1 is iterative, the
practice, a few behaviors dominate, and the data from
measured processor behaviors define two closed paths in
most processors lies in one of the associated clusters.
the metric space; for other codes with more irregular
Understanding the reasons for such behavioral outliers is
behavior, the curves need not be closed. (In Figure 1, dif
often the key to improving performance.
ferences across iterations are due to measurement varia
Finally, although scatterplot matrices highlight bivari
tions and changing converger.ce-verification costs across
ate correlations, trivariate or higher degree correlations
iterations.) Understanding these trajectories' characteris
are not obvious. The existence of multiple, bivariate clus
tics and correlations is the key to application and system
ters does not imply clustering in three or more dimensions.
software tuning.
To redress this constraint, statisticians have introduced
The simplicity of this behavioral-trajectory analysis
graphical brushing, where interactively highlighting a
problem belies its difficulty. In practice, such problems
cluster of data points in one scatterplot highlights the same
involve hundreds of processors with 10 or more perfor
points in all other scatterplots. The cluster dimensional
mance metrics for each, a microsecond time scale for
ity is determined by the number of scatterplots where the
events, and tens or hundreds of megabytes of performance
highlighted points are adjacent.
data. Moreover, some of the performance metrics are dis
In short, scatterplot matrices are attractive and intu
crete, while others are continuous, and their dynamic
itive. But they show only bivariate relations and do not
ranges can differ by multiple orders of magnitude. Within
exploit important aspects of our visual sense-notably,
this context, we must correlate the movement of hundreds
our kinematic and spatialization skills.
of points in a high-dimensional metric space, identifying those processors and metrics that are the critical perfor mance determinants.
-
Scattercubes Understanding a three-dimensional object's shape is
Because human visualization and spatialization skills
greatly simplified by binocular stereo images and the abil
enable recognition of two and three-dimensional projec
ity to change the viewing perspective; through their own
tions, understanding the relations among abstract, mul
head and body movements, users can study the three
tivariate data is difficult. In fact, performance analysts
dimensional object in a natural way. To exploit this capa
and statisticians face many of the same data analysis and
bility, while retaining the attractive features of scatterplot
visualization problems: In both cases, the data is high
matrices, we use a three-dimensional generalization of
dimensional, irregular, and sparse. Scatterplot matrices,
scatterplot matrices.
computer
� I
.. I
·
=- .
: .
� ·
-
.
i
I
II
i i
I
il
i
•
I
k
I l.
.
.
_
.
i
.
..
. .
.
.
� I
:
I I
. ..
.
:
I I
, :.
I
i . -
L .. _.
.
--
• I
I
.
_.
.__.
-_ . .
._---
*_ ..
__ . .. .
j .., .
_ ..
-
_
.
Preferences
..
----
I
:
:
.
NOIIblOC k tienCl
-
i
.
.
i
CIu.1erIng I
--
_ . .
� l
.
..
. � t.I:�!
-.
i...a-
.
':-:. ; :"1.'
1
.
-
..
o •
-
- _.
.
1
I
:
I I
I ,
I
I
�I I � � •
•
!
. .
--- --
I'roceClure
i
I
----
_.
i
I __
:s
I I
t
i
I
1 ......1···
; ,.' o:!O-
.
_.
_. . .
L--__
I..
..
..
-_ .
.
ItlOCk I\eCeIW
.. .
u_.
_.
-
ItlOCk I!en(\
...
. 0
I
I I
I
.
-
-
· .
.
.
-
.
_k
. .
-
-
-
.-
NOIIblOC k I\eaCI
BlOCk I\eaCI
IlVwalt
Figure 2. Scatterplot matrix.
Our three-dimensional generalization of scatterplot matrices, which we call a scattercube matrix, contains tf3 three-dimensional scatterplots. Figure 3 shows the case whenN = 6. In each cube, the coordinate axes correspond to three of the performance metrics. Like a scatterplot matrix, a scattercube contains both degenerate and non degenerate cubes. In a scattercube matrix, a diagonal of N cubes in the interior is three-fold degenerate (all three axes of the cubes on the diagonal correspond to the same metric); these are the gray cubes in Figure 3. In addition, three planes of N2 cubes are two-fold degenerate; these are the red, blue, and green cubes. Because the three degenerate planes share the same degenerate diagonal, there are tf3 - 3tf2 + 2N nondegenerate cubes-the violet cubes in Figure 3. In terms of symmetry, a scattercube matrix is similar to, but more complex than, a simple scatterplot matrix. Each coordinate-orthogonal plane is a variation on a scatterplot matrix; the diagonal along each coordinate orthogonal plane is degenerate, and the individual scat terplots are three- rather than two-dimensional. Moreover, the degenerate planes define a three-fold symmetry.
Figure 3. A 6
x
6
x
6 scattercube.
November I995
-
Figure 4 illustrates this symmetry when the degenerate
of a single scattercube, with the current value of each
cubes are not displayed. Each cube group in Figure 4
processor's performance metrics denoted by the location of the octahedra in the three-dimensional metric space.
reflects the group diagonally opposite. Finally, the phase portraits in Figure 1 can easily be gen eralized to three dimensions. Figure 5 shows the interior
The axes represent performance metrics from a parallel input/output library called the Portable Parallel File System (PPFS), 3 which supports a client-server model. In the PPFS, application clients issue requests to user-level file servers that collectively implement various file-system caching and prefetching policies (see sidebar "Portable Parallel File System"). Figure 5 shows three performance metrics from an exe cution of an application code with the PPFS library: server hits (the number of times client requests were found in the server file caches), server service time (the time servers spent satisfying client requests), and client service time (the time the user code was blocked on file requests) . These metrics reveal the dynamic behavior of the PPFS and show the efficacy of data caching and prefetching policies. The current position of each processor in the metric space is denoted by an octahedron. A history ribbon can be associated with an octahedron to show the octahedron's last k positions, with its most recent positions marked by the brightest blue ribbon and the oldest positions marked by very dark blue (that is, the ribbon color varies from bright blue to dark blue with age). The yellow octahedron
Figure 4. Scattercube symmetry for a 6
x
6
x
6 cube.
in the upper left corner of Figure 5 has an associated history ribbon. These history ribbons are also visible in Figure 4.
VIRTUAL ENVIRONMENT INFRASTRUCTURE Application performance is critically sensitive to system and application con figuration parameters. Therefore, we've designed Avatar to support both perfor mance data immersion and interactive, real-time adaptive control. Immersion lets users observe, explore, and modify attrib utes of the scattercube display while "inside" the performance data, whereas real-time, interactive control lets users modify application and system parameters and immediately see how performance is affected. Successfully integrating performance instrumentation and real-time data extrac tion, an immersive virtual environment,
Figure 5. Scattercube phase behavior.
and adaptive control mechanisms imposes rigid software design and interactive response-time constraints. Below,
Application program
�
Adaptive controls
Data presentation
Immersed user
1
Pablo performance instrumentation
Performance data
....
�
User controls
Data presentation metaphor
•
Data
Data
interface
manager
Parallel system
we
describe the software and hardware infra structure needed to satisfy these con straints along with the most salient aspects of their interaction.
Software design Figure 6 shows the logical structure of Avatar's software components. A parallel application code, instrumented with the University of Illinois's Pablo software,4,5 generates time-stamped performance data
Figure 6. Avatar's logical organization.
-
Computer
in Pablo's self-describing data format
(SDDF ) . A data interface accepts performance data from
us port the software to various hardware configurations
the parallel system and buffers it for subsequent rendering
by simply refining the appropriate hardware-interface
by the presentation-metaphor software (where a display
classes. Currently supported configurations include a sim
metaphor is a schema that captures a particular perspec
ple workstation monitor display, a six-degree-of-freedom
tive on system behavior). Finally, a data manager realizes
tracker and head-mounted display, and the Cave Auto
any needed data transformations, such as scaling, and
matic Virtual Environment (CAVE) virtual reality theater. 6
computes ancillary data, such as data centroids, for the
The CAVE's primary display is a room-sized cube with walls illuminated by high-resolution, rear-projection video
metaphor-rendering software.
displays.
Hardware support
The workstation environment provides an inexpensive,
Although Avatar was developed on Silicon Graphics
nonimmersive virtual reality, which is adequate for devel
(SGI) systems, its object-oriented implementation lets
opment and testing and effective for simple demonstra-
Portable Parallel File System The PPFS is a user-level, parallel input/output library that lets applica tions control the placement of file data across multiple storage devices, choose caching and prefetch policies, and specify data-consistency proto cols. In the PPFS client-server model, we can dynamically reconfigure file cache sizes at the application (client) and PPFS (server) level. By carefully matching PPFS parame ters and data management policies with application access patterns, we can sometimes increase application input/output performance by an order of magnitude over that achievable with a native Unix file system alone. Client
Intuitively, request-aggregation, write behind, prefetching, and caching poli cies better match the application Native messages
request stream to the underlying file system's capabilities. And, with dynamic
Control translator
file-system reconfiguration, the user can interactively explore many possible
PPFS
input/output optimizations during a single application execution.
Avatar
�
TCP control packets
Augmenting PPFS with the Pablo instrumentation library lets us cap ture event traces of internal PPFS state transitions, procedure calls, and input/output events. This data, along with sliding-window averages
,
of lower level input/output perfor
T
UDP S D DF
Controls
records
mance-such as queue lengths and
Cache
..
delays, service times, and request
..
Write-back
throughputs-can be transmitted in
..
real time to remote sites via network
..
sockets. Finally, PPFS can accept dy namic reconfiguration requests from a socket. With Avatar, the user can examine the PPFS performance data in real time. change PPFS file-system policies and policy parameters, and
PPFS Portable Parallel File System SDDF Self-describing data format Tep Transmission-control protocol UDP User datagram protocol
see the resulting changes in perfor mance (see Figure A).
Figure A. PPFSlAvatar interaction mechanism.
November 1995
-
tions. A user can experience stereo by wearing a pair of LCD shutter glasses that are synchronized with the display of left and right eye views.
Scattercube metaphor Relying on the SDDF data metaformat and separating presentation from data management has allowed us to
The head-mounted display version provides immersion
build realizations of the scattercube metaphor for the
by filling the field of view with synthetic imagery and
workstation, head-mounted-display, and CAVE environ
exploiting special-purpose peripherals that enhance the
ments. Moreover, this software separation has reduced
sense of immersion. Head and hand trackers provide the
the effort needed to implement other data-presentation
requisite data to render scenes in response to user move
metaphors.
ments. Stereo headphones and sound-spatialization hard
At initialization, the user interactively maps all or part
ware create the illusion that sounds originate from
of the performance metrics to the scattercube dimensions.
particular locations in the virtual environment. Simple
In individual scattercubes, three of these metrics define the
speech recognition and synthesis hardware augment the
position of each data point. Additional attributes of each
tracked mouse with oral commands and voice acknowl
data point, such as size, color, or sound, can represent more
edgment. The CAVE version of the code supports high-resolution imagery without the encumbrance of a head-mounted dis
than three performance metrics in a scattercube. Within each scattercube, historical display of each point's move ments creates three-dimensional phase portraits.
play; the user needs only LCD shutter glasses, a head
For real-time performance presentation and adaptive
tracker, and a tracked mouse to control stereo displays.
control, data updates from the parallel system must be
The CAVE currently supports data sonification but not
timely, lest the user base decisions on old, possibly obso
sound spatialization or voice recognition.
lete, data. The scattercube metaphor ties each point's transparency to its age. Thus, each data point fades as the
Pablo performance instrumentation An instrumented application invokes the Pablo instru mentation software to record salient application events.
time increases since the last performance data was received from the associated processor. If the interval becomes too great, the point simply disappears.
To minimize the volume of data that must be rendered exploit Pablo's capability to compute sliding-window aver
Data sonification Despite the efforts of a few data sonification pioneers,
ages of performance metrics. By adjusting the window
data presentation has long been synonymous with graph
while still providing details on application dynamics, we
size, we can balance instrumentation detail against data
ics and visualization, and only recently have nonvisual
volume. The resulting performance data can be output to
data representations become widely accepted. The ana
either a file for postmortem analysis or sent directly to
log in virtual reality systems is three-dimensional audio,?
Avatar through a Unix socket for real-time analysis. All performance data is described by Pablo's SDDF,4
which provides the illusion through stereo headphones that a sound emanates from a particular location in space.
which shares features of other data metaformats but is
Such spatialized sound can add realism by mimicking
designed specifically to describe performance data. By let
the physical world, and, in a virtual environment, can
ting users interactively select the desired SDDF records,
heighten awareness and increase the number of data
Avatar does not need to make assumptions about the
presentation options.
semantics of the records it receives. The same presenta
Avatar uses sound to reinforce the displayed data,
tion metaphor can be used for many different types of
increase the number of effective data-display dimensions
dynamic performance data.
by conveying the values of metrics not visually presented, and aid navigation and interaction in virtual space. To con vey the statistical characteristics of the performance met rics within each scattercube, a sound source is placed at the time-varying centroid of the data points within that scattercube. The distance from the scattercube origin to the source defines the pitch of the emitted sound. Hence, low-pitched sounds are emitted when the data centroid is near the origin, and high-pitched sounds are generated when the data centroid is far from the origin. When a user first enters a scattercube, the sound's origin helps the user locate most of the data. Alternatively, users can associate a sound source with an individual point in a scattercube. The attributes (for example, pitch, timbre, or sustain) of this sound source can be fixed or can be associated with other performance metrics. In either case, Avatar can plot a point or trajec tory on the basis of sound, and users can hear the phase space behavior.
Interactive controls Figure 7. Application controls (genome-sequencing code),
--
Computer
Natural, intuitive control is the essence of data immer sion; if the controls are awkward or confusing, the illu-
controls, all metaphor controls are realized
sion of immersion is quickly lost. To lessen the complexity of the virtual environment interface and reduce the learning curve, several less frequently used configuration
n addition
I to the mouse
controls. voice
through menus and control panels. To move about scattercubes, Avatar users can fly large distances using the mouse or
controls are accessible only through the
adivated toggles
move locally via head and body motions.
workstation interface. The remaining con
can control
Movements can be recorded for later replay
trols are accessible through a combination
display of phase
as a fixed flight path through the scatter
of mouse and voice commands.
behavior lines
cubes. Alternatively, user position can be
and scattercube
fixed while the scattercube matrix rotates
charaderistics.
about one or more axes. This lets users see
The mouse and tracker generate button signals along with mouse position and ori entation. Spoken commands are recog
the relations among all the performance
nized and positively acknowledged via
metrics without choosing a flight path that
synthesized voice response. The user can combine mouse
circumscribes all the scattercubes.
and voice commands to choose and configure items from
In addition to the mouse controls, voice-activated tog
a group of windows and menus. For the sake of simplicity
gles can control display of phase-behavior lines and scat
and familiarity, these windows and menus resemble those
tercube characteristics. When the scattercube faces are
of a standard workstation or PC windowing system. For
opaque, users can neither see inside multiple scattercubes
controls such as those in Figure 7, the movement of the
from the outside nor see through the wall of a single scat
tracked mouse is projected into the control panel window,
tercube that they are inside. In our experience, users begin
and the user modifies items by pointing, clicking, and
with an external view, where all scattercube faces are
dragging with the mouse.
translucent (as in Figure 4), then they circumnavigate the
As Figure 6 suggests, Avatar includes interactive con
scattercube matrix, fly into a single scattercube, and raise
trols for both parallel applications and data-presentation
the opacity of that cube to focus attention on the data
metaphors. One set of controls lets users adjust applica
within.
tion behavior in response to observed performance data, while another lets them change display attributes.
EXPERIENCES Avatar has been operational for about two years.
PARALLEL APPLICATION CONTROLS. To support inter
Although development continues, with emphasis on sup
active control of parallel-system behavior, we have devel
port for new presentation metaphors, the general struc
oped a control library that lets application codes accept
ture and functionality have stabilized. On the basis of our
and respond to interactive-control requests. We have used
experiences with real-time performance analysis and
this library to develop a version of the Portable Parallel
interactive control, we can draw some general conclusions
File System (PPFS) that can respond to interactive con
about our design choices.
trols issued from Avatar. Because PPFS supports a rich set of file system policies,
Genome-sequence comparison under the PPFS
choosing the best match of file system policies and appli
To assess the utility of Avatar, we selected a parallel
cation access patterns would typically require several
implementation of a genome-sequence comparison code
cycles of policy selection and testing. However, with inter
that executes under our PPFS on an Intel Paragon XPIS.
active controls, we can immediately change PPFS para
The Paragon XP /S is a two-dimensional mesh of compute,
meters and see their effects. Current controls include both
service, and input/output nodes, each with its own local
the PPFS server and client file-cache sizes as well as the
memory.
cache write-back and prefetch parameters.
Because the synthesis methods currently used to deter
Besides enabling file-system controls, the PPFS and our
mine genetic sequences produce nontrivial numbers of
instrumentation software permit interactive adjustment of
errors, exact string-matching algorithms are inappropri
performance-measurement windows. With large windows,
ate for biological sequences. One approximate sequence
the system reports average performance-metric values over
matching approach is based on a generalization of the
long time intervals, minimizing performance-data volume
Needleman, Wunsch, and Sellers (NWS)B dynamic
and data-extraction overhead. Conversely, small windows
programming algorithm, with a K-tuple heuristic that
provide detailed, high-resolution data and can track rapid
prunes the search space to improve performance. With
changes in performance, albeit with higher data volume
this algorithm, the input sequence is processed against all
and extraction overhead. Varying the window size lets users
entries in the genome database, and the database entry
adjust the performance-data rate as needed to balance
generating the highest score is declared the best match to
instrumentation overhead and detail.
the input sequence. In our parallel implementation of the NWS algorithm,
METAPHOR CONTROLS. Most controls associated with
each Paragon XP /S node independently compares the test
a particular data-presentation metaphor are necessarily
sequence against disjoint portions of the sequence data
metaphor dependent, although some (for example, fly
base. Unfortunately, a simple static partitioning of the
ing) are metaphor independent. For the scattercube
genome database yields poor load balance-comparison
metaphor, Avatar supports control of display attributes
times heavily depend on sequence content and size.
(such as cube wall opacity, data scales, and the mapping
Maximizing performance requires a dynamic approach,
of metrics to scattercube axes) and data attributes (such as
where parallel tasks read groups of new sequences from the
data-point brushing and history lines). Like application
database as needed. If too many sequences are read, load
November 1995
-
the queue of pending read requests, the number of file cache hits, and the time to service application read requests. For these metrics, optimality is at the upper rear corner of the cube-low client service time, small queue length, and high server hit counts. Comparing the top and bottom of Figure 8 shows that increasing the cache size and prefetch amounts dramati cally reduces the length of the queues, increases the cache hit count, and decreases the client (application) read service times. In practice, interactively identifying this combination ofPPFS parameters takes only a few minutes, and the correct parameters reduce application execution time by an order of magnitude on 256 processors of the Intel Paragon XPIS. More generally, because the input/output performance of many large-scale, parallel codes is strongly sensitive to request sizes and patterns, seeing system dynamics enables application scientists to more readily understand temporal performance variations and study the effects of changing application parameters and algorithms.
Interaction experiences Intuitive interaction is undoubtedly the most difficult design problem in the creation of a virtual reality system, particularly one like Avatar that shows abstract data. Without effective navigation techniques, users will not fully explore the data space, and without appropriate cues, those who attempted to do so would quickly become lost. NAVIGATION. In our experience, the utility of a partic
ular navigation technique depends on the desired dis placement from the current location. Walking is the most natural and simplest navigation technique, but it is appro priate only for exploration within a single scattercube or its adjoining cubes. Walking makes it difficult to sustain the illusion of immersion when concerned about physical obstacles and cabling constraints, neither of which can be Figure 8. Portable Parallel File System (PPFS)
seen when wearing a head-mounted display. Given these
cache configuration: (top) suboptimal cache configu
constraints, novice users are particularly reluctant to walk
ration; (bottom) optimal cache configuration.
while wearing a head-mounted display but feel quite free to move about in the CAVE. For long-distance movement, either in the CAVE or
--
imbalances result. Conversely, reading too few sequences
when wearing the head-mounted display, flying is essen
fails to amortize the cost of input/output operations.
tial. Unfortunately, f lying by pointing the mouse in the
Because the PPFS library provides a rich set of file-cache
desired direction is not intuitive. Inertia and thrust can
and prefetch policies, interposing the PPFS between the
only be represented visually, making it easy to misjudge
genome-sequence comparison code and the native
angle and speed.
ParagonXPIS file system lets users tune file-system behav
In general, giving users complete control to walk or
ior to meet their application needs. By dynamically chang
fly is best when they are moving toward and exploring a
ing the size of the PPFS cache and the aggressiveness of
specific scattercube. For large-scale exploration, user
the sequence prefetch policy, users can find a configura
controlled navigation must be complemented with fixed
tion that maximizes performance without changing the
flight paths and visual reference cues. Fixed-paths let users
application source code. The correct parameter choices
focus on the data rather than mentally balancing naviga
depend on the characteristics of the sequence database
tion and data analysis.
and the test sequence. Because these characteristics vary
Rotation and orbiting provide a global view of the scat
widely and cannot be predicted, interactive file-policy tun
tercubes with little chance of disorientation: The flight
ing can substantially reduce comparison times.
path is closed, and the users return to their point of ori
The Avatar environment lets us adjust PPFS parameters
gin. Flight path replay permits more general paths, albeit
while monitoring real-time performance data. As Figure 8
with greater chance of disorientation. This is especially
shows, the distinction between an effective policy config
useful for letting users share an earlier exploratory trip or
uration and an undesirable configuration is striking. The
providing inexperienced users with more interesting views
three performance-metric axes represent the length of
than those from simple rotations.
Computer
VISUAL AND SONIC CUES. The scattercube metaphor
where the user is currently located. However, with the
presents the user with a multiplicity of similar, three
CAVE's higher resolution projections, axis labels are excel
dimensional scatterplots. Without distinctive visual and
lent navigational aids. Users can quickly fly through cubes,
sonic cues, users can lose the identity of a specific scatter
searching for a specific projection.
cube. Color coding the cubes on the basis of their degen
Finally, placing a sound source at the origin or at the
eracy, as in Figure 3, provides rough guidance about the
data centroid of a single scattercube permits quick orien
user's current position. However, it does not distinguish
tation relative to the axes or most of the data. Because
cubes of the same class or indicate a specific location
users hear sound only when they are inside a scattercube,
within the scattercube array.
sound sources provide a complementary navigational aid,
We have found that color coding is best used in con
and users can readily determine the dispersion of the data
junction with an opacity control for the cube faces. When
in a series of scattercubes simply by choosing a flight path
a user is inside a specific scattercube, opaque faces focus
that intersects multiple scattercubes.
attention on the data in that cube. However, it is difficult to navigate among the cubes, because context is lost.
DATA UNDERSTANDING AND CORRELATION. The scat
Moreover, it is impossible to view the behavior of any met
tercube metaphor was designed to study the dynamic
rics when not inside a cube (for example, while flying).
behavior of performance metrics and show data cluster
Transparent faces permit observation of data in other
ing in many dimensions. Displaying data-point history is
cubes, but background clutter makes it difficult to identify
perhaps the most useful mechanism for understanding
data in a specific cube. Translucent cube faces strike a bal
dynamic behavior, as it extends phase portraits to three
ance between opacity and transparency by affording an
dimensions.
overview of many cubes simultaneously-with the near
If enough history is displayed, not only is any cyclic '
est cubes the most visible-while still delineating cube
behavior evident but also, by observing a data point across
boundaries. Adjusting the degree of translucency provides
multiple cubes (for instance, by flying along a coordinate
a continuum of local focus and global perspective.
axis), users can see the movement in four or more dimen
Within a single scattercube, labeling the axes at the ori
sions. Second, history lines distinguish the movements of
gin and adding data scales, as in Figure 5, uniquely identi
multiple points. Without history, if several points moved
fies each cube. With the resolution of today's head-mounted
at once, it would be difficult to determine which new met
displays, these labels are readable only from within the cube
ric values were associated with each point. Finally, history
Related research system
Carolina/University of California, Los Angeles nano
draws on a long history of performance-analysis
Our
performance-data
presentation
manipulator.5 Avatar differs from both the ROV and
software , statistical graphics, and virtual reality
the nanomanipulator in its focus on abstract, multi
research. Heath' describes a suite of two-dimen
variate data rather than on a physical system.
sional graphics displays for representing dynamic
However, like the ROV, our parallel performance
performance data. Cleveland2 presents a cogent
instrumentation provides a camera for peering into
summary of the statistics community's techniques
the murky depths of parallel systems, and like the
for visualizing irregular data, including early expe
nanomanipulator, our adaptive controls provide a
riences with three-dimensional scatterplots. Our
probe for prodding the mysterious beasts that live
work differs in its generalization of scatterplot
there.
matrices to encompass three-dimensional scatter plots and its integration of history lines to show phase behavior. The closest analog to our work within the virtual
References
1. M.T. Heath and J.A. Etheridge, "Visualizing the Per
reality community is Beshers and Feiner's work on
formance of Parallel Programs," IEEE Software, Vol.
multidimensional data spaces for visualization of
8, No . 5, Sept. 1991, pp. 29-39.
financial data. AutoVisuaP and its predecessor,
2. W.5. Cleveland and M.E. MiGill, eds., Dynamic Graph
n-Vision, use "worlds within worlds" to display
ics for Statistics, Wadsworth & Brooks/Cole, Pacific
N-dimensional data. Both create a hierarchy of
Grove, Calif., 1988.,
three-dimensional displays, where users can recur
3. C. Beshers and S. Feiner, "AutoVisual: Rule-Based
sively nest a group of displays within one display by
Design of Interactive Multivariate Visualizations," IEEE
selecting a point. Our work differs in that it imposes
Computer Graphics & Applications, Vol. 13, No.4, July
no hierarchy on the data dimensions: All are treated as equals, and users need not assign an a priori order or importance. Finally, because Avatar supports real-time adap tive control, it bears some relation to other tele
1993, pp. 4 1-49. 4. B.H. Robinson, "Midwater Research Methods with MBARI's ROV," Marine Tech. Soc. J., Vol. 26, No. 4,
Winter 1992, pp. 32-39.
5. R.M. Taylor et aI., "The Nanomanipulator: A Virtual
presence projects-for example, the Monterey Bay
Reality Interface for a Scanning Tunneling Micro
Aquarium Research Institute (MBARI) remotely oper
scope," Proc. SIGGraph 93, ACM, New York, 1993, pp.
ated vehicle (ROV)4 and the University of North
127- 134.
November 1995
lines show the magnitude of metric changes within the
formance input/output: parallel scientific codes and World
sliding-window interval-short lines for small changes
Wide Web servers. The goal of the first collaborative
and long lines for larger changes.
research effort and of the nascent scalable input/output
We have discovered that enabling history for a subset
initiative (SIO) is to redress the input/output limitations
of-rather than all-the points (for example, those in a
of today's massively parallel systems via a broad-based
bounding volume or a few representatives) provides sig
effort that includes performance-analysis, operating
nificant insight. In this case, history lines play the role of
system, compiler, and application researchers. In the sec
brushing2 in statistical graphics: If representatives are clus
ond domain, we are using Avatar to analyze the access pat
tered in all scattercubes, then the data is clustered in all
terns to the National Center for Supercomputing
dimensions.
Applications' (NCSA) WWW server, using the request logs
One problem with observing real-time data is temporal accuracy. A data point whose associated metrics have not been recently updated may appear near
recorded by that server. Our goal in this effort is to under stand the types of requests and access patterns and the implications for future-generation server design.
other, more recently updated points. This can be alleviated via aging-increasing the
e are now
Wusing Avatar to study two
transparency of a data point on the basis
RECORDING AND ANALYZING the dynamics of application
of the time since its last update.
program, system-software, and hardware interactions are
Seeing system dynamics helps us under
the keys to understanding and tuning the performance of
types of high
stand the temporal variation in perfor
massively parallel systems. We have implemented Avatar,
performance
mance and the effects of changing appli
a data-immersive virtual world for performance analysis
inputloutput.
cation parameters. Rather than running
and real-time control of application behavior. Avatar
the entire application several times using
shows all possible three-dimensional projections of a
test data sets to identify appropriate para
sparsely populated, N-dimensional metric space. Our early
meters, we could adjust those parameters interactively to
experiences with Avatar suggest that the combination of
increase performance. This is vital, because the perfor
its performance-metric correlation and its capability to
mance of many dynamic codes is strongly sensitive to input
interactively modify application behavior provide a pow
data characteristics, which makes it extraordinarily diffi
erful mechanism for performance optimization. I
cult to identify a priori a single, globally optimal parame ter configuration.
Acknowledgments
DIRECTION AND FUTURE WORK Although Avatar has been operational for about two
______.
__
This work was supported in part by the Advanced Research Projects Agency under ARPA contracts DAVT63-
years, like all experimental software projects its imple
91-C-0029 and DABT63-93-C-0040; the National Science
mentation has raised more research questions than it has
Foundation under grants NSF IRI 92-12976, NSF CDA94-
answered. (See sidebar "Related research.") Besides refin
01124, and NSF CDA87 -22836; the National Aeronautics
ing system functionality, we are working to add new data
and Space Administration under NASA Contract Number
presentation metaphors and interaction mechanisms.
NAG-1-613; and a collaborative research agreement with
The scattercube metaphor abstracts a parallel program's
the Intel Supercomputer Systems Division. We are
behavior as a group of dynamic performance metrics.
indebted to Duane Andres for his software contributions
Although this lets us study performance-metric correla
to the early development of Avatar and to Stephen Lamm
tions, the direct relation to application-code fragments is
for his recent additions. Ruth Aydt, Roger Noe, Tara
lost. We have developed a time-line metaphor to represent
Madhyastha, Bradley Schwartz, and Brian Totty con
the processor interactions when executing application
tributed to the Pablo performance analysis software and
code on parallel systems.
offered valuable advice on the design of Avatar. Finally,
In the time-line metaphor, processor icons are equally
we owe special thanks to Phil Roth for Figure 2.
spaced about the circumference of a cylinder, with the cylinder axis representing time; that is, each processor's time line extends along the cylinder axis. Along each time
References
line, icons represent processor activities, while lines from the initiating processor to the recipient processor repre
Analysis Environment for Data Parallel Programs," to appear
interaction durations and query the system for additional
in Proc. Supercomputing
95, ACM, New York.
information on selected activities. In the future, critical
2. W.S. Cleveland and M.E. MiGill, eds., Dynamic Graphicsfor
and near critical paths will be highlighted, and users will
Statistics, Wadsworth & Brooks/Cole, Pacific Grove, Calif.,
be able to determine how removing a critical path affects performance. We have also instrumented Avatar to obtain detailed, dynamic data on tracker overhead and lag, rendering rates,
-
1. V.S. Adve et aI., "An Integrated Compilation and Performance
sent cross-processor interactions. Users can determine
1988. 3. J.Y. Huber et aI., "PPFS: A High-Performance Portable Par allel File System," Proc. NinthACM Int'l Can! Supercomputing, ACM, New York,
1995, pp. 385-394.
data-processing costs, and command processing. With this
4. D.A. Reed, "Experimental Performance Analysis of Parallel
data, we will be able to analyze the end-to-end perfor
Systems: Techniques and Open Problems," Proc. Seventh Int'l
mance as a function of user behavior and data complexity.
Can! Modeling Techniques and Tools for Computer Performance
We are now using Avatar to study two types of high-per-
Evaluation, Springer-Verlag, Secaucus, N.J., 1994, pp. 25-51.
Computer
5. D.A. Reed et aI., "Scalable Performance Analysis: The Pablo
Keith A. Shields is a member of the technical staff at the
Performance Analysis Environment," Proe. Scalable Parallel
Analytic Sciences Corporation. Shields received aBS degree (magna cum laude) in computer sciencefrom the University of South Alabama in 1991 and an MS degree in computer science from the University of Illinois in 1994. Shields is a member of ACM and IEEE.
LibrariesConf., IEEE CS Press, Los Alamitos, Calif., Order No.
4890,1993, pp. 104-113. 6. C. Cruz-Neira, D.J. Sandin, T. DeFanti, "Surround-Screen Projection-Based V irtual Reality: The Design and Imple mentation of the CAVE," Proe. SIGGraph 93, ACM, New York,
1993, pp. 135-142.
Will H. Scullin is a graduate student pursuing an MS degree in the Department of Computer Science at the Uni versity of Illinois, Urbana-Champaign. He received a BA degree (with distinction) in computer science in 1993 from
7. E.M. Wenzel, "A Virtual Display System for Conveying Three Dimensional Acoustic Information," Proc. Human Factors Soe., Human Factors Soc., Santa Monica, Calif.,
1988, pp. 86-90.
8. S.B. Needleman and C.D. Wunsch, "An Efficient Method
the University of Minnesota, Morris.
Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins," J. Molecular Biology, Vol. 48, No.
Luis F. Tavera is a PhD candidate in the Department
1, Feb. 1970, pp. 444-453.
of Computer Science at the University of Illinois, Urbana Champaign. Tavera received a BS degree in physics engi neering in 1988 from the Universidad Iberoamericana in Mexico City, Mexico. He completed an MS degree in com puter science at the University ofIllinois in 1994.
Daniel A. Reed is a professor in the Department of
Computer Science at the University of Illinois, Urbana Champaign, where he holds a joint appointment with the National Center for Supercomputing Applications (NCSA). Reed received a BS degree (summa cum laude) in computer science from the University of Missouri, Rolla, in 1978 and MS and PhD degrees in computer science from Purdue Uni versity in 1980 and 1983, respectively. He was a recipient of the 1987 National Science Foundation Presidential Young Investigator Award. Reed serves on the boards of IEEE Trans
Christopher L. Elford is a PhD candidate in the Depart ment of Computer Science at the University of Illinois, Urbana-Champaign. Elford received a BS degree (magna cum laude) in computer sciencefrom the University ofHous ton in 1991 and an MS degree in computer science from the
University of Illinois in 1994.
actions on Parallel and Distributed Systems, Concurrency Practice and Experience, and the International Journal of
Readers can contact the authors at the Department of Com puter Science, University of Illinois, Urbana, Illinois 61801; e-mail {reed.shields.scullin, tavera, elford}@cs.uiuc.edu.
High-Speed Computing. He is a member of the NASA RIACS
Science Council.
CALL FOR PAPERS
The Fifth Asian Test Symposium (ATS'96) November 20-22, 1996
Na1ional Tsing Hua University Hsinchu, Taiwan
IEEE Computer Society
.. IEEE
Scope: Papers addressing original and unpublished research contributions on theoretical and/or practical aspects of electronic testing are welcome. Specially sought will be papers that illuminate connections between practice and theory. Topics of interest include, but are not limited to:
(8) fault simulation, (C) design for testability, (D) synthesis for testability, (E) built-in self-test, (F) circuit and system level (I) fault tolerance, (J) concurrent error detection, (K) analog and mixed-signal testing, (ll memory testing, (M) Iddq testing, (N) board and system level testing, (0) test economiCS, (PI sonware test.
(A) test pattern generation,
diagnostics, (G) funclional levet testing, (H) switch level testing,
Submission: Authors are invited to submit five
(5) copies of a full paper (in English, 61 - 5 pages double spaced),
twenty
(20) copies of a one page abstract, and a separate cover page
1) the title of the paper, 2) the name and affiliation of each author, 3) a classification of the topic covered (using one of the topics listed above or creating one if necessary), 4) the principal author (including his/her e-mail address and fax number if available), and 5) the following signed statement: All appropriate clearances for the publication of this paper have been obtained, and if accepted the author(s) will prepare the final manuscript in time for inclusion in the Symposium Proceedings and will present the paper at the Symposium. to the Technical Program Chair. The cover page must contain or identify
Important Dates:
* * *
Deadline for Submission:
March
15,1996
June 15,1996
Notification of Acceptance:
Deadline for Receipt of Camera-ready Copies: August 1, 1996
* Tutorials: * Symposium:
November 20, 1996
November 21-22,1996
General Chair:
Technical Program Chair:
local Arrangements Chair: Youn-Long lin (NTHU)
Professor Chung-Len Lee
Professor Cheng-Wen Wu
Finance Chair: Ting-ling Hwang (NTHU)
Department of Electronic Engineering
Department of Electrical Engineering
Registration Cochairs: Chung-Hao Wu (NTHU), Jing-Yang Jou (NCIU)
National Chiao Tung University
National Tsing Hua University
Publicity Cochairs: Mely Chen (ITRI), Jen-Sheng Hwang (CIC, NSC)
Hsinchu, Taiwan
Hsinchu, Taiwan
Publications Chair: Wen-Zen Shen (NCTU)
+88635731154 Fax: +88635715971
Tutorials Chair: Tsin-Yuan Chang (NTHU)
Tel:
Exhibits Chair:Jyuo- Min Shyu (ITRI) Europe liaison: Bernard Courtois (TIMA)
Sponsored by: IEEE Computer Society,
Further Information:
Test Technology Technical Committee
E-mail:
[email protected]
National Ising Hua University
WWW: http://mound_ee_nthu_edu_tw/cww/ats96/ats96_html
US liaison: Kwang-Ting (Tim) Cheng (UCSB)