EOPS&T_-_Computer_Hardware.pdf

(6755 KB) Pobierz
593871291 UNPDF
Cluster Computing
Thomas Sterling
California Institute of Technology
and NASA Jet Propulsion Laboratory
I. Introduction
II. A Taxonomy of Cluster Computing
III. A Brief History of Cluster Computing
IV. Cluster Hardware Components
V. Cluster Software Components
VI. Summary and Conclusions
GLOSSARY
CLUSTER COMPUTING is a class of parallel computer
structure that relies on cooperative ensembles of inde-
pendent computers integrated by means of interconnec-
tion networks to provide a coordinated system capable of
processing a single workload. Cluster computing systems
achieve high performance through the simultaneous ap-
plication of multiple computers within the ensemble to a
given task, processing the task in a fraction of the time
it would ordinarily take a single computer to perform the
same work. Cluster computing represents the most rapidly
growing field within the domain of parallel computing
due to its property of exceptional performance/price. Un-
like other parallel computer system architectures, the core
computing elements, referred to as nodes , are not custom
designed for high performance and parallel processing but
are derived from systems developed for the industrial,
commercial, or commodity market sectors and applica-
tions. Benefiting from the superior cost effectiveness of
the mass production and distribution of their COTS (com-
mercial off-the-shelf ) computing nodes, cluster systems
exhibit order-of-magnitude cost advantage with respect to
Cluster A computing system comprising an ensemble of
separate computers (e.g., servers, workstations) inte-
grated by means of an interconnection network co-
operating in the coordinated execution of a shared
workload.
Commodity cluster A cluster consisting of computer
nodes and network components that are readily avail-
able COTS (commercial off-the-shelf) systems and that
contain no special-purpose components unique to the
system or a given vendor product.
Beowulf-class system A commodity cluster imple-
mented using mass-market PCs and COTS network
technology for low-cost parallel computing.
Constellation A cluster for which there are fewer SMP
nodes than there are processors per node.
Message passing A model and methodology of paral-
lel processing that organizes a computation in separate
concurrent and cooperating tasks coordinated by means
of the exchange of data packets.
33
593871291.002.png
34
Cluster Computing
their custom-designed parallel computer counterparts de-
livering the same sustained performance for a wide range
of (but not all) computing tasks.
performance computing, because advances in device tech-
nology are usually first incorporated in mass market com-
puters suitable for clustering.
High availability. Clusters provide multiple redun-
dant identical resources that, if managed correctly, can
provide continued system operation through graceful
degradation even as individual components fail.
I. INTRODUCTION
Cluster computing provides a number of advantages with
respect to conventional custom-made parallel comput-
ers for achieving performance greater than that typical
of uniprocessors. As a consequence, the emergence of
clusters has greatly extended the availability of high-
performance processing to a much broader community
and advanced its impact through new opportunities in sci-
ence, technology, industry, medical, commercial, finance,
defense, and education among other sectors of computa-
tional application. Included among the most significant ad-
vantages exhibited by cluster computing are the following:
Cluster computing systems are comprised of a hierarchy
of hardware and software component subsystems. Cluster
hardware is the ensemble of compute nodes responsible
for performing the workload processing and the commu-
nications network interconnecting the nodes. The support
software includes programming tools and system resource
management tools. Clusters can be employed in a number
of ways. The master–slave methodology employs a num-
ber of slaved compute nodes to perform separate tasks
or transactions as directed by one or more master nodes.
Many workloads in the commercial sector are of this form.
But each task is essentially independent, and while the
cluster does achieve enhanced throughput over a single
processor system, there is no coordination among slave
nodes, except perhaps in their access of shared secondary
storage subsystems. The more interesting aspect of clus-
ter computing is in support of coordinated and interacting
tasks, a form of parallel computing, where a single job
is partitioned into a number of concurrent tasks that must
cooperate among themselves. It is this form of cluster com-
puting and the necessary hardware and software systems
that support it that are discussed in the remainder of this
article.
Performance scalability. Clustering of computer
nodes provides the means of assembling larger systems
than is practical for custom parallel systems, as these them-
selves can become nodes of clusters. Many of the entries
on the Top 500 list of the world’s most powerful com-
puters are clusters and the most powerful general-purpose
computer under construction in the United States (DOE
ASCI) is a cluster to be completed in 2003.
Performance to cost. Clustering of mass-produced
computer systems yields the cost advantage of a market
much wider than that limited to the high-performance
computing community. An order of magnitude price-
performance advantage with respect to custom-designed
parallel computers is achieved for many applications.
Flexibility of configuration. The organization of clus-
ter systems is determined by the topology of their inter-
connection networks, which can be determined at time
of installation and easily modified. Depending on the re-
quirements of the user applications, various system con-
figurations can be implemented to optimize for data flow
bandwidth and latency.
II. A TAXONOMY OF CLUSTER
COMPUTING
Ease of upgrade. Old components may be replaced or
new elements added to an original cluster to incrementally
improve system operation while retaining much of the
initial investment in hardware and software.
Cluster computing is an important class of the broader
domain of parallel computer architecture that employs
a combination of technology capability and subsystem
replication to achieve high performance. Parallel com-
puter architectures partition the total work to be performed
into many smaller coordinated and cooperating tasks and
distribute these tasks among the available replicated pro-
cessing resources. The order in which the tasks are per-
formed and the degree of concurrency among them are
determined in part by their interrelationships, precedence
constraints, type and granularity of parallelism exploited,
and number of computing resources applied to the com-
bined tasks to be conducted in concert. A major division
of parallel computer architecture classes, which includes
cluster computing, includes the following primary (but not
exhaustive) types listed in order of their level of internal
communication coupling measured in terms of bandwidth
Architecture convergence. Cluster computing offers
a single general strategy to the implementation and appli-
cation of parallel high-performance systems independent
of specific hardware vendors and their product decisions.
Users of clusters can build software application systems
with confidence that such systems will be available to sup-
port them in the long term.
Technology tracking. Clusters provide the most
rapid path to integrating the latest technology for high-
593871291.003.png
Cluster Computing
35
(communication throughput) and latency (delay in trans-
fer of data). This taxonomy is illustrated in Figure 1 . Such
a delineation is, by necessity, somewhat idealized because
many actual parallel computers may incorporate multiple
forms of parallel structure in their specific architecture.
Also, the terminology below reflects current general us-
age but the specific terms below have varied in their def-
inition over time (e.g., “MPP” originally was applied to
fine-grain SIMD computers, but now is used to describe
large MIMD computers).
ers (e.g., Intel Paragon, TMC CM-5), shared among all
CPUs without cache coherency (e.g., CRI T3E), shared in
SMPs (symmetric multiprocessors) with uniform access
times and cache coherence (e.g., SGI Oracle), or shared
in DSMs (distributed shared memory) with nonuniform
memory access times (e.g., HP Exemplar, SGI Origin).
5. Cluster computing. Integrates stand-alone computers
devised for mainstream processing tasks through local-
area (LAN) or system-area (SAN) interconnection net-
works and employed as a singly administered computing
resource (e.g., Beowulf, NOW, Compaq SC, IBM SP-2).
6. Distributed Internet computing. Employs wide-area
networks (WANs) including the Internet to coordinate
multiple separate computing systems (possibly thousands
of kilometers apart) under independent administrative
control in the execution of a single parallel task or work-
load. Previously known as metacomputing and including
the family of GRID management methods, this emergent
strategy harnesses existing installed computing resources
to achieve very high performance and, when exploiting
otherwise unused cycles, superior price/performance.
1. Vector processing. The basis of the classical su-
percomputer (e.g., Cray 1), this fine-grain architec-
ture pipelines memory accesses and numeric operations
through one or more multistage arithmetic units super-
vised by a single controller.
2. Systolic. Usually employed for special-purpose com-
puting (e.g., digital signal and image processing), systolic
systems employ a structure of logic units and physical
communication channels that reflect the computational or-
ganization of the application algorithm control and data
flow paths.
3. SIMD. This Single instruction stream, multiple data
stream or SIMD family employs many fine- to medium-
grain arithmetic/logic units (more than tens of thousands),
each associated with a given memory block (e.g., Maspar-
2, TMC CM-5). Under the management of a single system-
wide controller, all units perform the same operation on
their independent data each cycle.
4. MPP. This multiple instruction stream, multiple data
stream or MIMD class of parallel computer integrates
many (from a few to several thousand) CPUs (central pro-
cessing units) with independent instruction streams and
flow control coordinating through a high-bandwidth, low-
latency internal communication network. Memory blocks
associated with each CPU may be independent of the oth-
Cluster computing may be distinguished among a num-
ber of subclasses that are differentiated in terms of the
source of their computing nodes, interconnection net-
works, and dominant level of parallelism. A partial clas-
sification of the domain of cluster computing includes
commodity clusters (including Beowulf-class systems),
proprietary clusters, open clusters or workstation farms,
super clusters, and constellations. This terminology is
emergent, subjective, open to debate, and in rapid tran-
sition. Nonetheless, it is representative of current usage
and practice in the cluster community.
Adefinition of commodity clusters developed by con-
sensus is borrowed from the recent literature and reflects
their important attribute; that they comprise components
Parallel Computing
Vector
Distributed
Systolic
SIMD
MPP
Cluster
Constellations
Farms
Commodity
Proprietary
Super
Beowulf
NOW
FIGURE 1 Taxonomy of cluster computing.
593871291.004.png
36
Cluster Computing
that are entirely off-the-shelf, i.e., already developed and
available for mainstream computing:
timized for parallel computation, these open clusters are
best employed for weakly interacting distributed work-
loads. Software tools such as Condor facilitate their use
while incurring minimum intrusion to normal service.
Super clusters are clusters of clusters. Principally found
within academic, laboratory, or industrial organizations
that employ multiple clusters for different departments or
groups, super clusters are established by means of WANs
integrating the disparate clusters into a single more loosely
coupled computing confederation.
Constellations reflect a different balance of parallelism
than conventional commodity clusters. Instead of the pri-
mary source of parallelism being derived from the number
of nodes in the cluster, it is a product of the number of pro-
cessors in each SMP node. To be precise, a constellation
is a cluster in which there are more processors per SMP
node than there are nodes in the cluster. While the nodes of
a constellation must be COTS, its global interconnection
network can be of a custom design.
Of these, commodity clusters have emerged as the most
prevalent and rapidly growing segment of cluster comput-
ing systems and are the primary focus of this article.
A commodity cluster is a local computing system comprising
a set of independent computers and a network interconnecting
them. A cluster is local in that all of its component subsystems
are supervised within a single administrative domain, usually
residing in a single room and managed as a single computer sys-
tem. The constituent computer nodes are commercial-off-the-
shelf, are capable of full independent operation as is, and are of
a type ordinarily employed individually for stand-alone main-
stream workloads and applications. The nodes may incorporate
a single microprocessor or multiple microprocessors in a sym-
metric multiprocessor (SMP) configuration. The interconnection
network employs COTS LAN or SAN technology that may be
a hierarchy of or multiple separate network structures. A cluster
network is dedicated to the integration of the cluster compute
nodes and is separate from the cluster’s external (worldly) envi-
ronment. A cluster may be employed in many modes including
but not limited to high capability or sustained performance on
a single problem, high capacity or throughput on a job or pro-
cess workload, high availability through redundancy of nodes,
or high bandwidth through multiplicity of disks and disk access
or I/O channels.
Beowulf-class systems are commodity clusters employ-
ing personal computers (PCs) or small SMPs of PCs as
their nodes and using COTS LANs or SANs to provide
node interconnection. A Beowulf-class cluster is hosted
by an open source Unix-like operating system such as
Linux. A Windows-Beowulf system runs the mass-market
widely distributed Microsoft Windows operating systems
instead of Unix.
Proprietary clusters incorporate one or more com-
ponents that are custom-designed to give superior sys-
tem characteristics for product differentiation through
employing COTS components for the rest of the cluster
system. Most frequently proprietary clusters have incor-
porated custom-designed networks for tighter system cou-
pling (e.g., IBM SP-2). These networks may not be pro-
cured separately (unbundled) by customers or by OEMs
for inclusion in clusters comprising other than the specific
manufacturer’s products.
Workstation farms or open clusters are collections
of previously installed personal computing stations and
group shared servers, loosely coupled by means of one
or more LANs for access to common resources, that, al-
though primarily employed for separate and independent
operation, are occasionally used in concert to process sin-
gle coordinated distributed tasks. Workstation farms pro-
vide superior performance/price over even other cluster
types in that they exploit previously paid-for but other-
wise unused computing cycles. Because their intercon-
nection network is shared for other purposes and not op-
III. A BRIEF HISTORY OF
CLUSTER COMPUTING
Cluster computing originated within a few years of the in-
auguration of the modern electronic stored-program digi-
tal computer. SAGE was a cluster system built for NORAD
under an Air Force contract by IBM in the 1950s based on
the MIT Whirlwind computer architecture. Using vacuum
tube and core memory technologies, SAGE consisted of
a number of separate stand-alone systems cooperating to
manage early warning detection of hostile airborne intru-
sion of the North American continent. Early commercial
applications of clusters employed paired loosely coupled
computers with one performing user jobs while the other
managed various input/output devices.
Breakthroughs in enabling technologies occurred in the
late 1970s, both in hardware and software, that were to
have a significant long-term effect on future cluster com-
puting. The first generations of microprocessors were de-
signed with the initial development of VLSI technology
and by the end of the decade the first workstations and
personal computers were being marketed. The advent of
Ethernet provided the first widely used LAN technology,
creating an industry standard for a modest cost multidrop
interconnection medium and data transport layer. Also at
this time, the multitasking Unix operating system was cre-
ated at AT&T Bell Labs and extended with virtual memory
and network interfaces at UC Berkeley. Unix was adopted
in its various commercial and public domain forms by the
593871291.005.png
Cluster Computing
37
scientific and technical computing community as the prin-
cipal environment for a wide range of computing system
classes from scientific workstations to supercomputers.
During the decade of the 1980s, increased interest in
the potential of cluster computing was marked by impor-
tant experiments in research and industry. A collection of
160 interconnected Apollo workstations was employed as
a cluster to perform certain computational tasks by the
NSA. Digital Equipment Corporation developed a system
comprising interconnected VAX 11/750s, coining the term
cluster in the process. In the area of software, task man-
agement tools for employing workstation farms were de-
veloped, most notably the Condor software package from
the University of Wisconsin. The computer science re-
search community explored different strategies for paral-
lel processing during this period. From this early work
came the communicating sequential processes model
more commonly referred to as the message-passing model ,
which has come to dominate much of cluster computing
today.
An important milestone in the practical application of
the message passing model was the development of PVM
(parallel virtual machine), a library of linkable functions
that could allow routines running on separate but net-
worked computers to exchange data and coordinate their
operation. PVM, developed by Oak Ridge National Lab-
oratory, Emory University, and University of Tennessee,
was the first major open distributed software system to
be employed across different platforms. By the beginning
of the 1990s, a number of sites were experimenting with
clusters of workstations. At the NASA Lewis Research
Center, a small cluster of IBM workstations was used to
simulate the steady-state behavior of jet aircraft engines
in 1992. The NOW (Network of Workstations) project at
UC Berkeley began operation of the first of several clusters
there in 1993 that led to the first cluster to be entered on the
Top 500 list of the world’s most powerful computers. Also
in 1993, one of the first commercial SANs, Myrinet, was
introduced for commodity clusters, delivering improve-
ments in bandwidth and latency an order of magnitude
better than the Fast Ethernet LAN most widely used for
the purpose at that time.
The first Beowulf-class PC cluster was developed at
NASA’s Goddard Space Flight Center in 1994 using early
releases of the Linux operating system and PVM running
on 16 Intel 100-MHz 80486-based PCs connected by dual
10-Mbps Ethernet LANs. The Beowulf project developed
the necessary Ethernet driver software for Linux and ad-
ditional low-level cluster management tools and demon-
strated the performance and cost effectiveness of Beowulf
systems for real-world scientific applications. That year,
based on experience with many other message-passing
software systems, the parallel computing community set
out to provide a uniform set of message-passing semantics
and syntax and adopted the first MPI standard. MPI has
become the dominant parallel computing programming
standard and is supported by virtually all MPP and clus-
ter system vendors. Workstation clusters running the Sun
Microsystems Solaris operating system and NCSA’sPC
cluster running the Microsoft NT operating system were
being used for real-world applications.
In 1996, the Los Alamos National Laboratory and the
California Institute of Technology with the NASA Jet
Propulsion Laboratory independently demonstrated sus-
tained performance of more than 1-Gflops for Beowulf
systems costing under $50,000 and was awarded the
Gordon Bell Prize for price/performance for this accom-
plishment. By 1997 Beowulf-class systems of more than
100 nodes had demonstrated sustained performance of
greater than 10 Gflops with a Los Alamos system making
the Top 500 list. By the end of the decade, 28 clusters
were on the Top 500 list with a best performance of more
than 500 Gflops. In 2000, both DOE and NSF announced
awards to Compaq to implement their largest computing
facilities, both clusters of 30 and 6 Tflops, respectively.
IV. CLUSTER HARDWARE COMPONENTS
Cluster computing in general and commodity clusters in
particular are made possible by the existence of cost-
effective hardware components developed for mainstream
computing markets. The capability of a cluster is de-
termined to first order by the performance and stor-
age capacity of its processing nodes and the bandwidth
and latency of its interconnection network. Both cluster
node and cluster network technologies evolved during the
1990s and now exhibit gains of more than two orders of
magnitude in performance, memory capacity, disk stor-
age, and network bandwidth and a reduction of better
than a factor of 10 in network latency. During the same
period, the performance-to-cost ratio of node technology
has improved by approximately 1000. In this section, the
basic elements of the cluster node hardware and the alter-
natives available for interconnection networks are briefly
described.
A. Cluster Node Hardware
The processing node of a cluster incorporates all of
the facilities and functionality necessary to perform a
complete computation. Nodes are most often structured
either as uniprocessor systems or as SMPs although
some clusters, especially constellations, have incorpo-
rated nodes that were distributed shared memory (DSM)
systems. Nodes are distinguished by the architecture of
593871291.001.png
Zgłoś jeśli naruszono regulamin