Cost-Effective Parallel Computational Electromagnetic Modeling
Daniel S. Katz*, Tom Cwik
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California
ABSTRACT
This paper discusses the use of Beowulf-
class computers in solving computational electro-
magnetic problems. Small Beowulf-class computers may
be purchased in December 1997 for approximately
$1,700 per node, including all computation and
communication components. Two codes are examined,
namely a finite-difference time-domain code, and a finite
element iterative solver. The performance of these codes
on 1 to 128 processors of a Beowulf-class computer is
compared with the performance of similar codes on the
same number of processors of a Cray T3D.
INTRODUCTION
This paper discusses the use of Beowulf-class computers
in solving computational electromagnetic (CEM) problems.
Beowulf-class computers are defined as piles of PCs running
LINUX, using fully mass-market, commercial, off-the-shelf
(M2COTS) components.
The Beowulf-class systems used in the work described in
this paper consist of Pentium Pro processors and fast Ethernet
(100Base-T) networking. Small systems can be purchased for
approximately $1,700 per node as of December 1997.
Two codes will be examined in this paper; an MPI
implementation of the EMCC finite-difference time-domain
(FDTD) code [1], and an iterative solver used in JPL's
PHOEBUS coupled finite element-integral equation package
[2].
The FDTD code is a fairly straightforward domain
decomposition code that uses explicit time-stepping, with a
majority of the floating point operations being required by
stencils based on the second order differences in space.
Communication is required by ghost cells on the edges of
each processor's domain, and may be used for updates of the
edges of the entire domain.
The iterative solver from PHOEBUS requires a sparse
matrix (A) that has been reordered to minimize and equalize
row bandwidth. The iterative solution of a AX=Y using a
QMR algorithm can then be seen primarily as a series of
sparse matrix - dense vector multiplies. Some additional
work in the form of vector operations, such as dot products,
norms, scales, and sums is also performed, but the matrix
vector multiply is the time-consuming work. This is also
where the majority of the communication occurs, in copying
parts of the vector multiplicand from other processors.
PREVIOUS RESULTS
Previous work has demonstrated good performance for
FDTD codes and iterative solvers on a 16 node Beowulf-class
system (Hyglac, located at JPL) [3]. It was found that both
codes could achieve performance within a factor of two or
three of the performance of the same code on 16 processors of
the T3D, with much lower cost.
NEW WORK
This paper will discuss results of running the FDTD code
and the iterative solver on 1 to 128 nodes of a Beowulf-class
system located at Caltech (named Naegling). Preliminary
results show substantially better communications
performance on this system than was observed on Hyglac. In
addition, system software has changed in such a way as to
provide better computation rates, again compared with
Hyglac. Naegling has previously provided performance of
over 10 GFLOPS running an n-body gravitational
simulation, and should also provide good performance for
these CEM computations. Naegling run times will be
compared with both those of a Cray T3D, and those of
Hyglac, and reasons for the differences will be discussed.
REFERENCES
[1] A. C. Woo and K. C. Hill, "The EMCC/ARPA
Massively Parallel Electromagnetic Scattering Project,"
NAS Tech. Report NAS-96-008. NASA Ames Research
Center, 1996.
[2] T. Cwik, D. S. Katz, C. Zuffada, and V. Jamnejad, "The
Application of Scalable Distributed Memory Computers
to the Finite Element Modeling of Electromagnetic
Scattering and Radiation," Int. J. Num. Meth. Eng., in
press.
[3] D. S. Katz, "High-Performance Computational
Electromagnetic Modeling Using Low-Cost Parallel
Computers," IEEE AP-S Symposium/URSI Radio
Science Meeting, Montreal, Canada, July 1997.
*
D. S. Katz, e-mail d.katz@ieee.org, phone 818-354-7359, fax 818-
393-3134.
The research described in this paper was performed at the Jet
Propulsion Laboratory, under contract to the National Aeronautics and
Space Administration, and at the California Institute of Technology. The
Cray T3D was provided with funding from the NASA offices of Mission to
Planet Earth, Aeronautics, and Space Science. This work was funded by
the NASA High Performance Computing and Communications Earth and
Space Sciences Project.