J. Beahan, L. Edmonds, R. D. Ferraro, A. Johnston, D. S. Katz and R. R. Some
Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, CA 91109-8099
The goal of the NASA HPCC Remote Exploration and Experimentation (REE) Project is to transfer commercial supercomputing technology into space. The project will use state of the art, low-power, non-radiation-hardened, Commercial Off-The-Shelf (COTS) hardware chips and COTS software to the maximum extent possible, and will rely on Software-Implemented Fault Tolerance (SIFT) to provide the required levels of availability and reliability. In this paper, we outline the methodology used to develop a detailed radiation fault model for the REE Testbed architecture. The model addresses the effects of energetic protons and heavy ions which cause Single Event Upset (SEU) and Single Event Multiple Upset (SEMU) events in digital logic devices and which are expected to be the primary fault generation mechanism. Unlike previous modeling efforts, this model will address fault rates and types in computer subsystems at a sufficiently fine level of granularity (i.e., the register level) that specific software and operational errors can be derived. We present the current state of the model, model verification activities and results to date, and plans for the future. Finally, we explain the methodology by which this model will be used to derive application-level error effects sets. These error effects sets will be used in conjunction with our Testbed fault injection capabilities and our applications'9 mission scenarios to replicate the predicted fault environment on our suite of onboard applications.