Software Fault Tolerance for Low- to-Moderate Radiation Environments

R. Sengupta, Raytheon ITSS
J. D. Offenberg, Raytheon ITSS
D. J. Fixsen, Raytheon ITSS
D. S. Katz, JPL
P. L. Springer, JPL
H. S. Stockman, STScI
M. A. Nieto-Santisteban, STScI
R. J. Hanish, STScI
J. C. Mather, GSFC

The primary intention of NASA's Remote Exploration and Experimentation (REE) project is to use COTS scalable, low power, fault tolerant, high performance computation in space. Most of the faults caused by the radiation environment of the regions of space for which REE is developing a system (Deep Space, Low Earth Orbit) are transient, Single Event Effects (SEEs). Some of these faults can cause errors at different application levels. System and applications software can potentially detect and correct some or many of these errors. Here we discuss different software fault tolerance approaches such as replication, voting and masking with a focus on ABFT. Combined approaches of software and hardware like fault avoidance, redundancy and reconfiguration are discussed. These approaches would show the tradeoffs between reliability, power, cost and computation power for spacecraft in a low-to-moderate radiation environment.