dc.contributor |
Dr. Eric Rotenberg, Committee Chair |
|
dc.contributor |
Dr. Suleyman Sair, Committee Member |
|
dc.contributor |
Dr. Jun Xu, Committee Member |
|
dc.creator |
Parthasarathy, Sailashri |
|
dc.date |
2010-04-02T18:15:07Z |
|
dc.date |
2010-04-02T18:15:07Z |
|
dc.date |
2005-12-12 |
|
dc.date.accessioned |
2023-02-24T07:32:38Z |
|
dc.date.available |
2023-02-24T07:32:38Z |
|
dc.identifier |
etd-10312005-114614 |
|
dc.identifier |
http://www.lib.ncsu.edu/resolver/1840.16/2548 |
|
dc.identifier.uri |
http://localhost:8080/xmlui/handle/CUHPOERS/258873 |
|
dc.description |
A slipstream processor runs two copies of a program, one slightly ahead of the other, to achieve both higher single-program performance and transient fault tolerance. The leading copy of the program, or the Advanced Stream (A-stream), is accelerated by executing only a key subset of all instructions. The partial A-stream is speculative. Therefore, a second, complete copy of the program, called the Redundant Stream (R-stream), receives and checks all A-stream outcomes. The R-stream is also accelerated in this process. Together, the A-stream and R-stream finish faster than a single program copy would.
The partial redundancy between the A-stream and R-stream enables detection and recovery from transient faults. A transient fault that affects a redundantly executed instruction is easily detected, because its two instances will differ. However, a transient fault that affects a singly executed instruction (instruction removed from A-stream) is difficult to detect directly, because there is no redundant counterpart for comparison.
Actually, a fault in a singly executed instruction is indirectly detectable via a redundantly executed consumer. However, such a fault is unrecoverable since the fault is attributed to the consumer. Recovery is initiated too late, from the consumer instead of the faulty producer.
We propose a mechanism that conservatively attributes a detected fault, not to the redundantly executed instruction that detected it, but to its singly executed producer. Accordingly, recovery is initiated safely from the singly executed producer. Our approach works by forming a forward slice for each singly executed instruction, terminating in its direct/indirect redundantly executed consumers. Now, a consumer can mark its singly executed producer as faulty when its comparison mismatches.
A singly executed branch does not have a forward slice and thus is not checkable by consumers. However, the branch was removed from the A-stream precisely because its branch prediction is highly confident, hence, very likely correct. This likely correct branch prediction is treated as a second execution for the corresponding singly executed branch, different from true execution but nearly as effective for detecting faults.
In fact, the observation about confident branches extends to all redundantly executed instructions since the A-stream is predictive as a whole. All A-stream instructions are speculative, yet most likely correct in the fault-free case. This reveals an intriguing predictive checking paradigm.
Experiments using the SPEC95 and SPEC2K benchmarks show that coverage improves from 81% for baseline slipstream to 99% with only a small decrease in speedup. To obtain the same performance as baseline slipstream, we propose a relaxed checking model, which still achieves a much higher coverage of 95%. |
|
dc.rights |
I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
|
dc.subject |
transient fault tolerance |
|
dc.subject |
slipstream processors |
|
dc.title |
Improving Transient Fault Tolerance of Slipstream Processors |
|