Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος: Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading
Συγγραφείς: Sanem Arslan, Osman Unsal
Συνεισφορές: Barcelona Supercomputing Center
Πηγή: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Στοιχεία εκδότη: Springer Science and Business Media LLC, 2021.
Έτος έκδοσης: 2021
Θεματικοί όροι: Programari, Àrees temàtiques de la UPC::Informàtica::Enginyeria del software, Informàtica::Enginyeria del software [Àrees temàtiques de la UPC], Soft error reliability, Fault tolerance, 02 engineering and technology, Software reliability, 01 natural sciences, 0103 physical sciences, Fault tolerance (Engineering), 0202 electrical engineering, electronic engineering, information engineering, Redundant multithreading, Multitasking (Computer science), Soft errors (Computer science)
Περιγραφή: Redundant multithreading (RMT) is an effective reliability solution that provides thread-level replication; however, it imposes additional overheads in terms of performance loss or energy consumption. Partial-RMT is an alternative solution that provides partial redundancy of an executing thread to reduce such overheads while trading off full coverage from faults. In this study, we propose a software-level RMT approach that offers lightweight replication of partial code regions within the same application process. Our software-level RMT approach is particularly suitable for applications with varying code criticality, where we determine the critical code regions by performing a fault injection campaign in addition to execution time profile analysis. Using the results of the previous step, the application programmer annotates the source code to indicate the specific code regions that should be executed redundantly without re-implementing the application program from scratch. Our lightweight software-level RMT tool improves the average silent data corruption (SDC) rate of 30 applications of the PolyBench benchmark suite by around 7.6× with average performance and energy consumption overheads of 22 and 37%, respectively, compared to the original version of the program.
This work was completed, while the first author, Sanem Arslan, was visiting researcher at Barcelona Supercomputing Center, Barcelona, Spain. Sanem Arslan had received financial support from the Scientific and Technological Research Council of Turkey (TUBITAK) under the program BIDEB 2219 during this work.
Τύπος εγγράφου: Article
Other literature type
Περιγραφή αρχείου: application/pdf
Γλώσσα: English
ISSN: 1573-0484
0920-8542
DOI: 10.1007/s11227-021-03804-6
Σύνδεσμος πρόσβασης: https://link.springer.com/article/10.1007/s11227-021-03804-6
https://dblp.uni-trier.de/db/journals/tjs/tjs77.html#ArslanU21
https://avesis.marmara.edu.tr/yayin/af058c23-7df4-475a-911a-5cfd29355fa5/efficient-selective-replication-of-critical-code-regions-for-sdc-mitigation-leveraging-redundant-multithreading
https://upcommons.upc.edu/handle/2117/345961
https://aperta.ulakbim.gov.tr/record/231054
Rights: Springer TDM
CC BY
Αριθμός Καταχώρησης: edsair.doi.dedup.....57e6bf0b5076104a287a8c8f280ffadf
Βάση Δεδομένων: OpenAIRE
Περιγραφή
ISSN:15730484
09208542
DOI:10.1007/s11227-021-03804-6