Conference
Towards studying the effect of compiler optimizations and software randomization on GPU reliability
| Title: | Towards studying the effect of compiler optimizations and software randomization on GPU reliability |
|---|---|
| Authors: | Castillón, Pau López, Hernández, Xavier Caricchio, Kosmidis, Leonidas |
| Contributors: | Pau López Castillón and Xavier Caricchio Hernández and Leonidas Kosmidis |
| Source: | UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
| Publisher Information: | Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2025. |
| Publication Year: | 2025 |
| Subject Terms: | reliability, Àrees temàtiques de la UPC::Informàtica::Enginyeria del software, Software randomization, error rate, software randomization, ddc:004, Error rate, Reliability, Graphics processing units |
| Description: | The evolution of Graphics Processing Unit (GPU) compilers has facilitated the support for general-purpose programming languages across various architectures. The NVIDIA CUDA Compiler (NVCC) employs multiple compilation levels prior to generating machine code, implementing intricate optimizations to enhance performance. These optimizations influence the manner in which software is mapped to the underlying hardware, which can also impact GPU reliability. TASA is a source-to-source code randomization tool designed to alter the mapping of software onto the underlying hardware. It achieves this by generating random permutations of variable and function declarations, thereby introducing random padding between declarations of different types and modifying the program memory layout. Since this modifies their location in the memory, it also modifies their cache placement, affecting both their execution time (due to the different conflicts between them, which result in a different amount of cache misses in every execution), as well as their lifetime in the cache. In this work, which is part of the HiPEAC Student Challenge 2025, we first examine the reproducibility of a subset of data presented in the ACM TACO paper "Assessing the Impact of Compiler Optimizations on GPU Reliability" [Santos et al., 2024], and second we extend it by combining it with our proposal of software randomization. The paper indicates that the -O3 optimization flag facilitates an increased workload before failures occur within the application. By employing TASA, we investigate the impact of GPU randomization on reliability and performance metrics. By reproducing the results of the paper on a different GPU platform, we observe the same trend as reported in the original publication. Moreover, our preliminary results with the application of software randomization show in several cases an improved Mean Waiting Before Failure (MWBF) compared to the original source code. This work was supported by the ESA funded project “Open Source Software Randomisation Framework for Probabilistic WCET Prediction and Security on (multicore) CPUs, GPUs and Accelerators” as well as European Commission’s METASAT Horizon Europe project (grant agreement 101082622). Moreover, it was also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under the grant IJC2020-045931-I. |
| Document Type: | Conference object Article |
| File Description: | application/pdf |
| Language: | English |
| DOI: | 10.4230/oasics.parma-ditam.2025.4 |
| Access URL: | https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2025.4 |
| Rights: | CC BY |
| Accession Number: | edsair.dedup.wf.002..cec53899b6e3cc6e48233c0eaef77bd3 |
| Database: | OpenAIRE |
Be the first to leave a comment!