SCU: a GPU stream compaction unit for graph processing

Bibliographic Details
Title:	SCU: a GPU stream compaction unit for graph processing
Authors:	Segura Salvador, Albert, Arnau Montañés, José María, González Colás, Antonio María
Contributors:	Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Source:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) Recercat. Dipósit de la Recerca de Catalunya instname
Publisher Information:	ACM, 2019.
Publication Year:	2019
Subject Terms:	Computers, GPGPU, Graph processing, 02 engineering and technology, Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo, Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC], Ordinadors, Stream compaction, Àrees temàtiques de la UPC::Informàtica::Hardware, Imatges -- Processament -- Tècniques digitals, 0202 electrical engineering, electronic engineering, information engineering, Image processing -- Digital techniques, Informàtica::Hardware [Àrees temàtiques de la UPC]
Description:	Graph processing algorithms are key in many emerging applications in areas such as machine learning and data analytics. Although the processing of large scale graphs exhibits a high degree of parallelism, the memory access pattern tend to be highly irregular, leading to poor GPGPU efficiency due to memory divergence. To ameliorate this issue, GPGPU applications perform a stream compaction operation each iteration of the algorithm to extract the subset of active nodes/edges, so subsequent steps work on compacted dataset. We show that GPGPU architectures are inefficient for stream compaction, and propose to offload this task to a programmable Stream Compaction Unit (SCU) tailored to the requirements of this kernel. The SCU is a small unit tightly integrated in the GPU that efficiently gathers the active nodes/edges into a compacted array in memory. Applications can make use of it through a simple API. The remaining steps of the graph-based algorithm are executed on the GPU cores taking benefit of the large amount of parallelism in the GPU, but they operate on the SCU-prepared data and achieve larger memory coalescing and, hence, much higher efficiency. Besides, the SCU performs filtering of repeated and already visited nodes during the compaction process, significantly reducing GPGPU workload, and writes the compacted nodes/edges in an order that improves memory coalescing by reducing memory divergence. We evaluate the performance of a state-of-the-art GPGPU architecture extended with our SCU for a wide variety of applications. Results show that for high-performance and for low-power GPU systems the SCU achieves speedups of 1.37x and 2.32x, 84.7% and 69% energy savings, and an area increase of 3.3% and 4.1% respectively.
Document Type:	Article Conference object
File Description:	application/pdf
DOI:	10.1145/3307650.3322254
Access URL:	http://hdl.handle.net/2117/176876 https://upcommons.upc.edu/handle/2117/176876 https://dl.acm.org/citation.cfm?id=3322254 https://dblp.uni-trier.de/db/conf/isca/isca2019.html#SeguraA019 https://doi.org/10.1145/3307650.3322254 https://hdl.handle.net/2117/176876 https://doi.org/10.1145/3307650.3322254
Rights:	URL: https://www.acm.org/publications/policies/copyright_policy#Background
Accession Number:	edsair.doi.dedup.....a94ecd93fa2e81fd091c772cc2ff9644
Database:	OpenAIRE

View record at OpenAIRE

Description
DOI:	10.1145/3307650.3322254