Academic Journal

View in EDS
Linked Full Text

Mapreduce performance model for Hadoop 2.x

Bibliographic Details
Title:	Mapreduce performance model for Hadoop 2.x
Authors:	Glushkova, Daria, Jovanovic, Petar, Abelló Gamazo, Alberto
Contributors:	Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
Source:	Recercat. Dipósit de la Recerca de Catalunya instname UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC)
Publisher Information:	Elsevier BV, 2019.
Publication Year:	2019
Subject Terms:	Informàtica::Arquitectura de computadors::Arquitectures distribuïdes [Àrees temàtiques de la UPC], Programari lliure, Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC], Mean value analysis, 02 engineering and technology, Open source software, Cost effectiveness, MapReduce performance models, Hadoop 2.x, 0202 electrical engineering, electronic engineering, information engineering, Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes, Cost-eficàcia, Electronic data processing -- Distributed processing, MapReduce performance model, Queuing theory, Processament distribuït de dades
Description:	MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup.
Document Type:	Article Conference object
File Description:	application/pdf
Language:	English
ISSN:	0306-4379
DOI:	10.1016/j.is.2017.11.006
Access URL:	https://upcommons.upc.edu/bitstream/2117/124328/1/paper_is.pdf http://hdl.handle.net/2117/124328 https://hdl.handle.net/2117/113535 http://hdl.handle.net/2117/113535 https://upcommons.upc.edu/bitstream/2117/124328/1/paper_is.pdf https://upcommons.upc.edu/handle/2117/124328 https://www.sciencedirect.com/science/article/pii/S0306437917304659 https://doi.org/10.1016/j.is.2017.11.006 https://dblp.uni-trier.de/db/journals/is/is79.html#GlushkovaJA19
Rights:	Elsevier TDM CC BY NC ND
Accession Number:	edsair.doi.dedup.....4acfa8c9f5b044c09d00b904bbe5b4d1
Database:	OpenAIRE

View record at OpenAIRE

View record from ScienceDirect

Description
ISSN:	03064379
DOI:	10.1016/j.is.2017.11.006