A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος: A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one
Συγγραφείς: Ramón Cortés, Cristian, Lordan Gomis, Francesc, Ejarque Artigas, Jorge, Badia Sala, Rosa Maria
Συνεισφορές: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Πηγή: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Publication Status: Preprint
Στοιχεία εκδότη: Elsevier BV, 2020.
Έτος έκδοσης: 2020
Θεματικοί όροι: FOS: Computer and information sciences, Macrodades, Task-based workflows, Programming models, Parallel programming (Computer science), 02 engineering and technology, Programació en paral·lel (Informàtica), Streaming, Dataflows, Distributed computing, Big data, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Computer Science - Distributed, Parallel, and Cluster Computing, 0202 electrical engineering, electronic engineering, information engineering, Electronic data processing -- Distributed processing, Convergence HPC-Big Data, Distributed, Parallel, and Cluster Computing (cs.DC), Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Processament distribuït de dades
Περιγραφή: This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.
Accepted in Future Generation Computer Systems (FGCS). Licensed under CC-BY-NC-ND
Τύπος εγγράφου: Article
Περιγραφή αρχείου: application/pdf
Γλώσσα: English
ISSN: 0167-739X
DOI: 10.1016/j.future.2020.07.007
DOI: 10.48550/arxiv.2007.04939
DOI: 10.13039/501100003329
DOI: 10.13039/100010661
Σύνδεσμος πρόσβασης: http://arxiv.org/pdf/2007.04939
http://arxiv.org/abs/2007.04939
https://arxiv.org/abs/2007.04939
https://hdl.handle.net/2117/328850
https://doi.org/10.1016/j.future.2020.07.007
Rights: Elsevier TDM
arXiv Non-Exclusive Distribution
CC BY NC ND
Αριθμός Καταχώρησης: edsair.doi.dedup.....da5d1136a3a6f912a3aee6e3d658c5af
Βάση Δεδομένων: OpenAIRE
Περιγραφή
ISSN:0167739X
DOI:10.1016/j.future.2020.07.007