Academic Journal
A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one
| Τίτλος: | A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one |
|---|---|
| Συγγραφείς: | Ramón Cortés, Cristian, Lordan Gomis, Francesc, Ejarque Artigas, Jorge, Badia Sala, Rosa Maria |
| Συνεισφορές: | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
| Πηγή: | UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
| Publication Status: | Preprint |
| Στοιχεία εκδότη: | Elsevier BV, 2020. |
| Έτος έκδοσης: | 2020 |
| Θεματικοί όροι: | FOS: Computer and information sciences, Macrodades, Task-based workflows, Programming models, Parallel programming (Computer science), 02 engineering and technology, Programació en paral·lel (Informàtica), Streaming, Dataflows, Distributed computing, Big data, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Computer Science - Distributed, Parallel, and Cluster Computing, 0202 electrical engineering, electronic engineering, information engineering, Electronic data processing -- Distributed processing, Convergence HPC-Big Data, Distributed, Parallel, and Cluster Computing (cs.DC), Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Processament distribuït de dades |
| Περιγραφή: | This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end. Accepted in Future Generation Computer Systems (FGCS). Licensed under CC-BY-NC-ND |
| Τύπος εγγράφου: | Article |
| Περιγραφή αρχείου: | application/pdf |
| Γλώσσα: | English |
| ISSN: | 0167-739X |
| DOI: | 10.1016/j.future.2020.07.007 |
| DOI: | 10.48550/arxiv.2007.04939 |
| DOI: | 10.13039/501100003329 |
| DOI: | 10.13039/100010661 |
| Σύνδεσμος πρόσβασης: | http://arxiv.org/pdf/2007.04939 http://arxiv.org/abs/2007.04939 https://arxiv.org/abs/2007.04939 https://hdl.handle.net/2117/328850 https://doi.org/10.1016/j.future.2020.07.007 |
| Rights: | Elsevier TDM arXiv Non-Exclusive Distribution CC BY NC ND |
| Αριθμός Καταχώρησης: | edsair.doi.dedup.....da5d1136a3a6f912a3aee6e3d658c5af |
| Βάση Δεδομένων: | OpenAIRE |
| ISSN: | 0167739X |
|---|---|
| DOI: | 10.1016/j.future.2020.07.007 |