Academic Journal

Automating Extract, Transform, Load (ETL) Pipelines using Machine Learning Triggered Workflow Optimization

Bibliographic Details
Title: Automating Extract, Transform, Load (ETL) Pipelines using Machine Learning Triggered Workflow Optimization
Authors: Samyukta Rongala
Source: International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 3 (2024); 4427–4434
Publisher Information: International Journal of Intelligent Systems and Applications in Engineering, 2024.
Publication Year: 2024
Subject Terms: Data Integration, Data Engineering Solutions, Data Processing, Extract, Transform, Load (ETL) Pipeline Automation, Machine Learning, Workflow Optimization
Description: Consideration of the enhanced data processing requirements in the contemporary firm underlines the need to improve methods that can be used to automate ETL processes. This paper provides a machine learning framework used to automate most of the ETL process hence decreasing the number of steps performed manually. This takes advantage of some of the most innovative and sophisticated machine learning technologies to improve the efficiency of data extraction, transformation rules of the data and the loading of the data across the heterogonous systems. It uses anomaly detection models in aspects of data quality with a 95% anomaly detection level and it uses probabilistic imputation in aspect of data loss through achieving only 1% making an 80% enhancement as compared to using traditional methodologies. Algorithms dynamically enhance the component recognition rate to about 98% to enable harmonization of dissimilar datasets. The performance evaluation of the proposed approach resulted in an average saving of 36.49% in total ETL time and 40% in the overall transformation time. Confirming the results of simple scalability tests, it is possible to achieve a constant decrease in the time taken to process the records by 37%-40%, when working with data sets of between 1 million and 10 million records. The presented results demonstrate the value of the proposed framework for improving development cycles, reducing development costs, and ensuring efficient scaling for data-intensive applications. The research aims to identify the following objectives to capture the transformative functionalities of machine learning in enhancing ETL operational processes and present ideal solutions for current complexities encountered in data engineering.
Document Type: Article
File Description: application/pdf
Language: English
ISSN: 2147-6799
Access URL: https://www.ijisae.org/index.php/IJISAE/article/view/7193
Rights: CC BY SA
Accession Number: edsair.issn21476799..92a6db94f7531e175357877c448deb37
Database: OpenAIRE
Description
ISSN:21476799