Bibliographic Details
| Title: |
Automating Extract, Transform, Load (ETL) Pipelines using Machine Learning Triggered Workflow Optimization |
| Authors: |
Samyukta Rongala |
| Source: |
International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 3 (2024); 4427–4434 |
| Publisher Information: |
International Journal of Intelligent Systems and Applications in Engineering, 2024. |
| Publication Year: |
2024 |
| Subject Terms: |
Data Integration, Data Engineering Solutions, Data Processing, Extract, Transform, Load (ETL) Pipeline Automation, Machine Learning, Workflow Optimization |
| Description: |
Consideration of the enhanced data processing requirements in the contemporary firm underlines the need to improve methods that can be used to automate ETL processes. This paper provides a machine learning framework used to automate most of the ETL process hence decreasing the number of steps performed manually. This takes advantage of some of the most innovative and sophisticated machine learning technologies to improve the efficiency of data extraction, transformation rules of the data and the loading of the data across the heterogonous systems. It uses anomaly detection models in aspects of data quality with a 95% anomaly detection level and it uses probabilistic imputation in aspect of data loss through achieving only 1% making an 80% enhancement as compared to using traditional methodologies. Algorithms dynamically enhance the component recognition rate to about 98% to enable harmonization of dissimilar datasets. The performance evaluation of the proposed approach resulted in an average saving of 36.49% in total ETL time and 40% in the overall transformation time. Confirming the results of simple scalability tests, it is possible to achieve a constant decrease in the time taken to process the records by 37%-40%, when working with data sets of between 1 million and 10 million records. The presented results demonstrate the value of the proposed framework for improving development cycles, reducing development costs, and ensuring efficient scaling for data-intensive applications. The research aims to identify the following objectives to capture the transformative functionalities of machine learning in enhancing ETL operational processes and present ideal solutions for current complexities encountered in data engineering. |
| Document Type: |
Article |
| File Description: |
application/pdf |
| Language: |
English |
| ISSN: |
2147-6799 |
| Access URL: |
https://www.ijisae.org/index.php/IJISAE/article/view/7193 |
| Rights: |
CC BY SA |
| Accession Number: |
edsair.issn21476799..92a6db94f7531e175357877c448deb37 |
| Database: |
OpenAIRE |