skip to main content
research-article

Improved Regression Analysis with Ensemble Pipeline Approach for Applications across Multiple Domains

Published: 09 March 2024 Publication History

Abstract

In this research, we introduce two new machine learning regression methods: the Ensemble Average and the Pipelined Model. These methods aim to enhance traditional regression analysis for predictive tasks and have undergone thorough evaluation across three datasets, Kaggle House Price, Boston House Price, and California Housing, using various performance metrics. The results consistently show that our models outperform existing methods in terms of accuracy and reliability across all three datasets. The Pipelined Model, in particular, is notable for its ability to combine predictions from multiple models, leading to higher accuracy and impressive scalability. This scalability allows for their application in diverse fields like technology, finance, and healthcare. Furthermore, these models can be adapted for real-time and streaming data analysis, making them valuable for applications such as fraud detection, stock market prediction, and IoT sensor data analysis. Enhancements to the models also make them suitable for big data applications, ensuring their relevance for large datasets and distributed computing environments. It is important to acknowledge some limitations of our models, including potential data biases, specific assumptions, increased complexity, and challenges related to interpretability when using them in practical scenarios. Nevertheless, these innovations advance predictive modeling, and our comprehensive evaluation underscores their potential to provide increased accuracy and reliability across a wide range of applications. The results indicate that the proposed models outperform existing models in terms of accuracy and robustness for all three datasets. The source code can be found at https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main

References

[1]
Devansh Arpit, Huan Wang, Yingbo Zhou, and Caiming Xiong. 2022. Ensemble of averages: Improving model selection and boosting performance in domain generalization. Advances in Neural Information Processing Systems 35 (2022), 8265–8277.
[2]
K. C. Arum, F. I. Ugwuowo, H. E. Oranye, T. O. Alakija, T. E. Ugah, and O. C. Asogwa. 2023. Combating outliers and multicollinearity in linear regression model using robust Kibria-Lukman mixed with principal component estimator, simulation and computation. Scientific African (2023), e01566.
[3]
Ali Bager, Monica Roman, Meshal Algelidh, and Bahr Mohammed. 2017. Addressing multicollinearity in regression models: A ridge regression application. Journal of Social and Economic Statistics 6, 1 (July 2017), 30–45. https://ideas.repec.org/a/aes/jsesro/v6y2017i1p30-45.html
[4]
Thupakula Bhaskar, S Arumai Shiney, S. Babitha Rani, K. Maheswari, Samrat Ray, and V. Mohanavel. 2022. Usage of ensemble regression technique for product price prediction. In Proceedings of the 4th International Conference on Inventive Research in Computing Applications (ICIRCA ’22). 1439–1445.
[5]
Jireh Yi-Le Chan, Steven Mun Hong Leow, Khean Thye Bea, Wai Khuen Cheng, Seuk Wai Phoong, Zeng-Wei Hong, and Yen-Lin Chen. 2022. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics 10, 8 (2022), 1283.
[6]
Yucong Chen. 2023. Analysis and forecasting of california housing. Highlights Bus. Econ. Manage. 3 (2023), 128–135.
[7]
Handy Darmawan, Cherise Earlene, Eric, Budi Juarto, and Felix Indra Kurniadi. 2022. Comparison several regression algorithms for prediction of job satisfaction. In Proceedings of the International Conference on ICT for Smart Society (ICISS ’22). 1–6.
[8]
Changlin Han, Zhiyong Peng, Yadong Liu, Jingsheng Tang, Yang Yu, and Zongtan Zhou. 2023. Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning. Neurocomputing (2023).
[9]
Monalisa Jena, Asit Patra, Bijaya Ku. Sahoo, and Satchidananda Dehuri. 2022. Hybrid regression tree. In Proceedings of the OITS International Conference on Information Technology (OCIT ’22). 250–255.
[10]
Satish R. Jondhale, Amruta S. Jondhale, Pallavi S. Deshpande, and Jaime Lloret. 2021. Improved trilateration for indoor localization: Neural network and centroid-based approach. Int. J. Distrib. Sens. Netw. 17, 11 (2021), 15501477211053997.
[11]
Satish R. Jondhale, Manish Sharma, R. Maheswar, Raed Shubair, and Amruta Shelke. 2020. Comparison of neural network training functions for rssi based indoor localization problem in WSN. In Handbook of Wireless Sensor Networks: Issues and Challenges in Current Scenario’s. 112–133.
[12]
Satish R. Jondhale, Manoj A. Wakchaure, Balasaheb S. Agarkar, and Sagar B. Tambe. 2022. Improved generalized regression neural network for target localization. Wireless Pers. Commun. 125, 2 (2022), 1677–1693.
[13]
Jitendra Khatti and Kamaldeep Grover. 2022. A study of relationship among correlation coefficient, performance, and overfitting using regression analysis. Int. J. Sci. Eng. Res. 13 (052022), 1074–1085.
[14]
G. Kiran Kumar, D. Malathi Rani, Neeraja Koppula, and Syed Ashraf. 2021. Prediction of house price using machine learning algorithms. In Proceedings of the 5th International Conference on Trends in Electronics and Informatics (ICOEI ’21). IEEE, 1268–1271.
[15]
Joao Mendes-Moreira, Carlos Soares, Alípio Mário Jorge, and Jorge Freire De Sousa. 2012. Ensemble approaches for regression: A survey. ACM Comput. Surv. 45, 1 (2012), 1–40.
[16]
Partha P. Mitra. 2019. Understanding overfitting peaks in generalization error: Analytical risk curves for l2 and l1 penalized interpolation. CoRR abs/1906.03667 (2019). arXiv:1906.03667. http://arxiv.org/abs/1906.03667
[17]
Nikhil Pachauri and Chang Wook Ahn. 2022. Regression tree ensemble learning-based prediction of the heating and cooling loads of residential buildings. In Building Simulation, Vol. 15. Springer, 2003–2017.
[18]
Lele Peng, Shubin Zheng, Qianwen Zhong, Xiaodong Chai, and Jianhui Lin. 2023. A novel bagged tree ensemble regression method with multiple correlation coefficients to predict the train body vibrations using rail inspection data. Mech. Syst. Sign. Process. 182 (2023), 109543.
[19]
Saptarsi Sanyal, Saroj Kumar Biswas, Dolly Das, Manomita Chakraborty, and Biswajit Purkayastha. 2022. Boston house price prediction using regression models. In Proceedings of the 2nd International Conference on Intelligent Technologies (CONIT ’22). IEEE, 1–6.
[20]
George Seber and George A. F. Seber. 2015. Nonlinear regression models. In The Linear Model and Hypothesis: A General Unifying Theory. 117–128.
[21]
Dheeraj Vishwanatha Shetty, B. Prakash Rao, Chandra Prakash, and S. Vaibhava. 2020. Multiple regression analysis to predict the value of a residential building and to compare with the conventional method values. In Journal of Physics: Conference Series, Vol. 1706. IOP Publishing, Bristol, 012118.
[22]
Abir Smiti. 2020. A critical overview of outlier detection methods. Comput. Sci. Rev. 38 (2020), 100306.
[23]
A. So, T. Joseph, R. T. John, A. Worsley, and S. Asare. 2020. The Data Science Workshop: Learn how you can build machine learning models and create your own real-world data science projects. Chapters 1-2. Packt Publishing.
[24]
Zhuonan Yu. 2023. Use of logistic regression, nonlinear regression, and linear regression in lung cancer research. In Proceedings of the 2nd International Conference on Biological Engineering and Medical Science (ICBioMed ’22), Vol. 12611. SPIE, 1375–1379.
[25]
Junpeng Zhang, Yue Ju, Biqiang Mu, Renxin Zhong, and Tianshi Chen. 2023. An efficient implementation for spatial–temporal gaussian process regression and its applications. Automatica 147 (2023), 110679.

Cited By

View all
  • (2024)Application of a Statistical Regression Technique for Dynamic Analysis of Submarine PipelinesJournal of Marine Science and Engineering10.3390/jmse1206095512:6(955)Online publication date: 6-Jun-2024
  • (2024)Impact of AI and Dynamic Ensemble Techniques in Enhancing Healthcare Services: Opportunities and Ethical ChallengesIEEE Access10.1109/ACCESS.2024.344381212(141064-141087)Online publication date: 2024

Index Terms

  1. Improved Regression Analysis with Ensemble Pipeline Approach for Applications across Multiple Domains

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 3
      March 2024
      277 pages
      EISSN:2375-4702
      DOI:10.1145/3613569
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 March 2024
      Online AM: 08 February 2024
      Accepted: 30 January 2024
      Revised: 01 December 2023
      Received: 08 July 2023
      Published in TALLIP Volume 23, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Ensemble average
      2. pipeline scheme
      3. random forest
      4. light gradient boost
      5. XgBoost

      Qualifiers

      • Research-article

      Funding Sources

      • Government of Gujarat, India

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)140
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Application of a Statistical Regression Technique for Dynamic Analysis of Submarine PipelinesJournal of Marine Science and Engineering10.3390/jmse1206095512:6(955)Online publication date: 6-Jun-2024
      • (2024)Impact of AI and Dynamic Ensemble Techniques in Enhancing Healthcare Services: Opportunities and Ethical ChallengesIEEE Access10.1109/ACCESS.2024.344381212(141064-141087)Online publication date: 2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media