skip to main content
10.1145/3452383.3452403acmotherconferencesArticle/Chapter ViewAbstractPublication PagesisecConference Proceedingsconference-collections
short-paper

Re-Imagining data analytics software development

Published:26 April 2021Publication History

ABSTRACT

Creation of data analytics pipeline is a tedious task. The algorithm search space for creating a suitable solution for a given goal in a given constrained infrastructure is generally very large. The exploratory work to choose the best possible solution is an effort-, time- and intellect-intensive process. The current industry practice largely relies on the domain experts for this work. To improve a domain expert’s productivity, we propose a model- and rule-based system to automate the process of creation of data analytics pipeline. The proposed system provides a mechanism to specify domain knowledge in the form of an object model and a set of rules defined over it. Recommendations are given to choose suitable algorithm/s for carrying out various data analytics tasks based on the problem context. On successful creation of the pipeline, the system generates pipeline code. Moreover, the system also generates a trace data to help in cognitive knowledge upgrade. We discuss the approach using case study of sensor data-based health monitoring system and showcase its efficacy and lesson learnt.

Skip Supplemental Material Section

Supplemental Material

References

  1. Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, Giuseppe Tradigo, and Pierangelo Veltri. 2007. Using ontologies for preprocessing and mining spectra data on the Grid. Future Generation Computer Systems 23, 1 (2007), 55–60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michel Charest, Sylvain Delisle, Ofelia Cervantes, and Yanfen Shen. 2008. Bridging the gap between data mining and decision support: A case-based reasoning and ontology approach. Intelligent Data Analysis 12, 2 (2008), 211–236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Radwa Elshawi, Mohamed Maher, and Sherif Sakr. 2019. Automated machine learning: State-of-the-art and open challenges. arXiv preprint arXiv:1906.02287(2019).Google ScholarGoogle Scholar
  4. Narendhar Gugulothu, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2018. Sparse neural networks for anomaly detection in high-dimensional time series. In Proceedings of the AI4IOT workshop in conjunction with ICML, IJCAI and ECAI, Stockholm, Sweden. 13–15.Google ScholarGoogle Scholar
  5. Narendhar Gugulothu, Vishnu Tv, Pankaj Malhotra, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2017. Predicting remaining useful life using time series embeddings based on recurrent neural networks. arXiv preprint arXiv:1709.01073(2017).Google ScholarGoogle Scholar
  6. Narendhar Gugulothu, TV Vishnu, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2018. On practical aspects of using RNNs for fault detection in sparsely-labeled multi-sensor time series. In Annual Conference of the PHM Society, Vol. 10.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chen Jin, Luo De-Lin, and Mu Fen-Xiang. 2009. An improved ID3 decision tree algorithm. In 2009 4th International Conference on Computer Science & Education. IEEE, 127–130.Google ScholarGoogle Scholar
  10. Nikolay Laptev, Saeed Amizadeh, and Ian Flint. 2015. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1939–1947.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mao-Song Lin, Hui Zhang, and Zhang-Guo Yu. 2006. An ontology for supporting data mining process. In The Proceedings of the Multiconference on” Computational Engineering in Systems Applications”, Vol. 2. IEEE, 2074–2077.Google ScholarGoogle ScholarCross RefCross Ref
  12. Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2016. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148(2016).Google ScholarGoogle Scholar
  13. Mark Proctor, Michael Neale, Peter Lin, and Michael Frandsen. 2008. Drools documentation. JBoss 5, 05 (2008), 2008.Google ScholarGoogle Scholar
  14. Jürgen Schmidhuber and Sepp Hochreiter. 1997. Long short-term memory. Neural Comput 9, 8 (1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 847–855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Quanming Yao, Mengshuo Wang, Yuqiang Chen, Wenyuan Dai, Yu-Feng Li, Wei-Wei Tu, Qiang Yang, and Yang Yu. 2018. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306(2018).Google ScholarGoogle Scholar

Index Terms

  1. Re-Imagining data analytics software development
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ISEC '21: Proceedings of the 14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)
            February 2021
            185 pages
            ISBN:9781450390460
            DOI:10.1145/3452383

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 April 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate76of315submissions,24%
          • Article Metrics

            • Downloads (Last 12 months)1
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format