skip to main content
10.1145/3529399.3529410acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmltConference Proceedingsconference-collections
research-article

Transferring Learnt Features from Deep Neural Networks trained on Structured Data

Authors Info & Claims
Published:10 June 2022Publication History

ABSTRACT

Structured data is a widely used type of data with numerous applications in training machine learning models. However, training deep learning models require a lot of data, which may not be present for all use-cases. In addition to this, training these models could get very expensive as the data increases. Transfer learning can be a solution to these problems. It involves reusing features from trained models on the same or similar tasks, however, it has not been explored much for structured data yet. In this paper, an approach is proposed to transfer learnt features from the embedding layers present commonly in deep neural networks for structured data along with a format for effective portability of these trained embeddings. Experimentally, it is observed that the proposed method resulted in faster training and the model parameters start at a better point compared to parameters of a randomly initialized model, resulting in lesser training costs as well.

References

  1. Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A survey on deep transfer learning. (2018). arXiv: 1808.01974 [cs.LG].Google ScholarGoogle Scholar
  2. Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. (2020). arXiv: 1911.02685 [cs.LG].Google ScholarGoogle Scholar
  3. Shuteng Niu, Yongxin Liu, Jian Wang, and Houbing Song. 2020. A decade survey of transfer learning (2010–2020). IEEE Transactions on Artificial Intelligence, 1, 151–166.Google ScholarGoogle ScholarCross RefCross Ref
  4. Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. 2016. Disentangling the structure of tables in scientific literature. In Natural Language Processing and Information Systems. Elisabeth Métais, Farid Meziane, Mohamad Saraee, Vijayan Sugumaran, and Sunil Vadera, editors. Springer International Publishing, Cham, 162–174. isbn: 978-3-319-41754-7.Google ScholarGoogle Scholar
  5. Yeliz Yesilada, Robert Stevens, Carole Goble, and Shazad Hussein. 2003. Rendering tables in audio: the interaction of structure and reading styles. SIGACCESS Access. Comput., 77–78, (September 2003), 16–23. issn: 1558-2337. doi: 10.1145/1029014.1028635. https://doi.org/10.1145/1029014. 1028635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chris Brink, Wolfram Kahl, and Gunther Schmidt, editors. 1997. Tabular representations in relational documents. Relational Methods in Computer Science. Springer Vienna, Vienna, 184–196. isbn: 978-3-7091-6510-2. doi: 10.1007/978-3-7091-6510-2_12. https://doi.org/10.1007/978-3-7091-6510- 2_12.Google ScholarGoogle Scholar
  7. Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, and Yifan Gong. 2013. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). (May 2013). https://www.microsoft. com/en- us/research/publication/cross- language- knowledge- transfer- using- multilingual- deep- neural- network- with- shared- hidden- layers/.Google ScholarGoogle ScholarCross RefCross Ref
  8. Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1717–1724. doi: 10.1109/CVPR.2014.222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. 2021. What is being transferred in transfer learning? (2021). arXiv: 2008.11687 [cs.LG].Google ScholarGoogle Scholar
  10. Akram Farhadi, David Chen, Rozalina McCoy, Christopher Scott, John A. Miller, Celine M. Vachon, and Che Ngufor. 2019. Breast cancer classification using deep transfer learning on structured healthcare data. English (US). In Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019 (Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019). Lisa Singh, Richard De Veaux, George Karypis, Francesco Bonchi, and Jennifer Hill, editors. 6th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019 ; Conference date: 05-10-2019 Through 08-10-2019. Institute of Electrical and Electronics Engineers Inc., (October 2019), 277–286. doi: 10.1109/DSAA.2019.00043.Google ScholarGoogle Scholar
  11. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: a highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors. Volume 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.Google ScholarGoogle Scholar
  12. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, San Francisco, California, USA, 785–794. isbn: 9781450342322. doi: 10.1145/2939672.2939785. https://doi.org/10.1145/2939672.2939785.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. Catboost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors. Volume 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data. (2021). arXiv: 2106.11959 [cs.LG].Google ScholarGoogle Scholar
  15. Cheng Guo and Felix Berkhahn. 2016. Entity embeddings of categorical variables. (2016). arXiv: 1604.06737 [cs.LG].Google ScholarGoogle Scholar
  16. Benjamin Ghaemmaghami, Zihao Deng, Benjamin Cho, Leo Orshansky, Ashish Kumar Singh, Mattan Erez, and Michael Orshansky. 2020. Training with multi-layer embeddings for model reduction. (2020). arXiv: 2006.05623 [cs.LG].Google ScholarGoogle Scholar
  17. Yixuan Ma and Zhenji Zhang. 2020. Travel mode choice prediction using deep neural networks with entity embeddings. IEEE Access, 8, 64959–64970.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Ö. Arik and T. Pfister. 2021. Tabnet: attentive interpretable tabular learning. AAAI, vol. 35, no. 8, (May 2021), 6679–6687.Google ScholarGoogle Scholar
  19. Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, and Tom Goldstein. 2021. Saint: improved neural networks for tabular data via row attention and contrastive pre-training. (2021). arXiv: 2106.01342 [cs.LG].Google ScholarGoogle Scholar
  20. Xin Huang, Ashish Khetan, Milan Cvitkovic, and Zohar Karnin. 2020. Tabtransformer: tabular data modeling using contextual embeddings. (2020). arXiv: 2012.06678 [cs.LG].Google ScholarGoogle Scholar
  21. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors. Volume 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.Google ScholarGoogle Scholar
  22. Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. (2017). http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  23. Ronny Kohavi and Barry Becker. 1996. Uci machine learning repository. archive.ics.uci.edu/ml/datasets/census+income.Google ScholarGoogle Scholar
  24. Ronny Kohavi and Barry Becker. 1996. Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets/adult.Google ScholarGoogle Scholar
  25. Sérgio Moro, Paulo Cortez, and Paulo Rita. 2014. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22–31. issn: 0167-9236. doi: https://doi.org/10.1016/j.dss.2014.03.001. https://www.sciencedirect.com/science/article/pii/S016792361400061X.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mayuri Jha and Ayushman. 2019. Propensity to fund mortgages. crowdanalytix.com/contests/propensity-to-fund-mortgages.Google ScholarGoogle Scholar
  27. Nishant Bhavsar. 2019. Propensity to fund mortgages. github.com/NishantBhavsar/propensity_to_fund_mortgages.Google ScholarGoogle Scholar
  28. Jeremy Howard and Sylvain Gugger. 2020. Fastai: a layered api for deep learning. Information, 11, 2. issn: 2078-2489. doi: 10.3390/info11020108. https://www.mdpi.com/2078- 2489/11/2/108.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research). Francis Bach and David Blei, editors. Volume 37. PMLR, Lille, France, (July 2015), 448–456. https://proceedings.mlr.press/v37/ioffe15.html.Google ScholarGoogle Scholar
  30. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 56, 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lukas Biewald. 2020. Experiment tracking with weights and biases. Software available from wandb.com. (2020). https://www.wandb.com/.Google ScholarGoogle Scholar
  32. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114, 13, 3521–3526. issn: 0027-8424. doi: 10.1073/pnas.1611835114. eprint: https://www.pnas.org/content/114/13/3521.full.pdf. https://www.pnas.org/content/114/13/3521.Google ScholarGoogle ScholarCross RefCross Ref
  33. Leslie N. Smith. 2018. A disciplined approach to neural network hyper-parameters: part 1 – learning rate, batch size, momentum, and weight decay. (2018). arXiv: 1803.09820 [cs.LG].Google ScholarGoogle Scholar

Index Terms

  1. Transferring Learnt Features from Deep Neural Networks trained on Structured Data
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies
            March 2022
            291 pages
            ISBN:9781450395748
            DOI:10.1145/3529399

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 June 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)17
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format