Structured Data Encoder for Neural Networks Based on Gradient Boosting Decision Tree

Hu, Wenhui; Liu, Xueyang; Huang, Yu; Wang, Yu; Zhang, Minghui; Zhao, Hui

doi:10.1007/978-3-030-60239-0_41

Structured Data Encoder for Neural Networks Based on Gradient Boosting Decision Tree

Wenhui Hu⁹,
Xueyang Liu⁹,
Yu Huang⁹,
Yu Wang¹⁰,
Minghui Zhang¹¹ &
…
Hui Zhao¹²

Conference paper
First Online: 29 September 2020

1824 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Abstract

Features are very important for machine learning tasks, therefore, feature engineering has been widely adopted to obtain effective handcrafted features, which is, however, labor-intensive and in need of expert knowledge. Therefore, feature learning using neural networks has been used to obviate the need of manual feature engineering and achieved great successes in the image and sequential data processing. However, its performance in processing structured data is usually unsatisfactory. In order to tackle this problem and learn good feature representations for structured data, in this work, we propose a structured data encoder (SDE) based on Gradient Boost Decision Tree (GBDT) to learn feature representations from structured data both effectively and efficiently. Then, PCA is further employed to extract the most useful information and to reduce the dimensionality for the following classification or regression tasks. Extensive experimental studies have been conducted to show the superior performances of the proposed SDE solution in learning representations of structured data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Article Google Scholar
Ng, A.: Machine learning and ai via brain simulations (2013)
Google Scholar
Bengio, Y., Delalleau, O., Simard, C.: Decision trees do not generaliz to new variations. Comput. Intell. 26(4), 449–467 (2010)
Article MathSciNet MATH Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Mikolov, T., Karafiát, M., Burget, L., Cernocký,, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv preprint arXiv:1412.3555
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press (2007)
Google Scholar
Bengio, Y., et al.: Learning deep architectures for ai. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2012)
Article Google Scholar
Becker, C., Rigamonti, R., Lepetit, V., Fua, P.: Supervised Feature Learning for Curvilinear Structure Segmentation. Springer, Berlin Heidelberg (2013)
Book Google Scholar
Li, Y., Hu, X., Lin, H., Yang, Z.: A framework for semisupervised feature generation and its applications in biomedical literature mining. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 294–307 (2011)
Article Google Scholar
Coates, A., Ng, Andrew Y.: Learning feature representations with K-means. In: Montavon, G., Orr, Geneviève B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 561–580. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_30
Chapter Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Athiwaratkun, B., Kang, K.: Feature representation in convolutional neural networks. Comput. Sci. 32(1), 41–45 (2015)
Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015) arXiv preprint arXiv:1508.04025
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation (2016). arXiv preprint arXiv:1609.08144
Irsoy, O., Alpaydın, E.: Unsupervised feature extraction with autoencoder trees. Neurocomputing 258, 63–73 (2017)
Article Google Scholar
Feng, J, Zhou, Z.-H.: Autoencoder by forest (2017). arXiv preprint arXiv:1709.09018
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, p. 114 (2004)
Google Scholar

Download references

Acknowledge

National Key R&D Program of China (No. 2017YFB1103003)

PKUSER-CRHMS Medical AI Co-Lab

National Science and Technology Major Project for IND (investigational new drug) 2018ZX09201-014

Author information

Authors and Affiliations

National Engineering Research Center for Software Engineering, Peking University, Beijing, China
Wenhui Hu, Xueyang Liu & Yu Huang
ChinaSoft International Co., Ltd., Hong Kong, China
Yu Wang
Handan Institute of Innovation, Peking University, Handan, Beijing, China
Minghui Zhang
School of Software, Henan University, Kaifeng, China
Hui Zhao

Authors

Wenhui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xueyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueyang Liu .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, W., Liu, X., Huang, Y., Wang, Y., Zhang, M., Zhao, H. (2020). Structured Data Encoder for Neural Networks Based on Gradient Boosting Decision Tree. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-60239-0_41
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics