Skip to main content

eXtreme Gradient Boosting for Identifying Individual Users Across Different Digital Devices

  • Conference paper
  • First Online:
Book cover Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9658))

Included in the following conference series:

Abstract

With the increasing popularity of tablets, smartphones and other mobile electronic devices, it is not uncommon for users to complete online tasks through different electronic devices. Identifying individual users across different digital devices is now becoming a hot research topic. Methods based on name, email and other demographic information have received much attention. However, it is often difficult to obtain a complete set of information. In this paper, we use a probabilistic approach for cross-device identity issue and focus on comparing different algorithms. We conduct an in-depth study and expand the attribute of data through the study of the relationship between attributes. Dummy variables are introduced to improve the efficiency of the models. Experimental results on four datasets (released by ICDM Challenge) show that the eXtreme Gradient Boosting can consistently and significantly outperform other algorithms on both accuracy and F1-score. It also consistently provides a better performance compared to the methods we used in ICDM Challenge (We took part in the ICDM 2015 Challenge, and achieved a moderate score ranking use C4.5 and BP model), and achieves a better comprehensive evaluation ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://xgboost.readthedocs.org/en/latest/index.html.

  2. 2.

    https://www.kaggle.com/c/icdm-2015-drawbridge-cross-device-connections.

References

  1. Wells, J.D., Fuerst, W.L., Palmer, J.W.: Designing consumer interfaces for experiential tasks: an empirical investigation. Eur. J. Inf. Syst. 14(3), 273–287 (2005)

    Article  Google Scholar 

  2. Setoguchi, S., Zhu, Y., Jalbert, J.J., et al.: Validity of deterministic record linkage using multiple indirect personal identifiers linking a large registry to claims data. Circ. Cardiovas. Qual. Outcomes 7(3), 475–480 (2014)

    Article  Google Scholar 

  3. Ojanen, T.T., Boonmongkon, P., Samakkeekarom, R., et al.: Investigating online harassment and offline violence among young people in Thailand: methodological approaches, lessons learned. Cult. Health sex. 16(9), 1097–1112 (2014)

    Article  Google Scholar 

  4. Gueth, P., Dauvergne, D., Freud, N., et al.: Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy. Phys. Med. Biol. 58(13), 4563 (2013)

    Article  Google Scholar 

  5. Carmagnola, F., Cena, F.: User identification for cross-system personalisation. Inf. Sci. 179(1), 16–32 (2009)

    Article  Google Scholar 

  6. Guna, J., Stojmenova, E., Lugmayr, A., et al.: User identification approach based on simple gestures. Multimedia Tools Appl. 71(1), 179–194 (2014)

    Article  Google Scholar 

  7. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. stat. 1189–1232 (2001)

    Google Scholar 

  8. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Johnson, R., Zhang, T.: Learning nonlinear functions using regularized greedy forest. arXiv preprint arXiv:1109.0887 (2011)

  11. Rokach, L., Maimon, O.: Data mining with decision trees: theory and applications. World Scientific Pub Co Inc. (2008). ISBN 978-9812771711

    Google Scholar 

  12. Quinlan, J.R.: C4. 5: programs for machine learning. Elsevier (2014)

    Google Scholar 

  13. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  14. Breiman, L.: Arcing the edge. Technical Report 486, Statistics Department. University of California at Berkeley (1997)

    Google Scholar 

  15. Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  17. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5, 3 (1988)

    Google Scholar 

  18. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported by Natural Science Foundations of China (No.61170192), National High-tech R&D Program of China (No. 2013AA013801).L Li is the corresponding author for the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Song, R., Chen, S., Deng, B., Li, L. (2016). eXtreme Gradient Boosting for Identifying Individual Users Across Different Digital Devices. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39937-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39936-2

  • Online ISBN: 978-3-319-39937-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics