Skip to main content

Tree-Based Metric Learning for Distance Computation in Data Mining

  • Conference paper
  • First Online:
  • 2810 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Abstract

Distance is an essential measurement of data mining. A good metric often leads to a good performance. Then how to obtain a proper metric systematically is critical. Distance metric learning is a classic method to learn distances between instances on data set with complex distributions. However, most researches on distance metric learning are based on Mahalanobis metric, which is equivalent to linear transformation on distance space that has limitation on complex data. To solve this problem, we propose a metric learning method based on non-linear transformation suitable for complex data. By using the tree model, we could address non-linearly separable data that rearrange input data and represent them to another forms, and tree model could be able to implicitly represent data to a new distance space with a non-linear activator function. Furthermore, single tree model will lead to overfit that has higher generalization errors. Therefore, we design a randomize algorithm to combining different tree models which could reduce the generalization errors in theory and practice. According to analysis, we prove the correctness and effectiveness of our algorithm in theory. Extensive experiments demonstrate that algorithm is stable and suitable for data mining.

This paper was partially supported by NGFR 973 grant 2012CB316200, NSFC grant 61472099,61133002 and National Sci-Tech Support Plan 2015BAH10F00.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  2. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6(6), 937–965 (2005)

    MathSciNet  MATH  Google Scholar 

  3. Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 1. Springer New York (2006)

    Google Scholar 

  4. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2009)

    Google Scholar 

  5. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007)

    Google Scholar 

  7. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis (2004)

    Google Scholar 

  8. Jiawei, H., Kamber, M.: Data mining: concepts and techniques, vol. 5. Morgan Kaufmann, San Francisco (2001)

    MATH  Google Scholar 

  9. Kulis, B.: Metric learning: A survey. Foundations & Trends in Machine Learning 5(4), 287–364 (2012)

    Article  MATH  Google Scholar 

  10. Louppe, G.: Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502 (2014)

    Google Scholar 

  11. McFee, B., Lanckriet, G.R.: Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 775–782 (2010)

    Google Scholar 

  12. Pang-Ning, T., Steinbach, M., Kumar, V., et al.: Introduction to data mining. In: Library of Congress (2006)

    Google Scholar 

  13. Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press (2011)

    Google Scholar 

  14. Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2005)

    Google Scholar 

  15. Xing, E.P., Jordan, M.I., Russell, S., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 505–512 (2002)

    Google Scholar 

  16. Xiong, C., Johnson, D., Xu, R., Corso, J.J.: Random forests for metric learning with implicit pairwise position dependence. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 958–966. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yan, M., Zhang, Y., Wang, H. (2015). Tree-Based Metric Learning for Distance Computation in Data Mining. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25255-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25254-4

  • Online ISBN: 978-3-319-25255-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics