Skip to main content
Log in

LearnMet: learning domain-specific distance metrics for plots of scientific functions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scientific experimental results are often depicted as plots of functions to aid their visual analysis and comparison. In computationally comparing these plots using techniques such as similarity search and clustering, the notion of similarity is typically distance. However, it is seldom known which distance metric(s) best preserve(s) semantics in the respective domain. It is thus desirable to learn such domain-specific distance metrics for the comparison of plots. This paper describes a technique called LearnMet proposed to learn such metrics. The input to LearnMet is a training set with actual clusters of plots. These are iteratively compared with clusters over the same plots predicted using an arbitrary but fixed clustering algorithm. Using a guessed initial metric for clustering, adjustments are made to the metric in each epoch based on the error between the predicted and actual clusters until the error is minimal or below a given threshold. The metric giving the lowest error is output as the learned metric. The proposed LearnMet technique and its enhancements are discussed in detail in this paper. The primary application of LearnMet is clustering plots in the Heat Treating domain. Hence it is rigorously evaluated using Heat Treating data. Given distinct test sets for evaluation, clusters of plots predicted using the learned metrics are compared with given actual clusters over the same plots. The extent to which the predicted and actual clusters match each other denotes the accuracy of the learned metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bishop C (1996) Neural networks for pattern recognition. Oxford University Press, England

    MATH  Google Scholar 

  2. Boyer H, Cary P (1989) Quenching and control of distortion. ASM International, Ohio

    Google Scholar 

  3. Friedberg R (1958) A learning machine: Part I. IBM Journal 2:2–13

    Google Scholar 

  4. Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann, California

    Google Scholar 

  5. Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces. VLDB 506–515, August

  6. Kaufman L, Rousseeuw P (1988) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  7. Keim D, Bustos B (2004) Similarity search in multimedia databases. IEEE’s ICDE 873–874, March

  8. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Mathematical Statistics and Probability 1:281–297

    Google Scholar 

  9. Maniruzzaman M, Chaves J, McGee C, Ma S, Sisson R (2002) The CHTE quench probe system. ICFDM 13–17, July

  10. Petrucelli J, Nandram B, Chen M (1999) Applied statistics for engineers and scientists. Prentice Hall, New Jersey

    Google Scholar 

  11. Traina A, Traina C, Papadimitriou S, Faloutsos C (2001) TriPlots: scalable tools for multidimensional data mining. ACM KDD 184–193, August

  12. Varde A, Rundensteiner E, Ruiz C, Maniruzzaman M, Sisson R (2005) Data mining over graphical results of experiments with domain semantics. ACM SIGART’s ICICIS 603–611, March

  13. Varde A, Rundensteiner E, Ruiz C, Maniruzzaman M, Sisson R (2005) Learning semantics-preserving distance metrics for clustering graphical data. ACM KDD’s MDM 107–112, August

  14. Varde A (2006) Graphical data mining for computational estimation in materials science applications. Ph.D. Dissertation in Progress, Worcester Polytechnic Institute, MA, April

  15. Witten I, Frank E (2000) Data mining: practical machine learning algorithms with Java implementations. Morgan Kaufmann

  16. Xing E, Ng A, Jordan M, Russell S (2003) Distance metric learning with application to clustering with side information. NIPS 503–512, December

  17. Zhou Z, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aparna Varde.

Additional information

This work is supported by the Center for Heat Treating Excellence (CHTE) and by the Department of Energy-Industrial Technology Program (DOE-ITP) Award Number DE-FC-07-01ID14197.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varde, A., Rundensteiner, E., Ruiz, C. et al. LearnMet: learning domain-specific distance metrics for plots of scientific functions. Multimed Tools Appl 35, 29–53 (2007). https://doi.org/10.1007/s11042-007-0120-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-007-0120-0

Keywords

Navigation