Abstract
Scientific experimental results are often depicted as plots of functions to aid their visual analysis and comparison. In computationally comparing these plots using techniques such as similarity search and clustering, the notion of similarity is typically distance. However, it is seldom known which distance metric(s) best preserve(s) semantics in the respective domain. It is thus desirable to learn such domain-specific distance metrics for the comparison of plots. This paper describes a technique called LearnMet proposed to learn such metrics. The input to LearnMet is a training set with actual clusters of plots. These are iteratively compared with clusters over the same plots predicted using an arbitrary but fixed clustering algorithm. Using a guessed initial metric for clustering, adjustments are made to the metric in each epoch based on the error between the predicted and actual clusters until the error is minimal or below a given threshold. The metric giving the lowest error is output as the learned metric. The proposed LearnMet technique and its enhancements are discussed in detail in this paper. The primary application of LearnMet is clustering plots in the Heat Treating domain. Hence it is rigorously evaluated using Heat Treating data. Given distinct test sets for evaluation, clusters of plots predicted using the learned metrics are compared with given actual clusters over the same plots. The extent to which the predicted and actual clusters match each other denotes the accuracy of the learned metrics.
Similar content being viewed by others
References
Bishop C (1996) Neural networks for pattern recognition. Oxford University Press, England
Boyer H, Cary P (1989) Quenching and control of distortion. ASM International, Ohio
Friedberg R (1958) A learning machine: Part I. IBM Journal 2:2–13
Han J, Kamber M (2001) Data mining concepts and techniques. Morgan Kaufmann, California
Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces. VLDB 506–515, August
Kaufman L, Rousseeuw P (1988) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Keim D, Bustos B (2004) Similarity search in multimedia databases. IEEE’s ICDE 873–874, March
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Mathematical Statistics and Probability 1:281–297
Maniruzzaman M, Chaves J, McGee C, Ma S, Sisson R (2002) The CHTE quench probe system. ICFDM 13–17, July
Petrucelli J, Nandram B, Chen M (1999) Applied statistics for engineers and scientists. Prentice Hall, New Jersey
Traina A, Traina C, Papadimitriou S, Faloutsos C (2001) TriPlots: scalable tools for multidimensional data mining. ACM KDD 184–193, August
Varde A, Rundensteiner E, Ruiz C, Maniruzzaman M, Sisson R (2005) Data mining over graphical results of experiments with domain semantics. ACM SIGART’s ICICIS 603–611, March
Varde A, Rundensteiner E, Ruiz C, Maniruzzaman M, Sisson R (2005) Learning semantics-preserving distance metrics for clustering graphical data. ACM KDD’s MDM 107–112, August
Varde A (2006) Graphical data mining for computational estimation in materials science applications. Ph.D. Dissertation in Progress, Worcester Polytechnic Institute, MA, April
Witten I, Frank E (2000) Data mining: practical machine learning algorithms with Java implementations. Morgan Kaufmann
Xing E, Ng A, Jordan M, Russell S (2003) Distance metric learning with application to clustering with side information. NIPS 503–512, December
Zhou Z, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the Center for Heat Treating Excellence (CHTE) and by the Department of Energy-Industrial Technology Program (DOE-ITP) Award Number DE-FC-07-01ID14197.
Rights and permissions
About this article
Cite this article
Varde, A., Rundensteiner, E., Ruiz, C. et al. LearnMet: learning domain-specific distance metrics for plots of scientific functions. Multimed Tools Appl 35, 29–53 (2007). https://doi.org/10.1007/s11042-007-0120-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-007-0120-0