Detecting and ordering salient regions

Shoemaker, Larry; Banfield, Robert E.; Hall, Lawrence O.; Bowyer, Kevin W.; Kegelmeyer, W. Philip

doi:10.1007/s10618-010-0194-6

Detecting and ordering salient regions

Published: 03 August 2010

Volume 22, pages 259–290, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Larry Shoemaker¹,
Robert E. Banfield¹,
Lawrence O. Hall¹,
Kevin W. Bowyer² &
…
W. Philip Kegelmeyer³

207 Accesses
Explore all metrics

Abstract

We describe an ensemble approach to learning salient regions from arbitrarily partitioned data. The partitioning comes from the distributed processing requirements of large-scale simulations. The volume of the data is such that classifiers can train only on data local to a given partition. Since the data partition reflects the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some or even most partitions. We combine a fast ensemble learning algorithm with scaled probabilistic majority voting in order to learn an accurate classifier from such data. Since some simulations are difficult to model without a considerable number of false positive errors, and since we are essentially building a search engine for simulation data, we order predicted regions to increase the likelihood that most of the top-ranked predictions are correct (salient). Results from simulation runs of a canister being torn and from a casing being dropped show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets. Lift curve analysis shows that the use of data driven ordering methods provides a statistically significant improvement over the use of the default, natural time step ordering. Significant time is saved for the end user by allowing an improved focus on areas of interest without the need to conventionally search all of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 503–508
ASC, National Nuclear Security Administration in collaboration with Sandia, Lawrence Livermore, and Los Alamos National Laboratories, http://www.sandia.gov/nnsa/asc/. Accessed 29 Nov 2008
Baeza-Yates R, Ribeiro-Neto B: Modern information retrieval. ACM Press, New York (1999)
Google Scholar
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2005) Ensembles of classifiers from spatially disjoint data. In: Multiple classifier systems, sixth international workshop. Lecture Notes in Computer Science, vol. 3541. Springer, Seaside, CA, USA, pp 196–205
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP: A comparison of decision tree ensemble creation techniques. IEEE transactions on pattern analysis and machine intelligence 29(1), 173–180 (2007)
Article Google Scholar
Breiman L: Random forests. Mach Learn 45(1), 5–32 (2001)
Article MATH Google Scholar
Brinker K (2004) Active learning of label ranking functions. In: Proceedings of the 21st international conference on machine learning, July 4–8. Banff, Alberta, Canada, pp 129–136
Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP: SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16, 321–357 (2002)
MATH Google Scholar
Chawla NV, Moore TE, Hall LO, Bowyer KW, Kegelmeyer WP, Springer C: Distributed learning with bagging-like performance. Pattern Recognit Lett 24(1-3), 455–471 (2003)
Article Google Scholar
Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP: Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res 5, 421–451 (2004)
MathSciNet Google Scholar
Cohen WW, Schapire RE, Singer Y: Learning to order things. J Artif Intell Res 10, 243–270 (1999)
MATH MathSciNet Google Scholar
Demsar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7, 1–30 (2006)
MathSciNet Google Scholar
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 155–164
Domingos P, Hulten G (2000) Mining high-speed data streams. In: KDD ’00: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 71–80
Erdem Z, Polikar R, Gurgen F, Yumusak N (2005) Ensemble of SVMs for incremental learning. In: Multiple classifier systems, 6th international workshop. Lecture Notes in Computer Science, vol. 3541. Springer, Seaside, CA, USA, pp 246–256
Eschrich S, Hall LO (2003) Learning from soft partitions of data: reducing the variance. In: The 12th IEEE international conference on fuzzy systems, 2003. FUZZ ’03, May 25–28, vol 1. St. Louis, Missouri, USA, pp 666–671
Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 128–137
Fan W, Wang H, Yu PS, Stolfo SJ (2002) A fully distributed framework for cost-sensitive data mining. In: Proceedings 22nd international conference on distributed computing systems, July 2–5. Vienna, Austria, pp 445–446
Gionis A, Mannila H, Puolamäki K, Ukkonen A (2006) Algorithms for discovering bucket orders from data. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20–23. Philadelphia, PA, USA, pp 561–566
Hall LO, Bhadoria D, Bowyer KW (2004) Learning a model from spatially disjoint data. In: 2004 IEEE international conference on systems, man, and cybernetics, October 10–13, vol 2. The Hague, Netherlands, pp 1447–1451
Henderson A: The ParaView guide. Kitware Inc., United States (2004)
Google Scholar
Hullermeier E, Furnkranz J (2005) Learning label preferences: ranking error versus position error. Proceedings IDA05, 6th international symposium on intelligent data analysis, September 8–10. Madrid, Spain, pp 180–191
Koegler WS, Kegelmeyer WP (2005) FCLib: a library for building data analysis and data discovery tools. Advances in intelligent data analysis VI IDA 2005, pp 192–203
Kong R, Zhang B: A fast incremental learning algorithm for support vector machine. Control Decision 20(10), 1129–1136 (2005)
MathSciNet Google Scholar
Korecki JN, Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2008) Semi-supervised learning on large complex simulations. In: Proceedings of the 19th conference of the international association for pattern recognition, December 8–11. Tampa, FL, USA
Kotsiantis S, Kanellopoulos D, Pintelas P: Handling imbalanced datasets: a review. GESTS Int Trans Comput Sci Eng 30(1), 25–36 (2006)
Google Scholar
Kusnezov DF (2004) Advanced simulation & computing: the next ten years. Tech. rep., NA-ASC-100R-04, Sandia National Labs, Albuquerque. http://www.acq.usd.mil/dsb/reports/ADA495920.pdf
Lazarevic A, Obradovic Z: Boosting algorithms for parallel and distributed learning. Distrib Parallel Databases J 11(2), 203–229 (2002)
Article MATH Google Scholar
Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98), pp 73–79
Maloof MA, Michalski RS: Incremental learning with partial instance memory. Artif Intell 154(1-2), 95–126 (2004)
Article MATH MathSciNet Google Scholar
Manning C, Raghavan P, Schutze H: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Otsu N: A threshold selection method from gray level histograms. IEEE Trans Syst Man Cybern 9, 62–66 (1979)
Article Google Scholar
Piatetsky-Shapiro G, Steingold S: Measuring lift quality in database marketing. SIGKDD Explor Newsl 2(2), 76–80 (2000)
Article Google Scholar
Schoof LA, Yarberry VR (1998) EXODUS II: a finite element data model, Technical Report # SAND92–2137. Tech. rep., Sandia National Labs, Albuquerque, NM 87185
Shipp CA, Kuncheva LI: Relationships between combination methods and measures of diversity in combining classifiers. Inf Fusion 3(2), 135–148 (2002)
Article Google Scholar
Shoemaker L, Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2006) Learning to predict salient regions from disjoint and skewed training sets. In: 18th IEEE Conference on Tools with Artificial Intelligence (ICTAI 2006), Arlington, VA, USA, pp 116–123
Shoemaker L, Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2008a) Detecting and ordering salient regions for efficient browsing. In: Proceedings of the 19th conference of the international association for pattern recognition, December 8–11. Tampa, FL, USA
Shoemaker L, Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP: Using classifier ensembles to label spatially disjoint data. Inf Fusion 9(1), 120–133 (2008b)
Article Google Scholar
Wang F, Ma S, Yang L, Li T (2006) Recommendation on item graphs. Proceedings of the sixth international conference on data mining. pp 1119–1123
Webb GI, Boughton JR, Wang Z: Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1), 5–24 (2005)
Article MATH Google Scholar
Weiss G: Mining with rarity: a unifying framework. SIGKDD Explor 6(1), 7–19 (2004)
Article Google Scholar
Witten IH, Frank E: Data mining: practical machine learning tools and techniques. 2. Morgan Kaufmann, San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, University of South Florida, Tampa, FL, 33620-5399, USA
Larry Shoemaker, Robert E. Banfield & Lawrence O. Hall
Computer Science and Engineering, University of Notre Dame, South Bend, IN, 46556, USA
Kevin W. Bowyer
Sandia National Labs, Computer and Information Sciences, P.O. Box 969, MS 9951, Livermore, CA, 94551, USA
W. Philip Kegelmeyer

Authors

Larry Shoemaker
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Banfield
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence O. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Kevin W. Bowyer
View author publications
You can also search for this author in PubMed Google Scholar
W. Philip Kegelmeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Larry Shoemaker.

Additional information

Responsible editor: Chih-Jen Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shoemaker, L., Banfield, R.E., Hall, L.O. et al. Detecting and ordering salient regions. Data Min Knowl Disc 22, 259–290 (2011). https://doi.org/10.1007/s10618-010-0194-6

Download citation

Received: 09 July 2009
Accepted: 07 July 2010
Published: 03 August 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s10618-010-0194-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting and ordering salient regions

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering Ensemble for Prioritized Sampling Based on Average and Rough Patterns

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

A stochastic approach to handle resource constraints as knapsack problems in ensemble pruning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting and ordering salient regions

Abstract

Access this article

Similar content being viewed by others

Fuzzy Clustering Ensemble for Prioritized Sampling Based on Average and Rough Patterns

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

A stochastic approach to handle resource constraints as knapsack problems in ensemble pruning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation