Skip to main content

Advertisement

Log in

Efficiently finding unusual shapes in large image databases

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Among the visual features of multimedia content, shape is of particular interest because humans can often recognize objects solely on the basis of shape. Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, microscopy, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this untenable lethargy by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently and thus extract the maximum benefit from an admissible pruning strategy we introduce. An extensive experimental evaluation demonstrates that our approach is empirically linear in time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andre-Jonsson H, Badal D (1997) Using signature files for querying time-series data. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery, Trondheim, Norway, pp 211–220

  • Angiulli F, Basta S, Pizzuti C (2006) Distance-based detection and prediction of outliers. IEEE Trans Knowl Data Eng 18(2): 145–160

    Article  Google Scholar 

  • Bay S, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38

  • Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8th Annual ACM-SIAM symposium on discrete algorithms, New Orleans, Louisiana, pp 360–369

  • Blum H (1973) Biological shape and visual science. J Theory Biol 38: 205–287

    Article  MathSciNet  Google Scholar 

  • Castrejon-Pita AA, Sarmiento-Galan A, Castrejon-Pita JR, Castrejon-Garcia R (2005) Fractal dimension in butterflies’ wings: a novel approach to understanding wing patterns?. J Math Biol 50: 584–594

    Article  MATH  MathSciNet  Google Scholar 

  • Chen Z, Fu A, Tang J (2003) On complementarity of cluster and outlier detection schemes. In: Proceedings of the 5th international conference on data warehousing and knowledge discovery, Prague, Czech Republic, pp 234–243

  • Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD International conference on knowledge discovery and data mining, Washington, DC, pp 493–498

  • Chuang G, Kuo C-C (1996) Wavelet descriptor of planar curves: Theory and applications. IEEE Trans Image Process 5: 56–70

    Article  Google Scholar 

  • Clark JT, Bergstrom A, Landrum JE III, Larson F, Slator B (2002) Digital archive network for anthropology (DANA): three-dimensional modeling and database development for internet access. In: Proceedings of the VAST Euroconference, Arezzo

  • Davies ER (1997) Machine vision: theory, algorithms, practicalities. Academic Press, New York, pp, pp 171–191

    Google Scholar 

  • Daw CS, Finney CEA, Tracy ER (2002) Symbolic analysis of experimental data. Rev Sci Instrum 74(22): 915–930

    Google Scholar 

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of ACM SIGMOD international conference on management of data, Minneapolis, MN, pp 419–429

  • Ghoting A, Parthasarathy S, Otey M (2006) Fast mining of distance-based outliers in high dimensional datasets. In: Proceedings of the 6th SIAM international conference on data mining, Bethesda, MD, pp 608–612

  • Grass J, Zilberstein S (1996) Anytime algorithm development tools. Sigart Artif Intell 7(2): 20–27

    Article  Google Scholar 

  • Huang Y, Yu PS (1999) Adaptive query processing for time-series data. In: Proceedings of 5th international conference on knowledge discovery and data mining, San Diego, CA, pp 282–286

  • Indyk P, Motwani R, Raghavan P, Vempala S (1997) Locality-preserving hashing in multidimensional spaces. In: Proceedings of the 29th Annual ACM symposium on theory of computing, El Paso, Texas, pp 618–625

  • Jalba AC, Wilkinson MHF, Roerdink JBTM, Bayer MM, Juggins S (2005) Automatic diatom identification using contour analysis by morphological curvature scale spaces. Mach Vision Appli 16(4): 217–228

    Article  Google Scholar 

  • Jolliffe IT (2002) Principle component analysis, 2nd edn. Springer

  • Keogh E (2001) Similarity search in massive time series databases. Ph.D. Thesis, University of California, Irvine

  • Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada, pp 102–111

  • Keogh E, Chakrabati K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. J Knowl Inform Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  • Keogh E, Lonardi S, Chiu W (2002) Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the 8th international conference on knowledge discovery and data mining, Alberta, Canada, pp 550–556

  • Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, pp 206–215

  • Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE international conference on data mining, Houston, Texas, pp 226–233

  • Keogh E, Wei L, Xi X, Lee S, Vlachos M (2006) LB_Keogh allows exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd international conference on very large data bases, Seoul, Korea, pp 882–893

  • Khotanzan A, Hong YH (1990) Invariant image recognition by Zernike moments. IEEE Trans PAMI 12: 489–497

    Google Scholar 

  • Kitaguchi S (2004) Extracting feature based on motif from a chronic hepatitis dataset. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence, Kanazawa, Japan

  • Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4): 237–253

    Google Scholar 

  • Latecki LJ, Lakamper R (1999) Contour-based shape similarity. In: Proceedings of the international conference on visual information systems, Amsterdam, pp 617–624

  • Latecki LJ, Lakaemper R, Eckhardt U (2000) Shape descriptors for non-rigid Shapes with a single closed contour. In: Proceedings of the IEEE conference on computer vision and patten recognition, Hilton Head Island, SC, pp 424–429

  • Lee D-J, Schoenberger R, Shiozawa D, Xu X, Zhan P (2004) Contour matching for a fish recognition and migration monitoring system. In: Proceedings of the SPIE optics east, two and three-dimensional vision systems for inspection, control, and metrology II, vol. 5606-05, Oct. 25–28, 2004

  • Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, CA, pp 2–11

  • Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, pp 460–469

  • Loncarin S (1998) A survey of shape analysis techniques. Pattern Recognit 31(5): 983–1001

    Article  Google Scholar 

  • Mokhtarian F, Machworth AK (1992) A theory of multiscale, curvature-based shape representation for planar curves. IEEE Trans PAMI 14: 789–805

    Google Scholar 

  • Mollineda RA, Vidal E, Casacuberta F (2002) Cyclic sequence alignments: approximate versus optimal techniques. Int J Pattern Recognit Artif Intell 16(3): 291–299

    Article  Google Scholar 

  • Morphbank (2006) Florida State University, School of Computational Science. http://morphbank.csit.fsu.edu/

  • Narayanan M, Karp RM (2004) Gapped local similarity search with provable guarantees. In: Proceedings of the 4th workshop on algorithms in bioinformatic, Bergen, Norway, pp 74–86

  • O’Brien MJ, Darwent J, Lyman RL (2001) Cladistics is useful for reconstructing archaeological phylogenies: Paleoindian points from the southeastern United States. J Archaeol Sci 28: 1115–1136

    Article  Google Scholar 

  • Philip JW (2006) Personal communication. http://anthropology.ucr.edu/lithic

  • Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the International Conference on Management of Data, Dallas, TX, pp 427–438

  • Rombo S, Terracina G (2004) Discovering representative models in large time series models. In: Proceedings of the 6th international conference on flexible query answering systems, Lyon, France, pp 84–97

  • Shahabi C, Tian X, Zhao W (2000) Tsa-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries. In: Proceedings of the 12th international conference on scientific and statistical database management, Berlin, Germany, pp 55–68

  • Siddiqi K, Shokoufandeh A, Dickinson SJ, Zucker SW (1998) Shock graphs and shape matching. In: Sixth international conference on computer vision, Bombay, India, pp 222–229

  • Söderkvist OJO (2001) Computer vision classification of leaves from swedish trees. Master thesis, Linkoping University, Sweden

  • Tanaka Y, Uehara K (2004) Motif discovery algorithm from motion data. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence, Kanazawa, Japan

  • Tao Y, Xiao X, Zhou S (2006) Mining distance-based outliers from large databases in any metric space. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, pp 394–403

  • Tompa M, Buhler J (2001) Finding motifs using random projections. In: Proceedings of the 5th international conference on computational molecular biology, Montreal, Canada, pp 67–74

  • Van Otterloo PJ (1991) A contour-oriented approach to shape analysis. Prentice-Hall International (UK) Ltd, Englewood Cliffs, NJ, pp 90–108

    MATH  Google Scholar 

  • Vlachos M, Vagena Z, Yu PS, Athitsos V (2005) Rotation invariant indexing of shapes and line drawings. In: Proceedings of the 4th ACM conference on information and knowledge management, Bremen, Germany, pp 131–138

  • Wolfson HJ, Rigoutsos I (1997) Geometric hashing: an overview. IEEE Comput Sci Eng 4(4): 10–21

    Article  Google Scholar 

  • Yankov D, Keogh E, Lonardi S, Fu AW (2005) Dot plots for time series analysis. In: Proceedings of the 17th IEEE international conference on tools with artificial intelligence, Hongkong, China, pp 159–168

  • Zhang D, Lu G (2004) Review of shape representation and description techniques. Pattern Recognit 37(1): 1–19

    Article  MATH  Google Scholar 

  • Zimmerman E, Palsson A, Gibson G (2000) Quantitative trait loci affecting components of wing shape in Drosophila melanogaster. Genetics 155: 671–683

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Wei.

Additional information

Responsible editor: M. J. Zaki.

A primary version of this work appears in ICDM 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, L., Keogh, E., Xi, X. et al. Efficiently finding unusual shapes in large image databases. Data Min Knowl Disc 17, 343–376 (2008). https://doi.org/10.1007/s10618-008-0094-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0094-1

Keywords

Navigation