Skip to main content

k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variations. We propose a new method of KNN classification for spatial data using a new, rich, data-mining-ready structure, the Peano-count-tree (P-tree). We merely perform some AND/OR operations on P-trees to find the nearest neighbors of a new sample and assign the class label. We have fast and efficient algorithms for the AND/OR operations, which reduce the classification time significantly. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification accuracy as well as significantly higher speed.

Patents are pending on the bSQ and Ptree technology.

This work is partially supported by NSF Grant OSR-9553368, DARPA Grant DAAH04-96-1-0329 and GSA Grant ACT#: K96130308.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Domingos, P. and Hulten, G., “Mining high-speed data streams”, Proceedings of ACM SIGKDD 2000.

    Google Scholar 

  2. Domingos, P., & Hulten, G., “Catching Up with the Data: Research Issues in Mining Data Streams”, DMKD 2001.

    Google Scholar 

  3. T. Cover and P. Hart, “Nearest Neighbor pattern classification”, IEEE Trans. InformationTheory, 13:21–27, 1967.

    Article  MATH  Google Scholar 

  4. Dasarathy, B.V., “Nearest-Neighbor Classification Techniques”. IEEE Computer Society Press, Los Alomitos, CA, 1991.

    Google Scholar 

  5. Morin, R.L. and D.E. Raeside, “A Reappraisal of Distance-Weighted k-Nearest Neighbor Classification for Pattern Recognition with Missing Data”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-11(3), pp. 241–243, 1981.

    MathSciNet  Google Scholar 

  6. William Perrizo, “Peano Count Tree Technology”, Technical Report NDSU-CSOR-TR-01-1, 2001.

    Google Scholar 

  7. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.

    Google Scholar 

  8. M. James, “Classification Algorithms”, New York: John Wiley & Sons, 1985.

    MATH  Google Scholar 

  9. William Perrizo, Qin Ding, Qiang Ding and Amalendu Roy, “On Mining Satellite and Other Remotely Sensed Images”, DMKD 2001, pp. 33–40.

    Google Scholar 

  10. William Perrizo, Qin Ding, Qiang Ding and Amalendu Roy, “Deriving High Confidence Rules from Spatial Data using Peano Count Trees”, Springer-Verlag, Lecturer Notes in Computer Science 2118, July 2001.

    Google Scholar 

  11. Qin Ding, Maleq Khan, Amalendu Roy and William Perrizo, “The P-tree Algebra”, proceedings of the ACM Symposium on Applied Computing (SAC’02), 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Khan, M., Ding, Q., Perrizo, W. (2002). k-nearest Neighbor Classification on Spatial Data Streams Using P-trees. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_51

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_51

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics