k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

Khan, Maleq; Ding, Qin; Perrizo, William

doi:10.1007/3-540-47887-6_51

k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

Maleq Khan⁴,
Qin Ding⁴ &
William Perrizo⁴

Conference paper
First Online: 01 January 2002

2308 Accesses
30 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variations. We propose a new method of KNN classification for spatial data using a new, rich, data-mining-ready structure, the Peano-count-tree (P-tree). We merely perform some AND/OR operations on P-trees to find the nearest neighbors of a new sample and assign the class label. We have fast and efficient algorithms for the AND/OR operations, which reduce the classification time significantly. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification accuracy as well as significantly higher speed.

Patents are pending on the bSQ and Ptree technology.

This work is partially supported by NSF Grant OSR-9553368, DARPA Grant DAAH04-96-1-0329 and GSA Grant ACT#: K96130308.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Domingos, P. and Hulten, G., “Mining high-speed data streams”, Proceedings of ACM SIGKDD 2000.
Google Scholar
Domingos, P., & Hulten, G., “Catching Up with the Data: Research Issues in Mining Data Streams”, DMKD 2001.
Google Scholar
T. Cover and P. Hart, “Nearest Neighbor pattern classification”, IEEE Trans. InformationTheory, 13:21–27, 1967.
Article MATH Google Scholar
Dasarathy, B.V., “Nearest-Neighbor Classification Techniques”. IEEE Computer Society Press, Los Alomitos, CA, 1991.
Google Scholar
Morin, R.L. and D.E. Raeside, “A Reappraisal of Distance-Weighted k-Nearest Neighbor Classification for Pattern Recognition with Missing Data”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-11(3), pp. 241–243, 1981.
MathSciNet Google Scholar
William Perrizo, “Peano Count Tree Technology”, Technical Report NDSU-CSOR-TR-01-1, 2001.
Google Scholar
Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.
Google Scholar
M. James, “Classification Algorithms”, New York: John Wiley & Sons, 1985.
MATH Google Scholar
William Perrizo, Qin Ding, Qiang Ding and Amalendu Roy, “On Mining Satellite and Other Remotely Sensed Images”, DMKD 2001, pp. 33–40.
Google Scholar
William Perrizo, Qin Ding, Qiang Ding and Amalendu Roy, “Deriving High Confidence Rules from Spatial Data using Peano Count Trees”, Springer-Verlag, Lecturer Notes in Computer Science 2118, July 2001.
Google Scholar
Qin Ding, Maleq Khan, Amalendu Roy and William Perrizo, “The P-tree Algebra”, proceedings of the ACM Symposium on Applied Computing (SAC’02), 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, North Dakota State University, Fargo, ND, 58105, USA
Maleq Khan, Qin Ding & William Perrizo

Authors

Maleq Khan
View author publications
You can also search for this author in PubMed Google Scholar
Qin Ding
View author publications
You can also search for this author in PubMed Google Scholar
William Perrizo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, M., Ding, Q., Perrizo, W. (2002). k-nearest Neighbor Classification on Spatial Data Streams Using P-trees. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_51

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_51
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics