Abstract
We considered a spatial data mining method that extracts spatial knowledge by computing geometrical patterns from web pages and access log files of HTTP servers. There are many web pages that contain location information such as addresses, postal codes, and telephone numbers. We can collect such web pages by web-crawling programs. For each page determined to contain location information, we apply geocoding techniques to compute geographic coordinates, such as latitude-longitude pairs. Next, we augment the location information with keyword descriptors extracted from the web page contents. We then apply spatial data mining techniques on the augmented location information. In addition, we can use hyperlinks and access log files to find linkage between pages with location information to derive spatial knowledge.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2000)
Buyukokkten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. In: Proc. of Workshop on Web Databases (WebDB) (1999)
McCurley, K.: Geospatial mapping and navigation of the web. In: Proc. of World Wide Web (WWW), pp. 221–229 (2001)
Malassis, L., Kobayashi, M.: Statistical methods for search engines. Technical Report RT-413 (33 pages), IBM Tokyo Research Laboratory Research Report (2001)
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Transactions on Computers (11), 1025–1034 (1973)
Ertoz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: A shared nearest neighbor approach. Technical Report Preprint 2001-040 (8 pages), University of Minnesota Army HPC Research Center (2001)
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Houle, M.E.: Navigating massive data sets via local clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 547–552. ACM Press, New York (2003)
Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: SAINT, pp. 326–333. IEEE Computer Society Press, Los Alamitos (2003)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD Conference, pp. 207–216. ACM Press, New York (May 1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB Conference, pp. 487–499 (1994)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 1–12. ACM Press, New York (2000)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Morimoto, Y.: Optimized transitive association rule: Mining significant stopover between events. In: Proc. of the ACM Symposium on Applied Computing, pp. 547–548. ACM Press, New York (2005)
Floyd, R.W.: Shortest path. Communications of the ACM 5(6), 345 (1962)
Knuth, D.E.: Sorting and searching. The Art of Computer Programming 1(3) (1973)
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. J. of Computer and System Sciences 58(1), 1–15 (1999)
Morimoto, Y.: Mining frequent neighboring class sets in spatial databases. In: Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data mining (KDD), pp. 353–358. ACM Press, New York (2001)
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining with optimized two-dimensional association rules. ACM Trans. on Database Systems 26(2), 179–213 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morimoto, Y. (2007). Geometrical Information Fusion from WWW and Its Related Information. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-75512-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75511-1
Online ISBN: 978-3-540-75512-8
eBook Packages: Computer ScienceComputer Science (R0)