Skip to main content

Geometrical Information Fusion from WWW and Its Related Information

  • Conference paper
  • 386 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4777))

Abstract

We considered a spatial data mining method that extracts spatial knowledge by computing geometrical patterns from web pages and access log files of HTTP servers. There are many web pages that contain location information such as addresses, postal codes, and telephone numbers. We can collect such web pages by web-crawling programs. For each page determined to contain location information, we apply geocoding techniques to compute geographic coordinates, such as latitude-longitude pairs. Next, we augment the location information with keyword descriptors extracted from the web page contents. We then apply spatial data mining techniques on the augmented location information. In addition, we can use hyperlinks and access log files to find linkage between pages with location information to derive spatial knowledge.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2000)

    Google Scholar 

  2. Buyukokkten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. In: Proc. of Workshop on Web Databases (WebDB) (1999)

    Google Scholar 

  3. McCurley, K.: Geospatial mapping and navigation of the web. In: Proc. of World Wide Web (WWW), pp. 221–229 (2001)

    Google Scholar 

  4. http://www.census.gov/geo/www/tiger/

  5. Malassis, L., Kobayashi, M.: Statistical methods for search engines. Technical Report RT-413 (33 pages), IBM Tokyo Research Laboratory Research Report (2001)

    Google Scholar 

  6. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Transactions on Computers (11), 1025–1034 (1973)

    Article  Google Scholar 

  7. Ertoz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: A shared nearest neighbor approach. Technical Report Preprint 2001-040 (8 pages), University of Minnesota Army HPC Research Center (2001)

    Google Scholar 

  8. Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  9. Houle, M.E.: Navigating massive data sets via local clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 547–552. ACM Press, New York (2003)

    Chapter  Google Scholar 

  10. Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: SAINT, pp. 326–333. IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

  11. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD Conference, pp. 207–216. ACM Press, New York (May 1993)

    Google Scholar 

  12. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB Conference, pp. 487–499 (1994)

    Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 1–12. ACM Press, New York (2000)

    Chapter  Google Scholar 

  14. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  15. Morimoto, Y.: Optimized transitive association rule: Mining significant stopover between events. In: Proc. of the ACM Symposium on Applied Computing, pp. 547–548. ACM Press, New York (2005)

    Google Scholar 

  16. Floyd, R.W.: Shortest path. Communications of the ACM 5(6), 345 (1962)

    Article  Google Scholar 

  17. Knuth, D.E.: Sorting and searching. The Art of Computer Programming 1(3) (1973)

    Google Scholar 

  18. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. J. of Computer and System Sciences 58(1), 1–15 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  19. Morimoto, Y.: Mining frequent neighboring class sets in spatial databases. In: Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data mining (KDD), pp. 353–358. ACM Press, New York (2001)

    Google Scholar 

  20. Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Data mining with optimized two-dimensional association rules. ACM Trans. on Database Systems 26(2), 179–213 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morimoto, Y. (2007). Geometrical Information Fusion from WWW and Its Related Information. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75512-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75511-1

  • Online ISBN: 978-3-540-75512-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics