Skip to main content

Mining Graph Patterns in Web-based Systems: A Conceptual View

  • Chapter
  • First Online:
  • 1119 Accesses

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 42))

Abstract

The task of applying Data Mining methods [38] to web-based hypertexts is often referred to as Web Mining [16]. In view of the steadily increasing complexity of web data sources and the huge amount of information available online, Web Mining has been an important and fruitful research topic [16, 46]. Generally, Web Mining can be divided into the following categories:

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Albert, R., H. Jeong, and A.L. Barabási. 1999. Diameter of the world wide web. Nature 401:130–131.

    Article  Google Scholar 

  2. Baeza-Yates, R., and B. Ribeiro-Neto, eds. 1999. Modern information retrieval. Reading, MA: Addison-Wesley.

    Google Scholar 

  3. Barabási, A.-L., and Z.N. Oltvai. 2004. Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics, 5(2):101–113.

    Article  Google Scholar 

  4. Basak, S.C., V.R. Magnuson, G.J. Niemi, and R.R. Regal. 1988. Determining structural similarity of chemicals using graph-theoretic indices. Discrete Applied Mathematics 19:17–44.

    Article  MathSciNet  Google Scholar 

  5. Batagelj, V. 1988. Similarity measures between structured objects. In Proceedings of an International Course and Conference on the Interfaces between Mathematics, Chemistry and Computer Sciences. Dubrovnik, Yugoslavia.

    Google Scholar 

  6. Bonchev, D. 1979. Information indices for atoms and molecules. MATCH 7:65–113.

    Google Scholar 

  7. Bonchev, D. 1983. Information theoretic indices for characterization of-chemical structures. Chichester: Research Studies Press.

    Google Scholar 

  8. Bornholdt, S., and H.G. Schuster. 2003. Handbook of graphs and networks. From the genome to the Internet. Weinheim: Wiley-VCH.

    Google Scholar 

  9. Brandes, U., and T. Erlebach. 2005. Network analysis. Lecture Notes in Computer Science. Heidelberg: Springer.

    Google Scholar 

  10. Bunke, H. 1983. What is the distance between graphs? Bulletin of the EATCS 20:35–39.

    Google Scholar 

  11. Bunke, H. 2000a. Recent developments in graph matching. In Proceedings of the 15th International Conference on Pattern Recognition 2:117–124.

    Google Scholar 

  12. Bunke, H. 2000b. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of Vision Interface 2000, 82–88. Montreal, Canada.

    Google Scholar 

  13. Buttler, D. 2004. A short survey of document structure similarity algorithms. In International Conference on Internet Computing, 3–9. Los Vegas, Nevada, USA.

    Google Scholar 

  14. Carrière, S.J., and R. Kazman. 1997. Webquery: Searching and visualizing the web through connectivity. Computer Networks and ISDN Systems 29(8–13):1257–1267.

    Article  Google Scholar 

  15. Chakrabarti, S. 2001. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference, May 1–5, 211–220. Hong Kong.

    Google Scholar 

  16. Chakrabarti, S. 2002. Mining the web: Discovering knowledge from hypertext data. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  17. Cook, D., and L.B. Holder. 2007. Mining graph data. Weinheim: Wiley-Interscience.

    Google Scholar 

  18. Dehmer, M. 2006. Strukturelle analyse web-basierter Dokumente. Multimedia und Telekooperation. Wiesbaden: Deutscher Universitäts Verlag.

    Google Scholar 

  19. Dehmer, M. 2008a. Information-theoretic concepts for the analysis of complex networks. Applied Artificial Intelligence 22(7 and 8):684–706.

    Article  Google Scholar 

  20. Dehmer, M. 2008b. Information processing in complex networks:graph entropy and information functionals. Applied Mathematics and Computation 201:82–94.

    Article  MathSciNet  MATH  Google Scholar 

  21. Dehmer, M., and F. Emmert-Streib. 2007. Structural similarity of directed universal hierarchical graphs: A low computational complexity approach. Applied Mathematics and Computation 194:7–20.

    Article  MathSciNet  MATH  Google Scholar 

  22. Dehmer, M., and A. Mehler. 2007. A new method of measuring similarity for a special class of directed graphs. Tatra Mountains Mathematical Publications 36:39–59.

    MathSciNet  MATH  Google Scholar 

  23. Dehmer, M., A. Mehler, and R. Gleim. 2004. Aspekte der Kategorisierung von Webseiten. In Proceedings des Multimediaworkshops der Jahrestagung der Gesellschaft für Informatik, eds. P. Dadam und M. Reichert, Lecture Notes in Computer Science, vol. 2, 39–43, Berlin: Springer.

    Google Scholar 

  24. Dehmer, M., F. Emmert-Streib, and J. Kilian. 2006. A similarity measure for graphs with lowcomputational complexity. Applied Mathematics and Computation 182:447–459.

    Article  MathSciNet  MATH  Google Scholar 

  25. Dehmer, M., A. Mehler, and F. Emmert-Streib. 2007. Graphtheoretical characterizations of generalized trees. In Proceedings of the International Conference on Machine Learning: Models, Technologies & Applications (MLMTA’07). Las Vegas, NV.

    Google Scholar 

  26. Dehmer, M., F. Emmert-Streib, and T. Gesell. 2008. A comparative analysis of multidimensional featuresof objects resembling sets of graphs. Applied Mathematics and Computation 196:221–235.

    Article  MathSciNet  MATH  Google Scholar 

  27. Dehmer, M., F. Emmert-Streib, A. Mehler, and J. Kilian. 2006. Measuring the structural similarity of web-based documents: A novel approach. International Journal of Computational Intelligence 3(1):1–7.

    Google Scholar 

  28. Dimter, M. 1981. Textklassenkonzepte heutiger Alltagssprache. Tübingen: Niemeyer.

    Book  Google Scholar 

  29. Dorogovtsev, S.N., and J.F.F. Mendes. 2003. Evolution of networks. From biological networks to the internet and http://WWW. Oxford: Oxford University Press.

  30. Emmert-Streib, F., and M. Dehmer. 2007. Information theoretic measures of UHG graphs with low computational complexity. Applied Mathematics and Computation 190:1783–1794.

    Article  MathSciNet  MATH  Google Scholar 

  31. Ferber, R. 2003. Information retrieval. Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. Heidelberg: dpunkt.verlag.

    Google Scholar 

  32. Flesca, S., G. Manco, E. Masciari, L. Pontieri, and A. Pugliese. 2002. Detecting structural similarities between XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB 2002). Madison, Wisconsin, USA.

    Google Scholar 

  33. Foulds, L.R. 1992. Graph theory applications. New York, NY: Springer.

    Book  Google Scholar 

  34. Gibson, D., R. Kumar, K.S. McCurley, and A. Tomkins. 2007. Dense subgraph extraction. In Mining graph data, eds. D. Cook and L.B. Holder, 411–441. Hoboken, NJ: Wiley-Interscience.

    Google Scholar 

  35. Gleim, R. 2004. Integrierte Repräsentation, Kategorisierung und Strukturanalyse Web-basierter Hypertexte. Master’s thesis, Technische Universität Darmstadt, Fachbereich Informatik, Sept 2004.

    Google Scholar 

  36. Gleim, R. 2005. HyGraph: Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertexte. In Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005 in Bonn, eds. B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner, 42–53. Frankfurt a.M.: Lang.

    Google Scholar 

  37. Halin, R. 1989. Graphentheorie. Berlin: Akademie Verlag.

    Google Scholar 

  38. Han, J., and M. Kamber. 2001. Data mining: Concepts and techniques. New York, NY: Morgan and Kaufmann Publishers.

    Google Scholar 

  39. Harary, F. 1969. Graph theory. Reading, MA: Addison Wesley Publishing Company.

    Google Scholar 

  40. Huberman, B., and L. Adamic. 1999. Growth dynamics of the world-wide web. Nature, 399:130.

    Google Scholar 

  41. Jiang, T., L. Wang, and K. Zhang. 1994. Alignment of trees – an alternative to tree edit. In CPM ’94: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, 75–86, London: Springer-Verlag.

    Google Scholar 

  42. Joshi, S., N. Agrawal, R. Krishnapuram, and S. Negi. 2003. A bag of paths model for measuring structural similarity in web documents. In KDD ’03: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 577–582, New York, NY.

    Google Scholar 

  43. Kaden, F. 1982. Graphmetriken und Distanzgraphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 2(82):1–63.

    Google Scholar 

  44. Kaden, F. 1986. Graphmetriken und Isometrieprobleme zugehöriger Distanzgraphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 1(P6):1–100.

    Google Scholar 

  45. Kleinberg, J.M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604–632.

    Article  MathSciNet  MATH  Google Scholar 

  46. Kosala, R., and H. Blockeel. 2000. Web mining research: A survey. SIGKDD explorations: Newsletter of the Special Interest Group (SIG) on knowledge discovery & data mining, ACM 2(1):1–15.

    Google Scholar 

  47. Koschützki, D., K.A. Lehmann, L. Peters, S. Richter, D. Tenfelde-Podehl, and O. Zlotkowski. 2005. Clustering. In Centrality indices, eds. U. Brandes and T. Erlebach, Lecture Notes of Computer Science, 16–61. Berlin: Springer.

    Google Scholar 

  48. Kumar, R., P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins, and E. Upfal. 2000. The web as a graph. In PODS ’00: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 1–10, New York, NY: ACM Press.

    Google Scholar 

  49. Levenstein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics – Doklady 10(8):707–710, Feb 1966.

    MathSciNet  Google Scholar 

  50. Lindemann, C., and L. Littig. 2010. Classification of web sites at super-genre level. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini, Text, Speech and Language Technology. Dordrecht: Springer.

    Google Scholar 

  51. Mason, O., and M. 2007. Verwoerd. Graph theory and networks in biology. IET Systems Biology 1(2):89–119.

    Article  Google Scholar 

  52. Mehler, A. 2001. Textbedeutung. Zur prozeduralen Analyse und Repräsentation struktureller ähnlichkeiten von Texten, volume 5 of Sprache, Sprechen und Computer/Computer Studies in Language and Speech. Frankfurt a. M.: Peter Lang.

    Google Scholar 

  53. Mehler, A. 2004. Textmining. In Texttechnologie. Perspektiven und Anwendungen, eds. H. Lobin and L. Lemnitzer, 83–107. Tübingen: Stauffenburg.

    Google Scholar 

  54. Mehler, A. 2009. Generalized shortest paths trees: A novel graph class applied to semiotic networks. In Analysis of complex networks: From biology to linguistics, eds. M. Dehmer and F. Emmert-Streib, 175–220. Weinheim: Wiley-VCH.

    Google Scholar 

  55. Mehler, A. 2010. Structure formation in the web. toward a graphtheoretical model of hypertext types. In Linguistic modelling of information and markup languages, eds. A. Witt and D. Metzing, 225–247. Dordrecht: Springer.

    Google Scholar 

  56. Mehler, A., and R. Gleim. 2006. The net for the graphs – towards webgenre representation for corpus linguistic studies. In WaCky! Working papers on the web as corpus, eds. M. Baroni and S. Bernardini, 191–224. Bologna: Gedit.

    Google Scholar 

  57. Mehler, A., M. Dehmer, and R. Gleim. 2004. Towards logical hypertext structure – A graph-theoretic perspective. In Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS ’04), eds. T. Böhme and G. Heyer, Lecture Notes in Computer Science, vol. 3473, 136–150, Berlin/New York: Springer.

    Google Scholar 

  58. Mehler, A., R. Gleim, and M. Dehmer. 2005. Towards structure-sensitive hypertext categorization. In Proceedings of the 29th Annual Conference of the German Classification Society, LNCS, Mar 9–11. Universität Magdeburg, Berlin/New York, NY: Springer.

    Google Scholar 

  59. Mehler, A., R. Gleim, and A. Wegner. 2007. Structural uncertainty of hypertext types. An empirical study. In Proceedings of the Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 30 Sept 2007, 13–19, in conjunction with RANLP 2007. Borovets, Bulgaria.

    Google Scholar 

  60. Messmer, B.T., and H. Bunke. 1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5):493–504.

    Article  Google Scholar 

  61. Noller, S., J. Naumann, and T. Richter. 2001. LOGPAT – Ein webbasiertes Tool zur Analyse von Navigationsverläufen in Hypertexten. http://www.psych.uni-goettingen.de/congress/gor-2001

  62. Power, R., D. Scott, and N. Bouayad-Agha. 2003. Document structure. Computational Linguistics 29(2):211–260.

    Article  Google Scholar 

  63. Raghavan, P. 2000. Graph structure of the web: A survey. In LATIN 2000: Theoretical Informatics. Proceedings of 4th Latin American Symposium, 123–125. Punta del Este, Uruguay.

    Google Scholar 

  64. Rahm, E. 2002. Web usage mining. Datenbank-Spektrum 2(2)75–76.

    Google Scholar 

  65. Rehm, G. 2007. Hypertextsorten. Definition – Struktur – Klassifikation. Norderstedt: Books on Demand.

    Google Scholar 

  66. Richter, T., J. Naumann, and S. Noller. 2003. Logpat: A semi-automatic way to analyze hypertext navigation behavior. Swiss Journal of Psychology 62:113:120.

    Article  Google Scholar 

  67. Schädler, C. 1999. Die Ermittlung struktureller ähnlichkeit undstruktureller-Merkmale bei komplexen Objekten: Einkonnektionistischer Ansatz und seine Anwendungen. PhD thesis, Technische Universität Berlin.

    Google Scholar 

  68. Scsibrany, H., K. Karlovits, W. Demuth, F. Müller, and K. Varmuza. 2003. Clustering and similarity of chemical structures represented by binary substructure descriptors. Chemometrics and Intelligent Laboratory Systems 67:95–108.

    Article  Google Scholar 

  69. Selkow, S.M. 1977. The tree-to-tree editing problem. Information Processing Letters 6(6):184–186.

    Article  MathSciNet  MATH  Google Scholar 

  70. Skorobogatov, V.A., and A.A. Dobrynin. 1988. Metrical analysis of graphs. MATCH 23:105–155.

    MathSciNet  MATH  Google Scholar 

  71. Sobik, F. 1982. Graphmetriken und Klassifikation strukturierter Objekte. ZKI-Informationen, Akademie der Wissenschaften der DDR 2(82):63–122.

    Google Scholar 

  72. Sobik, F. 1986. Modellierung von Vergleichsprozessen auf der Grundlage von ähnlichkeitsmaßen für Graphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 4:104–144.

    Google Scholar 

  73. Spiliopoulou, M. 2000. Web usage mining for web site evaluation. Communications of the ACM 43(8):127–134.

    Article  Google Scholar 

  74. Tai, K.C. 1979. The tree-to-tree correction problem. Journal of the ACM 26(3):422–433. ISSN 0004-5411.

    Article  MathSciNet  MATH  Google Scholar 

  75. Waltinger, U., A. Mehler, and A. Wegner. 2009. A two-level approach to web genre classification. In Proceedings of the 5th International Conference on Web Information Systems and Technologies (WEBIST ’09), 23–26 Mar 2009. Lisboa.

    Google Scholar 

  76. Wasserman, S., and K. Faust. 1994. Social network analysis: Methods and applications, Structural Analysis in the Social Sciences. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  77. Zelinka, B. 1975. On a certain distance between isomorphism classes of graphs. Časopis pro \(\breve{p}\) est. Mathematiky 100:371–373.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are thankful to Alexander Mehler for fruitful discussions on this topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Dehmer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Dehmer, M., Emmert-Streib, F. (2010). Mining Graph Patterns in Web-based Systems: A Conceptual View. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9178-9_11

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9177-2

  • Online ISBN: 978-90-481-9178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics