Mining Graph Patterns in Web-based Systems: A Conceptual View

Dehmer, Matthias; Emmert-Streib, Frank

doi:10.1007/978-90-481-9178-9_11

Mining Graph Patterns in Web-based Systems: A Conceptual View

Matthias Dehmer^4,5 &
Frank Emmert-Streib⁶

Chapter
First Online: 01 January 2010

1119 Accesses

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 42))

Abstract

The task of applying Data Mining methods [38] to web-based hypertexts is often referred to as Web Mining [16]. In view of the steadily increasing complexity of web data sources and the huge amount of information available online, Web Mining has been an important and fruitful research topic [16, 46]. Generally, Web Mining can be divided into the following categories:

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Albert, R., H. Jeong, and A.L. Barabási. 1999. Diameter of the world wide web. Nature 401:130–131.
Article Google Scholar
Baeza-Yates, R., and B. Ribeiro-Neto, eds. 1999. Modern information retrieval. Reading, MA: Addison-Wesley.
Google Scholar
Barabási, A.-L., and Z.N. Oltvai. 2004. Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics, 5(2):101–113.
Article Google Scholar
Basak, S.C., V.R. Magnuson, G.J. Niemi, and R.R. Regal. 1988. Determining structural similarity of chemicals using graph-theoretic indices. Discrete Applied Mathematics 19:17–44.
Article MathSciNet Google Scholar
Batagelj, V. 1988. Similarity measures between structured objects. In Proceedings of an International Course and Conference on the Interfaces between Mathematics, Chemistry and Computer Sciences. Dubrovnik, Yugoslavia.
Google Scholar
Bonchev, D. 1979. Information indices for atoms and molecules. MATCH 7:65–113.
Google Scholar
Bonchev, D. 1983. Information theoretic indices for characterization of-chemical structures. Chichester: Research Studies Press.
Google Scholar
Bornholdt, S., and H.G. Schuster. 2003. Handbook of graphs and networks. From the genome to the Internet. Weinheim: Wiley-VCH.
Google Scholar
Brandes, U., and T. Erlebach. 2005. Network analysis. Lecture Notes in Computer Science. Heidelberg: Springer.
Google Scholar
Bunke, H. 1983. What is the distance between graphs? Bulletin of the EATCS 20:35–39.
Google Scholar
Bunke, H. 2000a. Recent developments in graph matching. In Proceedings of the 15th International Conference on Pattern Recognition 2:117–124.
Google Scholar
Bunke, H. 2000b. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of Vision Interface 2000, 82–88. Montreal, Canada.
Google Scholar
Buttler, D. 2004. A short survey of document structure similarity algorithms. In International Conference on Internet Computing, 3–9. Los Vegas, Nevada, USA.
Google Scholar
Carrière, S.J., and R. Kazman. 1997. Webquery: Searching and visualizing the web through connectivity. Computer Networks and ISDN Systems 29(8–13):1257–1267.
Article Google Scholar
Chakrabarti, S. 2001. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference, May 1–5, 211–220. Hong Kong.
Google Scholar
Chakrabarti, S. 2002. Mining the web: Discovering knowledge from hypertext data. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Cook, D., and L.B. Holder. 2007. Mining graph data. Weinheim: Wiley-Interscience.
Google Scholar
Dehmer, M. 2006. Strukturelle analyse web-basierter Dokumente. Multimedia und Telekooperation. Wiesbaden: Deutscher Universitäts Verlag.
Google Scholar
Dehmer, M. 2008a. Information-theoretic concepts for the analysis of complex networks. Applied Artificial Intelligence 22(7 and 8):684–706.
Article Google Scholar
Dehmer, M. 2008b. Information processing in complex networks:graph entropy and information functionals. Applied Mathematics and Computation 201:82–94.
Article MathSciNet MATH Google Scholar
Dehmer, M., and F. Emmert-Streib. 2007. Structural similarity of directed universal hierarchical graphs: A low computational complexity approach. Applied Mathematics and Computation 194:7–20.
Article MathSciNet MATH Google Scholar
Dehmer, M., and A. Mehler. 2007. A new method of measuring similarity for a special class of directed graphs. Tatra Mountains Mathematical Publications 36:39–59.
MathSciNet MATH Google Scholar
Dehmer, M., A. Mehler, and R. Gleim. 2004. Aspekte der Kategorisierung von Webseiten. In Proceedings des Multimediaworkshops der Jahrestagung der Gesellschaft für Informatik, eds. P. Dadam und M. Reichert, Lecture Notes in Computer Science, vol. 2, 39–43, Berlin: Springer.
Google Scholar
Dehmer, M., F. Emmert-Streib, and J. Kilian. 2006. A similarity measure for graphs with lowcomputational complexity. Applied Mathematics and Computation 182:447–459.
Article MathSciNet MATH Google Scholar
Dehmer, M., A. Mehler, and F. Emmert-Streib. 2007. Graphtheoretical characterizations of generalized trees. In Proceedings of the International Conference on Machine Learning: Models, Technologies & Applications (MLMTA’07). Las Vegas, NV.
Google Scholar
Dehmer, M., F. Emmert-Streib, and T. Gesell. 2008. A comparative analysis of multidimensional featuresof objects resembling sets of graphs. Applied Mathematics and Computation 196:221–235.
Article MathSciNet MATH Google Scholar
Dehmer, M., F. Emmert-Streib, A. Mehler, and J. Kilian. 2006. Measuring the structural similarity of web-based documents: A novel approach. International Journal of Computational Intelligence 3(1):1–7.
Google Scholar
Dimter, M. 1981. Textklassenkonzepte heutiger Alltagssprache. Tübingen: Niemeyer.
Book Google Scholar
Dorogovtsev, S.N., and J.F.F. Mendes. 2003. Evolution of networks. From biological networks to the internet and http://WWW. Oxford: Oxford University Press.
Emmert-Streib, F., and M. Dehmer. 2007. Information theoretic measures of UHG graphs with low computational complexity. Applied Mathematics and Computation 190:1783–1794.
Article MathSciNet MATH Google Scholar
Ferber, R. 2003. Information retrieval. Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. Heidelberg: dpunkt.verlag.
Google Scholar
Flesca, S., G. Manco, E. Masciari, L. Pontieri, and A. Pugliese. 2002. Detecting structural similarities between XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB 2002). Madison, Wisconsin, USA.
Google Scholar
Foulds, L.R. 1992. Graph theory applications. New York, NY: Springer.
Book Google Scholar
Gibson, D., R. Kumar, K.S. McCurley, and A. Tomkins. 2007. Dense subgraph extraction. In Mining graph data, eds. D. Cook and L.B. Holder, 411–441. Hoboken, NJ: Wiley-Interscience.
Google Scholar
Gleim, R. 2004. Integrierte Repräsentation, Kategorisierung und Strukturanalyse Web-basierter Hypertexte. Master’s thesis, Technische Universität Darmstadt, Fachbereich Informatik, Sept 2004.
Google Scholar
Gleim, R. 2005. HyGraph: Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertexte. In Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005 in Bonn, eds. B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner, 42–53. Frankfurt a.M.: Lang.
Google Scholar
Halin, R. 1989. Graphentheorie. Berlin: Akademie Verlag.
Google Scholar
Han, J., and M. Kamber. 2001. Data mining: Concepts and techniques. New York, NY: Morgan and Kaufmann Publishers.
Google Scholar
Harary, F. 1969. Graph theory. Reading, MA: Addison Wesley Publishing Company.
Google Scholar
Huberman, B., and L. Adamic. 1999. Growth dynamics of the world-wide web. Nature, 399:130.
Google Scholar
Jiang, T., L. Wang, and K. Zhang. 1994. Alignment of trees – an alternative to tree edit. In CPM ’94: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, 75–86, London: Springer-Verlag.
Google Scholar
Joshi, S., N. Agrawal, R. Krishnapuram, and S. Negi. 2003. A bag of paths model for measuring structural similarity in web documents. In KDD ’03: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 577–582, New York, NY.
Google Scholar
Kaden, F. 1982. Graphmetriken und Distanzgraphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 2(82):1–63.
Google Scholar
Kaden, F. 1986. Graphmetriken und Isometrieprobleme zugehöriger Distanzgraphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 1(P6):1–100.
Google Scholar
Kleinberg, J.M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604–632.
Article MathSciNet MATH Google Scholar
Kosala, R., and H. Blockeel. 2000. Web mining research: A survey. SIGKDD explorations: Newsletter of the Special Interest Group (SIG) on knowledge discovery & data mining, ACM 2(1):1–15.
Google Scholar
Koschützki, D., K.A. Lehmann, L. Peters, S. Richter, D. Tenfelde-Podehl, and O. Zlotkowski. 2005. Clustering. In Centrality indices, eds. U. Brandes and T. Erlebach, Lecture Notes of Computer Science, 16–61. Berlin: Springer.
Google Scholar
Kumar, R., P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins, and E. Upfal. 2000. The web as a graph. In PODS ’00: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 1–10, New York, NY: ACM Press.
Google Scholar
Levenstein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics – Doklady 10(8):707–710, Feb 1966.
MathSciNet Google Scholar
Lindemann, C., and L. Littig. 2010. Classification of web sites at super-genre level. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini, Text, Speech and Language Technology. Dordrecht: Springer.
Google Scholar
Mason, O., and M. 2007. Verwoerd. Graph theory and networks in biology. IET Systems Biology 1(2):89–119.
Article Google Scholar
Mehler, A. 2001. Textbedeutung. Zur prozeduralen Analyse und Repräsentation struktureller ähnlichkeiten von Texten, volume 5 of Sprache, Sprechen und Computer/Computer Studies in Language and Speech. Frankfurt a. M.: Peter Lang.
Google Scholar
Mehler, A. 2004. Textmining. In Texttechnologie. Perspektiven und Anwendungen, eds. H. Lobin and L. Lemnitzer, 83–107. Tübingen: Stauffenburg.
Google Scholar
Mehler, A. 2009. Generalized shortest paths trees: A novel graph class applied to semiotic networks. In Analysis of complex networks: From biology to linguistics, eds. M. Dehmer and F. Emmert-Streib, 175–220. Weinheim: Wiley-VCH.
Google Scholar
Mehler, A. 2010. Structure formation in the web. toward a graphtheoretical model of hypertext types. In Linguistic modelling of information and markup languages, eds. A. Witt and D. Metzing, 225–247. Dordrecht: Springer.
Google Scholar
Mehler, A., and R. Gleim. 2006. The net for the graphs – towards webgenre representation for corpus linguistic studies. In WaCky! Working papers on the web as corpus, eds. M. Baroni and S. Bernardini, 191–224. Bologna: Gedit.
Google Scholar
Mehler, A., M. Dehmer, and R. Gleim. 2004. Towards logical hypertext structure – A graph-theoretic perspective. In Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS ’04), eds. T. Böhme and G. Heyer, Lecture Notes in Computer Science, vol. 3473, 136–150, Berlin/New York: Springer.
Google Scholar
Mehler, A., R. Gleim, and M. Dehmer. 2005. Towards structure-sensitive hypertext categorization. In Proceedings of the 29th Annual Conference of the German Classification Society, LNCS, Mar 9–11. Universität Magdeburg, Berlin/New York, NY: Springer.
Google Scholar
Mehler, A., R. Gleim, and A. Wegner. 2007. Structural uncertainty of hypertext types. An empirical study. In Proceedings of the Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 30 Sept 2007, 13–19, in conjunction with RANLP 2007. Borovets, Bulgaria.
Google Scholar
Messmer, B.T., and H. Bunke. 1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5):493–504.
Article Google Scholar
Noller, S., J. Naumann, and T. Richter. 2001. LOGPAT – Ein webbasiertes Tool zur Analyse von Navigationsverläufen in Hypertexten. http://www.psych.uni-goettingen.de/congress/gor-2001
Power, R., D. Scott, and N. Bouayad-Agha. 2003. Document structure. Computational Linguistics 29(2):211–260.
Article Google Scholar
Raghavan, P. 2000. Graph structure of the web: A survey. In LATIN 2000: Theoretical Informatics. Proceedings of 4th Latin American Symposium, 123–125. Punta del Este, Uruguay.
Google Scholar
Rahm, E. 2002. Web usage mining. Datenbank-Spektrum 2(2)75–76.
Google Scholar
Rehm, G. 2007. Hypertextsorten. Definition – Struktur – Klassifikation. Norderstedt: Books on Demand.
Google Scholar
Richter, T., J. Naumann, and S. Noller. 2003. Logpat: A semi-automatic way to analyze hypertext navigation behavior. Swiss Journal of Psychology 62:113:120.
Article Google Scholar
Schädler, C. 1999. Die Ermittlung struktureller ähnlichkeit undstruktureller-Merkmale bei komplexen Objekten: Einkonnektionistischer Ansatz und seine Anwendungen. PhD thesis, Technische Universität Berlin.
Google Scholar
Scsibrany, H., K. Karlovits, W. Demuth, F. Müller, and K. Varmuza. 2003. Clustering and similarity of chemical structures represented by binary substructure descriptors. Chemometrics and Intelligent Laboratory Systems 67:95–108.
Article Google Scholar
Selkow, S.M. 1977. The tree-to-tree editing problem. Information Processing Letters 6(6):184–186.
Article MathSciNet MATH Google Scholar
Skorobogatov, V.A., and A.A. Dobrynin. 1988. Metrical analysis of graphs. MATCH 23:105–155.
MathSciNet MATH Google Scholar
Sobik, F. 1982. Graphmetriken und Klassifikation strukturierter Objekte. ZKI-Informationen, Akademie der Wissenschaften der DDR 2(82):63–122.
Google Scholar
Sobik, F. 1986. Modellierung von Vergleichsprozessen auf der Grundlage von ähnlichkeitsmaßen für Graphen. ZKI-Informationen, Akademie der Wissenschaften der DDR 4:104–144.
Google Scholar
Spiliopoulou, M. 2000. Web usage mining for web site evaluation. Communications of the ACM 43(8):127–134.
Article Google Scholar
Tai, K.C. 1979. The tree-to-tree correction problem. Journal of the ACM 26(3):422–433. ISSN 0004-5411.
Article MathSciNet MATH Google Scholar
Waltinger, U., A. Mehler, and A. Wegner. 2009. A two-level approach to web genre classification. In Proceedings of the 5th International Conference on Web Information Systems and Technologies (WEBIST ’09), 23–26 Mar 2009. Lisboa.
Google Scholar
Wasserman, S., and K. Faust. 1994. Social network analysis: Methods and applications, Structural Analysis in the Social Sciences. Cambridge, MA: Cambridge University Press.
Google Scholar
Zelinka, B. 1975. On a certain distance between isomorphism classes of graphs. Časopis pro \(\breve{p}\) est. Mathematiky 100:371–373.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are thankful to Alexander Mehler for fruitful discussions on this topic.

Author information

Authors and Affiliations

Institute of Discrete Mathematics and Geometry, Vienna University of Technology, Vienna, Austria
Matthias Dehmer
Institute for Bioinformatics and Translational Research, Hall in Tyrol, Austria
Matthias Dehmer
Computational Biology and Machine Learning, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen’s University Belfast, Belfast, UK
Frank Emmert-Streib

Authors

Matthias Dehmer
View author publications
You can also search for this author in PubMed Google Scholar
Frank Emmert-Streib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Dehmer .

Editor information

Editors and Affiliations

, Text Technology/Applied Comp. Ling., Bielefeld University, Universitätsstrasse 25, Bielefeld, 33615, Germany
Alexander Mehler
LS2 9JT Leeds, United Kingdom
Serge Sharoff
Varvsgatan 25, Stockholm, 117 29, Sweden
Marina Santini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dehmer, M., Emmert-Streib, F. (2010). Mining Graph Patterns in Web-based Systems: A Conceptual View. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_11

Download citation

DOI: https://doi.org/10.1007/978-90-481-9178-9_11
Published: 16 August 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9177-2
Online ISBN: 978-90-481-9178-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics