Abstract
In this paper, we introduce a quantitative graph model of social ontologies as exemplified by the category system of Wikipedia. This is done to contrast structure formation in distributed cognition with classification schemes (by example of the DDC and MeSH), formal ontologies (by example of OpenCyc and SUMO), and terminological ontologies (as exemplified by WordNet). Our basic findings are that social ontologies have a characteristic topology that clearly separates them from other types of ontologies. In this context, we introduce the notion of a Zipfian bipartivity to analyze the relationship of categories and categorized units in distributed cognition.
MSC2000 Primary 05C75; Secondary 05C82, 68T50, 90B15, 91D30, 91F20.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that within the social ontologies analyzed here, i.e. Wikipedia category systems, moderation may occur.
- 2.
Such links are anchored within the body of an article, but not at the end of it where categorization links are located.
- 3.
The order of a graph equals the number of its vertices [32].
- 4.
Note that the documentation of the Wikipedia suggests the absence of cycles and that the category system of the Wikipedia spans DAGs (cf. http://en.wikipedia.org/wiki/Wikipedia:Categorization). We show more precisely the degree by which this is only approximately true.
- 5.
This approach is in the line of [13] who develops information-theoretic indices of graphs and their topology. See also [87] for a related approach in the area of quantitative biology. A second impulse comes from [22] who calculates entropies of probability distributions of vertices in complex networks. Albeit this coincidence we deal with complex nearly DAG-like graphs apart from complex networks. In any event, it is our conviction that the analysis of graph structures can gain invaluable insights from these two approaches beyond of what has been done so far in complex network theory.
- 6.
Note that the graphs in Fig. 10.5 denote different scenarios only schematically.
- 7.
Note that we use the terms index and measure synonymously (cf. [41]).
- 8.
Of course, every weakly connected component of the [ ]-variant of the SOC has at least one source. This information is already implicity explored by means of the connected component statistics. Thus, we focus on operating on the LCC of each SOG when calculating the multiplicity index.
- 9.
Note that μ = . 514 and σ = . 3621.
- 10.
This also means that if we disregard the multiplicity of sources we may say that SOGs tend to be tree-like.
- 11.
By D − u we denote in the usual way the subdigraph of D induced by V ∖ { u}.
- 12.
- 13.
At this point one might ask why we do not use a simpler notion of cyclicity [38] by counting, for example, the number of edges to be deleted in order to make a graph acyclic? The simple reason is that index (10.10) is more informative about the DAG-like structure of a SOG as it additionally includes the impact of multiple sources.
- 14.
- 15.
- 16.
Note that this main category is always unique.
- 17.
An obvious alternative to this approach would be to analyze the distribution of imbalance values in a SOG – this will be one reference point for future work.
- 18.
Botafogo et al. [14] already utilized depth as a reference quantity of imbalance. However, they unnecessarily use a recursive function for defining it and miss demonstrating its empirical significance.
- 19.
Without following this line of research here, it may be interesting to consider this level of maximum order from the point of view of conceptual levels in prototype theory [74].
- 20.
See [54] who have introduced this notion in the area of modeling web genres.
- 21.
- 22.
- 23.
- 24.
Thus, the reader should not confuse ℂ r with a random uniform distribution of the objects over the target classes.
- 25.
References
Altmann, G.: Semantische Diversifikation. Folia Ling. 19, 177–200 (1985)
Altmann, G., Köhler, R.: “Language forces” and synergetic modelling of language phenomena. In: Glottometrika, vil. 15, pp. 62–76. Brockmeyer, Bochum (1996)
Altmann, G., Lehfeldt. W.: Allgemeine Sprachtypologie. Fink, München (1973)
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web. Wiley, Chichester (2003)
Bales, M.E., Lussier, Y.A., Johnson, S.B.: Topological analysis of large-scale biomedical terminology structures. J. Am. Med. Informat. Assoc. 14(6), 788–797 (2007)
Bang-Jensen, J., Gutin, G.: Digraphs. Theory, Algorithms and Applications. Springer, London/Berlin (2006)
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Barrat, A., Barthélemy, M., Vespignani, A.: Dynamical Processes on Complex Networks. Cambridge University Press, Cambridge (2008)
Barthélemy, M.: Betweenness centrality in large complex networks. Eur. Phys. J. B 38, 163–168 (2004)
Berwanger, D., Dawar, A., Hunter, P., Kreutzer, S.: DAG-width and parity games. In: Durand, B., Thomas, W. (eds.) STACS, vol. 3884, Lecture Notes in Computer Science, pp. 524–536. Springer, Berlin (2006)
Bickhard, M.H.: Social ontology as convention. Topoi 27(1-2), 139–149 (2008)
Blohm, S., Kroetzsch, M., Cimiano, P.: Integrating the fast and the numerous – combining machine and community intelligence for semantic annotation and Wikipedia: Folksonomy meets rigorously defined common-sense. In: Proceedings of AAAI 2008 Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy (WikiAI08), Chicago (2008)
Bonchev, D.: Information Theoretic Indices for Characterization of Chemical Structures. Research Studies Press, Chichester (1983)
Botafogo, R.A., Rivlin, E., Shneiderman, B.: Structural analysis of hypertexts: Identifying hierarchies and useful metrics. ACM Trans. Inform. Syst. 10(2), 142–180 (1992)
Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Ling. 32(1), 13–47 (2006)
Caldarelli, G.: Scale-Free Networks. Complex webs in nature and technology. Oxford University Press, Oxford (2008)
Capocci, A., Caldarelli, G.: Folksonomies and clustering in the collaborative system CiteULike. J. Phys. A Math. Theor. 41, 224016 (2008)
Capocci, A., Rao, F., Caldarelli, G.: Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia. Europhys. Lett. 81, 28006 (2008)
Cattuto, C., Barrat, A., Baldassarri, A., Schehr, G., Loreto, V.: Collective dynamics of social annotation. PNAS 106(26), 10511–10515 (2009)
Cattuto, C., Benz, D., Hotho, A., Stumme, G.: Semantic grounding of tag relatedness in social bookmarking systems. In: The Semantic Web – ISWC 2008, vol. 5318, Lecture Notes in Computer Science, pp. 615–631. Springer, Berlin, Heidelberg (2008)
Chernov, S., Iofciu, T., Nejdl, W., Zhou, X.: Extracting semantic relationships between Wikipedia categories. In: 1st International Workshop: From Wiki to Semantics (Sem Wiki 2006), co-located with ESWC 2006, Budva, Montenegro, June 12, 2006
Dehmer, M.: Information processing in complex networks: Graph entropy and information functionals. Appl. Math. Comput. 201, 82–94 (2008)
Dehmer, M., Mehler, A.: A new method of measuring the similarity for a special class of directed graphs. Tatra Mountains Math. Publ. 36, 39–59 (2007)
Dehmer, M., Mowshowitz, A.: A history of graph entropy measures. Inform. Sci. 181(1), 57–78 (2011)
Dellschaft, K., Staab, S.: An epistemic dynamic model for tagging systems. In: Hypertext 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19–21, 2008, Pittsburgh, Pennsylvania, USA, 2008
Estrada, E.: Protein bipartivity and essentiality in the yeast protein-protein interaction network. J. Proteome Res. 5(9), 2177–2184 (2006)
Estrada, E., Rodríguez-Velázquez, J.A.: Spectral measures of bipartivity in complex networks. Phys. Rev. E 72(4), 046105 (2005)
Fellbaum, C., (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Firth, J.R.: A synopsis of linguistic theory, 1933–1955. In: Firth, J.R., (ed.) Studies in Linguistic Analysis, pp. 1–32. Blackwell, Oxford (1957)
Freyd, J.J.: Shareability: The social psychology of epistemology. Cognit. Sci. 7, 191–210 (1983)
Hammwöhner, R.: Interlingual aspects of Wikipedia’s quality. In: Proceedings of the International Conference On Information Qualiy (ICIQ 2007) (2007)
Harary, F.: Graph Theory. Addison Wesley, Boston (1969)
Hollan, J., Hutchins, E., Kirsh, D.: Distributed cognition: toward a new foundation for human-computer interaction research. ACM Trans. Comput. Hum. Interact. 7(2), 174–196 (2000)
Holme, P., Liljeros, F., Edling, C.R., Kim, B.J.: On network bipartivity. Phys. Rev. E 68, 056107 (2003)
Hotho, A., Jäschke, R., Schmitz, C., Stumme, G.: BibSonomy: A social book-mark and publication sharing system. In: Proceedings Of the Workshop on Tool Interoperability at the International Conference on Conceptual Structures 2006, pp. 87–102 (2006)
Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. J. Lang. Tech. Comput. Ling. 20(1), 19–62 (2005)
Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Discovering shared conceptualizations in folksonomies. Web Semant. Sci. Serv. Agents World Wide Web 6(1), 38–53 (2008)
Klein, D.J., Ivanciuc, O.: Graph cyclicity, excess conductance, and resistance deficit. J. Math. Chem. 30(3), 271–287 (2001)
Köhler, R.: Systems theoretical linguistics. Theor. Ling. 14(2/3), 241–257 (1987)
Köhler, R.: Syntactic structures, properties and interrelations. J. Quant. Ling. 6, 46–57 (1999)
Koschützki, D., Lehmann, K.A., Peeters, L., Richter, S., Tenfelde-Podehl, D., Zlotowski, O.: Centrality indices. In: Brandes, U., Erlebach, T., (eds.) Network Analysis, vol. 3418, Lecture Notes in Computer Science, pp. 16–61. Springer, Berlin (2004)
Kunze, C., Lemnitzer, L.: GermaNet – representation, visualization, application. In Rodriguez, M., González, Paz Suárez Araujo, C., (eds.) Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), pp. 1485–1491. European Language Resources Association, Paris (2002)
Lambiotte, R., Ausloos, M.: Collaborative tagging as a tripartite network. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J., (eds.) International Conference on Computational Science (3), vol. 3993, Lecture Notes in Computer Science, pp. 1114–1117. Springer, Berlin (2006)
Lenat, D.B.: CYC: A large-scale investment in knowledge infrastructure. Comm. ACM 38, 33–38 (1995)
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with Wikipedia. In: Proceedings of AAAI 2008 Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy (WikiAI08), Chicago (2008)
Mehler, A.: Text linkage in the wiki medium – a comparative study. In: Karlgren, J., (ed.) Proceedings of the EACL Workshop on New Text – Wikis and blogs and other dynamic text sources, pp. 1–8. Trento, Italy (2006)
Mehler, A.: Large text networks as an object of corpus linguistic studies. In: Lüdeling, A., Kytö, M., (eds.) Corpus Linguistics. An International Handbook of the Science of Language and Society, pp. 328–382. De Gruyter, Berlin/NewYork (2008)
Mehler, A.: On the impact of community structure on self-organizing lexical networks. In: Smith, A.D.M., Smith, K., Ferrer i Cancho, R., (eds.) Proceedings of the 7th Evolution of Language Conference (Evolang7), pp. 227–234. World Scientific, Barcelona (2008)
Mehler, A.: Structural similarities of complex networks: A computational model by example of wiki graphs. Appl. Artif. Intell. 22(7&8), 619–683 (2008)
Mehler, A.: Generalized shortest paths trees: A novel graph class applied to semiotic networks. In: Dehmer, M., Emmert-Streib, F., (eds.) Analysis of Complex Networks: From Biology to Linguistics, pp. 175–220. Wiley-VCH, Weinheim (2009)
Mehler, A.: Minimum spanning Markovian trees: Introducing context-sensitivity into the generation of spanning trees. In: Dehmer, M., (ed.) Structural Analysis of Complex Networks, pp. 381–401. Birkhäuser/Basel (2010)
Mehler, A., Geibel, P., Pustylnikov, O.: Structural classifiers of text types: Towards a novel model of text representation. J. Lang. Tech. Comput. Ling. 22(2), 51–66 (2007)
Mehler, A., Gleim, R., Ernst, A., Waltinger, U.: WikiDB: Building interoperable wiki-based knowledge resources for semantic databases. Sprache und Datenverarbeitung Int. J. Lang. Data Process. 32(1), 47–70 (2008)
Mehler, A., Gleim, R., Wegner, A.: Structural uncertainty of hypertext types. An empirical study. In: Proceedings of the Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, in conjunction with RANLP 2007, pp. 13–19. Borovets, Bulgaria (2007)
Meluk, I.: Dependency Syntax: Theory and Practice. SUNY, Albany (1988)
Mika, P.: Ontologies are us: A unified model of social networks and semantics. J. Web Semant. 5(1), 5–15 (2007)
Mika, P., Gangemi, A.: Descriptions of social relations. In: Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web (2004)
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceedings of AAAI 2008 Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy (WikiAI08), Chicago (2008)
Naranan, S., Balasubrahmanyan, V.K.: Models for power law relations in linguistics and information science. J. Quant. Ling. 5(1-2), 35–61 (1998)
Nelson, S.J., Johnston, W.D., Humphreys, B.L.: Relationships in medical subject headings. In: Bean, C.A., Green, R., (eds.) Relationships in the organization of knowledge, pp. 171–184. Kluwer Academic Publishers, New York (2001)
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003)
Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2005)
Newman, M.E.J., Park, J.: The origin of degree correlations in the internet and other networks. Phys. Rev. E 68, 026121 (2003)
Niles, I., Pease, A.: Towards a standard upper ontology. In: Welty, C., Smith, B., (eds.) Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine (2001)
Obdržálek, J.: DAG-width: connectivity measure for directed graphs. In: SODA’06: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 814–821. ACM, New York, NY, USA (2006)
OCLC. Dewey decimal classification summaries. A brief introduction to the Dewey Decimal Classification. http://www.oclc.org/dewey/resources/summaries/default.htm [accessed February 15, 2009], 2008
OpenCyc.org. OpenCyc documentation. http://www.opencyc.org/doc [accessed February 15, 2009], 2008
Pastor-Satorras, R., Vázquez, A., Vesipignani, A.: Dynamical and correlation properties of the internet. Phys. Rev. Letters 87(25), 268701 (2001)
Ponzetto, S., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI-07), pp. 1440–1447. Vancouver, B.C., Canada (2007)
Pustylnikov, O., Mehler, A.: Structural differentiate of text types. A quantitative model. In: Proceedings of the 31st Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl), pp. 655–662 (2007)
Pustylnikov, O., Mehler, A.: Text classification by means of structural features. What kind of information about texts is captured by their structure? In: Proceedings of RUSSIR’08. Taganrog, Russia (2008)
Abramov, O., Mehler, A.: Automatic language classification by means of syntactic dependency networks. J. Quant. Ling. (2011) (accepted)
Ravasz, E., Barabási, A.-L.: Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003)
Rosch, E.: Principles of categorization. In: Rosch, E., Lloyd, B.B., (eds.) Cognition and Categorization, pp. 27–48. Erlbaum, Hillsdale, N.J. (1978)
Santini, M.: Characterizing genres of web pages: Genre hybridism and individualization. In: Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS’07) (2007)
Saunders, S.: Improved shortest path algorithms for nearly acyclic graphs. Ph.D thesis, University of Canterbury, Computer Science (2004)
Saunders, S., Takaoka, T.: Improved shortest path algorithms for nearly acyclic graphs. Theor. Comput. Sci. 293(3), 535–556 (2003)
Saunders, S., Takaoka, T.: Solving shortest paths efficiently on nearly acyclic directed graphs. Theor. Comput. Sci. 370(1-3), 94–109 (2007)
Searle, J.R.: Social ontology. Some basic principles. Anthropol. Theor. 6(1), 12–29 (2006)
Skorobogatov, V.A., Dobrynin, A.A.: Metrical analysis of graphs. MATCH 23, 105–155 (1988)
Sowa, J.F.: Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks/Cole, Pacific Grove (2000)
Steels, L.: Collaborative tagging as distributed cognition. Pragmatics Cognit. 14(2), 287–292 (2006)
Steyvers, M., Tenenbaum, J.: The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognit. Sci. 29(1), 41–78 (2005)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW’07: Proceedings of the 16th international conference on World Wide Web, pp. 697–706. ACM, New York, NY, USA (2007)
Takaoka, T.: Shortest path algorithms for nearly acyclic directed graphs. Theor. Comput. Sci. 203(1), 143–150 (1998)
Tuldava, J.: Probleme und Methoden der quantitativ-systemischen Lexikologie. Wissenschaftlicher Verlag, Trier (1998)
Ulanowicz, R.E.: Identifying the structure of cycling in ecosystems. Math. Biosci. 65(2), 219–237 (1983)
Voss, J.: Collaborative thesaurus tagging the Wikipedia way. arXiv.org:cs/0604036 (2006)
Waltinger, U., Mehler, A., Heyer, G.: Towards automatic content tagging: Enhanced web services in digital libraries using lexical chaining. In: Cordeiro, J., Filipe, J., Hammoudi, S., (eds.) 4th Int. Conf. on Web Information Systems and Technologies (WEBIST ’08), pp. 231–236. INSTICC Press, Barcelona, Funchal, Portugal (2008)
Watts, D.J.: Six Degrees. The Science of a Connected Age. W. W. Norton & Company, New York/London (2003)
Zelinka, B.: Nearly acyclic digraphs. Czech. Math. J. 33(1), 164–165 (1983)
Zipf, G.K.: Human Behavior and the Principle of Least Effort. An Introduction to Human Ecology. Hafner Publishing Company, New York (1972)
Zlatic, V., Bozicevic, M., Stefancic, H., Domazet, M.: Wikipedias: Collaborative web-based encyclopedias as complex networks. Phys. Rev. E 74, 016115 (2006)
Acknowledgment
Financial support of the German Federal Ministry of Education (BMBF) through the research project Linguistic Networks, of the German Research Foundation (DFG) through the Excellence Cluster 277 Cognitive Interaction Technology (via the Project KnowCIT) and of the SFB 673 Alignment in Communication (via the Project A3 Dialogue Games and Group Dynamics and X1 Multimodal Alignment Corpora: Statistical Modeling and Information Management) is gratefully acknowledged. We also thank Dietmar Esch, Tobias Feith, and Roman Pustylnikov for the download of ontologies as well as Rüdiger Gleim, Olga Abramov, and Paul Warner for their fruitful hints which helped to reduce the number of errors in this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Mehler, A. (2011). Social Ontologies as Generalized Nearly Acyclic Directed Graphs: A Quantitative Graph Model of Social Tagging. In: Dehmer, M., Emmert-Streib, F., Mehler, A. (eds) Towards an Information Theory of Complex Networks. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-0-8176-4904-3_10
Download citation
DOI: https://doi.org/10.1007/978-0-8176-4904-3_10
Published:
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-0-8176-4903-6
Online ISBN: 978-0-8176-4904-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)