Integrating Concepts and Knowledge in Large Content Networks

Rossetti, Marco; Pareschi, Remo; Stella, Fabio; fontana, Francesca Arcelli

doi:10.1007/s00354-014-0407-4

Integrating Concepts and Knowledge in Large Content Networks

Published: 27 August 2014

Volume 32, pages 309–330, (2014)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Marco Rossetti¹,
Remo Pareschi²,
Fabio Stella¹ &
…
Francesca Arcelli fontana¹

150 Accesses
3 Citations
Explore all metrics

Abstract

Large content networks like the World Wide Web contain huge amounts of information that have the potential of being integrated because their components fit within common concepts and/or are connected through hidden, implicit relationships. One attempt at such an integration is the program called the “Web of Data,” which is an evolution of the Semantic Web. It targets semi-structured information sources such as Wikipedia and turns them into fully structured ones in the form of Web-based databases like DBpedia and then integrates them with other public databases such as Geonames. On the other hand, the vast majority of the information residing on the Web is still totally unstructured, which is the starting point for our approach that aims to integrate unstructured information sources. For this purpose, we exploit techniques from Probabilistic Topic Modeling, in order to cluster Web pages into concepts (topics), which are then related through higher-level concept networks; we also make implicit semantic relationships emerge between single Web pages. The approach has been tested through a number of case studies that are here described. While the applicative focus of the research reported here is on knowledge integration on the specific and relevant case of the WWW, the wider aim is to provide a framework for integration generally applicable to all complex content networks where information propagates from multiple sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utilizing Big Data Analytics for Automatic Building of Language-agnostic Semantic Knowledge Bases

ConceptNet 5: A Large Semantic Network for Relational Knowledge

The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional Data

References

Agarwal, D. and Chen, B., “flda: matrix factorization through latent dirichlet allocation,” in Proc. of the third ACM international conference on Web search and data mining, ACM, pp. 91–100, 2010.
Fontana, F.A., Formato, F. and Pareschi, R., “Boosting concept discovery in collective intelligences,” Brain Informatics, 5819, Springer, pp. 214–224, 2009.
Barbieri, N. and Manco, G., “An analysis of probabilistic methods for top-n recommendation in collaborative filtering,” Machine Learning and Knowledge Discovery in Databases, pp. 172–187, 2011.
Berners-Lee, T., Hendler, J. and Lassila, O., “The semantic web,” Scientific american, 284, 5, pp. 28–37, 2001.
Bizer, C., Heath, T. and Berners-Lee, T., “Linked data-the story so far," International Journal on Semantic Web and Information Systems, 5, 3, pp. 1–22, 2009.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R. and Hellmann, S., “DBpedia-A crystallization point for the Web of Data,” Web Semantics: Science, Services and Agents on the World Wide Web, 7, 3, pp. 154–165, 2009.
Blei, D. and Jordan, M., “Modeling annotated data,” in Proc. of the 26th Annual International ACM Conference on Research and Development in Informaion Retrieval, ACM, pp. 127–134, 2003.
Blei, D., Ng, A. and Jordan, M., “Latent dirichlet allocation,” the Journal of machine Learning research, 3, pp. 993–1022, 2003.
Chang, J., Gerrish, S., Wang, C., Boyd-graber, J. and Blei, D., “Reading tea leaves: How humans interpret topic models,” in Advances in neural information processing systems, pp. 288–296, 2009.
Cuthill, E. and McKee, J., in “Reducing the bandwidth of sparse symmetric matrices,” Proc. of the 1969 24th National Conference, ACM, pp. 157–172, 1969.
Fei-Fei, L. and Perona, P., “A bayesian hierarchical model for learning natural scene categories,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, pp. 524–531, 2005.
Griffiths, T. and Steyvers, M., “A probabilistic approach to semantic representation,” in Proc. of the 24th Annual Conference of the Cognitive Science Society, pp. 381–386, 2002.
Griffiths, T. and Steyvers, M., “Finding scientific topics,” in Proc. of the National academy of Sciences of the United States of America, 101, 1, National Acad Sciences, pp. 5228–5235, 2004.
Griffiths, T., Steyvers, M. and Tenenbaum, J., “Topics in semantic representation,” Psychological review, 114, 2, pp. 211, 2007.
Halevy, A., Norvig, P. and Pereira, F., “The unreasonable effectiveness of data,” IEEE Intelligent Systems, 24, 2, pp. 8–12, 2009.
Harel, D. and Koren, Y., “On clustering using random walks,” Foundations of Software Technology and Theoretical Computer Science, pp. 18–41, 2001.
Hoffart, J., Suchanek, F., Berberich, K. and Weikum, G., “Yago2: a spatially and temporally enhanced knowledge base from wikipedia,” Artificial Intelligence, 194, pp. 28–61, 2013.
Hofmann, T., “Probabilistic latent semantic indexing,” in Proc. of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval, ACM, pp. 50–57, 1999.
Hofmann, T., “Unsupervised learning by probabilistic latent semantic analysis,” Machine learning, 42, 1-2, pp. 177–196, 2001.
Jung, J.J. and Król, D., “Engineering knowledge and semantic systems,” in Computer Journal, 55, 3, ACM, pp. 256–257, 2012.
Kurzweil, R., How to Create a Mind: The Secret of Human Thought Revealed, Penguin.com, 2012.
Lancichinetti, A. and Fortunato, S., “Community detection algorithms: a comparative analysis,” Physical review E, 80, 5, 2009.
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S. and others, “Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia,” Semantic Web Journal, 2013.
Moniruzzaman, A. and Hossain, S., “Nosql database: New era of databases for big data analytics-classification, characteristics and comparison,” International Journal of Database Theory & Application, 6, 4, 2013.
Newman, M., Networks: an introduction, Oxford University Press, 2009.
Newman, M., Barabási, A. and Watts, D., The structure and dynamics of networks, Princeton University Press, 2006.
Newman M.: “Mixing patterns in networks”. Physical Review E 67, 2 (2003)
Article Google Scholar
Newman M., Girvan M.: “Finding and evaluating community structure in networks”. Physical review E 69, 2 (2004)
Google Scholar
Ramirez, E., Brena, R., Magatti, D. and Stella, F., “Probabilistic metrics for soft-clustering and topic model validation,” in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 1, pp. 406–412, 2010.
Ramirez, E., Brena, R., Magatti, D. and Stella, F., “Topic model validation,” Neurocomputing, 76, 1, pp. 125–133, 2012.
Rossetti, M., Stella, F. and Zanker, M., “Towards explaining latent factors with topic models in collaborative recommender systems,” in 24th International Workshop on Database and Expert Systems Applications (DEXA), pp. 162–167, 2013.
Rosvall, M. and Bergstrom, C., “Maps of random walks on complex networks reveal community structure,” in Proc. of the National Academy of Sciences, 105, 4, National Acad Sciences, pp. 1118–1123, 2008.
Sivic, J., Russell, B., Zisserman, A., Freeman, W. and Efros, A., “Unsupervised discovery of visual object class hierarchies,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, 2008.
Steyvers, M. and Griffiths, T., “Probabilistic topic models,” in Handbook of Latent Semantic Analysis, Lawrence Erlbaum, pp. 427–448, 2007.
Tenenbaum, J., Kemp, C., Griffiths, T. and Goodman, N., “How to grow a mind: Statistics, structure, and abstraction,” Science, 331, 6022, pp. 1279–1285, 2011.
Wallach, H., Murray, I., Salakhutdinov, R. and Mimno, D., “Evaluation methods for topic models,” Proc. of the 26th Annual International Conference on Machine Learning, ACM, pp. 1105–1112, 2009.
Wang, C. and Blei, D., “Collaborative topic modeling for recommending scientific articles,” in Proc. of the 17th International Conference on Knowledge Discovery and Data Mining, ACM, pp. 448–456, 2011.

Download references

Author information

Authors and Affiliations

Department of Informatics, Systems and Communication, University of Milano - Bicocca, Milan, Italy
Marco Rossetti, Fabio Stella & Francesca Arcelli fontana
Department of Bioscience and Territory, University of Molise, Campobasso, Italy
Remo Pareschi

Authors

Marco Rossetti
View author publications
You can also search for this author in PubMed Google Scholar
Remo Pareschi
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Stella
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Arcelli fontana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Rossetti.

About this article

Cite this article

Rossetti, M., Pareschi, R., Stella, F. et al. Integrating Concepts and Knowledge in Large Content Networks. New Gener. Comput. 32, 309–330 (2014). https://doi.org/10.1007/s00354-014-0407-4

Download citation

Received: 09 September 2013
Revised: 17 March 2014
Published: 27 August 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00354-014-0407-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Concepts and Knowledge in Large Content Networks

Abstract

Access this article

Similar content being viewed by others

Utilizing Big Data Analytics for Automatic Building of Language-agnostic Semantic Knowledge Bases

ConceptNet 5: A Large Semantic Network for Relational Knowledge

The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional Data

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Integrating Concepts and Knowledge in Large Content Networks

Abstract

Access this article

Similar content being viewed by others

Utilizing Big Data Analytics for Automatic Building of Language-agnostic Semantic Knowledge Bases

ConceptNet 5: A Large Semantic Network for Relational Knowledge

The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional Data

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation