Skip to main content

Ontology Construction Based on Latent Topic Extraction in a Digital Library

  • Conference paper
Digital Libraries: Universal and Ubiquitous Access to Information (ICADL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5362))

Included in the following conference series:

Abstract

This paper discusses the automatic ontology construction process in a digital library. Traditional automatic ontology construction uses hierarchical clustering to group similar terms, and the result hierarchy is usually not satisfactory for human’s recognition. Human-provided knowledge network presents strong semantic features, but this generation process is both labor-intensive and inconsistent under large scale scenario. The method proposed in this paper combines the statistical correction and latent topic extraction of textual data in a digital library, which produces a semantic-oriented and OWL-based ontology. The experimental document collection used here is the Chinese Recorder, which served as a link between the various missions that were part of the rise and heyday of the Western effort to Christianize the Far East. The ontology construction process is described and a final ontology in OWL format is shown in our result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yeh, J.-H., Sie, S.-h.: Towards automatic concept hierarchy generation for specific knowledge network. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 982–989. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Chen, C.-c., Yeh, J.-H., Sie, S.-h.: Government ontology and thesaurus construction: A taiwanese experience. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds.) ICADL 2005, vol. 3815, pp. 263–272. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Deborah, L., McGuinness, Harmelen, F.v.: OWL Web Ontology Language Overview. W3C Recommendation (February 2004), http://www.w3.org/TR/owl-features/

  4. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating Your First Ontology (2001)

    Google Scholar 

  5. The Chinese Recorder, Scholarly Resources, Inc, 1867-1941

    Google Scholar 

  6. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  7. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)

    Article  Google Scholar 

  8. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning (1997)

    Google Scholar 

  9. Li, F., Yang, Y.: A loss function analysis for classification methods in text categorization. In: The Twentieth International Conference on Machine Learning (ICML 2003), pp. 472–479 (2003)

    Google Scholar 

  10. Valdes-Perez, R.E., et al.: Demonstration of Hierarchical Document Clustering of Digital Library Retrieval Results. In: Joint Conference on Digital Libraries (JDCL 2001), Roanoke, VA, June 24-28 (2001)(presented as a demonstration)

    Google Scholar 

  11. Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: ACM SIGIR 2003, pp. 96–103 (2003)

    Google Scholar 

  12. Widyantoro, D., Ioerger, T.R., Yen, J.: An Incremental Approach to Building a Cluster Hierarchy. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002 (2002)

    Google Scholar 

  13. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  14. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3(5), 993–1022 (2003)

    MATH  Google Scholar 

  16. Girolami, M., Kaban, A.: On an equivalence between PLSI and LDA. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–434 (2003)

    Google Scholar 

  17. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  18. Collins, M.: A new statistical parser based on bigram lexical dependencies. In: Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics, Santa Cruz, CA, pp. 184–191 (1996)

    Google Scholar 

  19. British National Corpus, http://www.natcorp.ox.ac.uk/

  20. Lodwick, K.L.: The Chinese Recorder Index: a guide to Christian Missions in Asia, 1867–1941. Scholarly Resources Inc., Wilmington (1986)

    Google Scholar 

  21. Noy, N.F., Fergerson, R.W., Musen, M.A.: The knowledge model of protégé-2000: Combining interoperability and flexibility. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 17–32. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  22. Yeh, J.-h., Sie, S.-h.: Common Ontology Generation with Partially Available Side Information through Similarity Propagation. In: Proceedings of the 2007 International Conference on Semantic Web and Web Services(SWWS 2007), Las Vegas, USA (June 2007)

    Google Scholar 

  23. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yeh, Jh., Yang, N. (2008). Ontology Construction Based on Latent Topic Extraction in a Digital Library. In: Buchanan, G., Masoodian, M., Cunningham, S.J. (eds) Digital Libraries: Universal and Ubiquitous Access to Information. ICADL 2008. Lecture Notes in Computer Science, vol 5362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89533-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89533-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89532-9

  • Online ISBN: 978-3-540-89533-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics