Skip to main content

Text Categorization for Generation of a Historical Shipbuilding Ontology

  • Conference paper
Book cover Knowledge Engineering and the Semantic Web (KESW 2014)

Abstract

This paper deals with the task of developing a text corpus for the automatic generation of a historical shipbuilding domain ontology. Standard methods of analysis produce unsatisfactory results due to the limited nomenclature of available texts and lexical evolution of language. In this work, a parser developed by authors is used for lemmatization and word-sense disambiguation. The parser is based on an external classifier and provides the unambiguous relationship between each lexeme and class. The documents are represented as vectors in the topic space. The experiments show that the proposed method of categorization produces results very close to the expert opinion and at the same time is sufficiently resistant to the historical dynamics of the vocabulary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The CIDOC conceptual reference model (CRM), www.cidoc-crm.org/

  2. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)

    MATH  Google Scholar 

  3. Blei, D., Lafferty, J.: Topic models. Text Mining: Classification, Clustering, and Applications, 71–94 (2009)

    Google Scholar 

  4. Boyarsky, K.K., Kanevsky, E.A.: Rules language for creation of a syntactic tree. In: Internet and Modern Society: XIV All-Russian Joint Conference, pp. 233–237. Multi Project System Service Publishing, Sankt-Petersburg (2011)

    Google Scholar 

  5. Curti, O.: Modelli Navali. Encyclopedia del Modellismo Navale. Sudostrojenie Publishing (1977)

    Google Scholar 

  6. Gavrilova, T.A., Horoshevsky, V.F.: Knowledge bases of intellectual systems. Piter Publishing, Sankt-Petersburg (2000)

    Google Scholar 

  7. Isa, D., Kallimani, V.P., Lee, L.H.: Using the self organizing map for clustering of text documents. Expert Systems with Applications 36, 9584–9591 (2009)

    Article  Google Scholar 

  8. Kanevsky, E.A., Boyarsky, K.K.: Semantic-syntactical analyzer semsin. In: International Conference on Computational Linguistics Dialog 2012, Bekasovo, May 30-June 3 (2012), http://www.dialog-21.ru/digest/2012/?type=doc

  9. Karlgren, J., Cutting, D.: Recognizing text genres with simple metrics using discriminant analysis. In: Proc. 15th Int. Conf. on Computational Linguistics (COLING), Kyoto, vol. 2, pp. 1071–1075 (1994)

    Google Scholar 

  10. de Knijff, J., Frasincar, F., Hogenboom, F.: Domain taxonomy learning from text: The subsumption method versus hierarchical clustering. Data & Knowledge Engineering 83, 54–69 (2013)

    Article  Google Scholar 

  11. Korshunov, A., Gomzin, A.: Topic modeling in natural language texts. In: Works of Institute of System Design of the Russian Academy of Sciences (2012)

    Google Scholar 

  12. Lee, C.S., Kao, Y.F., Kuo, Y.H., Wang, M.H.: Automated ontology construction for unstructured text documents. Data & Knowledge Engineering 60, 547–566 (2007)

    Article  Google Scholar 

  13. Luo, C., Li, Y., Chung, S.M.: Text document clustering based on neighbors. Data & Knowledge Engineering 68, 1271–1288 (2009)

    Article  Google Scholar 

  14. Mashechkin, I.V., Petrovsky, M.I., Tsarov, D.: Methods of calculation of relevance of text fragments using topic models in a problem of automatic annotation. Computing Methods and Programming 14, 91–102 (2013)

    Google Scholar 

  15. Mozzherina, E.: Approach to improving the classification of the new york times annotated corpus. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 83–91. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Nasir, J.A., Varlamis, I., Karim, A., Tsatsaronis, G.: Semantic smoothing for text clustering. Knowledge-Based Systems 54, 216–229 (2013)

    Article  Google Scholar 

  17. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, pp. 100–108 (June 2010)

    Google Scholar 

  18. Nouman, A., JingTao, Y.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Systems with Applications 39, 4760–4768 (2012)

    Article  Google Scholar 

  19. Pinheiro, R., Cavalcanti, G., Correa, R., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Systems with Applications 39, 12851–12857 (2012)

    Article  Google Scholar 

  20. Romme, C.: L’Art de la marine, ou principes et prceptes gnraux de l’art de construire et d’armer les vaisseaux. Sea military school Publishing (1793, 1795)

    Google Scholar 

  21. Rubashkin, V.S.: Ontologic semantics. Knowledge. Ontologies. Ontologically focused methods of the information analysis of the text. Fizmatlit Publishing (2013)

    Google Scholar 

  22. Rykov, V.V.: Text corpus as realization of an object-oriented paradigm. In: Workshop Dialog 2002. Nauka Publishing (2002)

    Google Scholar 

  23. Song, W., Li, C.H., Park, S.C.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Systems with Applications 36, 9095–9104 (2009)

    Article  Google Scholar 

  24. Tuzov, V.A.: Computer semantics of Russian. Sankt-Petersburg State University (2004)

    Google Scholar 

  25. Varfolomeyev, A., Ivanovs, A.: Representation of historical sources on the semantic web by means of attempto controlled english. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 177–190. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  26. Vorontsov, K.B.: Probabilistic topic models of text documents collections, http://www.machinelearning.ru/wiki/images/7/7e/Voron-ML-TopicModels-slides.pdf

  27. de Vries, G., Malaisé, V., van Someren, M., Adriaans, P., Chreiber, G.: Semi-automatic ontology extension in the maritime domain. In: Proceedings of the Twentieth Belgian-Dutch Conference on Artificial Intelligence, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science, pp. 265–272 (2008), http://dare.uva.nl/en/record/315959

  28. Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Information Processing and Management 48, 741–754 (2012)

    Article  Google Scholar 

  29. Zagidulin, I.: Methods and means of an automatic text categorization (2008), http://www.cv.imm.uran.ru/uploads/f1/s/0/299/basic/7/858/Metodyi_i_sredstva_TK.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Artemova, G. et al. (2014). Text Categorization for Generation of a Historical Shipbuilding Ontology. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and the Semantic Web. KESW 2014. Communications in Computer and Information Science, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-319-11716-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11716-4_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11715-7

  • Online ISBN: 978-3-319-11716-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics