ABSTRACT
Ontology-based document classification relies on background knowledge exploited by ontologies to represent documents. Background knowledge is embedded in a document using the exact matching technique. The basic idea of this technique is to map a term to a concept by searching only the concept labels that explicitly occur in a document. Searching only the presence of concept labels limits the capabilities to capture and exploit the whole conceptualization involved in user information and content meanings. Therefore, to address this limitation, we propose a new document classification model based on ontologies. The proposed model uses background knowledge derived by ontologies for document representation. It associates a document with a set of concepts by not only using the exact matching technique but also by identifying and extracting new terms which can be semantically related to the concepts of ontologies. Additionally, the proposed model employs a new concept weighting technique which computes the weight of a concept using the relevance and the importance of the concept. We conducted several experiments using a real ontology and a dataset to test our proposed model. The results obtained by experiments run on 3 different classification algorithms using the baseline ontology, the improved concept vector space model by using the new concept weighting technique, and the enriched ontology, show that our proposed model achieved a considerable improvement of classification performance.
- Wang, P., and Domeniconi, C. 2008. Building Semantic Kernels for Text Classification Using Wikipedia. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining, pp. 713--721. Google ScholarDigital Library
- Kastrati, Z., Imran, A., and Yayilgan, S. 2016. SEMCON - A Semantic and Contextual Objective Metric for Enriching Domain Ontology Concepts. International Journal on Semantic Web and Information Systems, vol. 12(2), pp. 1--24. Google ScholarDigital Library
- Kastrati, Z., Imran, A.S., and Yayilgan, S.Y. 2015. SEMCON: Semantic and Contextual Objective Metric. In Proceedings of the 9th IEEE International Conference on Semantic Computing, pp. 65--68. Google ScholarCross Ref
- Kastrati, Z., Imran, A., and Yayilgan, S.Y. 2015. An Improved Concept Vector Space Model for Ontology Based Classification. In Proceedings of the 11th International Conference on Signal Image Technology & Internet Systems, pp. 240--245. Google ScholarDigital Library
- Nyberg, K., Raiko, T., Tinanen, T., and Hyvonen E. 2010. Document Classification Utilising Ontologies and Relations between Documents. In Proceedings of the 8th Workshop on Mining and Learning with Graphs, pp.86-93. Google ScholarDigital Library
- Camous, F., Blott, S., and Smeaton, A. 2007. Ontology-Based MEDLINE Document Classification. In S. Hochreiter, & R. Wagner (Ed.), LNCS: Vol. 4414. Bioinformatics Research and Development, pp. 439--452. Google ScholarCross Ref
- Dinh, D., and Tamine, L. 2011. Biomedical Concept Extraction Based on Combining the Content-based and Word Order Similarities. In Proceedings of the ACM Symposium on Applied Computing, pp 1159--1163. Google ScholarDigital Library
- Sy, M-F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., and Ranwez, V. 2012. User Centered and Ontology Based Information Retrieval System for Life Sciences. BMC Bioinformatics, 13(1).Google Scholar
- Fang, J., Guo, L., and Niu, Y. 2010. Documents Classification by Using Ontology Reasoning and Similarity Measure. In Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1535--1539. Google ScholarCross Ref
- Keikha, M., Khonsari, A., and Oroumchian, F. 2009. Rich document representation and classification: An analysis. Knowledge-Based Systems, vol. 22(1), pp. 67--71. Google ScholarDigital Library
- Deng, S., and Peng, H. 2006. Document Classification Based on Support Vector Machine Using A Concept Vector Model. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 473--476. Google ScholarDigital Library
- Castells, P., Fernandez, M., and Vallet D. 2007. An Adaptation of the Vector Space Model for Ontology Based Information Retrieval. IEEE Transactions on Knowledge and data engineering, vol. 19(2), pp. 261--272. Google ScholarDigital Library
Index Terms
- Supervised Ontology-Based Document Classification Model
Recommendations
Automatically Enriching Domain Ontologies for Document Classification
WIMS '16: Proceedings of the 6th International Conference on Web Intelligence, Mining and SemanticsThe ontology-based document classification approach relies on the content meanings of a given domain exploited and captured using the ontologies of this particular domain. Domain ontologies consist of a set of concepts and relations which links these ...
Performance analysis of machine learning classifiers on improved concept vector space models
AbstractThis paper provides a comprehensive performance analysis of parametric and non-parametric machine learning classifiers including a deep feed-forward multi-layer perceptron (MLP) network on two variants of improved Concept Vector Space (...
An automatic approach to classify web documents using a domain ontology
PReMI'05: Proceedings of the First international conference on Pattern Recognition and Machine IntelligenceThis paper suggests an automated method for document classification using an ontology, which expresses terminology information and vocabulary contained in Web documents by way of a hierarchical structure. Ontologybased document classification involves ...
Comments