Abstract
The fast growth of the web generates a significant amount of heterogeneous information such as images, text, audio, and video through various applications. These applications use different layouts to represent significant information. The layouts of table information are overloaded with anomalies that have given rise to intensive research into the semantification of web content and organizing tabular data for knowledge sharing and acquisition. Moreover, there are many anomalies present in tabular layouts that lead to the lack of semantic representation in tabular form and new challenges in data modeling. In this paper, we have discussed the various anomalies present in the tabular data that pertain to ontology learning and population tasks and provide the semantification of tabular data. To complete this task, (1) we provide the list of anomalies that pertain to semantification and provide the resolution to anomalies along with the semantification of tabular data, and (2) we have established the algorithm to interpret the table structure into a formal representation to analyze anomalies and provide the resolution. Furthermore, the proposed approach has been compared with existing approaches using ontology elements, the ability to resolve the anomalies, and the time complexity of the ontology population.
















Similar content being viewed by others
Data Availability
The data are available upon reasonable request to the corresponding authors.
References
Celjuska D, Vargas-Vera M (2004) Ontosophie: a semi-automatic system for ontology population from text. In: International conference on natural language processing (ICON). vol 60
Ermilov I, Auer S, Stadler C (2013) Csv2rdf: user-driven csv to rdf mass conversion framework. In: Proceedings of the ISEM. vol 13, pp 04–06
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Han L, Parr C, Sachs J, Joshi A, Finin T et al (2007) Rdf123: a mechanism to transform spreadsheets to rdf. Computer Science and Electrical Engineering, Technical Report, University of Maryland, Baltimore County
Hazber MA, Li R, Li B, Zhao Y, Alalayah KM (2019) A survey: transformation for integrating relational database with semantic web. In: Proceedings of the 2019 3rd international conference on management engineering, software engineering and service sciences, pp 66–73
Hurst M (2001) Layout and language: challenges for table understanding on the web. In: Proceedings of the international workshop on web document analysis, vol 8. Citeseer
Jacob B, Ortiz J (2017) Data. world: a platform for global-scale semantic publishing. In: ISWC (Posters, Demos & Industry Tracks)
Lakzaei B, Shamsfard M (2021) Ontology learning from relational databases. Inf Sci 577:280–297
Lamy JB (2017) Owlready: ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies. Artif Intell Med 80:11–28
Langegger A, Wöß W (2009) Xlwrap–querying and integrating arbitrary spreadsheets with sparql. In: The semantic web-ISWC 2009: 8th international semantic web conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009. Proceedings 8. Springer, pp 359–374
Lourdusamy R, Abraham S (2020) A survey on methods of ontology learning from text. In: Intelligent computing paradigm and cutting-edge technologies: proceedings of the first international conference on innovative computing and cutting-edge technologies (ICICCT 2019), Istanbul, Turkey, October 30–31, 2019 1. Springer, pp 113–123
Ma C, Molnár B (2020) Use of ontology learning in information system integration: a literature survey. In: Intelligent information and database systems: 12th Asian conference, ACIIDS 2020, Phuket, Thailand, March 23–26, 2020, Proceedings 12. Springer, pp 342–353
Maedche A, Staab S (2004) Ontology learning. In: Handbook on ontologies. Springer, pp 173–190
McDowell LK, Cafarella M (2008) Ontology-driven, unsupervised instance population. J Web Semant 6(3):218–236
Nederstigt LJ, Aanen SS, Vandic D, Frasincar F (2014) Floppies: a framework for large-scale ontology population of product information from tabular data in e-commerce stores. Decis Support Syst 59:296–311
Özacar T (2016) A tool for producing structured interoperable data from product features on the web. Inf Syst 56:36–54
Ozturk O (2020) Oppcat: ontology population from tabular data. J Inf Sci 46(2):161–175
O’connor MJ, Halaschek-Wiener C, Musen MA (2010) Mapping master: a flexible approach for mapping spreadsheets to owl. In: International semantic web conference. Springer, pp 194–208
Patel C, Supekar K, Lee Y (2003) Ontogenie: extracting ontology instances from www. Human language technology for the semantic web and web services, ISWC 3
Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E (2011) Ontology population and enrichment: state of the art. In: Paliouras, G, Spyropoulos, CD, Tsatsaronis G (eds) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20795-2_6
Shamsfard M, Barforoush AA (2003) The state of the art in ontology learning: a framework for comparison. Knowl Eng Rev 18(4):293–316
Sharma K, Marjit U, Biswas U (2015) Automatically converting tabular data to RDF: an Ontological approach. IJSWIS 6(3):71–86. https://doi.org/10.5121/ijwest.2015.6306
Sharma S, Jain S (2023) The coronavirus disease ontology (covido). In: Semantic intelligence: select proceedings of ISIC 2022. Springer, pp 89–103
Sharma S, Jain S (2024) Covido: an ontology for covid-19 metadata. J Supercomput 80(1):1238–1267
Sharma S, Jain S (2024) Ontoxai: a semantic web rule language approach for dengue fever classification using explainable ai and ontology. Available at SSRN 4726837
Sharma S, Jain S (2024) The semantics of covid-19 web data: ontology learning and population. Curr Mater Sci Former Recent Patents Mater Sci 17(1):44–64
Singh AK, Kumar J (2023) A privacy-preserving multidimensional data aggregation scheme with secure query processing for smart grid. J Supercomput 79(4):3750–3770
Skjæveland MG, Forssell H, Klüwer JW, Lupp D, Thorstensen E, Waaler A (2017) Pattern-Based ontology design and instantiation with reasonable ontology templates. Proceedings of the 8th Workshop on Ontology Design and Patterns (WOP 2017) 69. http://ceur-ws.org/Vol-2043/paper-04.pdf
Tanaka M, Ishida T (2006) Ontology extraction from tables on the web. In: International symposium on applications and the internet (SAINT’06). IEEE, pp 7–pp
Tijerino YA, Embley DW, Lonsdale DW, Ding Y, Nagy G (2005) Towards ontology generation from tables. World Wide Web 8:261–285
Vu HT, Nguyen MT, Nguyen VC, Pham MH, Nguyen VQ, Nguyen VH (2023) Label-representative graph convolutional network for multi-label text classification. Appl Intell 53(12):14759–14774
Zahera HM, Heindorf S, Balke S, Haupt J, Voigt M, Walter C, Witter F, Ngonga Ngomo AC (2022) Tab2onto: unsupervised semantification with knowledge graph embeddings. In: European semantic web conference. Springer, pp 47–51
Zhang L, Li J (2011) Automatic generation of ontology based on database. J Comput Inf Syst 7(4):1148–1154
Zhang X, Di R, Feng X (2012) Ontology based data conversion from spreadsheet to owl. In: 2012 seventh China grid annual conference. IEEE, pp 76–79
Acknowledgements
The authors would like to thank the National Institute of Technology, Kurukshetra, India, for financially supporting the research work.
Funding
National Institute of Technology, Kurukshetra, India, funded this research under the Institute fellowship.
Author information
Authors and Affiliations
Contributions
Both the authors have discussed and constructed the ideas, designed the ontology model and wrote the paper together.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest regarding the publication.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, S., Jain, S. Anomalies resolution and semantification of tabular data. J Supercomput 80, 18081–18117 (2024). https://doi.org/10.1007/s11227-024-06147-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06147-0