Skip to main content

Advertisement

Log in

Anomalies resolution and semantification of tabular data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The fast growth of the web generates a significant amount of heterogeneous information such as images, text, audio, and video through various applications. These applications use different layouts to represent significant information. The layouts of table information are overloaded with anomalies that have given rise to intensive research into the semantification of web content and organizing tabular data for knowledge sharing and acquisition. Moreover, there are many anomalies present in tabular layouts that lead to the lack of semantic representation in tabular form and new challenges in data modeling. In this paper, we have discussed the various anomalies present in the tabular data that pertain to ontology learning and population tasks and provide the semantification of tabular data. To complete this task, (1) we provide the list of anomalies that pertain to semantification and provide the resolution to anomalies along with the semantification of tabular data, and (2) we have established the algorithm to interpret the table structure into a formal representation to analyze anomalies and provide the resolution. Furthermore, the proposed approach has been compared with existing approaches using ontology elements, the ability to resolve the anomalies, and the time complexity of the ontology population.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Fig. 10

Similar content being viewed by others

Data Availability

The data are available upon reasonable request to the corresponding authors.

References

  1. Celjuska D, Vargas-Vera M (2004) Ontosophie: a semi-automatic system for ontology population from text. In: International conference on natural language processing (ICON). vol 60

  2. Ermilov I, Auer S, Stadler C (2013) Csv2rdf: user-driven csv to rdf mass conversion framework. In: Proceedings of the ISEM. vol 13, pp 04–06

  3. Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  4. Han L, Parr C, Sachs J, Joshi A, Finin T et al (2007) Rdf123: a mechanism to transform spreadsheets to rdf. Computer Science and Electrical Engineering, Technical Report, University of Maryland, Baltimore County

  5. Hazber MA, Li R, Li B, Zhao Y, Alalayah KM (2019) A survey: transformation for integrating relational database with semantic web. In: Proceedings of the 2019 3rd international conference on management engineering, software engineering and service sciences, pp 66–73

  6. Hurst M (2001) Layout and language: challenges for table understanding on the web. In: Proceedings of the international workshop on web document analysis, vol 8. Citeseer

  7. Jacob B, Ortiz J (2017) Data. world: a platform for global-scale semantic publishing. In: ISWC (Posters, Demos & Industry Tracks)

  8. Lakzaei B, Shamsfard M (2021) Ontology learning from relational databases. Inf Sci 577:280–297

    Article  MathSciNet  Google Scholar 

  9. Lamy JB (2017) Owlready: ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies. Artif Intell Med 80:11–28

    Article  Google Scholar 

  10. Langegger A, Wöß W (2009) Xlwrap–querying and integrating arbitrary spreadsheets with sparql. In: The semantic web-ISWC 2009: 8th international semantic web conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009. Proceedings 8. Springer, pp 359–374

  11. Lourdusamy R, Abraham S (2020) A survey on methods of ontology learning from text. In: Intelligent computing paradigm and cutting-edge technologies: proceedings of the first international conference on innovative computing and cutting-edge technologies (ICICCT 2019), Istanbul, Turkey, October 30–31, 2019 1. Springer, pp 113–123

  12. Ma C, Molnár B (2020) Use of ontology learning in information system integration: a literature survey. In: Intelligent information and database systems: 12th Asian conference, ACIIDS 2020, Phuket, Thailand, March 23–26, 2020, Proceedings 12. Springer, pp 342–353

  13. Maedche A, Staab S (2004) Ontology learning. In: Handbook on ontologies. Springer, pp 173–190

  14. McDowell LK, Cafarella M (2008) Ontology-driven, unsupervised instance population. J Web Semant 6(3):218–236

    Article  Google Scholar 

  15. Nederstigt LJ, Aanen SS, Vandic D, Frasincar F (2014) Floppies: a framework for large-scale ontology population of product information from tabular data in e-commerce stores. Decis Support Syst 59:296–311

    Article  Google Scholar 

  16. Özacar T (2016) A tool for producing structured interoperable data from product features on the web. Inf Syst 56:36–54

    Article  Google Scholar 

  17. Ozturk O (2020) Oppcat: ontology population from tabular data. J Inf Sci 46(2):161–175

    Article  MathSciNet  Google Scholar 

  18. O’connor MJ, Halaschek-Wiener C, Musen MA (2010) Mapping master: a flexible approach for mapping spreadsheets to owl. In: International semantic web conference. Springer, pp 194–208

  19. Patel C, Supekar K, Lee Y (2003) Ontogenie: extracting ontology instances from www. Human language technology for the semantic web and web services, ISWC 3

  20. Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E (2011) Ontology population and enrichment: state of the art. In: Paliouras, G, Spyropoulos, CD, Tsatsaronis G (eds) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20795-2_6

  21. Shamsfard M, Barforoush AA (2003) The state of the art in ontology learning: a framework for comparison. Knowl Eng Rev 18(4):293–316

    Article  Google Scholar 

  22. Sharma K, Marjit U, Biswas U (2015) Automatically converting tabular data to RDF: an Ontological approach. IJSWIS 6(3):71–86. https://doi.org/10.5121/ijwest.2015.6306

  23. Sharma S, Jain S (2023) The coronavirus disease ontology (covido). In: Semantic intelligence: select proceedings of ISIC 2022. Springer, pp 89–103

  24. Sharma S, Jain S (2024) Covido: an ontology for covid-19 metadata. J Supercomput 80(1):1238–1267

    Article  Google Scholar 

  25. Sharma S, Jain S (2024) Ontoxai: a semantic web rule language approach for dengue fever classification using explainable ai and ontology. Available at SSRN 4726837

  26. Sharma S, Jain S (2024) The semantics of covid-19 web data: ontology learning and population. Curr Mater Sci Former Recent Patents Mater Sci 17(1):44–64

    Google Scholar 

  27. Singh AK, Kumar J (2023) A privacy-preserving multidimensional data aggregation scheme with secure query processing for smart grid. J Supercomput 79(4):3750–3770

    Article  Google Scholar 

  28. Skjæveland MG, Forssell H, Klüwer JW, Lupp D, Thorstensen E, Waaler A (2017) Pattern-Based ontology design and instantiation with reasonable ontology templates. Proceedings of the 8th Workshop on Ontology Design and Patterns (WOP 2017) 69. http://ceur-ws.org/Vol-2043/paper-04.pdf

  29. Tanaka M, Ishida T (2006) Ontology extraction from tables on the web. In: International symposium on applications and the internet (SAINT’06). IEEE, pp 7–pp

  30. Tijerino YA, Embley DW, Lonsdale DW, Ding Y, Nagy G (2005) Towards ontology generation from tables. World Wide Web 8:261–285

    Article  Google Scholar 

  31. Vu HT, Nguyen MT, Nguyen VC, Pham MH, Nguyen VQ, Nguyen VH (2023) Label-representative graph convolutional network for multi-label text classification. Appl Intell 53(12):14759–14774

    Article  Google Scholar 

  32. Zahera HM, Heindorf S, Balke S, Haupt J, Voigt M, Walter C, Witter F, Ngonga Ngomo AC (2022) Tab2onto: unsupervised semantification with knowledge graph embeddings. In: European semantic web conference. Springer, pp 47–51

  33. Zhang L, Li J (2011) Automatic generation of ontology based on database. J Comput Inf Syst 7(4):1148–1154

    Google Scholar 

  34. Zhang X, Di R, Feng X (2012) Ontology based data conversion from spreadsheet to owl. In: 2012 seventh China grid annual conference. IEEE, pp 76–79

Download references

Acknowledgements

The authors would like to thank the National Institute of Technology, Kurukshetra, India, for financially supporting the research work.

Funding

National Institute of Technology, Kurukshetra, India, funded this research under the Institute fellowship.

Author information

Authors and Affiliations

Authors

Contributions

Both the authors have discussed and constructed the ideas, designed the ontology model and wrote the paper together.

Corresponding author

Correspondence to Sumit Sharma.

Ethics declarations

Conflict of interest

The authors have no conflict of interest regarding the publication.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S., Jain, S. Anomalies resolution and semantification of tabular data. J Supercomput 80, 18081–18117 (2024). https://doi.org/10.1007/s11227-024-06147-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06147-0

Keywords