Skip to main content
Log in

Discovering meaning on the go in large heterogenous data

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The world is increasingly full of data. Organisations, governments and individuals are creating increasingly large data sources, and in many cases making them publicly available. This offers massive potential for interaction and mutual collaboration. But using this data often creates problems. Those creating the data will use their own terminology, structure and formats for the data, meaning that data from one source will be incompatible with data from another source. When presented with a large, unknown data source, it is very difficult to ascribe meaning to the terms of that data source, and to understand what is being conveyed. Much effort has been invested in data interpretation prior to run-time, with large data sources being matched against each other off-line. But data is often used dynamically, and so to maximise the value of the data it is necessary to extract meaning from it dynamically. We therefore postulate that an essential competent of utilising the world of data in which we increasingly live is the development of the ability to discover meaning on the go in large, heterogenous data.This paper provides an overview of the current state-of-the-art, reviewing the aims and achievements in different fields which can be applied to this problem. We take a brief look at cutting edge research in this field, summarising four papers published in the special issue of the AI Review on Discovering Meaning on the go in Large Heterogenous Data, and conclude with our thoughts about where research in this field is going, and what our priorities must be to enable us to move closer to achieving this goal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. See for example http://www.data.gov/ and http://data.gov.uk/.

  2. http://ijcai-11.iiia.csic.es/.

  3. http://dream.inf.ed.ac.uk/events/lhd-11/index.html.

  4. http://schema.org.

  5. These are the papers included in the Special Issue of the AI Review on Discovering Meaning on the go in Large Heterogeneous Data.

  6. http://www.wolframalpha.com/.

  7. http://blog.schema.org/2011/12/building-web-of-objects-at-yahoo.html.

  8. http://dig.csail.mit.edu/breadcrumbs/node/91.

  9. The size of the index of Sindice, the largest Linked Data search engine, as of September 2012. See http://sindice.com/.

  10. JavaScript Object Notation, a simple key-value pair notation, see www.json.org/ for details.

  11. http://www.wikidata.org/.

  12. http://www.schema.org.

  13. Usually called “vocabularies” to emphasize their social nature and lack of use of inference, as to distinguish them from heavy-weight description logic-based formalisms.

  14. From http://foaf.tv/tellyclub/schema.org/protovis-3.2/ex/den3.html.

  15. Schema.org deploys a HTML5 feature known as “microdata” to put markup into web-pages (Hickson 2012). Microdata is structurally similar to JSON insofar as it consists of markup that lets parts of web-pages be labeled as types of “item” that have key-value pair “item properties.” After much debate, schema.org also took on using a subset of RDFa, a way to embed RDF directly into web-pages as well (Adida et al. 2008). Although RDFa is much more flexible, it comes at the cost of being more confusing for web-masters.

  16. https://www.youtube.com/watch?v=_-6mhdjE1XE.

  17. http://www.ontologymatching.org/.

  18. This list of techniques is adapted from the figure on p. 65 of Euzenat and Shvaiko (2007).

  19. www.tptp.org.

  20. http://fallabs.com/kyotocabinet/.

  21. https://cassandra.apache.org/.

  22. From http://www.dbpedia.org/resource/Skye.

References

  • Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: Syntax and processing. W3C Recommendation, W3C, http://www.w3.org/TR/rdfa-syntax/

  • Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1–39

    Article  Google Scholar 

  • Auer S, Bizer C, Lehmann J, Kobilarov G, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Proceedings of the international and Asian semantic web conference (ISWC/ASWC2007), Busan, Korea, pp 718–728

  • Aurnhammer M, Hanappe P, Steels L (2006) Augmenting navigation for collaborative tagging with emergent semantics. In: Proceedings of the 5th international conference on the semantic web, ISWC’06, Springer, Berlin, Heidelberg, pp 58–71

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley-Longman, New York City

  • Baeza-Yates RA, Ciaramita M, Mika P, Zaragoza H (2008) Towards semantic search. In: Proceedings of conference on applications of natural language to information systems (NLDB), pp 4–11

  • Berners-Lee T (2007) Linked Data. Tech. rep., World Wide Web Consortium, Cambridge, Massachusetts, USA, http://www.w3.org/DesignIssues/LinkedData

  • Bizer C (2004) D2rq—treating non-rdf databases as virtual rdf graphs. In: Proceedings of the 3rd international semantic web conference (ISWC2004)

  • Blanco R, Halpin H, Herzig D, Mika P, Pound J, Thompson H, Duc TT (2011) Entity search evaluation over structured web data. In: Proceedings of the 1st international workshop on entity-oriented sarch workshop on entity-oriented search (SIGIR (2011) ACM, New York, NY, USA

  • Brickley D, Guha RV (2004) RDF Vocabulary Description Language 1.0: RDF Schema. Recommendation, W3C, http://www.w3.org/TR/rdf-schema/ (Last accessed on Nov 15th 2008)

  • Buneman P, Chapman A, Cheney J (2006) Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’06, pp 539–550, http://doi.acm.org/10.1145/1142473.1142534

  • Choi N, Song IY, Han H (2006) A survey on ontology mapping. SIGMOD Rec 35:34–41

    Article  Google Scholar 

  • Crestani F, Dominich S, Lalmas M, van Rijsbergen CJ (2003) Mathematical, logical and formal methods in information retrieval: an introduction to the special issue. J Am Soc Inf Sci Technol 54(4):281–284

    Article  Google Scholar 

  • Cudré-Mauroux P, Haghani P, Jost M, Aberer K, De Meer H (2009) idmesh: graph-based disambiguation of linked data. In: Proceedings of the 18th international conference on world wide web, ACM, New York, NY, USA, WWW ’09, pp 591–600

  • Euzenat J, Shvaiko P (2007) Ontology Matching. Springer, Berlin

    MATH  Google Scholar 

  • Euzenat J, Valtchev P (2004) Similarity-based ontology alignment in owl-lite. In: ECAI, pp 333–337

  • Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, dos Santos CT (2011) Ontology alignment evaluation initiative: six years of experience. J Data Semant 15:158–192

    Article  Google Scholar 

  • Fan W, Li J, Ma S, Tang N, Yu W (2010) Towards certain fixes with editing rules and master data. Proc VLDB Endow 3(1–2):173–184, http://dl.acm.org/citation.cfm?id=1920841.1920867

  • Fensel D (2001) Ontologies: a silver bullet for knowledge management and electronic commerce. Springer, London

    Google Scholar 

  • Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’11, pp 61–72

  • Gruber T (2004) Every ontology is a treaty. SIGSEMIS, Bulletin 1

  • Guha RV, Lenat D, (1993) Language, representation and contexts. J Inf Process 15(3):340–349

    Google Scholar 

  • Halpin H (2012) Social semantics: the search for meaning on the web. Springer, London

    Google Scholar 

  • Halpin H, Lavrenko V (2011) Relevance feedback between web search and the semantic web. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), Barcelona, Spain, pp 2250–2255

  • Halpin H, Hayes PJ, McCusker JP, McGuinness DL, Thompson HS (2010) When owl: sameas isn’t the same: an analysis of identity in linked data. In: Proceedings of the 9th international semantic web conference on the semantic web—vol Part I, Springer, Berlin, Heidelberg, ISWC’10, pp 305–320 http://dl.acm.org/citation.cfm?id=1940281.1940302

  • Havely A (2005) Why your data won’t mix. ACM Queue 3(8):50–58

    Google Scholar 

  • Hayes P (2004) RDF Semantics. Recommendation, W3C, http://www.w3.org/TR/rdf-mt/ (Last accessed Sept. 21st 2008)

  • Hickson I (2012) HTML Microdata. W3C Working Draft, W3C, http://www.w3.org/TR/2011/WD-microdata-20110525

  • Horrocks I, Patel-Schneider P, van Harmelen F (2003) From SHIQ and RDF to OWL: the making of a web ontology language. J Web Semant 1(1):17–26

    Article  Google Scholar 

  • Horrocks I, Parsia B, Patel-Schneider P, Hendler J (2005) Semantic web architecture: stack or two towers? In: Proceedings of the third international conference on principles and practice of semantic web reasoning, Springer, Berlin, Heidelberg, PPSWR’05, pp 37–41

  • Jones KS (2004) What’s new about the semantic web?: some questions. SIGIR Forum 38(2):18–23

    Article  Google Scholar 

  • Kalfoglou Y, Schorlemmer M (2003) Ontology mapping: the state of the art. Knowl Eng Rev 18(1):1–31

    Article  Google Scholar 

  • Klyne G, Carroll J (2004) Resource description framework (RDF): concepts and abstract syntax. Recommendation, W3C, http://www.w3.org/TR/rdf-concepts/

  • Kwok C, Etzioni O, Weld DS (2001) Scaling question answering to the web. ACM Trans Inf Syst 19(3): 242–262

    Google Scholar 

  • Mazzieri M, Dragoni AF, Marche UPD (2005) A fuzzy semantics for semantic web languages. In: Proceedings of workshop on uncertainty reasoning for the aemantic web (URSW) at the 4th international semantic web conference (ISWC), pp 12–22

  • McCarthy J, Hayes P (1969) Some philosophical problems from the standpoint of artificial intelligence. In: Meltzer B, Michie D (eds) Machine intelligence, vol 4. Edinburgh University Press, Edinburgh, pp 463–502

    Google Scholar 

  • Mika P (2008) Microsearch: an interface for semantic search. In: Proceedings of semantic search workshop at the European semantic web conference

  • Moraru A, Mladenic D, Vucnik M, Porcius M, Fortuna C, Mohorcic M (2011) Exposing real world information for the web of things. In: Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011, ACM, New York, NY, USA, IIWeb ’11, pp 6:1–6:6.

  • Noy NF (2004) Semantic integration: a survey of ontology-based approaches. SIGMOD Rec 33:65–70

    Article  Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web. Technical report 1999–66, Stanford InfoLab, http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120

  • Pentland A (2012) Society’s nervous system: building effective government, energy, and public health systems. IEEE Comput 45(1):31–38

    Article  Google Scholar 

  • Putnam H (1975) The meaning of meaning. In: Gunderson K (ed) Language, mind, and knowledge. University of Minnesota Press, Minneapolis

    Google Scholar 

  • Reiter R (1978) Logic and data bases. In: On closed world data bases. Plenum Publishing, New York City, New York

  • Shvaiko P, Euzenat J (2008) Ten challenges for ontology matching. In: OTM conferences (2), pp 1164–1182

  • Shvaiko P, Euzenat J (2012) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng (to appear)

  • Silverstein C, Marais H, Henzinger M, Moricz M (1999) Analysis of a very large web search engine query log. SIGIR Forum 33(1):6–12

    Article  Google Scholar 

  • Simperl E, Acosta M, Norton B (2012) A semantically enabled architecture for crowdsourced linked data management. In: CrowdSearch WWW2012 workshop proceedings, pp 9–14

  • Togia T (2010) Automated ontology evolution: semantic matching. MSc thesis, unpublished. Available on line at http://dream.inf.ed.ac.uk/projects/dor/

  • Von Ahn L (2005) Human computation. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, aAI3205378

  • Weinberger D (2007) Everything is miscellaneous: the power of the new digital disorder. Times Books, New York City

  • Wilks Y (2007) Karen Spärck Jones (1935–2007). IEEE Intell Syst 22(3):8–9

    Article  Google Scholar 

  • Wittgenstein L (1953) Philosophical investigations. Blackwell Publishers, London, United Kingdom (republished 2001)

  • Wun A, Petrovi M, Jacobsen HA (2007) A system for semantic data fusion in sensor networks. In: Proceedings of the 2007 inaugural international conference on distributed event-based systems, ACM, New York, NY, USA, DEBS ’07, pp 75–79

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fiona McNeill.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Halpin, H., McNeill, F. Discovering meaning on the go in large heterogenous data. Artif Intell Rev 40, 107–126 (2013). https://doi.org/10.1007/s10462-012-9377-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9377-4

Keywords

Navigation