Abstract
The world is increasingly full of data. Organisations, governments and individuals are creating increasingly large data sources, and in many cases making them publicly available. This offers massive potential for interaction and mutual collaboration. But using this data often creates problems. Those creating the data will use their own terminology, structure and formats for the data, meaning that data from one source will be incompatible with data from another source. When presented with a large, unknown data source, it is very difficult to ascribe meaning to the terms of that data source, and to understand what is being conveyed. Much effort has been invested in data interpretation prior to run-time, with large data sources being matched against each other off-line. But data is often used dynamically, and so to maximise the value of the data it is necessary to extract meaning from it dynamically. We therefore postulate that an essential competent of utilising the world of data in which we increasingly live is the development of the ability to discover meaning on the go in large, heterogenous data.This paper provides an overview of the current state-of-the-art, reviewing the aims and achievements in different fields which can be applied to this problem. We take a brief look at cutting edge research in this field, summarising four papers published in the special issue of the AI Review on Discovering Meaning on the go in Large Heterogenous Data, and conclude with our thoughts about where research in this field is going, and what our priorities must be to enable us to move closer to achieving this goal.
Similar content being viewed by others
Notes
See for example http://www.data.gov/ and http://data.gov.uk/.
These are the papers included in the Special Issue of the AI Review on Discovering Meaning on the go in Large Heterogeneous Data.
The size of the index of Sindice, the largest Linked Data search engine, as of September 2012. See http://sindice.com/.
JavaScript Object Notation, a simple key-value pair notation, see www.json.org/ for details.
Usually called “vocabularies” to emphasize their social nature and lack of use of inference, as to distinguish them from heavy-weight description logic-based formalisms.
Schema.org deploys a HTML5 feature known as “microdata” to put markup into web-pages (Hickson 2012). Microdata is structurally similar to JSON insofar as it consists of markup that lets parts of web-pages be labeled as types of “item” that have key-value pair “item properties.” After much debate, schema.org also took on using a subset of RDFa, a way to embed RDF directly into web-pages as well (Adida et al. 2008). Although RDFa is much more flexible, it comes at the cost of being more confusing for web-masters.
This list of techniques is adapted from the figure on p. 65 of Euzenat and Shvaiko (2007).
References
Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: Syntax and processing. W3C Recommendation, W3C, http://www.w3.org/TR/rdfa-syntax/
Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1–39
Auer S, Bizer C, Lehmann J, Kobilarov G, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Proceedings of the international and Asian semantic web conference (ISWC/ASWC2007), Busan, Korea, pp 718–728
Aurnhammer M, Hanappe P, Steels L (2006) Augmenting navigation for collaborative tagging with emergent semantics. In: Proceedings of the 5th international conference on the semantic web, ISWC’06, Springer, Berlin, Heidelberg, pp 58–71
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley-Longman, New York City
Baeza-Yates RA, Ciaramita M, Mika P, Zaragoza H (2008) Towards semantic search. In: Proceedings of conference on applications of natural language to information systems (NLDB), pp 4–11
Berners-Lee T (2007) Linked Data. Tech. rep., World Wide Web Consortium, Cambridge, Massachusetts, USA, http://www.w3.org/DesignIssues/LinkedData
Bizer C (2004) D2rq—treating non-rdf databases as virtual rdf graphs. In: Proceedings of the 3rd international semantic web conference (ISWC2004)
Blanco R, Halpin H, Herzig D, Mika P, Pound J, Thompson H, Duc TT (2011) Entity search evaluation over structured web data. In: Proceedings of the 1st international workshop on entity-oriented sarch workshop on entity-oriented search (SIGIR (2011) ACM, New York, NY, USA
Brickley D, Guha RV (2004) RDF Vocabulary Description Language 1.0: RDF Schema. Recommendation, W3C, http://www.w3.org/TR/rdf-schema/ (Last accessed on Nov 15th 2008)
Buneman P, Chapman A, Cheney J (2006) Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’06, pp 539–550, http://doi.acm.org/10.1145/1142473.1142534
Choi N, Song IY, Han H (2006) A survey on ontology mapping. SIGMOD Rec 35:34–41
Crestani F, Dominich S, Lalmas M, van Rijsbergen CJ (2003) Mathematical, logical and formal methods in information retrieval: an introduction to the special issue. J Am Soc Inf Sci Technol 54(4):281–284
Cudré-Mauroux P, Haghani P, Jost M, Aberer K, De Meer H (2009) idmesh: graph-based disambiguation of linked data. In: Proceedings of the 18th international conference on world wide web, ACM, New York, NY, USA, WWW ’09, pp 591–600
Euzenat J, Shvaiko P (2007) Ontology Matching. Springer, Berlin
Euzenat J, Valtchev P (2004) Similarity-based ontology alignment in owl-lite. In: ECAI, pp 333–337
Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, dos Santos CT (2011) Ontology alignment evaluation initiative: six years of experience. J Data Semant 15:158–192
Fan W, Li J, Ma S, Tang N, Yu W (2010) Towards certain fixes with editing rules and master data. Proc VLDB Endow 3(1–2):173–184, http://dl.acm.org/citation.cfm?id=1920841.1920867
Fensel D (2001) Ontologies: a silver bullet for knowledge management and electronic commerce. Springer, London
Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’11, pp 61–72
Gruber T (2004) Every ontology is a treaty. SIGSEMIS, Bulletin 1
Guha RV, Lenat D, (1993) Language, representation and contexts. J Inf Process 15(3):340–349
Halpin H (2012) Social semantics: the search for meaning on the web. Springer, London
Halpin H, Lavrenko V (2011) Relevance feedback between web search and the semantic web. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), Barcelona, Spain, pp 2250–2255
Halpin H, Hayes PJ, McCusker JP, McGuinness DL, Thompson HS (2010) When owl: sameas isn’t the same: an analysis of identity in linked data. In: Proceedings of the 9th international semantic web conference on the semantic web—vol Part I, Springer, Berlin, Heidelberg, ISWC’10, pp 305–320 http://dl.acm.org/citation.cfm?id=1940281.1940302
Havely A (2005) Why your data won’t mix. ACM Queue 3(8):50–58
Hayes P (2004) RDF Semantics. Recommendation, W3C, http://www.w3.org/TR/rdf-mt/ (Last accessed Sept. 21st 2008)
Hickson I (2012) HTML Microdata. W3C Working Draft, W3C, http://www.w3.org/TR/2011/WD-microdata-20110525
Horrocks I, Patel-Schneider P, van Harmelen F (2003) From SHIQ and RDF to OWL: the making of a web ontology language. J Web Semant 1(1):17–26
Horrocks I, Parsia B, Patel-Schneider P, Hendler J (2005) Semantic web architecture: stack or two towers? In: Proceedings of the third international conference on principles and practice of semantic web reasoning, Springer, Berlin, Heidelberg, PPSWR’05, pp 37–41
Jones KS (2004) What’s new about the semantic web?: some questions. SIGIR Forum 38(2):18–23
Kalfoglou Y, Schorlemmer M (2003) Ontology mapping: the state of the art. Knowl Eng Rev 18(1):1–31
Klyne G, Carroll J (2004) Resource description framework (RDF): concepts and abstract syntax. Recommendation, W3C, http://www.w3.org/TR/rdf-concepts/
Kwok C, Etzioni O, Weld DS (2001) Scaling question answering to the web. ACM Trans Inf Syst 19(3): 242–262
Mazzieri M, Dragoni AF, Marche UPD (2005) A fuzzy semantics for semantic web languages. In: Proceedings of workshop on uncertainty reasoning for the aemantic web (URSW) at the 4th international semantic web conference (ISWC), pp 12–22
McCarthy J, Hayes P (1969) Some philosophical problems from the standpoint of artificial intelligence. In: Meltzer B, Michie D (eds) Machine intelligence, vol 4. Edinburgh University Press, Edinburgh, pp 463–502
Mika P (2008) Microsearch: an interface for semantic search. In: Proceedings of semantic search workshop at the European semantic web conference
Moraru A, Mladenic D, Vucnik M, Porcius M, Fortuna C, Mohorcic M (2011) Exposing real world information for the web of things. In: Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011, ACM, New York, NY, USA, IIWeb ’11, pp 6:1–6:6.
Noy NF (2004) Semantic integration: a survey of ontology-based approaches. SIGMOD Rec 33:65–70
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web. Technical report 1999–66, Stanford InfoLab, http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120
Pentland A (2012) Society’s nervous system: building effective government, energy, and public health systems. IEEE Comput 45(1):31–38
Putnam H (1975) The meaning of meaning. In: Gunderson K (ed) Language, mind, and knowledge. University of Minnesota Press, Minneapolis
Reiter R (1978) Logic and data bases. In: On closed world data bases. Plenum Publishing, New York City, New York
Shvaiko P, Euzenat J (2008) Ten challenges for ontology matching. In: OTM conferences (2), pp 1164–1182
Shvaiko P, Euzenat J (2012) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng (to appear)
Silverstein C, Marais H, Henzinger M, Moricz M (1999) Analysis of a very large web search engine query log. SIGIR Forum 33(1):6–12
Simperl E, Acosta M, Norton B (2012) A semantically enabled architecture for crowdsourced linked data management. In: CrowdSearch WWW2012 workshop proceedings, pp 9–14
Togia T (2010) Automated ontology evolution: semantic matching. MSc thesis, unpublished. Available on line at http://dream.inf.ed.ac.uk/projects/dor/
Von Ahn L (2005) Human computation. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, aAI3205378
Weinberger D (2007) Everything is miscellaneous: the power of the new digital disorder. Times Books, New York City
Wilks Y (2007) Karen Spärck Jones (1935–2007). IEEE Intell Syst 22(3):8–9
Wittgenstein L (1953) Philosophical investigations. Blackwell Publishers, London, United Kingdom (republished 2001)
Wun A, Petrovi M, Jacobsen HA (2007) A system for semantic data fusion in sensor networks. In: Proceedings of the 2007 inaugural international conference on distributed event-based systems, ACM, New York, NY, USA, DEBS ’07, pp 75–79
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Halpin, H., McNeill, F. Discovering meaning on the go in large heterogenous data. Artif Intell Rev 40, 107–126 (2013). https://doi.org/10.1007/s10462-012-9377-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9377-4