Abstract
The abundance of semantically related information has resulted in semantic heterogeneity. Ontology matching is among the utilized techniques implemented for semantic heterogeneity resolution; however, ontology matching being a computationally intensive problem can be a time-consuming process. Medium to large-scale ontologies can take from hours up to days of computation time depending upon the utilization of computational resources and complexity of matching algorithms. This delay in producing results, makes ontology matching unsuitable for semantic web-based interactive and semireal-time systems. This paper presents SPHeRe, a performance-based initiative that improves ontology matching performance by exploiting parallelism over multicore cloud platform. Parallelism has been overlooked by ontology matching systems. SPHeRe avails this opportunity and provides a solution by: (i) creating and caching serialized subsets of candidate ontologies with single-step parallel loading; (ii) lightweight matcher-based and redundancy-free subsets result in smaller memory footprints and faster load time; and (iii) implementing data parallelism based distribution over subsets of candidate ontologies by exploiting the multicore distributed hardware of cloud platform for parallel ontology matching and execution. Performance evaluation of SPHeRe on a trinode (12-core) private cloud infrastructure has shown up to 3 times faster ontology load time with up to 8 times smaller memory footprint than Web Ontology Language (OWL) frameworks Jena and OWLAPI. Furthermore, by utilizing the computation resources most efficiently, SPHeRe provides the best scalability in contrast with other ontology matching systems, i.e., GOMMA, LogMap, AROMA, and AgrMaker. On a private cloud instance with 8 cores, SPHeRe outperforms the most performance efficient ontology matching system GOMMA by 40 % in scalability and 4 times in performance.



















Similar content being viewed by others
Notes
Accuracy aspect of SPHeRe is beyond the scope of this paper and has been catered in another publication; however, accuracy with performance results for FMA with NCI ontology are presented in Sect. 5.
References
Agreement maker. http://agreementmaker.org/
Apache Hadoop. https://hadoop.apache.org
Apache Jena. http://jena.apache.org/
AROMA project. http://aroma.gforge.inria.fr/
Combinatorial optimization for data integration (CODI). http://code.google.com/p/codi-matcher/
Concurrent access to models. http://jena.apache.org/documentation/notes/concurrency-howto.html
Do you know your data size? Vladimir Roubtsov, JavaWorld. http://www.javaworld.com/javatips/jw-javatip130.html
Dropbox—simplify your life. http://ontolog.cim3.net/file/work/OWL2/OWL-2_Tools-n-Applications/owl-api-presentation–MatthewHorridge_20100805.pdf
Generic ontology matching and mapping management (GOMMA). http://dbs.uni-leipzig.de/GOMMA
Hadooprdf. http://74.207.237.15/product/hadooprdf
HDFS architecture guide. http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html
HeartProposal. http://wiki.apache.org/incubator/HeartProposal
How to get max memory, free memory and total memory in Java, Javin Paul, Javarevisited. http://javarevisited.blogspot.kr/2012/01/find-max-free-total-memory-in-java.html
Java performance—memory and runtime analysis—tutorial, Lars Vogel, vogella.com. http://www.vogella.com/articles/JavaPerformance/article.html
Jena, a framework for developing semantic web applications. http://semanticwebbuzz.blogspot.kr/2009/10/jena-framework-for-developing-semantic.html
LogMap: logic-based methods for ontology mapping. http://www.cs.ox.ac.uk/isg/projects/LogMap/
Reasoning-Hadoop. http://www.jacopourbani.it/reasoning-hadoop.html
SNOMED clinical terms. http://bioportal.bioontology.org/ontologies/1353
The OWL API. http://owlapi.sourceforge.net/
Data partitioning for parallel entity matching. CoRR abs/1006.5309 (2010). http://dblp.uni-trier.de/db/journals/corr/corr1006.html#abs-1006-5309. Informal publication
Large-scale interactive ontology matching: algorithms and implementation. In: ECAI’12, pp. 444–449 (2012)
Amin M, Shafi A, Hussain S, Khan W, Lee S (2012) High performance Java sockets (HPJS) for scientific health clouds. In: 2012 IEEE 14th international conference on e-health networking, applications and services (Healthcom), pp 477–480. doi:10.1109/HealthCom.2012.6379466
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58. doi:10.1145/1721654.1721672
Bloch J (2008) Effective Java, 2nd edn. Addison-Wesley, Reading
Cheatham M (2011) MAPSSS results for oaei 2011. In: OM, CEUR workshop proceedings, vol 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#Cheatham11
David J, Guillet F, Briand H (2006) Matching directories and owl ontologies with aroma. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM ’06. ACM, New York, pp 830–831. doi:10.1145/1183614.1183752. http://doi.acm.org/10.1145/1183614.1183752
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
Dilong Z (2001) Analysis of XML and COIN as solutions for data heterogeneity in insurance system Integration. Master’s thesis, Sloan School of Management, Massachusetts Institute of Technology (MIT), Cambridge, MA
Doan A, Halevy A, Ives Z (2012) Principles of data integration. Addison-Wesley, Reading
Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Berlin (DE)
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park
Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992). Knowledge discovery in databases: an overview
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley/Longman, Boston
Gracia J, Mena E (2012) Semantic heterogeneity issues on the web. IEEE Internet Comput 16(5):60–67. doi:10.1109/MIC.2012.116
Gross A, Hartung M, Kirsten T, Rahm E (2010) On matching large life science ontologies in parallel. In: Proceedings of the 7th international conference on data integration in the life sciences, DILS’10. Springer, Berlin, pp 35–49. http://dl.acm.org/citation.cfm?id=1884477.1884483
Hakimpour F, Geppert A (2001) Resolving semantic heterogeneity in schema integration. In: Proceedings of the international conference on formal ontology in information systems, FOIS’01, vol 2001. ACM, New York, pp 297–308. doi:10.1145/505168.505196
Hayamizu T, Mangan M, Corradi J, Kadin J, Ringwald M (2005) The adult mouse anatomical dictionary: a tool for annotating and integrating data. Genome Biol 6:1–8. http://dx.doi.org/10.1186/gb-2005-6-3-r29. doi:10.1186/gb-2005-6-3-r29
Horridge M, Bechhofer S (2011) The owl api: a Java api for owl ontologies. Semant Web 2(1):11–21. http://dl.acm.org/citation.cfm?id=2019470.2019471
Hu W (2010) Falcon-AO. http://ws.nju.edu.cn/falcon-ao/
Institute NC (2012) National Cancer Institute thesaurus. http://ncit.nci.nih.gov/
Jiménez-Ruiz E, Grau BC (2011) Logmap: logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the semantic web, (ISWC’11). Springer, Berlin, pp 273–288. http://dl.acm.org/citation.cfm?id=2063016.2063035
Kazakov Y, Krötzsch M, Simancík F (2011) Concurrent classification of el ontologies. In: Proceedings of the 10th international conference on the semantic web, (ISWC’11). Springer, Berlin, pp 305–320. http://dl.acm.org/citation.cfm?id=2063016.2063037
Khattak AM, Latif K, Lee S (2013) Change management in evolving web ontologies. Knowl-Based Syst 37(0):1–18. doi:10.1016/j.knosys.2012.05.005. http://www.sciencedirect.com/science/article/pii/S0950705112001323
Khattak AM, Pervez Z, Latif K, Lee S (2012) Time efficient reconciliation of mappings in dynamic web ontologies. Knowl-Based Syst 35:369–374. http://dblp.uni-trier.de/db/journals/kbs/kbs35.html#KhattakPLL12
Kirsten T, Gross A, Hartung M, Rahm E (2011) Gomma: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J Biomed Semant 2:6
Larman C (2004) Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development, 3rd edn. Prentice Hall, New York
LeBlanc T, Friedberg S (1985) Hpc: a model of structure and change in distributed systems. IEEE Trans Comput C-34(12):1114–1129. doi:10.1109/TC.1985.6312210
Microsoft (2004) Applying Microsoft Patterns to Solve EAI Problems. http://msdn.microsoft.com/en-us/library/ee265635(v=bts.10).aspx
Microsoft (2012) Microsoft BizTalk Server. http://www.microsoft.com/biztalk/en/us/default.aspx
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88. doi:10.1145/375360.375365. http://doi.acm.org/10.1145/375360.375365
Rauber T, Runger G (2010) Parallel programming for mulitcore and cluster systems. Springer, Berlin
Rivest R (1992) The MD5 message-digest algorithm. Tech. rep. 1321, RFC editor, Fremont, CA, USA. http://www.rfc-editor.org/rfc/rfc1321.txt
Ruiz EJ, Grau BC, Horrocks I (2012) Evaluating ontology matching systems on large, multilingual and real-world test cases. http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
Schadd FC, Roos N (2011) Maasmatch results for oaei 2011. In: OM, CEUR workshop proceedings, vol. 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#SchaddR11
Seddiqui H, Aono M (2009) An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web semantics: science, services and agents on the world wide web 7(4). http://www.websemanticsjournal.org/index.php/ps/article/view/272
Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 25(1):158–176. doi:10.1109/TKDE.2011.253
Sioutos N, Coronado Sd, Haber MW, Hartel FW, Shaiu WL, Wright LW (2007) Nci thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40(1):30–43. http://dx.doi.org/10.1016/j.jbi.2006.02.013. doi:10.1016/j.jbi.2006.02.013
Stoilos G, Stamou G, Kollias S (2005) A string metric for ontology alignment. In: Proceedings of the 4th international conference on the semantic web, ISWC’05. Springer, Berlin, pp 624–637. doi:10.1007/11574620_45. http://dx.doi.org/10.1007/11574620_45
Tran QV, Ichise R, Ho BQ (2011) Cluster-based similarity aggregation for ontology matching. In: OM, CEUR workshop proceedings, vol. 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#TranIH11
University, P. (2013) WordNet A lexical database for English. http://wordnet.princeton.edu/
Wang P, Xu B (2009) Lily: Ontology alignment results for oaei 2009
University of Washington, S.o.M. (2007) Foundation Model of Anatomy. http://sig.biostr.washington.edu/projects/fm/
Acknowledgements
This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2013-(H0301-13-2001)).
This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2010-0020725) and the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4006) supervised by the NIPA (National IT Industry Promotion Agency).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amin, M.B., Batool, R., Khan, W.A. et al. SPHeRe. J Supercomput 68, 274–301 (2014). https://doi.org/10.1007/s11227-013-1037-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1037-1