Skip to main content
Log in

SPHeRe

A Performance Initiative Towards Ontology Matching by Implementing Parallelism over Cloud Platform

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The abundance of semantically related information has resulted in semantic heterogeneity. Ontology matching is among the utilized techniques implemented for semantic heterogeneity resolution; however, ontology matching being a computationally intensive problem can be a time-consuming process. Medium to large-scale ontologies can take from hours up to days of computation time depending upon the utilization of computational resources and complexity of matching algorithms. This delay in producing results, makes ontology matching unsuitable for semantic web-based interactive and semireal-time systems. This paper presents SPHeRe, a performance-based initiative that improves ontology matching performance by exploiting parallelism over multicore cloud platform. Parallelism has been overlooked by ontology matching systems. SPHeRe avails this opportunity and provides a solution by: (i) creating and caching serialized subsets of candidate ontologies with single-step parallel loading; (ii) lightweight matcher-based and redundancy-free subsets result in smaller memory footprints and faster load time; and (iii) implementing data parallelism based distribution over subsets of candidate ontologies by exploiting the multicore distributed hardware of cloud platform for parallel ontology matching and execution. Performance evaluation of SPHeRe on a trinode (12-core) private cloud infrastructure has shown up to 3 times faster ontology load time with up to 8 times smaller memory footprint than Web Ontology Language (OWL) frameworks Jena and OWLAPI. Furthermore, by utilizing the computation resources most efficiently, SPHeRe provides the best scalability in contrast with other ontology matching systems, i.e., GOMMA, LogMap, AROMA, and AgrMaker. On a private cloud instance with 8 cores, SPHeRe outperforms the most performance efficient ontology matching system GOMMA by 40 % in scalability and 4 times in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 5
Algorithm 6
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Accuracy aspect of SPHeRe is beyond the scope of this paper and has been catered in another publication; however, accuracy with performance results for FMA with NCI ontology are presented in Sect. 5.

References

  1. Agreement maker. http://agreementmaker.org/

  2. Apache Hadoop. https://hadoop.apache.org

  3. Apache Jena. http://jena.apache.org/

  4. AROMA project. http://aroma.gforge.inria.fr/

  5. Combinatorial optimization for data integration (CODI). http://code.google.com/p/codi-matcher/

  6. Concurrent access to models. http://jena.apache.org/documentation/notes/concurrency-howto.html

  7. Do you know your data size? Vladimir Roubtsov, JavaWorld. http://www.javaworld.com/javatips/jw-javatip130.html

  8. Dropbox—simplify your life. http://ontolog.cim3.net/file/work/OWL2/OWL-2_Tools-n-Applications/owl-api-presentation–MatthewHorridge_20100805.pdf

  9. Generic ontology matching and mapping management (GOMMA). http://dbs.uni-leipzig.de/GOMMA

  10. Hadooprdf. http://74.207.237.15/product/hadooprdf

  11. HDFS architecture guide. http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html

  12. HeartProposal. http://wiki.apache.org/incubator/HeartProposal

  13. How to get max memory, free memory and total memory in Java, Javin Paul, Javarevisited. http://javarevisited.blogspot.kr/2012/01/find-max-free-total-memory-in-java.html

  14. Java performance—memory and runtime analysis—tutorial, Lars Vogel, vogella.com. http://www.vogella.com/articles/JavaPerformance/article.html

  15. Jena, a framework for developing semantic web applications. http://semanticwebbuzz.blogspot.kr/2009/10/jena-framework-for-developing-semantic.html

  16. LogMap: logic-based methods for ontology mapping. http://www.cs.ox.ac.uk/isg/projects/LogMap/

  17. Reasoning-Hadoop. http://www.jacopourbani.it/reasoning-hadoop.html

  18. SNOMED clinical terms. http://bioportal.bioontology.org/ontologies/1353

  19. The OWL API. http://owlapi.sourceforge.net/

  20. Data partitioning for parallel entity matching. CoRR abs/1006.5309 (2010). http://dblp.uni-trier.de/db/journals/corr/corr1006.html#abs-1006-5309. Informal publication

  21. Large-scale interactive ontology matching: algorithms and implementation. In: ECAI’12, pp. 444–449 (2012)

  22. Amin M, Shafi A, Hussain S, Khan W, Lee S (2012) High performance Java sockets (HPJS) for scientific health clouds. In: 2012 IEEE 14th international conference on e-health networking, applications and services (Healthcom), pp 477–480. doi:10.1109/HealthCom.2012.6379466

    Chapter  Google Scholar 

  23. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58. doi:10.1145/1721654.1721672

    Article  Google Scholar 

  24. Bloch J (2008) Effective Java, 2nd edn. Addison-Wesley, Reading

    Google Scholar 

  25. Cheatham M (2011) MAPSSS results for oaei 2011. In: OM, CEUR workshop proceedings, vol 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#Cheatham11

    Google Scholar 

  26. David J, Guillet F, Briand H (2006) Matching directories and owl ontologies with aroma. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM ’06. ACM, New York, pp 830–831. doi:10.1145/1183614.1183752. http://doi.acm.org/10.1145/1183614.1183752

    Chapter  Google Scholar 

  27. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492

    Article  Google Scholar 

  28. Dilong Z (2001) Analysis of XML and COIN as solutions for data heterogeneity in insurance system Integration. Master’s thesis, Sloan School of Management, Massachusetts Institute of Technology (MIT), Cambridge, MA

  29. Doan A, Halevy A, Ives Z (2012) Principles of data integration. Addison-Wesley, Reading

    Google Scholar 

  30. Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Berlin (DE)

    MATH  Google Scholar 

  31. Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park

    Google Scholar 

  32. Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992). Knowledge discovery in databases: an overview

  33. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley/Longman, Boston

    Google Scholar 

  34. Gracia J, Mena E (2012) Semantic heterogeneity issues on the web. IEEE Internet Comput 16(5):60–67. doi:10.1109/MIC.2012.116

    Article  Google Scholar 

  35. Gross A, Hartung M, Kirsten T, Rahm E (2010) On matching large life science ontologies in parallel. In: Proceedings of the 7th international conference on data integration in the life sciences, DILS’10. Springer, Berlin, pp 35–49. http://dl.acm.org/citation.cfm?id=1884477.1884483

    Chapter  Google Scholar 

  36. Hakimpour F, Geppert A (2001) Resolving semantic heterogeneity in schema integration. In: Proceedings of the international conference on formal ontology in information systems, FOIS’01, vol 2001. ACM, New York, pp 297–308. doi:10.1145/505168.505196

    Chapter  Google Scholar 

  37. Hayamizu T, Mangan M, Corradi J, Kadin J, Ringwald M (2005) The adult mouse anatomical dictionary: a tool for annotating and integrating data. Genome Biol 6:1–8. http://dx.doi.org/10.1186/gb-2005-6-3-r29. doi:10.1186/gb-2005-6-3-r29

    Article  Google Scholar 

  38. Horridge M, Bechhofer S (2011) The owl api: a Java api for owl ontologies. Semant Web 2(1):11–21. http://dl.acm.org/citation.cfm?id=2019470.2019471

    Google Scholar 

  39. Hu W (2010) Falcon-AO. http://ws.nju.edu.cn/falcon-ao/

  40. Institute NC (2012) National Cancer Institute thesaurus. http://ncit.nci.nih.gov/

  41. Jiménez-Ruiz E, Grau BC (2011) Logmap: logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the semantic web, (ISWC’11). Springer, Berlin, pp 273–288. http://dl.acm.org/citation.cfm?id=2063016.2063035

    Google Scholar 

  42. Kazakov Y, Krötzsch M, Simancík F (2011) Concurrent classification of el ontologies. In: Proceedings of the 10th international conference on the semantic web, (ISWC’11). Springer, Berlin, pp 305–320. http://dl.acm.org/citation.cfm?id=2063016.2063037

    Google Scholar 

  43. Khattak AM, Latif K, Lee S (2013) Change management in evolving web ontologies. Knowl-Based Syst 37(0):1–18. doi:10.1016/j.knosys.2012.05.005. http://www.sciencedirect.com/science/article/pii/S0950705112001323

    Article  Google Scholar 

  44. Khattak AM, Pervez Z, Latif K, Lee S (2012) Time efficient reconciliation of mappings in dynamic web ontologies. Knowl-Based Syst 35:369–374. http://dblp.uni-trier.de/db/journals/kbs/kbs35.html#KhattakPLL12

    Article  Google Scholar 

  45. Kirsten T, Gross A, Hartung M, Rahm E (2011) Gomma: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J Biomed Semant 2:6

    Article  Google Scholar 

  46. Larman C (2004) Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development, 3rd edn. Prentice Hall, New York

    Google Scholar 

  47. LeBlanc T, Friedberg S (1985) Hpc: a model of structure and change in distributed systems. IEEE Trans Comput C-34(12):1114–1129. doi:10.1109/TC.1985.6312210

    Article  Google Scholar 

  48. Microsoft (2004) Applying Microsoft Patterns to Solve EAI Problems. http://msdn.microsoft.com/en-us/library/ee265635(v=bts.10).aspx

  49. Microsoft (2012) Microsoft BizTalk Server. http://www.microsoft.com/biztalk/en/us/default.aspx

  50. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88. doi:10.1145/375360.375365. http://doi.acm.org/10.1145/375360.375365

    Article  Google Scholar 

  51. Rauber T, Runger G (2010) Parallel programming for mulitcore and cluster systems. Springer, Berlin

    Book  Google Scholar 

  52. Rivest R (1992) The MD5 message-digest algorithm. Tech. rep. 1321, RFC editor, Fremont, CA, USA. http://www.rfc-editor.org/rfc/rfc1321.txt

  53. Ruiz EJ, Grau BC, Horrocks I (2012) Evaluating ontology matching systems on large, multilingual and real-world test cases. http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/

  54. Schadd FC, Roos N (2011) Maasmatch results for oaei 2011. In: OM, CEUR workshop proceedings, vol. 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#SchaddR11

  55. Seddiqui H, Aono M (2009) An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web semantics: science, services and agents on the world wide web 7(4). http://www.websemanticsjournal.org/index.php/ps/article/view/272

  56. Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 25(1):158–176. doi:10.1109/TKDE.2011.253

    Article  Google Scholar 

  57. Sioutos N, Coronado Sd, Haber MW, Hartel FW, Shaiu WL, Wright LW (2007) Nci thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40(1):30–43. http://dx.doi.org/10.1016/j.jbi.2006.02.013. doi:10.1016/j.jbi.2006.02.013

    Article  Google Scholar 

  58. Stoilos G, Stamou G, Kollias S (2005) A string metric for ontology alignment. In: Proceedings of the 4th international conference on the semantic web, ISWC’05. Springer, Berlin, pp 624–637. doi:10.1007/11574620_45. http://dx.doi.org/10.1007/11574620_45

    Google Scholar 

  59. Tran QV, Ichise R, Ho BQ (2011) Cluster-based similarity aggregation for ontology matching. In: OM, CEUR workshop proceedings, vol. 814. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/om2011.html#TranIH11

  60. University, P. (2013) WordNet A lexical database for English. http://wordnet.princeton.edu/

  61. Wang P, Xu B (2009) Lily: Ontology alignment results for oaei 2009

  62. University of Washington, S.o.M. (2007) Foundation Model of Anatomy. http://sig.biostr.washington.edu/projects/fm/

Download references

Acknowledgements

This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2013-(H0301-13-2001)).

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2010-0020725) and the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4006) supervised by the NIPA (National IT Industry Promotion Agency).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungyoung Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amin, M.B., Batool, R., Khan, W.A. et al. SPHeRe. J Supercomput 68, 274–301 (2014). https://doi.org/10.1007/s11227-013-1037-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1037-1

Keywords

Navigation