Skip to main content

GEM: The GAAIN Entity Mapper

  • Conference paper
  • First Online:
Data Integration in the Life Sciences (DILS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9162))

Included in the following conference series:

Abstract

We present a software system solution that significantly simplifies data sharing of medical data. This system, called GEM (for the GAAIN Entity Mapper), harmonizes medical data. Harmonization is the process of unifying information across multiple disparate datasets needed to share and aggregate medical data. Specifically, our system automates the task of finding corresponding elements across different independently created (medical) datasets of related data. We present our overall approach, detailed technical architecture, and experimental evaluations demonstrating the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.gaain.org.

  2. 2.

    http://schemaspy.sourceforge.net.

  3. 3.

    http://www.altova.com.

  4. 4.

    http://www.talend.com.

  5. 5.

    http://www.informatica.com.

  6. 6.

    http://openii.sourceforge.net.

  7. 7.

    http://www.isi.edu/integration/karma/.

  8. 8.

    http://www.cihr-irsc.gc.ca.

References

  1. Ashish, N., Ambite, J.L., Muslea, M., Turner, J.A.: Neuroscience data integration through mediation: an (F)BIRN case study. Front. Neuroinform. 4:118 (2010). doi: 10.3389/fninf.2010.00118. PUBMED PMID: 21228907 PMCID: PMC3017358

  2. Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, June 2005

    Google Scholar 

  3. Beekly, D.L., Ramos, E.M., Lee, W.W., et al.: The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis. Assoc. Disord. 21, 249–258 (2007)

    Article  Google Scholar 

  4. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). doi:10.1145/2133806.2133826. http://doi.acm.org/10.1145/2133806.2133826

    Article  MathSciNet  Google Scholar 

  5. Bosch, T., Mathiak, B.: Generic multilevel approach designing domain ontologies based on XML schemas. In: Workshop Ontologies Come of Age in the Semantic Web, pp. 1–12 (2011)

    Google Scholar 

  6. Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Chaudhri, A.B., Jeckle, M., Rahm, E., Unland, R. (eds.) Web, Web-Services, and Database Systems 2002. LNCS, vol. 2593, pp. 221–237. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Amsterdam (2012)

    Google Scholar 

  8. Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: a machine-learning approach. In: ACM Sigmod Record, vol. 30, no. 2, pp. 509–520. ACM, May 2001

    Google Scholar 

  9. Garcia-Molina, H.: Database Systems: The Complete Book. Pearson Education, India (2008)

    Google Scholar 

  10. Halevy, A.Y., Ashish, N., Bitton, D., Carey, M., Draper, D., Pollock, J., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 778–787. ACM, June, 2005

    Google Scholar 

  11. Karlawish, J., Siderowf, A., Hurtig, H., Elman, L., McCluskey, L., Van Deerlin, V., Lee, V.M., Trojanowski, J.Q.: Building an integrated neurodegenerative disease database at an academic health center. Alzheimer’s Dement. 7, e84–e93 (2011). doi: 10.1016/j.jalz.2010.08

    Article  Google Scholar 

  12. Mandel, A.J., Kamerick, M., Berman, D., Dahm, L.: University of California Research eXchange (UCReX): a federated cohort discovery system. In: 2012 IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, p. 146 (2012)

    Google Scholar 

  13. Morris, J.C., Weintraub, S., Chui, H.C., Cummings, J., DeCarli, C., Ferris, S., Foster, N.L., Galasko, D., Graff-Radford, N., Peskind, E.R., Beekly, D., Ramos, E.M., Kukull, W.A.: The Uniform Data Set (UDS): clinical and cognitive variables and descriptive data from Alzheimer Disease Centers. Alzheimer Dis. Assoc. Disord. 20(4), 210–216 (2006)

    Article  Google Scholar 

  14. Morris, J.C., et al.: Developing an international network for Alzheimer’s research: the Dominantly Inherited Alzheimer Network. Clin. Invest. (Lond) 2(10), 975–984 (2012). PMCID: PMC3489185

    Article  Google Scholar 

  15. NDAR: National Database of Autism Research (2014). Web: http://ndar.nih.gov

  16. Ohmann, C., Kuchinke, W.: Future developments of medical informatics from the viewpoint of networked clinical research. Methods Inf. Med. 48(1), 45–54 (2009)

    Google Scholar 

  17. Shen, L., Thompson, P.M., Potkin, S.G., Bertram, L., Farrer, L.A., Foroud, T.M., Green, R.C., Hu, X., Huentelman, M.J., Kim, S., Kauwe, J.S., Li, Q., Liu, E., Macciardi, F., Moore, J.H., Munsie, L., Nho, K., Ramanan, V.K., Risacher, S.L., Stone, D.J., Swaminathan, S., Toga, A.W., Weiner, M.W., Saykin, A.J.: Generic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav. 8(2), 183–207 (2014)

    Article  Google Scholar 

  18. Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014). doi: 10.13053/CyS-18-3-2043. Accessed 7 October 2014

    Article  Google Scholar 

  19. Tata, S., Patel, J.: Estimating the selectivity of tf-idf based cosine similarity predicates. SIGMOD Rec. 36(2), 75–80 (2007)

    Article  Google Scholar 

  20. Wu, X., Li, J., Ayutyanont, N., Protas, H., Jagust, W., Fleisher, A., Reiman, E., Yao, L., Chen, K.: The receiver operational characteristic for binary classification with multiple indices and its application to the neuroimaging study of Alzheimer’s disease. IEEE/ACM Trans. Comput. Biol. Bioinf. 10, 173–180 (2013)

    Article  Google Scholar 

  21. Xie, S.X., Baek, Y., Grossman, M., Arnold, M.S., Weiner, M.W., Thal, L.J., Peterson, R.C., Jack, C., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L.: Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin. N. Am. 15(4), 869–877 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naveen Ashish .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ashish, N., Dewan, P., Ambite, JL., Toga, A.W. (2015). GEM: The GAAIN Entity Mapper. In: Ashish, N., Ambite, JL. (eds) Data Integration in the Life Sciences. DILS 2015. Lecture Notes in Computer Science(), vol 9162. Springer, Cham. https://doi.org/10.1007/978-3-319-21843-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21843-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21842-7

  • Online ISBN: 978-3-319-21843-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics