skip to main content
10.1145/1167350.1167393acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article

A methodology for semantic integration of metadata in bioinformatics data sources

Published: 18 March 2005 Publication History

Abstract

Semantic heterogeneity is becoming increasingly prominent in bioinformatics domains that deal with constantly expanding, dynamic, often very large, datasets from various distributed sources. Metadata is the key component for effective information integration. Traditional approaches for reconciling semantic heterogeneity use standards or mediation-based methods. These approaches have had limited success in addressing the general semantic heterogeneity problem and by themselves are not likely to succeed in bioinformatics domains where one faces the additional complexity of keeping pace with the speed at which data and semantic heterogeneity is being generated. This paper presents a methodology for reconciliation of semantic heterogeneity of metadata in bioinformatics data sources. The approach is based on the proposition that by globally monitoring, clustering, and visualizing bioinformatics metadata across disparately created data sources, patterns of practice can be identified. This can facilitate semantic reconciliation of metadata in current data and mitigate semantic heterogeneity in future data by promoting sharing and reuse of existing metadata. To instantiate the methodology, a research architecture, MicroSEEDS, is presented and its implementation and envisioned uses are discussed.

References

[1]
Batini, C., Lenzerini, M. and Navathe, S. B. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18, 4, December 1986, pp. 323--364.
[2]
Brooks, F. P. No Silver Bullet: Essence and Accidents of Software Engineering. Computer, 20 (4), pp. 10--19, 1987.
[3]
Chen, L., Jamil, H. M., and Wang, N. Automatic Composite Wrapper Generation for Semi-Structured Biological Data Based on Table Structure Identification. SIGMOD Record 33(2): 58--64, 2004.
[4]
Damsgaard, J. and Truex, D. Binary Trading Relations and the Limits of EDI Standards: The Procrustean Bed of Standards. European Journal of Information Systems, 9 (3), pp. 173--188, 2000.
[5]
Foster, I. and Grossman, R. L. Data Integration in a Bandwidth-rich World. Communications of the ACM, vol. 6, no. 11, November 2003. pp50--57.
[6]
Hernandez, T. and Kambhampati, S. Integration of Biological Sources: Current Systems and Challenges Ahead. SIGMOD Record 33(3): 51--60 (2004)
[7]
Jagadish, H. V. and Olken, F. Database Management for Life Science Research: Summary Report of the Workshop on Data Management for Molecular and Cell Biology at the National Library of Medicine, Bethesda, Maryland, February 2-3, 2003. OMICS A Journal of Integrative Biology, 7 (1), 2003.
[8]
Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review. ACM Computing Surveys, 31, 3, pp. 264--323, 1999.
[9]
Kohonen, T. Self-Organizing Maps, Springer-Verlag, Berlin, 1995.
[10]
Kuechler, D., Vaishnavi, V., and. Vandenberg, A. An Architecture to Support Communities of Interest Using Directory Services Capabilities. Proceedings Hawaii International Conference on System Sciences, Big Island, Hawaii, 2003.
[11]
Li, L., Vaishnavi, V., and Vandenberg, A. An Architecture for Semantic Facilitation and Reuse of Directory Metadata. Proc. 2004 International Conference on Information and Knowledge Engineering, Las Vegas, Nevada, 2004.
[12]
Liang, J., Vaishnavi, V., and Vandenberg, A. Clustering of LDAP Directory Schemas to Facilitate Information Resources Interoperability Across Organizations. IEEE Transactions on Systems, Man, and Cybernetics, Part A (to appear).
[13]
Liu, Y., Ciliax, B. J., Borges, K., Dasigi, V., Ram, A., Navathe, S., and Dingledine, R. Comparison of Two Schemes for Automatic Keyword Extraction from MEDLINE for Functional Gene Clustering. IEEE Conf. on Computational Systems Bioinformatics (CSB'2004), August 2004.
[14]
MGED: Microarray Gene Expression Data Society (MGED), home page, 2004. <http://www.mged.org> (Last accessed November 14, 2004).
[15]
Navathe, S. and Patil, U. Genomic and Proteomic Databases and Applications: "A challenge for Database Technology," Proc. 9th International Conference on Database Systems for Advanced Applications (DASFAA 2004), Jeju Island, Korea, March 2004, - Invited Paper.
[16]
Newman, H. B., Ellisman, M. H., and Orcutt J. A. Data-Intensive E-Science Frontier. Communications of the ACM, 46 (11), pp. 68--77, November 2003.
[17]
Panayiota, P. and Nicholas, D. Familiarity with and Use of Metadata Formats and Metadata Registries amongst Those Working in Diverse Professional Communities within the Information Sector. Aslib Proceedings, 53, 8, pp. 309--324, 2001
[18]
Roszkiewicz, R. Metadata in Context. The Seybold Report, vol. 4, no. 8, 2004.
[19]
Roussinov, D. Information Foraging Through Clustering and Summarization: A Self-Organizing Approach. A Dissertation Submitted to the Faculty of the Committee on Business Administration, the University of Arizona, 1999.
[20]
Shaw, C. D., Hall, J. A., Ebertc D. S., and Roberts, D. A., "Interactive Lens Visualization Techniques," in IEEE Visualization'99, pp. 155--160, October 1999.
[21]
Sheth, A., Gala, S. K., Navathe, S. B. On Automatic Reasoning for Schema Integration. Int. Journal of Intelligent Co-operative Information Systems, 2 (1), March 1993.
[22]
Stekel, D., Microarray Bioinformatics, Cambridge University Press, 2003.
[23]
Stoimenov, L., Djordjevic, K., Stojanovic, D. Integration of GIS Data Sources over the Internet Using Mediator and Wrapper Technology. Proceedings of the 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries (MeleCon 2000), pp. 334--336, 2000.
[24]
Vaishnavi, V. and Kuechler, W. Universal Enterprise Integration: The Challenges of and Approaches to Web-Enabled Virtual Organizations. Information Technology & Management, 6 (1), 2005, to appear.
[25]
Vandenberg, A., Liang, J., Bolet, V., Kou, H., Vaishnavi, V., and Kuechler, D. Research Prototype: Semantic Facilitator #8482; SM for LDAP Directory Services. Proceedings of the 12th Annual Workshop on Information Technologies and Systems, 2002.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '05 vol 1: Proceedings of the 43rd annual ACM Southeast Conference - Volume 1
March 2005
408 pages
ISBN:1595930590
DOI:10.1145/1167350
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bioinformatics
  2. clustering
  3. information integration
  4. metadata
  5. semantic heterogeneity

Qualifiers

  • Article

Conference

ACM SE05
Sponsor:
ACM SE05: ACM Southeast Regional Conference 2005
March 18 - 20, 2005
Georgia, Kennesaw

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 594
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media