skip to main content
10.1145/1077501.1077508acmconferencesArticle/Chapter ViewAbstractPublication PagesiqisConference Proceedingsconference-collections
Article

Making quality count in biological data sources

Published: 17 June 2005 Publication History

Abstract

We propose an extension to the semistructured data model that captures and integrates information about the quality of the stored data. Specifically, we describe the main challenges involved in measuring and representing data quality, and how we addressed them. These challenges include extending an existing data model to include quality metadata, identifying useful quality measures, and devising a way to compute and update the value of the quality measures as data is queried and updated. Although our approach can be generalized to various other domains, it is currently aimed at describing the quality of biological data sources. We illustrate the benefits of our model using several examples from biological databases.

References

[1]
Abiteboul, S., Buneman P., Suciu, D. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.
[2]
AGAVE - Architecture for Genomic Annotation, Visualization and Exchange. Available at http://www.agavexml.org/
[3]
Ballou, D., Madnick, S., and Wang, R. Assuring Information Quality. Journal of Management Information Systems, 20, 3(2004), 9--11.
[4]
BSML -Bio Sequence Markup Language. Available at http://www.bsml.org/
[5]
Buneman, P. Semistructured Data. Proc. PODS '97. Tucson, Arizona (May 1997).
[6]
Buneman, S., Davison, S., Hillebrand, G., and Suciu, D. A query language and optimization techniques for unstructured data. Proceedings of the ACM SIGMOD International Conference on Management of Data. (1996), 505--516.
[7]
Calvanese, D., De Giacomo, G., and Lenzerini, M. Modeling and Querying Semi-Structured Data. Networking and Information Systems Journal, 2, 2(1999), 253--273.
[8]
DDBJ -DNA Data Bank of Japan. Available at http://www.ddbj.nig.ac.ip/
[9]
EMBL Nucleotide Sequence Database. Available at http://www.ebi.ac.uk/embl/
[10]
GenBank. Available at http://www.ncbi.nlm.nih.gov/Genbank/index.html
[11]
Hammer, J. and Pluempitiwiriyawej, C. Element matching across xml sources using a multi-strategy clustering technique. Data and Knowledge Engineering (DKE), Elsevier Science, 48 (2004), 297--333.
[12]
Lee, Y. W. and Strong, D. M. Knowing-Why About Data Processes and Data Quality. Journal of Management Information Systems, 20, 3 (Winter 2003-4), 13--39.
[13]
Lee, Y. W., Strong, D. M., Kahn, B. K., and Wang, R. Y. AIMQ: A methodology for information quality assessment. Information & Management, 40, 2(2002), 133--146.
[14]
McHug, J., Abiteboul, S., Goldman, R., Quass, D., and Widom, J. Lore: A database management system for semistructured data. SIGMOD Record, 26, 3(1997).
[15]
Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R., Catarci, T., Batini, C. Managing Data Quality in Cooperative Information Systems. Journal of Data Semantics, I (2003), LNCS 2800.
[16]
Mihaila, G., Raschid, L., Vidal, M. E. Querying "quality of data" metadata. Proc. of the Third IEEE Meta-Data Conference. Bethesda, Maryland (April 1999), 526--531.
[17]
Missier, P., Batini, C. A Multidimensional Model for Information Quality in Cooperative Information Systems. Proceedings of the Eighth International Conference on Information Quality (2003), 25--40.
[18]
Müller, H., Naumann, F., Freytag J. C. Data Quality in Genome Databases. Proceedings of the Eighth International Conference on Information Quality (2003), 269--284.
[19]
Naumann, F., Freytag J. C., Leser, U. Completeness of integrated information sources. Information Systems, 29, 7(2004), 583--615.
[20]
NCBI Reference Sequences. Available at http://www.ncbi.nlm.nih.gov/RefSeq/
[21]
Orr, K. Data Quality and Systems Theory. Communications of the ACM, 41, 2(1998), 66--71.
[22]
Pipino, L. L., Lee, Y. W., and Wang, R. Y. Data Quality Assessment. Communications of the ACM, 45, 4(2002), 211--218.
[23]
Scannapieco, M., Virgillito, A., Marchetti, M., Mecella, M., Baldoni, R. The DaQuinCIS Architecture: a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems. Information Systems, 29, 7(2004), 551--582.
[24]
Strong, D., Lee, Y., and Wang, R. Data quality in context. Communications of the ACM, 40, 5(1997), 103--110.
[25]
The Biopolymer Markup Language -BIOML, Working Draft Proposal. Available at http://www.proteome.ca/x-bang/bioml/b_toc.htm
[26]
Wand, Y. and Wang, R. Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39, 11(1996), 86--95.
[27]
Wang, R. Y., Reddy, M. P., and Kon, H. B. Toward quality data: An attribute-based approach. Decision Support Systems, 13 (1995), 349--372.
[28]
XEMBL. Available at http://www.ebi.ac.uk/xembl/

Cited By

View all
  • (2016)Models for Information QualityData and Information Quality10.1007/978-3-319-24106-7_6(137-154)Online publication date: 24-Mar-2016
  • (2011)Quality, trust, and utility of scientific data on the webProceedings of the 3rd International Web Science Conference10.1145/2527031.2527048(1-8)Online publication date: 15-Jun-2011
  • (2010)Quality assessment of MAGE-ML genomic datasets using DescribeXProceedings of the 7th international conference on Data integration in the life sciences10.5555/1884477.1884497(192-206)Online publication date: 25-Aug-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IQIS '05: Proceedings of the 2nd international workshop on Information quality in information systems
June 2005
116 pages
ISBN:1595931600
DOI:10.1145/1077501
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

IQIS05
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Models for Information QualityData and Information Quality10.1007/978-3-319-24106-7_6(137-154)Online publication date: 24-Mar-2016
  • (2011)Quality, trust, and utility of scientific data on the webProceedings of the 3rd International Web Science Conference10.1145/2527031.2527048(1-8)Online publication date: 15-Jun-2011
  • (2010)Quality assessment of MAGE-ML genomic datasets using DescribeXProceedings of the 7th international conference on Data integration in the life sciences10.5555/1884477.1884497(192-206)Online publication date: 25-Aug-2010
  • (2010)Quality Assessment of MAGE-ML Genomic Datasets Using DescribeXData Integration in the Life Sciences10.1007/978-3-642-15120-0_15(192-206)Online publication date: 2010
  • (2009)Incorporating Domain-Specific Information Quality Constraints into Database QueriesJournal of Data and Information Quality10.1145/1577840.15778461:2(1-31)Online publication date: 1-Sep-2009
  • (2007)QDexProceedings of the 2007 international conference on Web information systems engineering10.5555/1781503.1781506(5-16)Online publication date: 3-Dec-2007
  • (2007)QDex: A Database Profiler for Generic Bio-data Exploration and Quality Aware IntegrationWeb Information Systems Engineering – WISE 2007 Workshops10.1007/978-3-540-77010-7_2(5-16)Online publication date: 2007
  • (2006)Report from the First and Second International Workshops on Information Quality in Information SystemsACM SIGMOD Record10.1145/1147376.114738435:2(50-52)Online publication date: 1-Jun-2006
  • (2006)Tolerant ad hoc data propagation with error quantificationProceedings of the 2006 international conference on Current Trends in Database Technology10.1007/11896548_3(22-31)Online publication date: 26-Mar-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media