Enabling Massive XML-Based Biological Data Management in HBase | IEEE Journals & Magazine | IEEE Xplore

Enabling Massive XML-Based Biological Data Management in HBase


Abstract:

Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable f...Show More

Abstract:

Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable format. In the era of big data, massive XML-based biological data management is emerged as a challengeable issue. With the continuous growth of the XML-based biological data sets, it is usually frustrating to use traditional declarative query languages to provide efficient query capabilities in terms of processing speed and scale. In this study, we report a novel platform to store and query massive XML-based biological data collections. A prototype tool for constructing HBase tables from XML-based biological data collections is first developed, and then a formal approach to transform the XML query model into the MapReduce query model is proposed. Finally, an evaluation of the query performance of the proposed approach on the existing XML-based biological databases is presented, showing that the performance advantages of the proposed solution. The source code of the massive XML-based biological data management platform is freely available at https://github.com/lyotvincent/X2H.
Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 17, Issue: 6, 01 Nov.-Dec. 2020)
Page(s): 1994 - 2004
Date of Publication: 10 May 2019

ISSN Information:

PubMed ID: 31094692

Funding Agency:


References

References is not available for this document.