ABSTRACT
High-throughput genetic sequencing produces the ultimate "big data": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting "big data" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.
- Agrawal, R. and Srikant, R. 2003. Searching with numbers. IEEE TKDE. 15, 4 (Aug. 2003), 855--870. Google ScholarDigital Library
- Ahrens, J.P. et al. 2011. Data-intensive science in the US DOE. CISE. 13, 6 (Dec. 2011), 14--24. Google ScholarDigital Library
- Altschul, S.F. et al. 1997. Gapped BLAST and PSI-BLAST. Nucleic acids res. 25, 17 (1997), 3389--3402.Google Scholar
- Cafarella, M.J. et al. 2008. Webtables: exploring the power of tables on the web. VLDB. 1, 1 (2008), 538--549. Google ScholarDigital Library
- CURSOR: http://cursor.businesscatalyst.com/index.html. Accessed: 2015-02-23.Google Scholar
- Krzywinski, M. et al. 2009. Circos: An information aesthetic for comparative genomics. Genome Research. 19, 9 (Sep. 2009), 1639--1645.Google ScholarCross Ref
- Maier, D. et al. 2012. Navigating oceans of data. Scientific and Statistical Database Management (2012), 1--19. Google ScholarDigital Library
- Martin Sanchez, F. et al. 2013. Exposome informatics. J. of Am. Medical Informatics Ass. 21, 3 (Nov. 2013), 386--390.Google Scholar
- Megler, V.M. 2014. Ranked Similarity Search of Scientific Datasets (PhD Dissertation). Portland State University.Google Scholar
- Megler, V.M. and Maier, D. 2015. Are Datasets Like Documents?. IEEE TKDE. 27, 1 (Jan. 2015), 32--45.Google Scholar
- Robinson, J.T. et al. 2011. Integrative Genomics Viewer. Nature Biotechnology. 29, (2011), 24--26.Google Scholar
- UCSC Genome Browser: http://genome.ucsc.edu/.Google Scholar
- Venetis, P. et al. 2011. Recovering semantics of tables on the web. Proceedings of VLDB. 4, 9 (2011), 528--538. Google ScholarDigital Library
- Weidman, S. and Arrison, T. 2009. Steps toward large-scale data integration in the sciences. NRC/NAGoogle Scholar
Index Terms
- Data Like This: Ranked Search of Genomic Data Vision Paper
Recommendations
Demonstrating "Data Near Here": Scientific Data Search
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataPrior work proposed "Data Near Here" (DNH), a data search engine for scientific archives that is modeled on Internet search engines. DNH performs a periodic, asynchronous scan of each dataset in an archive, extracting lightweight features that are ...
3D Graphical Representation of DNA Sequences and its Application for Long Sequence Searching over Whole Genomes
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyWith the development of Next Generation Sequencing techniques, the analysis of megabyte-sized whole genome sequence has been common. In general genome sequence comparison is conducted by alignment algorithm model. It is accurate, but assuming that the ...
When big data leads to lost data
PIKM '12: Proceedings of the 5th Ph.D. workshop on Information and knowledgeFor decades, scientists bemoaned the scarcity of observational data to analyze and against which to test their models. Exponential growth in data volumes from ever-cheaper environmental sensors has provided scientists with the answer to their prayers: "...
Comments