ABSTRACT
An increasing number of data sets are being published online, in institutional or government repositories as well as by individual researchers, journalists, and others. These data are often represented as tables of various kinds: however, repositories have poor search over and inside tables. It is difficult for a user to tell from a repository's portal whether a useful dataset is available, and this problem is only likely to get worse.
We describe this problem, and demonstrate that the naïve approach of full-text search is not appropriate. We describe an alternative, based on inferring types of data and indexing columns as a unit, and demonstrate some improvements in early success especially when long captions are not available.
- Marco D Adelfio and Hanan Samet. Schema extraction for tabular data on the web. Proc. VLDB Endowment, 6(6):421--432, 2013. Google ScholarDigital Library
- David W Embley, Matthew Hurst, Daniel Lopresti, and George Nagy. Table-processing paradigms: a research survey. Int. J of Document Analysis and Recognition, 8(2--3):66--86, 2006.Google ScholarCross Ref
- Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proc. JCDL, pages 91--100, 2007. Google ScholarDigital Library
- David Pinto, Michael Branstein, Ryan Coleman, W. Bruce Croft, Matthew King, Wei Li, and Xing Wei. QuASM: A system for question answering using semi-structured data. In Proc. JCDL, pages 46--55, 2002. Google ScholarDigital Library
- Pallavi Pyreddy and W Bruce Croft. TINTIN: A system for retrieval in text tables. In Proc. ACM Int. Conf. on Digital Libraries, pages 193--200, 1997. Google ScholarDigital Library
- Trevor Strohman, Donald Metzler, Howard Turtle, and W Bruce Croft. Indri: A language model-based search engine for complex queries. In Proc. Int. Conf. on Intelligent Analysis, volume 2, pages 2--6, 2005.Google Scholar
- Xing Wei, Bruce Croft, and Andrew McCallum. Table extraction for answer retrieval. Inf. Retrieval, 9:589--611, 2006. Google ScholarDigital Library
Index Terms
- Towards Searching Amongst Tables
Recommendations
Harnessing the potential of ICT for collective empowerment amongst the urban underserved communities
PDC '14: Proceedings of the 13th Participatory Design Conference: Short Papers, Industry Cases, Workshop Descriptions, Doctoral Consortium papers, and Keynote abstracts - Volume 2The PhD research focuses on two issues of actuality in information and communication technology for Development (ICT4D) research: the role of ICTs in cultivating collective empowerment and the place of private and public access to ICTs in relation to ...
Scholarly collaboration amongst researchers in Kenya: a social network analysis
MISNC '17: Proceedings of the 4th Multidisciplinary International Social Networks ConferenceIn spite of the general understanding that collaboration and communication between researchers enhance research impact, information on the interactions of researchers in Kenya is unclear. This study investigated the social networks which currently exist ...
Comments