skip to main content
10.1145/3366423.3379995acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Ad Hoc Table Retrieval using Intrinsic and Extrinsic Similarities

Published: 20 April 2020 Publication History

Abstract

Given a keyword query, the ad hoc table retrieval task aims at retrieving a ranked list of the top-k most relevant tables in a given table corpus. Previous works have primarily focused on designing table-centric lexical and semantic features, which could be utilized for learning-to-rank (LTR) tables. In this work, we make a novel use of intrinsic (passage-based) and extrinsic (manifold-based) table similarities for enhanced retrieval. Using the WikiTables benchmark, we study the merits of utilizing such similarities for this task. To this end, we combine both similarity types via a simple, yet an effective, cascade re-ranking approach. Overall, our proposed approach results in a significantly better table retrieval quality, which even transcends that of strong semantically-rich baselines.

References

[1]
Michael Bendersky and Oren Kurland. 2008. Utilizing Passage-based Language Models for Document Retrieval. In Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval(ECIR’08). Springer-Verlag, Berlin, Heidelberg, 162–174. http://dl.acm.org/citation.cfm?id=1793274.1793297
[2]
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2013. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics(IDEA ’13). ACM, New York, NY, USA, 18–26. https://doi.org/10.1145/2501511.2501516
[3]
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: Entity Linking in Web Tables. In Proceedings of the 14th International Conference on The Semantic Web - ISWC 2015 - Volume 9366. Springer-Verlag, Berlin, Heidelberg, 425–441. https://doi.org/10.1007/978-3-319-25007-6_25
[4]
A. Bhattacharyya. 1946. On a Measure of Divergence between Two Multinomial Populations. Sankhyā: The Indian Journal of Statistics (1933-1960) 7, 4(1946), 401–406. http://www.jstor.org/stable/25047882
[5]
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: Exploring the Power of Tables on the Web. Proc. VLDB Endow. 1, 1 (Aug. 2008), 538–549. https://doi.org/10.14778/1453856.1453916
[6]
James P. Callan. 1994. Passage-level Evidence in Document Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’94). Springer-Verlag New York, Inc., New York, NY, USA, 302–310. http://dl.acm.org/citation.cfm?id=188490.188589
[7]
Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, and Paul Groth. 2019. Dataset search: a survey. The VLDB Journal (Aug 2019). https://doi.org/10.1007/s00778-019-00564-x
[8]
Kyle Yingkai Gao and Jamie Callan. 2017. Scientific Table Search Using Keyword Queries. CoRR abs/1707.03423(2017). arxiv:1707.03423http://arxiv.org/abs/1707.03423
[9]
Mathias Géry and Christine Largeron. 2012. BM25T: A BM25 Extension for Focused Information Retrieval. Knowl. Inf. Syst. 32, 1 (July 2012), 217–241. https://doi.org/10.1007/s10115-011-0426-0
[10]
Oren Kurland. 2014. The Cluster Hypothesis in Information Retrieval. In Advances in Information Retrieval. Springer International Publishing, Cham, 823–826.
[11]
Oren Kurland and J. Shane Culpepper. 2018. Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR ’18). ACM, New York, NY, USA, 1383–1386. https://doi.org/10.1145/3209978.3210186
[12]
Shangsong Liang, Ilya Markov, Zhaochun Ren, and Maarten de Rijke. 2018. Manifold Learning for Rank Aggregation. In Proceedings of the 2018 World Wide Web Conference(WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1735–1744. https://doi.org/10.1145/3178876.3186085
[13]
Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. Tablerank: A Ranking Algorithm for Table Search and Retrieval. In Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 1(AAAI’07). AAAI Press, 317–322. http://dl.acm.org/citation.cfm?id=1619645.1619696
[14]
Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries(JCDL ’07). ACM, New York, NY, USA, 91–100. https://doi.org/10.1145/1255175.1255193
[15]
Donald Metzler and W Bruce Croft. 2007. Linear feature-based models for information retrieval. Information Retrieval 10, 3 (2007), 257–274.
[16]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning(ICML’11). Omnipress, USA, 689–696. http://dl.acm.org/citation.cfm?id=3104482.3104569
[17]
Rakesh Pimplikar and Sunita Sarawagi. 2012. Answering Table Queries on the Web Using Column Keywords. Proc. VLDB Endow. 5, 10 (June 2012), 908–919. https://doi.org/10.14778/2336664.2336665
[18]
Jay M. Ponte and W. Bruce Croft. 1998. A Language Modeling Approach to Information Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’98). ACM, New York, NY, USA, 275–281. https://doi.org/10.1145/290941.291008
[19]
Pallavi Pyreddy and W. Bruce Croft. 1997. TINTIN: A System for Retrieval in Text Tables. In Proceedings of the Second ACM International Conference on Digital Libraries(DL ’97). ACM, New York, NY, USA, 193–200. https://doi.org/10.1145/263690.263816
[20]
Haggai Roitman. 2018. An Extended Query Performance Prediction Framework Utilizing Passage-Level Information. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval(ICTIR ’18). ACM, New York, NY, USA, 35–42. https://doi.org/10.1145/3234944.3234946
[21]
Haggai Roitman and Yosi Mass. [n.d.]. Utilizing Passages in Fusion-based Document Retrieval. In The 2019 ACM SIGIR International Conf. on the Theory of Information Retrieval,(ICTIR ’19). ACM, 8.
[22]
Divesh Srivastava. 2010. Schema Extraction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management(CIKM ’10). ACM, New York, NY, USA, 3–4. https://doi.org/10.1145/1871437.1871440
[23]
Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, and Xifeng Yan. 2016. Table Cell Search for Question Answering. In Proceedings of the 25th International Conference on World Wide Web(WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 771–782. https://doi.org/10.1145/2872427.2883080
[24]
Yibo Sun, Zhao Yan, Duyu Tang, Nan Duan, and Bing Qin. 2019. Content-based table retrieval for web queries. Neurocomputing 349(2019), 183 – 189. https://doi.org/10.1016/j.neucom.2018.10.033
[25]
Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval(SIGIR ’03). ACM, New York, NY, USA, 41–47. https://doi.org/10.1145/860435.860445
[26]
Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. VLDB Endow. 4, 9 (June 2011), 528–538. https://doi.org/10.14778/2002938.2002939
[27]
Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2008. Towards a Unified Approach to Document Similarity Search Using Manifold-ranking of Blocks. Inf. Process. Manage. 44, 3 (May 2008), 1032–1048. https://doi.org/10.1016/j.ipm.2007.07.012
[28]
Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’11). ACM, New York, NY, USA, 105–114. https://doi.org/10.1145/2009916.2009934
[29]
Bin Xu, Jiajun Bu, Chun Chen, Deng Cai, Xiaofei He, Wei Liu, and Jiebo Luo. 2011. Efficient Manifold Ranking for Image Retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’11). ACM, New York, NY, USA, 525–534. https://doi.org/10.1145/2009916.2009988
[30]
Liu Yang, Qingyao Ai, Damiano Spina, Ruey-Cheng Chen, Liang Pang, W. Bruce Croft, Jiafeng Guo, and Falk Scholer. 2016. Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval. In Advances in Information Retrieval. Springer International Publishing, Cham, 115–128.
[31]
Li Zhang, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, New York, NY, USA, 1029–1032. https://doi.org/10.1145/3331184.3331333
[32]
Shuo Zhang and Krisztian Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 World Wide Web Conference(WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1553–1562. https://doi.org/10.1145/3178876.3186067
[33]
Shuo Zhang and Krisztian Balog. 2019. Web Table Extraction, Retrieval and Augmentation. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, New York, NY, USA, 1409–1410. https://doi.org/10.1145/3331184.3331385
[34]
Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf. 2003. Ranking on Data Manifolds. In Proceedings of the 16th International Conference on Neural Information Processing Systems(NIPS’03). MIT Press, Cambridge, MA, USA, 169–176. http://dl.acm.org/citation.cfm?id=2981345.2981367
[35]
Xiaofei Zhu, Jiafeng Guo, and Xueqi Cheng. 2010. Recommending diverse and relevant queries with a manifold ranking based approach. In SIGIR’10 Workshop on Query Representation and Understanding.

Cited By

View all
  • (2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
  • (2024)Metadata-less Dataset Recommendation Leveraging Dataset Embeddings by Pre-trained Tabular Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825245(6604-6613)Online publication date: 15-Dec-2024
  • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '20
Sponsor:
WWW '20: The Web Conference 2020
April 20 - 24, 2020
Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)12
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
  • (2024)Metadata-less Dataset Recommendation Leveraging Dataset Embeddings by Pre-trained Tabular Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825245(6604-6613)Online publication date: 15-Dec-2024
  • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
  • (2023)Enhancing Table Retrieval with Dual Graph RepresentationsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43421-1_7(107-123)Online publication date: 18-Sep-2023
  • (2022)Matching news articles and wikipedia tables for news augmentationKnowledge and Information Systems10.1007/s10115-022-01815-065:4(1713-1734)Online publication date: 27-Dec-2022
  • (2021)Semantic Table Retrieval Using Keyword and Table QueriesACM Transactions on the Web10.1145/344169015:3(1-33)Online publication date: 13-May-2021
  • (2021)Retrieving Complex Tables with Multi-Granular Graph Representation LearningProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462909(1472-1482)Online publication date: 11-Jul-2021
  • (2021)Demo Paper: Ad Hoc Search On Statistical Data Based On Categorization And Metadata Augmentation2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR51284.2021.00043(231-234)Online publication date: Sep-2021
  • (2021)ConvTab: A Context-Preserving, Convolutional Model for Ad-Hoc Table Retrieval2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671828(5043-5052)Online publication date: 15-Dec-2021
  • (2020)Web Table Retrieval using Multimodal Deep LearningProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401120(1399-1408)Online publication date: 25-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media