research-article

Ad Hoc Table Retrieval using Intrinsic and Extrinsic Similarities

Authors:

Haggai Roitman,

Guy Feigenblat,

Mustafa CanimAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 2479 - 2485

https://doi.org/10.1145/3366423.3379995

Published: 20 April 2020 Publication History

Abstract

Given a keyword query, the ad hoc table retrieval task aims at retrieving a ranked list of the top-k most relevant tables in a given table corpus. Previous works have primarily focused on designing table-centric lexical and semantic features, which could be utilized for learning-to-rank (LTR) tables. In this work, we make a novel use of intrinsic (passage-based) and extrinsic (manifold-based) table similarities for enhanced retrieval. Using the WikiTables benchmark, we study the merits of utilizing such similarities for this task. To this end, we combine both similarity types via a simple, yet an effective, cascade re-ranking approach. Overall, our proposed approach results in a significantly better table retrieval quality, which even transcends that of strong semantically-rich baselines.

References

[1]

Michael Bendersky and Oren Kurland. 2008. Utilizing Passage-based Language Models for Document Retrieval. In Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval(ECIR’08). Springer-Verlag, Berlin, Heidelberg, 162–174. http://dl.acm.org/citation.cfm?id=1793274.1793297

Digital Library

[2]

Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2013. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics(IDEA ’13). ACM, New York, NY, USA, 18–26. https://doi.org/10.1145/2501511.2501516

Digital Library

[3]

Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: Entity Linking in Web Tables. In Proceedings of the 14th International Conference on The Semantic Web - ISWC 2015 - Volume 9366. Springer-Verlag, Berlin, Heidelberg, 425–441. https://doi.org/10.1007/978-3-319-25007-6_25

Digital Library

[4]

A. Bhattacharyya. 1946. On a Measure of Divergence between Two Multinomial Populations. Sankhyā: The Indian Journal of Statistics (1933-1960) 7, 4(1946), 401–406. http://www.jstor.org/stable/25047882

[5]

Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: Exploring the Power of Tables on the Web. Proc. VLDB Endow. 1, 1 (Aug. 2008), 538–549. https://doi.org/10.14778/1453856.1453916

Digital Library

[6]

James P. Callan. 1994. Passage-level Evidence in Document Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’94). Springer-Verlag New York, Inc., New York, NY, USA, 302–310. http://dl.acm.org/citation.cfm?id=188490.188589

Digital Library

[7]

Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, and Paul Groth. 2019. Dataset search: a survey. The VLDB Journal (Aug 2019). https://doi.org/10.1007/s00778-019-00564-x

Digital Library

[8]

Kyle Yingkai Gao and Jamie Callan. 2017. Scientific Table Search Using Keyword Queries. CoRR abs/1707.03423(2017). arxiv:1707.03423http://arxiv.org/abs/1707.03423

[9]

Mathias Géry and Christine Largeron. 2012. BM25T: A BM25 Extension for Focused Information Retrieval. Knowl. Inf. Syst. 32, 1 (July 2012), 217–241. https://doi.org/10.1007/s10115-011-0426-0

Digital Library

[10]

Oren Kurland. 2014. The Cluster Hypothesis in Information Retrieval. In Advances in Information Retrieval. Springer International Publishing, Cham, 823–826.

[11]

Oren Kurland and J. Shane Culpepper. 2018. Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR ’18). ACM, New York, NY, USA, 1383–1386. https://doi.org/10.1145/3209978.3210186

Digital Library

[12]

Shangsong Liang, Ilya Markov, Zhaochun Ren, and Maarten de Rijke. 2018. Manifold Learning for Rank Aggregation. In Proceedings of the 2018 World Wide Web Conference(WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1735–1744. https://doi.org/10.1145/3178876.3186085

Digital Library

[13]

Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. Tablerank: A Ranking Algorithm for Table Search and Retrieval. In Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 1(AAAI’07). AAAI Press, 317–322. http://dl.acm.org/citation.cfm?id=1619645.1619696

[14]

Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries(JCDL ’07). ACM, New York, NY, USA, 91–100. https://doi.org/10.1145/1255175.1255193

Digital Library

[15]

Donald Metzler and W Bruce Croft. 2007. Linear feature-based models for information retrieval. Information Retrieval 10, 3 (2007), 257–274.

Digital Library

[16]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning(ICML’11). Omnipress, USA, 689–696. http://dl.acm.org/citation.cfm?id=3104482.3104569

[17]

Rakesh Pimplikar and Sunita Sarawagi. 2012. Answering Table Queries on the Web Using Column Keywords. Proc. VLDB Endow. 5, 10 (June 2012), 908–919. https://doi.org/10.14778/2336664.2336665

Digital Library

[18]

Jay M. Ponte and W. Bruce Croft. 1998. A Language Modeling Approach to Information Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’98). ACM, New York, NY, USA, 275–281. https://doi.org/10.1145/290941.291008

Digital Library

[19]

Pallavi Pyreddy and W. Bruce Croft. 1997. TINTIN: A System for Retrieval in Text Tables. In Proceedings of the Second ACM International Conference on Digital Libraries(DL ’97). ACM, New York, NY, USA, 193–200. https://doi.org/10.1145/263690.263816

Digital Library

[20]

Haggai Roitman. 2018. An Extended Query Performance Prediction Framework Utilizing Passage-Level Information. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval(ICTIR ’18). ACM, New York, NY, USA, 35–42. https://doi.org/10.1145/3234944.3234946

Digital Library

[21]

Haggai Roitman and Yosi Mass. [n.d.]. Utilizing Passages in Fusion-based Document Retrieval. In The 2019 ACM SIGIR International Conf. on the Theory of Information Retrieval,(ICTIR ’19). ACM, 8.

[22]

Divesh Srivastava. 2010. Schema Extraction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management(CIKM ’10). ACM, New York, NY, USA, 3–4. https://doi.org/10.1145/1871437.1871440

[23]

Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, and Xifeng Yan. 2016. Table Cell Search for Question Answering. In Proceedings of the 25th International Conference on World Wide Web(WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 771–782. https://doi.org/10.1145/2872427.2883080

Digital Library

[24]

Yibo Sun, Zhao Yan, Duyu Tang, Nan Duan, and Bing Qin. 2019. Content-based table retrieval for web queries. Neurocomputing 349(2019), 183 – 189. https://doi.org/10.1016/j.neucom.2018.10.033

Digital Library

[25]

Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval(SIGIR ’03). ACM, New York, NY, USA, 41–47. https://doi.org/10.1145/860435.860445

Digital Library

[26]

Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. VLDB Endow. 4, 9 (June 2011), 528–538. https://doi.org/10.14778/2002938.2002939

Digital Library

[27]

Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2008. Towards a Unified Approach to Document Similarity Search Using Manifold-ranking of Blocks. Inf. Process. Manage. 44, 3 (May 2008), 1032–1048. https://doi.org/10.1016/j.ipm.2007.07.012

[28]

Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’11). ACM, New York, NY, USA, 105–114. https://doi.org/10.1145/2009916.2009934

Digital Library

[29]

Bin Xu, Jiajun Bu, Chun Chen, Deng Cai, Xiaofei He, Wei Liu, and Jiebo Luo. 2011. Efficient Manifold Ranking for Image Retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’11). ACM, New York, NY, USA, 525–534. https://doi.org/10.1145/2009916.2009988

Digital Library

[30]

Liu Yang, Qingyao Ai, Damiano Spina, Ruey-Cheng Chen, Liang Pang, W. Bruce Croft, Jiafeng Guo, and Falk Scholer. 2016. Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval. In Advances in Information Retrieval. Springer International Publishing, Cham, 115–128.

[31]

Li Zhang, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, New York, NY, USA, 1029–1032. https://doi.org/10.1145/3331184.3331333

Digital Library

[32]

Shuo Zhang and Krisztian Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 World Wide Web Conference(WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1553–1562. https://doi.org/10.1145/3178876.3186067

Digital Library

[33]

Shuo Zhang and Krisztian Balog. 2019. Web Table Extraction, Retrieval and Augmentation. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, New York, NY, USA, 1409–1410. https://doi.org/10.1145/3331184.3331385

Digital Library

[34]

Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf. 2003. Ranking on Data Manifolds. In Proceedings of the 16th International Conference on Neural Information Processing Systems(NIPS’03). MIT Press, Cambridge, MA, USA, 169–176. http://dl.acm.org/citation.cfm?id=2981345.2981367

Digital Library

[35]

Xiaofei Zhu, Jiafeng Guo, and Xueqi Cheng. 2010. Recommending diverse and relevant queries with a manifold ranking based approach. In SIGIR’10 Workshop on Query Representation and Understanding.

Cited By

Fan GShraga RMiller R(2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00272
Manabe KFujita YKuwahara MHayashi T(2024)Metadata-less Dataset Recommendation Leveraging Dataset Embeddings by Pre-trained Tabular Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825245(6604-6613)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825245
Joseph MRavana S(2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377239
Show More Cited By

Index Terms

Ad Hoc Table Retrieval using Intrinsic and Extrinsic Similarities
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Ad Hoc Table Retrieval using Semantic Similarity
WWW '18: Proceedings of the 2018 World Wide Web Conference

We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based ...
Semantic Table Retrieval Using Keyword and Table Queries
Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this problem in two ...
Qualitative measures for ad hoc table retrieval
Abstract
The focus of our work is the ad hoc table retrieval task, which aims to rank a list of structured tabular objects in response to a user query. Given the importance of this task, various methods have already been proposed in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
482
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)12

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan GShraga RMiller R(2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00272
Manabe KFujita YKuwahara MHayashi T(2024)Metadata-less Dataset Recommendation Leveraging Dataset Embeddings by Pre-trained Tabular Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825245(6604-6613)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825245
Joseph MRavana S(2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3377239
Liu TZhang XZhang ZWang YLi QZhang SLiu T(2023)Enhancing Table Retrieval with Dual Graph RepresentationsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43421-1_7(107-123)Online publication date: 18-Sep-2023
https://doi.org/10.1007/978-3-031-43421-1_7
Silva LBarbosa L(2022)Matching news articles and wikipedia tables for news augmentationKnowledge and Information Systems10.1007/s10115-022-01815-065:4(1713-1734)Online publication date: 27-Dec-2022
https://doi.org/10.1007/s10115-022-01815-0
Zhang SBalog K(2021)Semantic Table Retrieval Using Keyword and Table QueriesACM Transactions on the Web10.1145/344169015:3(1-33)Online publication date: 13-May-2021
https://dl.acm.org/doi/10.1145/3441690
Wang FSun KChen MPujara JSzekely PDiaz FShah CSuel TCastells PJones RSakai T(2021)Retrieving Complex Tables with Multi-Granular Graph Representation LearningProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462909(1472-1482)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462909
Okamoto TMiyamori H(2021)Demo Paper: Ad Hoc Search On Statistical Data Based On Categorization And Metadata Augmentation2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR51284.2021.00043(231-234)Online publication date: Sep-2021
https://doi.org/10.1109/MIPR51284.2021.00043
Agarwal VBhardwaj ARosso PCudre-Mauroux P(2021)ConvTab: A Context-Preserving, Convolutional Model for Ad-Hoc Table Retrieval2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671828(5043-5052)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671828
Shraga RRoitman HFeigenblat GCannim MHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Web Table Retrieval using Multimodal Deep LearningProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401120(1399-1408)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401120
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten