Skip to main content

Enhancing Table Retrieval with Dual Graph Representations

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Abstract

Table retrieval aims to rank candidate tables for answering natural language query, in which the most critical problem is how to learn informative representations for structured tables. Most previous methods roughly flatten the table and send it into a sequence encoder, ignoring the structure information of tables and the semantic interaction between table cells and contexts. In this paper, we propose a dual graph based method to perceive the semantics and structure of tables, so as to preferably support the downstream table retrieval task. Inspired by human cognition, we first decouple a table into the row view and column view, then build dual graphs from these two views with the consideration of table contexts. Afterward, intra-graph and inter-graph interactions are iteratively performed for aggregating and exchanging local row- and column-oriented features respectively, and an adaptive fusion strategy is eventually tailor-made for sophisticated table representations. In this way, the table structure and semantic information are well considered with dual-graph modeling. Consequently, the input query can match the target tables based on their full-fledged table representations and achieve the ultimate ranking results more accurately. Extensive experiments verify the superiority of our dual graphs over strong baselines on two table retrieval datasets WikiTables and WebQueryTable. Further analyses also confirm the adaptability for row-/column-oriented tables, and show the rationality and generalization of dual graphs. The source code is available at https://github.com/ty33123/DualG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/.

  2. 2.

    https://github.com/usnistgov/trec_eval.

References

  1. Cafarella, M.J., Halevy, A., Khoussainova, N.: Data integration for the relational web. Proc. VLDB Endowment 2(1), 1090–1101 (2009)

    Google Scholar 

  2. Chen, W., et al.: TabFact: a large-scale dataset for table-based fact verification. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=rkeJRhNYDH

  3. Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.91, https://aclanthology.org/2020.findings-emnlp.91

  4. Chen, Z., Trabelsi, M., Heflin, J., Xu, Y., Davison, B.D.: Table search using a deep contextualized language model. In: Huang, J., et al. (eds.) Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020. ACM (2020). https://doi.org/10.1145/3397271.3401044, https://doi.org/10.1145/3397271.3401044

  5. Chen, Z., Trabelsi, M., Heflin, J., Yin, D., Davison, B.D.: MGNETS: multi-graph neural networks for table search. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2945–2949. Association for Computing Machinery, New York, NY, USA (2021), https://doi.org/10.1145/3459637.3482140

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  7. Eberius, J., Braunschweig, K., Hentsch, M., Thiele, M., Ahmadov, A., Lehner, W.: Building the dresden web table corpus: a classification approach. In: 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC) (2015). https://doi.org/10.1109/BDC.2015.30

  8. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. vol. 2, Short Papers. Association for Computational Linguistics, Valencia, Spain (2017). https://aclanthology.org/E17-2068

  9. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=SJU4ayYgl

  10. Kurland, O.: The cluster hypothesis in information retrieval. In: Jones, G.J.F., Sheridan, P., Kelly, D., de Rijke, M., Sakai, T. (eds.) The 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2013, Dublin, Ireland - July 28 - August 01, 2013. ACM (2013). https://doi.org/10.1145/2484028.2484192, https://doi.org/10.1145/2484028.2484192

  11. Li, X., Sun, Y., Cheng, G.: TSQA: Tabular scenario based question answering. Proc. AAAI Conf. Artif. Intell. 35(15), 13297–13305 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17570

  12. MacDonald, E., Barbosa, D.: Neural relation extraction on wikipedia tables for augmenting knowledge graphs. In: d’Aquin, M., Dietze, S., Hauff, C., Curry, E., Cudré-Mauroux, P. (eds.) CIKM 2020: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 19–23 October 2020. ACM (2020). https://doi.org/10.1145/3340531.3412164, https://doi.org/10.1145/3340531.3412164

  13. Pan, F., Canim, M., Glass, M., Gliozzo, A., Fox, P.: CLTR: an end-to-end, transformer-based system for cell-level table retrieval and table question answering. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-demo.24, https://aclanthology.org/2021.acl-demo.24

  14. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. Nist Special Publication Sp 109 (1995)

    Google Scholar 

  15. Shi, Q., Zhang, Y., Yin, Q., Liu, T.: Logic-level evidence retrieval and graph-based verification network for table-based fact verification. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.16, https://aclanthology.org/2021.emnlp-main.16

  16. Shraga, R., Roitman, H., Feigenblat, G., Canim, M.: Ad hoc table retrieval using intrinsic and extrinsic similarities. In: WWW 2020: The Web Conference 2020, Taipei, Taiwan, April 20–24, 2020. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3379995, https://doi.org/10.1145/3366423.3379995

  17. Shraga, R., Roitman, H., Feigenblat, G., Canim, M.: Web table retrieval using multimodal deep learning. In: Huang, J., (eds.), Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020. ACM (2020). https://doi.org/10.1145/3397271.3401120, https://doi.org/10.1145/3397271.3401120

  18. Sun, Y., Yan, Z., Tang, D., Duan, N., Qin, B.: Content-based table retrieval for web queries. Neurocomputing 349, 183–189 (2019). https://doi.org/10.1016/j.neucom.2018.10.033, https://www.sciencedirect.com/science/article/pii/S0925231218312219

  19. Trabelsi, M., Chen, Z., Zhang, S., Davison, B.D., Heflin, J.: StruBERT: structure-aware BERT for table search and matching. In: Proceedings of the ACM Web Conference 2022. WWW 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3485447.3511972, https://doi.org/10.1145/3485447.3511972

  20. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., (ed.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA (2017)

    Google Scholar 

  21. Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)

    Google Scholar 

  22. Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.L., Jiang, M.: TCN: table convolutional network for web table interpretation. In: Proceedings of the Web Conference 2021. WWW 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442381.3450090, https://doi.org/10.1145/3442381.3450090

  23. Wang, F., Sun, K., Chen, M., Pujara, J., Szekely, P.: Retrieving complex tables with multi-granular graph representation learning. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3462909

  24. Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.745, https://aclanthology.org/2020.acl-main.745

  25. Zhang, L., Zhang, S., Balog, K.: Table2vec: neural word and entity embeddings for table population and retrieval. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019. ACM (2019). https://doi.org/10.1145/3331184.3331333, https://doi.org/10.1145/3331184.3331333

  26. Zhang, S., Balog, K.: Ad hoc table retrieval using semantic similarity. In: Champin, P., Gandon, F.L., Lalmas, M., Ipeirotis, P.G. (eds.) Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018. ACM (2018). https://doi.org/10.1145/3178876.3186067, https://doi.org/10.1145/3178876.3186067

Download references

Acknowledgment

This work is supported by the National Key Research and Development Program of China (grant No.2021YFB3100600), the Strategic Priority Research Program of Chinese Academy of Sciences (grant No.XDC02040400) and the Youth Innovation Promotion Association of CAS (Grant No. 2021153).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingwen Liu .

Editor information

Editors and Affiliations

Ethics declarations

Ethics Statement

I understand that using technology can have ethical implications, especially in collection, processing, and privacy of form retrieval data. I acknowledge and recognize the importance of complying with ethical standards and the hazards of potential risks.

In the data collection and processing, my training data comes from two publicly available tabular search datasets. Although we do not collect or store any sensitive information, we should strictly restrict the retrieval text of users and ensure that it does not contain any dangerous information.

In addition, when the model used in police or military related applications, we should pay special attention to its use in these areas, which must conducted in a more responsible manner. To prevent models from providing inaccurate search results for police or military personnel, users are responsible for ensuring that they comply with ethical principles and laws and regulations when using model outputs, and for screening search results.

In summary, I strive to ensure that the model outputs search results in an ethical and responsible manner, and I urge my users to do the same. I will continue to adhere to ethical standards and stay abreast of emerging ethical issues in the fields of machine learning and data mining.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, T. et al. (2023). Enhancing Table Retrieval with Dual Graph Representations. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43421-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43420-4

  • Online ISBN: 978-3-031-43421-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics