ABSTRACT
Tabular data is an invaluable information resource for search, in-formation extraction and question answering about the world. It is critical to understand the semantic concept types for table columns in order to fully exploit the information in tabular data. In this paper, we focus on learning-based approaches for column concept type detection without relying on any metadata or queries to existing knowledge bases. We propose a model that employs both statistical and semantic features of table columns, and use Star-Transformers to gather and scatter information across the whole table to boost the performance on individual columns. We apply distant supervision to construct a tabular dataset with columns annotated with DBpedia classes. Our experiment results show that our model achieves 93.57 accuracy on the dataset, exceeding that of the state-of-the-art baselines.
Supplemental Material
- Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. Tabel: Entity linking in web tables. In ISWC.Google Scholar
- Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa, and Paolo Merialdo. 2018. Leveraging wikipedia table schemas for knowledge graph augmentation. In WebDB. Google ScholarDigital Library
- Jiaoyan Chen, Ernesto Jiménez-Ruiz, Ian Horrocks, and Charles Sutton. 2019. Learning semantic annotations for tabular data. In IJCAI. Google ScholarDigital Library
- Zhiyu Chen, Haiyan Jia, Jeff Heflin, and Brian D Davison. 2018. Generating schema labels through dataset content analysis. In WWW. Google ScholarDigital Library
- Andrew M Dai, Christopher Olah, and Quoc V Le. 2015. Document embedding with paragraph vectors. In NIPS Deep Learning Workshop.Google Scholar
- Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. 2021. Turl: Table understanding through representation learning. In VLDB. Google ScholarCross Ref
- Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-transformer. In NAACL.Google Scholar
- Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, cC agatay Demiralp, and César Hidalgo. 2019. Sherlock: A deep learning approach to semantic data type detection. In KDD. Google ScholarDigital Library
- Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and searching web tables using entities, types and relationships. In VLDB. Google ScholarDigital Library
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In ICLR.Google Scholar
- Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic message passing for generating linked data from tables. In ISWC. Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP.Google Scholar
- Minh Pham, Suresh Alse, Craig A Knoblock, and Pedro Szekely. 2016. Semantic labeling: a domain-independent approach. In ISWC.Google Scholar
- Dominique Ritze and Christian Bizer. 2017. Matching web tables to dbpedia-a feature utility study. In EDBT. Google ScholarDigital Library
- Natalia Rümmele, Yuriy Tyshetskiy, and Alex Collins. 2018. Evaluating approaches for supervised semantic labeling. In WWW Linked Data on the Web Workshop.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. Google ScholarDigital Library
- Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, and Gengxin Miao. 2011. Recovering semantics of tables on the web. In VLDB. Google ScholarDigital Library
- Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. 2020. Tabert: Pretraining for joint understanding of textual and tabular data. In ACL.Google Scholar
- Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Caugatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual semantic type detection in tables. In VLDB. Google ScholarDigital Library
- Ziqi Zhang. 2017. Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8, 6 (2017), 921?957.Google Scholar
Index Terms
- Tabular Data Concept Type Detection Using Star-Transformers
Recommendations
Semantic Concept Annotation for Tabular Data
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementDetermining the semantic concepts of columns in tabular data is of use for many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Several prior works have proposed supervised ...
TCN: Table Convolutional Network for Web Table Interpretation
WWW '21: Proceedings of the Web Conference 2021Information extraction from semi-structured webpages provides valuable long-tailed facts for augmenting knowledge graph. Relational Web tables are a critical component containing additional entities and attributes of rich and diverse knowledge. However, ...
Sherlock: A Deep Learning Approach to Semantic Data Type Detection
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningCorrectly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular ...
Comments