Abstract
We consider a problem of word embedding for tables, and we obtain distributed representations for words found in tables. We propose a table word-embedding method, which considers both horizontal and vertical relations between cells to estimate appropriate word embedding for words in tables. We propose objective functions that make use of horizontal and vertical relations, both individually and jointly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The total number of tables found in the corpus was 255,039.
- 2.
In our data set, 266 (93.7%) out of 284 randomly sampled tables were row-wise.
- 3.
In this research, we ignore tables that have no attribute names. Although this strategy can cause noise in the set of attribute vectors, the effects of such noise are small, because values are of many types and their frequency is relatively lower than that of the attributes.
- 4.
The original paper of word2vec derived this objective by maximizing the probability of a word appearing in the given contexts, but here we ignore these derivations and consider only the following objectives as merely the score function for the purpose of obtaining word-embedding vectors.
- 5.
In addition, note that only two of these four terms are used for each (w, z) pair, rendering the SGD implementation for this model nearly the same as that of word2vec.
- 6.
Although two (the first and second) terms are used for the word w and its vertical context word c, we can differentiate each term independently because there are no vectors appearing both of the terms, thus we can use the iteration method similar to that of word2vec.
- 7.
Note that as a result, the size of the similarity and analogy task queries was reduced to 445 and 5,124, respectively.
References
Bollegala, D., Alsuhaibani, M., Maehara, T., Kawarabayashi, K.I.: Joint word representation learning using a corpus and a semantic lexicon. In: Proceedings of AAAI 2016, pp. 2690–2696 (2016)
Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.I.: Learning word representations from relational graphs. In: Proceedings of AAAI 2015, pp. 2146–2152 (2015)
Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, Burlington (2002)
Embley, D., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2), 66–86 (2006)
Ji, S., Satish, N., Li, S., Dubey, P.: Parallelizing word2vec in shared and distributed memory. CoRR abs/ 1604.04661 (2016)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1), 1338–1347 (2010)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of AAAI 2015, pp. 2181–2187 (2015)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013)
Munoz, E., Hogan, A., Mileo, A.: Triplifying Wikipedia’s tables. In: Proceedings of the ISWC 2013 Workshop on Linked Data for Information Extraction (2013)
Neelakantan, A., Roth, B., McCallum, A.: Compositional vector space models for knowledge base completion. In: Proceedings of ACL 2015, pp. 156–166 (2015)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. Proc. VLDB Endowment 5(10), 908–919 (2012)
Recht, B., Re, C., Wright, S.J., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of NIPS 2011, pp. 693–701 (2011)
Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of EMNLP 2015, pp. 1499–1509 (2015)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI 2014, pp. 1112–1119 (2014)
Yin, P., Lu, Z., Li, H., Kao, B.: Neural enquirer: learning to query tables in natural language. In: Proceedings of IJCAI 2016, pp. 2308–2314 (2016)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)
Acknowledgement
This work was supported by JSPS KAKENHI Grant Numbers JP15K00309, JP15K00425, JP15K16077.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yoshida, M., Matsumoto, K., Kita, K. (2017). Distributed Representations for Words on Tables. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-57454-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)