Extracting Temporal Patterns from Large-Scale Text Corpus

Liu, Yu; Hua, Wen; Zhou, Xiaofang

doi:10.1007/978-3-030-12079-5_2

Extracting Temporal Patterns from Large-Scale Text Corpus

Yu Liu¹⁵,
Wen Hua¹⁵ &
Xiaofang Zhou¹⁵

Conference paper
First Online: 23 January 2019

666 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11393))

Abstract

Knowledge, in practice, is time-variant and many relations are only valid for a certain period of time. This phenomenon highlights the importance of designing temporal patterns, i.e., indicating phrases and their temporal meanings, for temporal knowledge harvesting. However, pattern design is extremely laborious and time consuming even for a single relation. Therefore, in this work, we study the problem of temporal pattern extraction by automatically analysing a large-scale text corpus with a small number of seed temporal facts. The problem is challenging considering the ambiguous nature of natural language and the huge amount of documents we need to analyse in order to obtain highly representative temporal patterns. To this end, we introduce various techniques, including corpus annotation, pattern generation, scoring and clustering, to reduce ambiguity in the text corpus and improve both accuracy and coverage of the extracted patterns. We conduct extensive experiments on real world datasets and the experimental results verify the effectiveness of our proposals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The datasets can be downloaded from https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/pravda/.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Google Scholar
Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999). https://doi.org/10.1007/10704656_11
Chapter Google Scholar
Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)
Google Scholar
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP, pp. 827–832, October 2013
Google Scholar
Clark, K., Manning, C.D.: Deep reinforcement learning for mention-ranking coreference models. arXiv preprint arXiv:1609.08667 (2016)
Cucerzan, S., Sil, A.: The MSR systems for entity linking and temporal slot filling at TAC 2013. In: Text Analysis Conference (2013)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)
Google Scholar
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM (2010)
Google Scholar
Garrido, G., Penas, A., Cabaleiro, B.: UNED slot filling and temporal slot filling systems at TAC KBP 2013: system description. In: TAC (2013)
Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet Google Scholar
Kuzey, E., Weikum, G.: Extraction of temporal facts and events from Wikipedia. In: Proceedings of the 2nd Temporal Web Analytics Workshop, pp. 25–32. ACM (2012)
Google Scholar
Ling, X., Weld, D.S.: Temporal information extraction. In: AAAI, vol. 10, pp. 1385–1390 (2010)
Google Scholar
Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research. CIDR Conference (2014)
Google Scholar
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, pp. 746–751 (2013)
Google Scholar
Mitchell, T., et al.: Never-ending learning (2015)
Google Scholar
Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534. Association for Computational Linguistics (2012)
Google Scholar
Strötgen, J., Gertz, M.: HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324. Association for Computational Linguistics (2010)
Google Scholar
Surdeanu, M.: Overview of the TAC2013 knowledge base population evaluation: English slot filling and temporal slot filling. In: Proceedings of the Sixth Text Analysis Conference (TAC 2013) (2013)
Google Scholar
Talukdar, P.P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 73–82. ACM (2012)
Google Scholar
UzZaman, N., Llorens, H., Derczynski, L., Verhagen, M., Allen, J., Pustejovsky, J.: SemEval-2013 task 1: TempEval-3: evaluating time expressions, events, and temporal relations
Google Scholar
Wang, Y., Dylla, M., Spaniol, M., Weikum, G.: Coupling label propagation and constraints for temporal fact extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 233–237. Association for Computational Linguistics (2012)
Google Scholar
Wang, Y., Yang, B., Qu, L., Spaniol, M., Weikum, G.: Harvesting facts from textual web sources by constrained label propagation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 837–846. ACM (2011)
Google Scholar
Wang, Y., Zhu, M., Qu, L., Spaniol, M., Weikum, G.: Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 697–700. ACM (2010)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2012)
Google Scholar
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26. Association for Computational Linguistics (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
Yu Liu, Wen Hua & Xiaofang Zhou

Authors

Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wen Hua
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Liu .

Editor information

Editors and Affiliations

University of Sydney, Sydney, NSW, Australia
Lijun Chang
University of Melbourne, Parkville, VIC, Australia
Junhao Gan
University of New South Wales, Sydney, NSW, Australia
Xin Cao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Hua, W., Zhou, X. (2019). Extracting Temporal Patterns from Large-Scale Text Corpus. In: Chang, L., Gan, J., Cao, X. (eds) Databases Theory and Applications. ADC 2019. Lecture Notes in Computer Science(), vol 11393. Springer, Cham. https://doi.org/10.1007/978-3-030-12079-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-12079-5_2
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12078-8
Online ISBN: 978-3-030-12079-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics