skip to main content
research-article

A Time Machine for Information: Looking Back to Look Forward

Published: 28 September 2016 Publication History

Abstract

Historical data (also called long data) holds the key to understanding when facts are true. It is through long data that one can understand the trends that have developed in the past, form the audit trails needed for justification, and make predictions about the future. For searching, there is also increasing interest to develop search capabilities over long data.
In this article, we first motivate the need to develop a time machine for information that will help people "look back" so as to "look forward". We will overview key ideas on three components (extraction, linking, and cleaning) that we believe are central to the development of any time machine for information. Finally, we conclude with our thoughts on what we believe are some interesting open research problems. This article is based on the material presented in a tutorial at VLDB 2015.

References

[1]
M. Al-Kateb, A. Ghazal, A. Crolotte, R. Bhashyam, J. Chimanchode, and S. P. Pakala. Temporal query processing in teradata. In EDBT, pages 573--578, 2013.
[2]
B. Alexe, M. Roth, and W. Tan. Preference-aware integration of temporal data. PVLDB, 8(4):365--376, 2014.
[3]
J. Bleiholder and F. Naumann. Data fusion. ACM Comput. Surv., 41(1):1--41, 2009.
[4]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008.
[5]
D. Burdick, M. A. Hernández, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. Stanoi, S. Vaithyanathan, and S. R. Das. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bulletin., 34(3):60--67, 2011.
[6]
M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: Exploring the power of tables on the web. PVLDB, 1(1):538--549, Aug. 2008.
[7]
Y.-H. Chiang, A. Doan, and J. F. Naughton. Modeling entity evolution for temporal record matching. In Sigmod, 2014.
[8]
Y.-H. Chiang, A. Doan, and J. F. Naughton. Tracking entities in the dynamic world: A fast algorithm for matching temporal records. PVLDB, pages 469--480, 2014.
[9]
J. Chomicki. Temporal query languages: A survey. In ICTL, pages 506--534, 1994.
[10]
J. Chomicki and D. Toman. Temporal databases. In Foundations of Artificial Intelligence, pages 429--467. Elsevier, 2005.
[11]
T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Inc., New York, 2003.
[12]
A matter of time: Temporal data management in db2 10, 2012. http://www.ibm.com/developerworks/data/library/techarticle/dm-1204db2temporaldata/.
[13]
A. Doan, A. Halevy, and Z. Ives. Principles of Data Integration. Morgan Kaufmann, 2012.
[14]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1):562--573, 2009.
[15]
X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014.
[16]
X. L. Dong and F. Naumann. Data fusion - resolving data conflicts for integration. PVLDB, 2(2):1654--1655, 2009.
[17]
X. L. Dong and D. Srivastava. Big Data Integration. Morgan & Claypool, 2015.
[18]
X. L. Dong and W. Tan. A time machine for information: Looking back to look forward. PVLDB, 8(12):2044--2055, 2015.
[19]
M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. ACM Transactions on the Web (TWEB), 1(2):7, 2007.
[20]
The EDGAR Public Dissemination Service. http://www.sec.gov/edgar.shtml.
[21]
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011.
[22]
W. Fan and F. Geerts. Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012.
[23]
W. Fan, F. Geerts, N. Tang, and W. Yu. Conflict resolution with data currency and consistency. ACM Journal of Data and Information Quality, 5, 2014.
[24]
E. Filatova and E. Hovy. Assigning time-stamps to event-clauses. In Workshop on Temporal and spatial inf. proc. -Volume 13, page 13, 2001.
[25]
D. Graus, M.-H. Peetz, D. Odijk, O. de Rooij, and M. de Rijke. your history-semantic linking for a personalized timeline of historic events. Workshop: LinkedUp Challenge at Open Knowledge Conference (OKCon) 2013, 2013.
[26]
J. Hellerstein. Quantitative data cleaning for large databases. Technical report, UC Berkeley, 2008.
[27]
J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham, G. de Melo, and G. Weikum. YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In WWW, pages 229--232, 2011.
[28]
W. Hua, Z. Wang, H. Wang, K. Zheng, and X. Zhou. Short text understanding through lexical-semantic analysis. In ICDE, pages 495--506, 2015.
[29]
Big Data, Meet Long Data. http://www.informationweek.com/big-data/big-data-analytics/big-data-meet-long-data/d/d-id/1109325?, Apr 1, 2013.
[30]
Internet Archive Wayback Machine. http://waybackmachine.org, Aug. 26, 2011.
[31]
J.-T. Kim and D. I. Moldovan. Acquisition of semantic patterns for information extraction from corpora. In Artificial Intelligence for Appl., pages 171--176, 1993.
[32]
F. Li, M. L. Lee, W. Hsu, and W.-C. Tan. Linking temporal records for profiling entities. In SIGMOD, 2015.
[33]
J. Li and C. Cardie. Timeline generation: tracking individuals on twitter. In WWW, pages 643--652, 2014.
[34]
P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. PVLDB, 4(11):956--967, 2011.
[35]
X. Li, X. L. Dong, K. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 2013.
[36]
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. PVLDB, 3(1-2):1338--1347, 2010.
[37]
X. Ling and D. S. Weld. Temporal information extraction. In AAAI, volume 10, pages 1385--1390, 2010.
[38]
A. Mazeika, T. Tylenda, and G. Weikum. Entity timelines: Visual analytics and named entity evolution. In CIKM, pages 2585--2588, 2011.
[39]
A. Pal, V. Rastogi, A. Machanavajjhala, and P. Bohannon. Information integration over time in unreliable and uncertain environment. In WWW, pages 789--798, 2012.
[40]
R. Qian. Timeline: Understanding Important Events in Peoples Lives. http://blogs.bing.com/search/2014/02/21/timelineunderstanding- important-events-in-peoples-lives/, February 2014. Last retrieved on Oct 27, 2014.
[41]
M. Roth and W.-C. Tan. Data integration and data exchange: It's really about time. In CIDR, 2013.
[42]
R. T. Snodgrass. The TSQL2 Temporal Query Language. Kluwer, 1995.
[43]
R. T. Snodgrass and I. Ahn. Temporal databases. IEEE Computer, 19(9):35--42, 1986.
[44]
J. Strötgen and M. Gertz. Heideltime: High quality rule-based extraction and normalization of temporal expressions. In Intl. Workshop on Semantic Evaluation, pages 321--324, 2010.
[45]
T. A. Tuan, S. Elbassuoni, N. Preda, and G. Weikum. Cate: context-aware timeline for entity illustration. In WWW, pages 269--272, 2011.
[46]
Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. Timely yago: harvesting, querying, and visualizing temporal knowledge from wikipedia. In EDBT, pages 697--700, 2010.
[47]
G. Weikum, N. Ntarmos, M. Spaniol, P. Triantafillou, A. A. Benczúr, S. Kirkpatrick, P. Rigaux, and M. Williamson. Longitudinal analytics on web archive data: It's about time! In CIDR, pages 199--202, 2011.
[48]
Stop hyping big data and start paying attention to 'long data'. http://www.wired.com/2013/01/forget-big-data-think-long-data/, Jan 29, 2013.
[49]
F. Wu, R. Hoffmann, and D. S. Weld. Information extraction from wikipedia: Moving down the long tail. In SIGKDD, pages 731--739, 2008.

Cited By

View all
  • (2024)Efficiently Labeling and Retrieving Temporal Anomalies in Relational DatabasesInformation Systems Frontiers10.1007/s10796-024-10495-wOnline publication date: 31-May-2024
  • (2022)Querying Temporal Anomalies in Healthcare Information Systems and BeyondAdvances in Databases and Information Systems10.1007/978-3-031-15740-0_16(209-222)Online publication date: 5-Sep-2022
  • (2021)It Is Time for Journalists to Save JournalismMedia, Technology and Education in a Post-Truth Society10.1108/978-1-80043-906-120211015(203-221)Online publication date: 8-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 45, Issue 2
June 2016
66 pages
ISSN:0163-5808
DOI:10.1145/3003665
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2016
Published in SIGMOD Volume 45, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficiently Labeling and Retrieving Temporal Anomalies in Relational DatabasesInformation Systems Frontiers10.1007/s10796-024-10495-wOnline publication date: 31-May-2024
  • (2022)Querying Temporal Anomalies in Healthcare Information Systems and BeyondAdvances in Databases and Information Systems10.1007/978-3-031-15740-0_16(209-222)Online publication date: 5-Sep-2022
  • (2021)It Is Time for Journalists to Save JournalismMedia, Technology and Education in a Post-Truth Society10.1108/978-1-80043-906-120211015(203-221)Online publication date: 8-Jul-2021
  • (2020)Popularity Prediction for Single Tweet based on Heterogeneous Bass ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2952856(1-1)Online publication date: 2020
  • (2019)Synthesizing N-ary Relations from Web TablesProceedings of the 9th International Conference on Web Intelligence, Mining and Semantics10.1145/3326467.3326480(1-12)Online publication date: 26-Jun-2019
  • (2019)Profiling the semantics of n-ary web table dataProceedings of the International Workshop on Semantic Big Data10.1145/3323878.3325806(1-6)Online publication date: 5-Jul-2019
  • (2019)CurrentClean: Spatio-Temporal Cleaning of Stale Data2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00024(172-183)Online publication date: Apr-2019
  • (2019)Temporal data exchangeInformation Systems10.1016/j.is.2019.07.004Online publication date: Jul-2019
  • (2018)Exploring changeProceedings of the VLDB Endowment10.14778/3282495.328249612:2(85-98)Online publication date: 1-Oct-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media