Towards Real-Time Analysis of ID-Associated Data

Jin, Guodong; Wang, Yixuan; Qin, Xiongpai; Chen, Yueguo; Du, Xiaoyong

doi:10.1007/978-3-030-01391-2_6

Guodong Jin^18,19,
Yixuan Wang^18,19,
Xiongpai Qin^18,19,
Yueguo Chen^18,19 &
…
Xiaoyong Du^18,19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11158))

Included in the following conference series:

International Conference on Conceptual Modeling

1118 Accesses

Abstract

ID-associated data are sequences of entries, and each entry is semantically associated with a unique ID. Examples are user IDs in user behaviour logs of mobile applications and device IDs in sensor records of self-driving cars. Nowadays, many big data applications generate such types of ID-associated data at high speed, and most queries over them are ID-centric (on specific IDs and ranges of time). To generate valuable insights from such data timely, the system needs to ingest high volumes of them with low latency, and support real-time analysis over them efficiently. In this paper, we introduce a system prototype designed for this goal. The system designed a parallel ingestion pipeline and a lightweight indexing scheme for the fast ingestion and efficient analysis. Besides, a fiber partitioning method is utilized to achieve dynamic scalability. For better integration with Hadoop ecosystem, the prototype is implemented based on open source projects, including Kafka and Presto.

This work is supported by Science and Technology Planning Project of Guangdong under grant No.2015B010131015, 863 key project under grant No.2015AA015307, and the National Science Foundation of China under grants No.61472426, U1711261, 61432006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
select sum(quantity), sum(totalprice), min(discount), max(discount), avg(extendedprice), count(*) from test where custkey = k and \(generation\_time > t1\) and \(generation\_time < t2\) order by linestatus.
2.
https://github.com/dbiir/paraflow.

References

Apache Hive (2011). http://hive.apache.org
Spark SQL: relational data processing in spark (2015). http://spark.apache.org/sql/
Apache ORC (2018). https://orc.apache.org
Apache Parquet (2018). https://parquet.apache.org
Gormley, C., Tong, Z.: ElasticSearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media Inc, Sebastopol (2015)
Google Scholar
Karger, D., et al.: Web caching with consistent hashing. Comput. Netw. 31(11–16), 1203–1213 (1999)
Article Google Scholar
Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)
Google Scholar
Traverso, M.: Presto: interacting with petabytes of data at facebook (2013). Accessed 4 Feb 2014
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, China
Guodong Jin, Yixuan Wang, Xiongpai Qin, Yueguo Chen & Xiaoyong Du
DEKE Key Laboratory, Renmin University of China, MOE, Beijing, China
Guodong Jin, Yixuan Wang, Xiongpai Qin, Yueguo Chen & Xiaoyong Du

Authors

Guodong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yixuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiongpai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiongpai Qin .

Editor information

Editors and Affiliations

University of British Columbia, Vancouver, BC, Canada
Carson Woo
University of Helsinki, Helsinki, Finland
Jiaheng Lu
Northwestern Polytechnical University, Xian, China
Zhanhuai Li
Department of Computer Science, National University of Singapore, Singapore, Singapore
Tok Wang Ling
Department of Computer Science and Technology, Tsinghua University, Beijing, Beijing, China
Guoliang Li
National University of Singapore, Singapore, Singapore
Mong Li Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, G., Wang, Y., Qin, X., Chen, Y., Du, X. (2018). Towards Real-Time Analysis of ID-Associated Data. In: Woo, C., Lu, J., Li, Z., Ling, T., Li, G., Lee, M. (eds) Advances in Conceptual Modeling. ER 2018. Lecture Notes in Computer Science(), vol 11158. Springer, Cham. https://doi.org/10.1007/978-3-030-01391-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-01391-2_6
Published: 13 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01390-5
Online ISBN: 978-3-030-01391-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics