Skip to main content

Towards Real-Time Analysis of ID-Associated Data

  • Conference paper
  • First Online:
Advances in Conceptual Modeling (ER 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11158))

Included in the following conference series:

  • 1118 Accesses

Abstract

ID-associated data are sequences of entries, and each entry is semantically associated with a unique ID. Examples are user IDs in user behaviour logs of mobile applications and device IDs in sensor records of self-driving cars. Nowadays, many big data applications generate such types of ID-associated data at high speed, and most queries over them are ID-centric (on specific IDs and ranges of time). To generate valuable insights from such data timely, the system needs to ingest high volumes of them with low latency, and support real-time analysis over them efficiently. In this paper, we introduce a system prototype designed for this goal. The system designed a parallel ingestion pipeline and a lightweight indexing scheme for the fast ingestion and efficient analysis. Besides, a fiber partitioning method is utilized to achieve dynamic scalability. For better integration with Hadoop ecosystem, the prototype is implemented based on open source projects, including Kafka and Presto.

This work is supported by Science and Technology Planning Project of Guangdong under grant No.2015B010131015, 863 key project under grant No.2015AA015307, and the National Science Foundation of China under grants No.61472426, U1711261, 61432006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    select sum(quantity), sum(totalprice), min(discount), max(discount), avg(extendedprice), count(*) from test where custkey = k and \(generation\_time > t1\) and \(generation\_time < t2\) order by linestatus.

  2. 2.

    https://github.com/dbiir/paraflow.

References

  1. Apache Hive (2011). http://hive.apache.org

  2. Spark SQL: relational data processing in spark (2015). http://spark.apache.org/sql/

  3. Apache ORC (2018). https://orc.apache.org

  4. Apache Parquet (2018). https://parquet.apache.org

  5. Gormley, C., Tong, Z.: ElasticSearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media Inc, Sebastopol (2015)

    Google Scholar 

  6. Karger, D., et al.: Web caching with consistent hashing. Comput. Netw. 31(11–16), 1203–1213 (1999)

    Article  Google Scholar 

  7. Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)

    Google Scholar 

  8. Traverso, M.: Presto: interacting with petabytes of data at facebook (2013). Accessed 4 Feb 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiongpai Qin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jin, G., Wang, Y., Qin, X., Chen, Y., Du, X. (2018). Towards Real-Time Analysis of ID-Associated Data. In: Woo, C., Lu, J., Li, Z., Ling, T., Li, G., Lee, M. (eds) Advances in Conceptual Modeling. ER 2018. Lecture Notes in Computer Science(), vol 11158. Springer, Cham. https://doi.org/10.1007/978-3-030-01391-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01391-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01390-5

  • Online ISBN: 978-3-030-01391-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics