Abstract
Aadhaar is the world’s largest biometric database with a billion records, being compiled as an identity platform to deliver social services to residents of India. Aadhaar processes streams of biometric data as residents are enrolled and updated. Besides \(\sim \)1 million enrollments and updates per day, up to 100 million daily biometric authentications are expected during delivery of various public services. These form critical Big Data applications, with large volumes and high velocity of data. Here, we propose a stream processing workload, based on the Aadhaar enrollment and Authentication applications, as a Big Data benchmark for distributed stream processing systems. We describe the application composition, and characterize their task latencies and selectivity, and data rate and size distributions, based on real observations. We also validate this benchmark on Apache Storm using synthetic streams and simulated application logic. This paper offers a unique glimpse into an operational national identity infrastructure, and proposes a benchmark for “fast data” platforms to support such eGovernance applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
India 2011 Census, and live statistics from https://portal.uidai.gov.in.
- 2.
Healthcare.gov: CMS Has Taken Steps to Address Problems, but Needs to Further Implement Systems Development Best Practices, www.gao.gov/products/GAO-15-238.
- 3.
Biometric Attendance Service, http://attendance.gov.in.
- 4.
Code and data generator at https://github.com/dream-lab/bigdata-benchmarks.
- 5.
The violin plot is a generalization of a box and whiskers plot. The minimum, median and maximum values are marked with a dash on the vertical line. The width of the horizontal shaded region around each vertical bar represents the relative frequency of packets having that latency value.
References
Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: VLDB (2004)
Baru, C., Marcus, R., Chang, W. (eds.): Use cases from NIST big data requirements working group V1.0. Technical report M0180 v15, NIST (2013). http://bigdatawg.nist.gov
Dalwai, A. (ed.): Aadhaar technology and architecture: principles, design. best practices and key lessons. Technical report, Unique Identification Authority of India (UIDAI) (2014)
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark for big data analytics. In: ACM SIGMOD (2013)
Gu, L., Zhou, M., Zhang, Z., Shan, M.C., Zhou, A., Winslett, M.: Chronos: an elastic parallel framework for stream benchmark generation and simulation. In: IEEE ICDE (2015)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: Agrawal, D., Candan, K.S., Li, W.-S. (eds.) New Frontiers in Information and Software as Services. LNBIP, vol. 74, pp. 209–228. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19294-4_9
Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: ACM International Conference on Computing Frontiers (2015)
Lu, R., Wu, G., Xie, B., Hu, J.: Stream bench: towards benchmarking modern distributed stream computing frameworks. In: IEEE/ACM UCC, 2014 (2014)
Nabi, Z., Bouillet, E., Bainbridge, A., Thomas, C.: Of Streams and Storms. Technical report, IBM (2014). https://github.com/IBMStreams/benchmarks
Office of the Chief Financial Officer: Office of Biometric Identity Management Expenditure Plan: Fiscal Year 2015 Report to Congress. Technical report, Office of Biometric Identity Management, Homeland Security, United States (2015)
Poess, M., Smith, B., Kollar, L., Larson, P.: TPC-DS, taking decision support benchmarking to the next level. In: ACM International Conference on Management of Data (SIGMOD), pp. 582–587. ACM (2002)
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: ACM SOSP (2001)
Zhao, J.-M., Wang, W.-S., Liu, X., Chen, Y.-F.: Big data benchmark - big DS. In: Rabl, T., Jacobsen, H.-A., Raghunath, N., Poess, M., Bhandarkar, M., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 49–57. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10596-3_5
Acknowledgments
We are grateful for inputs provided by Dr. Vivek Raghavan from UIDAI, and UIDAI’s public reports in preparing this article. The views and opinions of authors expressed herein do not necessarily state or reflect those of the Government of India or any agency thereof, the UIDAI, nor any of their employees.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Simmhan, Y., Shukla, A., Verma, A. (2016). Benchmarking Fast-Data Platforms for the Aadhaar Biometric Database. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-49748-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)