Benchmarking Fast-Data Platforms for the Aadhaar Biometric Database

Simmhan, Yogesh; Shukla, Anshu; Verma, Arun

doi:10.1007/978-3-319-49748-8_2

Yogesh Simmhan¹⁹,
Anshu Shukla¹⁹ &
Arun Verma¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10044))

Included in the following conference series:

Abstract

Aadhaar is the world’s largest biometric database with a billion records, being compiled as an identity platform to deliver social services to residents of India. Aadhaar processes streams of biometric data as residents are enrolled and updated. Besides $\sim $1 million enrollments and updates per day, up to 100 million daily biometric authentications are expected during delivery of various public services. These form critical Big Data applications, with large volumes and high velocity of data. Here, we propose a stream processing workload, based on the Aadhaar enrollment and Authentication applications, as a Big Data benchmark for distributed stream processing systems. We describe the application composition, and characterize their task latencies and selectivity, and data rate and size distributions, based on real observations. We also validate this benchmark on Apache Storm using synthetic streams and simulated application logic. This paper offers a unique glimpse into an operational national identity infrastructure, and proposes a benchmark for “fast data” platforms to support such eGovernance applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of Frameworks for Heterogeneous Computing Using High-Performance Pattern-Matching for DNA Biometrics and Digital Forensics

Performance Characterization of Big Data Systems with TPC Express Benchmark HS

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Notes

1.
India 2011 Census, and live statistics from https://portal.uidai.gov.in.
2.
Healthcare.gov: CMS Has Taken Steps to Address Problems, but Needs to Further Implement Systems Development Best Practices, www.gao.gov/products/GAO-15-238.
3.
Biometric Attendance Service, http://attendance.gov.in.
4.
Code and data generator at https://github.com/dream-lab/bigdata-benchmarks.
5.
The violin plot is a generalization of a box and whiskers plot. The minimum, median and maximum values are marked with a dash on the vertical line. The width of the horizontal shaded region around each vertical bar represents the relative frequency of packets having that latency value.

References

Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: VLDB (2004)
Google Scholar
Baru, C., Marcus, R., Chang, W. (eds.): Use cases from NIST big data requirements working group V1.0. Technical report M0180 v15, NIST (2013). http://bigdatawg.nist.gov
Dalwai, A. (ed.): Aadhaar technology and architecture: principles, design. best practices and key lessons. Technical report, Unique Identification Authority of India (UIDAI) (2014)
Google Scholar
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark for big data analytics. In: ACM SIGMOD (2013)
Google Scholar
Gu, L., Zhou, M., Zhang, Z., Shan, M.C., Zhou, A., Winslett, M.: Chronos: an elastic parallel framework for stream benchmark generation and simulation. In: IEEE ICDE (2015)
Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: Agrawal, D., Candan, K.S., Li, W.-S. (eds.) New Frontiers in Information and Software as Services. LNBIP, vol. 74, pp. 209–228. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19294-4_9
Chapter Google Scholar
Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: ACM International Conference on Computing Frontiers (2015)
Google Scholar
Lu, R., Wu, G., Xie, B., Hu, J.: Stream bench: towards benchmarking modern distributed stream computing frameworks. In: IEEE/ACM UCC, 2014 (2014)
Google Scholar
Nabi, Z., Bouillet, E., Bainbridge, A., Thomas, C.: Of Streams and Storms. Technical report, IBM (2014). https://github.com/IBMStreams/benchmarks
Office of the Chief Financial Officer: Office of Biometric Identity Management Expenditure Plan: Fiscal Year 2015 Report to Congress. Technical report, Office of Biometric Identity Management, Homeland Security, United States (2015)
Google Scholar
Poess, M., Smith, B., Kollar, L., Larson, P.: TPC-DS, taking decision support benchmarking to the next level. In: ACM International Conference on Management of Data (SIGMOD), pp. 582–587. ACM (2002)
Google Scholar
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: ACM SOSP (2001)
Google Scholar
Zhao, J.-M., Wang, W.-S., Liu, X., Chen, Y.-F.: Big data benchmark - big DS. In: Rabl, T., Jacobsen, H.-A., Raghunath, N., Poess, M., Bhandarkar, M., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 49–57. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10596-3_5
Google Scholar

Download references

Acknowledgments

We are grateful for inputs provided by Dr. Vivek Raghavan from UIDAI, and UIDAI’s public reports in preparing this article. The views and opinions of authors expressed herein do not necessarily state or reflect those of the Government of India or any agency thereof, the UIDAI, nor any of their employees.

Author information

Authors and Affiliations

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
Yogesh Simmhan, Anshu Shukla & Arun Verma

Authors

Yogesh Simmhan
View author publications
You can also search for this author in PubMed Google Scholar
Anshu Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Arun Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yogesh Simmhan .

Editor information

Editors and Affiliations

Technical University of Berlin, Berlin, Germany
Tilmann Rabl
Cisco Systems, Inc., San Jose, California, USA
Raghunath Nambiar
University of California at San Diego, La Jolla, California, USA
Chaitanya Baru
Ampool, Inc., Santa Clara, California, USA
Milind Bhandarkar
Oracle Corporation, Redwood Shores, California, USA
Meikel Poess
Indian Institute of Public Health, Hyderabad, India
Saumyadipta Pyne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simmhan, Y., Shukla, A., Verma, A. (2016). Benchmarking Fast-Data Platforms for the Aadhaar Biometric Database. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-49748-8_2
Published: 01 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics