Abstract
Research in the field of biometrics depends on the effective management and analysis of many terabytes of digital data. The quality of an experimental result is often highly dependent upon the sheer amount of data marshalled to support it. However, the current state of the art requires researchers to have a heroic level of expertise in systems software to perform large scale experiments. To address this, we have designed and implemented BXGrid, a data repository and workflow abstraction for biometrics research. The system is composed of a relational database, an active storage cluster, and a campus computing grid. End users interact with the system through a high level abstraction of four stages: Select, Transform, AllPairs, and Analyze. A high degree of availability and reliability is achieved through transparent fail over, three phase operations, and independent auditing. BXGrid is currently in daily production use by an active biometrics research group at the University of Notre Dame. We discuss our experience in constructing and using the system and offer lessons learned in conducting collaborative research in e-Science.
Similar content being viewed by others
References
Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of CASCON, Toronto, Canada, 1998
Daugman, J.: How Iris recognition works. IEEE Trans. Circuits Syst. Video Technol. 14(1), 21–30 (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large cluster. In: Operating Systems Design and Implementation, 2004
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3) (2005)
Dongarra, J.J., Walker, D.W.: MPI: a standard message passing interface. Supercomputer (January), 56–68 (1996)
Gray, J., Szalay, A.: Where the rubber meets the sky: bridging the gap between databases and science. IEEE Data Eng. Bull. 27, 3–11 (2004)
Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data parallel programs from sequential building blocks. In: Proceedings of EuroSys, March 2007
Jain, A.K., Ross, A., Pankanti, S.: A prototype hand geometry-based verification system. In: Proc. Audio- and Video-Based Biometric Person Authentication (AVBPA), pp. 166–171, 1999
Moretti, C., Bulosan, J., Flynn, P., Thain, D.: All-pairs: an abstraction for data intensive cloud computing. In: International Parallel and Distributed Processing Symposium (IPDPS), 2008
No, J., Thakur, R., Choudhary, A.: Integrating parallel file i/o and database support for high-performance scientific data management. In: IEEE High Performance Networking and Computing, 2000
Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: USENIX File and Storage Technologies, 2007
Ratha, N., Bolle, R.: Automatic Fingerprint Recognition Systems. Springer, Berlin (2004)
Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large scale data mining and multimedia. In: Very Large Databases (VLDB), 1998
Searcs, R., Ingen, C.V., Gray, J.: To blob or not to blob: large object storage in a database or a filesystem. Technical Report MSR-TR-2006-45, Microsoft Research, April (2006)
Stolte, E., von Praun, C., Alonso, G., Gross, T.: Scientific data repositories. Designing for a moving target. In: SIGMOD, 2003
Szalay, A.S., Kunszt, P., Thakar, A., Gray, J., Slutz, D., Brenner, R.J.: Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. Technical Report MSR-TR-99-30, Microsoft Research, Feb (2000)
Thain, D., Moretti, C., Hemmes, J.: Chirp: a practical global file system for cluster and grid computing. J. Grid Comput. 7(1), 51–72 (2009)
Thain, D., Tannenbaum, T., Livny, M.: Condor and the grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)
Wan, M., Moore, R., Schroeder, W.: A prototype rule-based distributed data management system rajasekar. In: HPDC Workshop on Next Generation Distributed Data Management, May 2006
Yan, P., Bowyer, K.W.: A fast algorithm for icp-based 3d shape biometrics. Comput. Vis. Image Underst. 107(3), 195–202 (2007)
Zhao, W., Chellappa, R., Phillips, P., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Comput. Surv. 34(4), 299–458 (2003)
Zhao, Y., Dobson, J., Moreau, L., Foster, I., Wilde, M.: A notation and system for expressing and executing cleanly typed workflows on messy scientific data. In: SIGMOD, 2005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bui, H., Kelly, M., Lyon, C. et al. Experience with BXGrid: a data repository and computing grid for biometrics research. Cluster Comput 12, 373–386 (2009). https://doi.org/10.1007/s10586-009-0098-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-009-0098-7