Abstract
In the last few years, object recognition has become one of the most popular tasks in computer vision. In particular, this was driven by the development of new powerful algorithms for local appearance based object recognition. So-called “smart cameras” with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. Recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects is a matter of public safety. This simply makes the deployment of recognition capabilities on embedded platforms necessary. In our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a DSP-based embedded system. We implement several powerful algorithms for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. We carefully investigate the parameters of the algorithm with respect to the performance on the embedded platform. We show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources.
Similar content being viewed by others
References
Arth, C., Bischof, H., Leistner, C.: TRICam: An embedded platform for remote traffic surveillance. In: Embedded Computer Vision Workshop (held in conjunction with CVPR) (2006)
Arth, C., Leistner, C., Bischof, H.: Object reacquisition and tracking in large-scale smart camera networks. In: Proceedings of the IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 156–163 (2007)
Arth, C., Leistner, C., Bischof, H.: Robust local features and their application in self-calibration and object recognition on embedded systems. In: Embedded Computer Vision Workshop (held in conjunction with CVPR), pp. 1–8 (2007)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4):509–522 (2002)
Bishnu, A., Bhattacharya, B.B., Kundu, M.K., Murthy, C.A., Acharya, T.: A pipeline architecture for computing the Euler number of a binary image. J. Syst. Archit. 51(8):470–487 (2005)
Bishnu, A., Bhunre, P.K., Bhattacharya, B.B., Kundu, M.K., Murthy, C.A., Acharya, T.: Content based image retrieval: related issues using Euler vector. In: Proc. of the ICIP, vol. 2, pp. II-585–II-588 (2002)
Brown, M., Szeliski, R., Winder, S.: Multi-image matching using multi-scale oriented patches. In: Proc. of the CVPR, vol. 1, pp. 510–517, 20–25 June 2005
Carson, C., Belongie, S., Greenspan, H., Malik, J.: Region-based image querying. In: Workshop on Content-Based Access of Image and Video Libraries (held in conjunction with CVPR) (1997)
Dey, S., Bhattacharya, B.B., Kundu, M.K., Acharya, T.: A fast algorithm for computing the Euler number of an image and its VLSI implementation. In: Proc. of the 13th International Conference on VLSI Design, pp. 330–335 (2000)
Donoser, M., Bischof, H.: Efficient maximally stable extremal region (MSER) tracking. In: Proc CVPR, vol. 1, pp. 553–560 (2006)
Estevez, L., Kehtarnavaz N.: A real-time histographic approach to road sign recognition. In: Southwest Symposium on Image Analysis and Interpretation, pp. 95–100 (1996)
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 13(9):891–906 (1991)
Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1):103–112 (2005)
Harris, C., Stephens M.J.: A combined corner and edge detector. In: Alvey Vision Conference, pp. 147–152 (1988)
Helmbold, D.P., Schapire R.E.: Predicting nearly as well as the best pruning of a decision tree. In: Computational Learing Theory, pp. 61–68 (1995)
Kadir, T., Zisserman, A., Brady M.: An affine invariant salient region detector. In: Proc. of the ECCV, vol. 1, pp. 228–241 (2004)
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: Proc. of the CVPR, vol. 2, pp. 506–513 (2004)
Kuo, S.M., Lee, B.H., Tian, W.: Real-Time Digital Signal Processing: Implementations and Applications. Wiley, New York (2006)
Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 27(8):1265–1278 (2005)
Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Proc. CVPR, vol. 2, pp. 775–781 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2):91–110 (2004)
Mansour, Y.: Pessimistic decision tree pruning based on tree size. In: Proc. of the International Conference on Machine Learning, Morgan Kaufmann, pp. 195–201 (1997)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Rosin, Paul, L., Marshall, D. (eds.) Proc. of the BMVC, London, UK, vol. 1, pp. 384–393, September 2002. BMVA
Mikolajczyk, K.: Interest Point Detection Invariant to Affine Transformations. Ph.D. thesis, Institut National Polytechnique de Grenoble (2002)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. of the ICCV, pp. 525–531 (2001)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proc. of the ECCV, vol. 1, pp. 128–142 (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10):1615–1630 (2005)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2):43–72 (2005)
Munich, M.E., Pirjanian, P., DiBernardo, E., Goncalves, L., Karlsson, N., Lowe, D.: Break-through visual pattern recognition for robotics and automation. In: IEEE International Conference on Robotics and Automation (2005)
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: Proc. of the CVPR, vol. 2, pp. 2161–2168 (2006)
Obdržálek, S., Matas J.: Sub-linear indexing for large scale object recognition. In Proc. of the BMVC, vol. 2 (2005)
Ober, S., Winter, M., Arth, C., Bischof, H.: Dual-layer visual vocabulary tree hypotheses for object recognition. In: Proc. of the ICIP (2007)
Ortmann, V., Eckmiller, R.: Real-time object recognition based on active vision and sequential analysis. In: Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE Comp. Society, Washington, DC, USA, pp. 3325–3328 (1999)
Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. 16(2):187–260 (1984)
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How Do I Organize My Holiday Snaps?”. In: Proc. of the ECCV, vol. 1, pp. 414 (2002)
Schiele B., Crowley, J.L.: Object recognition using multidimensional receptive field histograms. In: Proc. of the ECCV, vol. 1, pp. 610–619 (1996)
Schiele B., Crowley J.L.: Recognition without correspondence using multidimensional receptive field histograms. Int. J. Comput. Vis. 36(1):31–50 (2000)
Schmid C., Mohr R.: Local grayvalue invariants for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 19(5):530–535 (1997)
Sivic, J., Zisserman, A.: Video google: a text retrieval aproach to object matching in videos. In: Proc. of the ICCV, IEEE Computer Society, Los Alamitos, CA, USA, vol. 02, p. 1470 (2003)
Squire, D., Muller, W., Muller, H., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. In: Proc. of the Scandinavian Conference on Image Analysis (1999)
Tuytelaars, T., Van Gool, L.J.: Matching widely separated views based on affine invariant regions. Int. J. Comput. Vis. 59(1):61–85 (2004)
Wolf, W., Ozer, B., Lv, T.: Smart cameras as embedded systems. Computer 35(9):48–53 (2002)
Yeh, T., Grauman, K., Tollmar, K., Darrell, T.: A picture is worth a thousand keywords: image-based object search on a mobile platform. In: CHI Extended Abstracts, pp. 2025–2028 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done in the scope of the VM-GPU Project No. 813396, financed by the Austrian Research Promotion Agency (http://www.ffg.at), and has been supported by the Austrian Joint Research Project Cognitive Vision under projects S9103-N04 and S9104-N04.
Appendix
Appendix
In Table 5 the IDs of the 250 objects selected from the ALOI database for our experiments are listed. These objects have been selected because they deliver the highest number of DoG points on the resized ALOI images (352 × 288 pixels). To illustrate this, in Fig. 25 the number of DoG points for the top 500 ALOI images is depicted.
Rights and permissions
About this article
Cite this article
Arth, C., Bischof, H. Real-time object recognition using local features on a DSP-based embedded system. J Real-Time Image Proc 3, 233–253 (2008). https://doi.org/10.1007/s11554-008-0083-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-008-0083-z