Skip to main content
Log in

Real-time object recognition using local features on a DSP-based embedded system

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

In the last few years, object recognition has become one of the most popular tasks in computer vision. In particular, this was driven by the development of new powerful algorithms for local appearance based object recognition. So-called “smart cameras” with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. Recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects is a matter of public safety. This simply makes the deployment of recognition capabilities on embedded platforms necessary. In our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a DSP-based embedded system. We implement several powerful algorithms for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. We carefully investigate the parameters of the algorithm with respect to the performance on the embedded platform. We show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. The algorithm described is complex \({{\mathcal{O}}}(N{\rm log}({\rm log}(N)))\). A more efficient algorithm based on the use of component tree analysis is to be found in [10].

  2. Note that we also presented an application of our approach for vehicle reacquisition in smart camera networks in [2].

References

  1. Arth, C., Bischof, H., Leistner, C.: TRICam: An embedded platform for remote traffic surveillance. In: Embedded Computer Vision Workshop (held in conjunction with CVPR) (2006)

  2. Arth, C., Leistner, C., Bischof, H.: Object reacquisition and tracking in large-scale smart camera networks. In: Proceedings of the IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 156–163 (2007)

  3. Arth, C., Leistner, C., Bischof, H.: Robust local features and their application in self-calibration and object recognition on embedded systems. In: Embedded Computer Vision Workshop (held in conjunction with CVPR), pp. 1–8 (2007)

  4. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4):509–522 (2002)

    Article  Google Scholar 

  5. Bishnu, A., Bhattacharya, B.B., Kundu, M.K., Murthy, C.A., Acharya, T.: A pipeline architecture for computing the Euler number of a binary image. J. Syst. Archit. 51(8):470–487 (2005)

    Article  Google Scholar 

  6. Bishnu, A., Bhunre, P.K., Bhattacharya, B.B., Kundu, M.K., Murthy, C.A., Acharya, T.: Content based image retrieval: related issues using Euler vector. In: Proc. of the ICIP, vol. 2, pp. II-585–II-588 (2002)

  7. Brown, M., Szeliski, R., Winder, S.: Multi-image matching using multi-scale oriented patches. In: Proc. of the CVPR, vol. 1, pp. 510–517, 20–25 June 2005

  8. Carson, C., Belongie, S., Greenspan, H., Malik, J.: Region-based image querying. In: Workshop on Content-Based Access of Image and Video Libraries (held in conjunction with CVPR) (1997)

  9. Dey, S., Bhattacharya, B.B., Kundu, M.K., Acharya, T.: A fast algorithm for computing the Euler number of an image and its VLSI implementation. In: Proc. of the 13th International Conference on VLSI Design, pp. 330–335 (2000)

  10. Donoser, M., Bischof, H.: Efficient maximally stable extremal region (MSER) tracking. In: Proc CVPR, vol. 1, pp. 553–560 (2006)

  11. Estevez, L., Kehtarnavaz N.: A real-time histographic approach to road sign recognition. In: Southwest Symposium on Image Analysis and Interpretation, pp. 95–100 (1996)

  12. Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 13(9):891–906 (1991)

    Article  Google Scholar 

  13. Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1):103–112 (2005)

    Article  Google Scholar 

  14. Harris, C., Stephens M.J.: A combined corner and edge detector. In: Alvey Vision Conference, pp. 147–152 (1988)

  15. Helmbold, D.P., Schapire R.E.: Predicting nearly as well as the best pruning of a decision tree. In: Computational Learing Theory, pp. 61–68 (1995)

  16. Kadir, T., Zisserman, A., Brady M.: An affine invariant salient region detector. In: Proc. of the ECCV, vol. 1, pp. 228–241 (2004)

  17. Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: Proc. of the CVPR, vol. 2, pp. 506–513 (2004)

  18. Kuo, S.M., Lee, B.H., Tian, W.: Real-Time Digital Signal Processing: Implementations and Applications. Wiley, New York (2006)

    Google Scholar 

  19. Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 27(8):1265–1278 (2005)

    Article  Google Scholar 

  20. Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Proc. CVPR, vol. 2, pp. 775–781 (2005)

  21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2):91–110 (2004)

    Article  Google Scholar 

  22. Mansour, Y.: Pessimistic decision tree pruning based on tree size. In: Proc. of the International Conference on Machine Learning, Morgan Kaufmann, pp. 195–201 (1997)

  23. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Rosin, Paul, L., Marshall, D. (eds.) Proc. of the BMVC, London, UK, vol. 1, pp. 384–393, September 2002. BMVA

  24. Mikolajczyk, K.: Interest Point Detection Invariant to Affine Transformations. Ph.D. thesis, Institut National Polytechnique de Grenoble (2002)

  25. Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. of the ICCV, pp. 525–531 (2001)

  26. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proc. of the ECCV, vol. 1, pp. 128–142 (2002)

  27. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10):1615–1630 (2005)

    Article  Google Scholar 

  28. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2):43–72 (2005)

    Article  Google Scholar 

  29. Munich, M.E., Pirjanian, P., DiBernardo, E., Goncalves, L., Karlsson, N., Lowe, D.: Break-through visual pattern recognition for robotics and automation. In: IEEE International Conference on Robotics and Automation (2005)

  30. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: Proc. of the CVPR, vol. 2, pp. 2161–2168 (2006)

  31. Obdržálek, S., Matas J.: Sub-linear indexing for large scale object recognition. In Proc. of the BMVC, vol. 2 (2005)

  32. Ober, S., Winter, M., Arth, C., Bischof, H.: Dual-layer visual vocabulary tree hypotheses for object recognition. In: Proc. of the ICIP (2007)

  33. Ortmann, V., Eckmiller, R.: Real-time object recognition based on active vision and sequential analysis. In: Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE Comp. Society, Washington, DC, USA, pp. 3325–3328 (1999)

  34. Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. 16(2):187–260 (1984)

    Article  MathSciNet  Google Scholar 

  35. Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How Do I Organize My Holiday Snaps?”. In: Proc. of the ECCV, vol. 1, pp. 414 (2002)

  36. Schiele B., Crowley, J.L.: Object recognition using multidimensional receptive field histograms. In: Proc. of the ECCV, vol. 1, pp. 610–619 (1996)

  37. Schiele B., Crowley J.L.: Recognition without correspondence using multidimensional receptive field histograms. Int. J. Comput. Vis. 36(1):31–50 (2000)

    Article  Google Scholar 

  38. Schmid C., Mohr R.: Local grayvalue invariants for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 19(5):530–535 (1997)

    Article  Google Scholar 

  39. Sivic, J., Zisserman, A.: Video google: a text retrieval aproach to object matching in videos. In: Proc. of the ICCV, IEEE Computer Society, Los Alamitos, CA, USA, vol. 02, p. 1470 (2003)

  40. Squire, D., Muller, W., Muller, H., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. In: Proc. of the Scandinavian Conference on Image Analysis (1999)

  41. Tuytelaars, T., Van Gool, L.J.: Matching widely separated views based on affine invariant regions. Int. J. Comput. Vis. 59(1):61–85 (2004)

    Article  Google Scholar 

  42. Wolf, W., Ozer, B., Lv, T.: Smart cameras as embedded systems. Computer 35(9):48–53 (2002)

    Article  Google Scholar 

  43. Yeh, T., Grauman, K., Tollmar, K., Darrell, T.: A picture is worth a thousand keywords: image-based object search on a mobile platform. In: CHI Extended Abstracts, pp. 2025–2028 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clemens Arth.

Additional information

This work was done in the scope of the VM-GPU Project No. 813396, financed by the Austrian Research Promotion Agency (http://www.ffg.at), and has been supported by the Austrian Joint Research Project Cognitive Vision under projects S9103-N04 and S9104-N04.

Appendix

Appendix

In Table 5 the IDs of the 250 objects selected from the ALOI database for our experiments are listed. These objects have been selected because they deliver the highest number of DoG points on the resized ALOI images (352 × 288 pixels). To illustrate this, in Fig. 25 the number of DoG points for the top 500 ALOI images is depicted.

Table 5 Object IDs selected for experiments
Fig. 25
figure 25

Number of DoG points for the first 500 objects in the ALOI database decreasingly sorted

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arth, C., Bischof, H. Real-time object recognition using local features on a DSP-based embedded system. J Real-Time Image Proc 3, 233–253 (2008). https://doi.org/10.1007/s11554-008-0083-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-008-0083-z

Keywords

Navigation