Abstract
This paper presents a real-time Visual Content Description System (VCDS) based on MPEG-7 descriptors. In our approach the system’s structure is divided into two parts, the first of which is the extraction of the descriptors using the VCDS. The second part uses the descriptors’ values in a particular search algorithm. We propose here original solutions for both parts. The proposed system architecture could be used for real-time video indexing and retrieval, content summarization, content delivery, surveillance, personalized services, etc. The descriptor extractor IP core, which is part of the VCDS, implements four MPEG-7 visual descriptors and was designed for ASIC implementation in CMOS 0.35 μm, which is a novel solution for a real-time content description problem. The proposed hardware architecture splits the computational burden into several threads, so that calculations are made simultaneously in order to improve the system’s speed. These methods make the hardware implementation of the most computationally demanding modules of the system more time- and power-efficient. Four different variations of the basic hardware architecture are discussed. New search algorithms based on the VCDS responses are also proposed. Experimental results demonstrate the effectiveness of the hardware architectures, and the new approach to similarity-based searching methods.
Similar content being viewed by others
References
Bae B, Yang SW, Ro YM (2003) Fast MPEG-7 visual descriptor extraction using DCT coefficient. TENCON, Bangalore, pp 1136–1139
Boyd JE, Sayles M, Olsen L, Tarjan P (2004) Content description servers for networked video surveillance. In: Proceedings international conference on information technology. Coding and Computing, pp 798–803
Chang SF, Puri A, Sikora T, Zhang H (2001) Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst Video Technol 11:688–695
Chang JY, Fung HC, Hitang YW, Chen LG (2004) Architecture of MPEG-7 color structure description generator for real-time video applications.Proc. of International Conference on Image Processing, pp 2813–2816
CUDA for GPU Computing. http://news.developer.nvidia.com/2007/02/cuda_for_gpu_co.html
Döller M, Kosch H, Dörflinger B, Bachlechner A, Blaschke G (2002) Demonstration of an MPEG-7 multimedia data cartridge. In: Proceedings of the tenth ACM international conference on Multimedia. Juan-les-Pins, France, pp 85–86
Ebrahimi T, Abdeljaoued Y, Figureas RM, Divorra Escoda O (2001) MPEG-7 camera. In: Proc. international conference on image processing, vol 3. Thessaloniki, pp 600–603
Eid M, Alamri A, El Saddik A (2006) MPEG-7 description of haptic applications using HAML. In: IEEE international workshop on haptic audio visual environments and their applications. HAVE’2006, Ottawa, Canada, pp 134–139
Ferman AM, Krishnamachari S, Abdel-Mottaleb M, Tekalp AM, Mehrotra R (2001) Core experiment on Group-of-Frames/Pictures histogram descriptors (CT7). Technical Report #13-05, MPEG-7 Color Descriptors
Kapela R, Rybarczyk A (2007) A real-time shape description system based on MPEG-7 descriptors. J Systems Archit 53:602–618
Kapela R, Rybarczyk A, Śniatała P, Rudnicki R (2006) Hardware realisation of the MPEG-7 edge histogram descriptor. In: Proc. mixed design of integrated circuits and systems. MIXDES, Gdynia, Poland, pp 675–678
Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. International Conference on Image Processing, ICIP’2001, Thessaloniki, Greece
Koenen R, Pereira F (2000) MPEG-7: a standardised description of audiovisual content. Signal Processing: Image Communication 16(1–2):5–13
Kreppa M (2006) MDCT IP Core specification. Rev. 1.1. www.opencores.org
Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7. Multimedia Content Description Interface. John Wiley & Sons, Ltd
Martin O, Solana MJ (2001) Programable processor for on-line computing of inverse Haar transform. Electron Lett 37(16):1050–1052
Multimedia content description interface—ISO/IEC 15938-3 (2001) Part 3 Visual, Version 1, pp 44–52
Ndjiki-Nya P, Novychny O, Wiegand T (2004) Video content analysis using MPEG-7 descriptors. In: 1st European conference on visual media production (CVMP). London, United Kingdom, pp 95–101
Ndjiki-Nya P, Restat J, Meiers T, Ohm JR, Seyferth A, Sniehotta R (2000) Subjective evaluation of the MPEG-7 retrieval accuracy measure (ANMRR). Technical Report #13-02, MPEG-7 Color Descriptors
Savakis A, Śniatała P, Rudnicki R (2003) Real time video annotation using MPEG-7 motion activity descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Łodz, Poland, pp 625–628
Savakis A, Śniatała P, Rudnicki R (2004) Hardware implementation of MPEG-7 color descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Szczecin, Poland, pp 199–203
Śniatała P, Kapela R, Rudnicki R, Rybarczyk A (2007) Efficient hardware architectures of selected MPEG-7 color descriptors. EUSIPCO, Poznań, Poland, pp 1672–1675
Steiger O (2001) Smart camera for MPEG-7. Ecole Polytechnique Federale de Lausanne, Lausanne
Won CS, Park DK, Park SJ (2002) Efficient use of MPEG-7 edge histogram descriptor. ETRI J 24(1):23–30
Xing B, Fu P, Sun Z, Liu Y, Zhao J, Chen M, Li X (2006) Hardware for MPEG-7 compact color descriptor based on sub-block. ICSP, China
Xu H, Mita Y, Shibata T (2002) Similarity-measure-based VLSI searching system for MPEG-7. In: Proc. of world automation congress, pp 357–363
Author information
Authors and Affiliations
Corresponding author
Appendix: Object matching techniques
Appendix: Object matching techniques
In this section we describe three object matching techniques used in our experiments. The matching process inputs are always two color images: I ′, I ′′. The particular object extracted from an image i is referred to as \(O_{i}^{\prime}\). Object properties are indicated as follows:
-
.Vol—object’s area (volume);
-
.SCD—object’s SCD;
-
.EHD—object’s EHD;
-
.x, .y—the coefficients of i-th object’s mass center.
1.1 Combined matching technique
The combined matching technique relies on a two-step image similarity measurement. The steps are called griddles, because of their specific task (each one filters similar images based on different measurement techniques). The first griddle passes those images which have a sufficient number of similar features.
Assume that n, m are the number of objects in images I ′, I ′′ respectively. Then we have:
where
and i = 1,..,n j = 1,..,m. s is the number of similar features on both of the images.
The second one is based on calculations of the combined distance which takes into account SCD, EHD and vertical placement of the objects on the images:
where
is the distance that allows us to rank images passed through first griddle.
1.2 Similar objects matching technique
This method is quite similar to the previous one. We have modified the similarity measure used in the first step—when two objects are matched as similar, the respective bits in an additional vector are set in order to exclude the objects from the remainder of the similarity measurement process.
Assume that n,m are the number of objects in images I ′, I ′′ respectively. We initialize a vector \(M\in\Im^{1\times n}, \ \left\{ \forall i \ : \ M\left(i\right)=0 \right\}\).
The first griddle works as follows:
The distance D is computed in the same manner as in the previous matching method (8). Note that in this technique, the placement of the objects is not taken into account. For further distance calculations we assume
where
is the distance that allows us to rank images passed through first griddle.
1.3 The biggest object matching technique
The biggest object matching technique differ considerably from the other techniques presented. It does not contain complex similarity matching methods—the idea is to find the two biggest objects in the reference image and to calculate the distance between the selected objects from the first image, and the most similar objects from the second image. The assumption that we must use is that n,m ≥ 2. Then we have:
Where the distance d is computed as follows:
Rights and permissions
About this article
Cite this article
Kapela, R., Śniatała, P. & Rybarczyk, A. Real-time visual content description system based on MPEG-7 descriptors. Multimed Tools Appl 53, 119–150 (2011). https://doi.org/10.1007/s11042-010-0493-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0493-3