Real-time visual content description system based on MPEG-7 descriptors

Kapela, Rafał; Śniatała, Paweł; Rybarczyk, Andrzej

doi:10.1007/s11042-010-0493-3

Real-time visual content description system based on MPEG-7 descriptors

Published: 02 March 2010

Volume 53, pages 119–150, (2011)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rafał Kapela¹,
Paweł Śniatała¹ &
Andrzej Rybarczyk¹

170 Accesses
8 Citations
Explore all metrics

Abstract

This paper presents a real-time Visual Content Description System (VCDS) based on MPEG-7 descriptors. In our approach the system’s structure is divided into two parts, the first of which is the extraction of the descriptors using the VCDS. The second part uses the descriptors’ values in a particular search algorithm. We propose here original solutions for both parts. The proposed system architecture could be used for real-time video indexing and retrieval, content summarization, content delivery, surveillance, personalized services, etc. The descriptor extractor IP core, which is part of the VCDS, implements four MPEG-7 visual descriptors and was designed for ASIC implementation in CMOS 0.35 μm, which is a novel solution for a real-time content description problem. The proposed hardware architecture splits the computational burden into several threads, so that calculations are made simultaneously in order to improve the system’s speed. These methods make the hardware implementation of the most computationally demanding modules of the system more time- and power-efficient. Four different variations of the basic hardware architecture are discussed. New search algorithms based on the VCDS responses are also proposed. Experimental results demonstrate the effectiveness of the hardware architectures, and the new approach to similarity-based searching methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

A Complete Real-Time Feature Extraction and Matching System Based on Semantic Kernels Binarized

System-Level Hardware Implementation of Simplified Low-Level Color Image Descriptor

References

Bae B, Yang SW, Ro YM (2003) Fast MPEG-7 visual descriptor extraction using DCT coefficient. TENCON, Bangalore, pp 1136–1139
Google Scholar
Boyd JE, Sayles M, Olsen L, Tarjan P (2004) Content description servers for networked video surveillance. In: Proceedings international conference on information technology. Coding and Computing, pp 798–803
Chang SF, Puri A, Sikora T, Zhang H (2001) Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst Video Technol 11:688–695
Article Google Scholar
Chang JY, Fung HC, Hitang YW, Chen LG (2004) Architecture of MPEG-7 color structure description generator for real-time video applications.Proc. of International Conference on Image Processing, pp 2813–2816
CUDA for GPU Computing. http://news.developer.nvidia.com/2007/02/cuda_for_gpu_co.html
Döller M, Kosch H, Dörflinger B, Bachlechner A, Blaschke G (2002) Demonstration of an MPEG-7 multimedia data cartridge. In: Proceedings of the tenth ACM international conference on Multimedia. Juan-les-Pins, France, pp 85–86
Chapter Google Scholar
Ebrahimi T, Abdeljaoued Y, Figureas RM, Divorra Escoda O (2001) MPEG-7 camera. In: Proc. international conference on image processing, vol 3. Thessaloniki, pp 600–603
Eid M, Alamri A, El Saddik A (2006) MPEG-7 description of haptic applications using HAML. In: IEEE international workshop on haptic audio visual environments and their applications. HAVE’2006, Ottawa, Canada, pp 134–139
Ferman AM, Krishnamachari S, Abdel-Mottaleb M, Tekalp AM, Mehrotra R (2001) Core experiment on Group-of-Frames/Pictures histogram descriptors (CT7). Technical Report #13-05, MPEG-7 Color Descriptors
Kapela R, Rybarczyk A (2007) A real-time shape description system based on MPEG-7 descriptors. J Systems Archit 53:602–618
Article Google Scholar
Kapela R, Rybarczyk A, Śniatała P, Rudnicki R (2006) Hardware realisation of the MPEG-7 edge histogram descriptor. In: Proc. mixed design of integrated circuits and systems. MIXDES, Gdynia, Poland, pp 675–678
Chapter Google Scholar
Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. International Conference on Image Processing, ICIP’2001, Thessaloniki, Greece
Koenen R, Pereira F (2000) MPEG-7: a standardised description of audiovisual content. Signal Processing: Image Communication 16(1–2):5–13
Article Google Scholar
Kreppa M (2006) MDCT IP Core specification. Rev. 1.1. www.opencores.org
Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7. Multimedia Content Description Interface. John Wiley & Sons, Ltd
Martin O, Solana MJ (2001) Programable processor for on-line computing of inverse Haar transform. Electron Lett 37(16):1050–1052
Article Google Scholar
Multimedia content description interface—ISO/IEC 15938-3 (2001) Part 3 Visual, Version 1, pp 44–52
Ndjiki-Nya P, Novychny O, Wiegand T (2004) Video content analysis using MPEG-7 descriptors. In: 1st European conference on visual media production (CVMP). London, United Kingdom, pp 95–101
Google Scholar
Ndjiki-Nya P, Restat J, Meiers T, Ohm JR, Seyferth A, Sniehotta R (2000) Subjective evaluation of the MPEG-7 retrieval accuracy measure (ANMRR). Technical Report #13-02, MPEG-7 Color Descriptors
Savakis A, Śniatała P, Rudnicki R (2003) Real time video annotation using MPEG-7 motion activity descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Łodz, Poland, pp 625–628
Google Scholar
Savakis A, Śniatała P, Rudnicki R (2004) Hardware implementation of MPEG-7 color descriptors. In: Proc. mixed design of integrated circuits and systems, vol 1. MIXDES, Szczecin, Poland, pp 199–203
Google Scholar
Śniatała P, Kapela R, Rudnicki R, Rybarczyk A (2007) Efficient hardware architectures of selected MPEG-7 color descriptors. EUSIPCO, Poznań, Poland, pp 1672–1675
Google Scholar
Steiger O (2001) Smart camera for MPEG-7. Ecole Polytechnique Federale de Lausanne, Lausanne
Google Scholar
Won CS, Park DK, Park SJ (2002) Efficient use of MPEG-7 edge histogram descriptor. ETRI J 24(1):23–30
Article Google Scholar
Xing B, Fu P, Sun Z, Liu Y, Zhao J, Chen M, Li X (2006) Hardware for MPEG-7 compact color descriptor based on sub-block. ICSP, China
Google Scholar
Xu H, Mita Y, Shibata T (2002) Similarity-measure-based VLSI searching system for MPEG-7. In: Proc. of world automation congress, pp 357–363

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Computing Science and Management, Poznań University of Technology, 3A Piotrowo Street, 60-965, Poznań, Poland
Rafał Kapela, Paweł Śniatała & Andrzej Rybarczyk

Authors

Rafał Kapela
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Śniatała
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Rybarczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Kapela.

Appendix: Object matching techniques

In this section we describe three object matching techniques used in our experiments. The matching process inputs are always two color images: I ^′, I ^′′. The particular object extracted from an image i is referred to as $O_{i}^{\prime}$. Object properties are indicated as follows:

.Vol—object’s area (volume);
.SCD—object’s SCD;
.EHD—object’s EHD;
.x, .y—the coefficients of i-th object’s mass center.

1.1 Combined matching technique

The combined matching technique relies on a two-step image similarity measurement. The steps are called griddles, because of their specific task (each one filters similar images based on different measurement techniques). The first griddle passes those images which have a sufficient number of similar features.

Assume that n, m are the number of objects in images I ^′, I ^′′ respectively. Then we have:

$$ \left\{\exists i,j \ : \ D\left(O^{\prime}_{i},O_{j}^{\prime\prime}\right)\leq t_1,t_2 \right\}\Rightarrow \left\{s=s+1\right\} \label{eq:com-match-1-griddle} $$

(8)

where

$$ D\left(a,b\right) \leq t_1, t_2 \Leftrightarrow \left\{\left[ L_1\left(a.SCD,b.SCD\right)\leq t_1 \right]\wedge\right. \left.\left[L_1\left(a.EHD,b.EHD\right)\leq t_2 \right] \right\}\\ \label{eq:com-match-dist} $$

(9)

and i = 1,..,n j = 1,..,m. s is the number of similar features on both of the images.

The second one is based on calculations of the combined distance which takes into account SCD, EHD and vertical placement of the objects on the images:

$$ \left\{\left[s\geq 0.6n\right] \vee \left[s\geq0.9m\right]\right\}\Rightarrow \left\{\forall i,j \left[\mathrm{min}\left(L_1\left(O^{'}_{i}.SCD,O^{''}_{j} .SCD\right)\right)\right] \ : \ \mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right) \right\} $$

where

$$ \mathrm{d}\left(a,b\right) = \left\{1.5L_1\left(a.SCD,b.SCD\right)\right. + \left. 4L_1\left(a.EHD,b.EHD\right) + 8|a.y-b.y|\right\}\\ \label{eq:com-match-dist-final} $$

(10)

is the distance that allows us to rank images passed through first griddle.

1.2 Similar objects matching technique

This method is quite similar to the previous one. We have modified the similarity measure used in the first step—when two objects are matched as similar, the respective bits in an additional vector are set in order to exclude the objects from the remainder of the similarity measurement process.

Assume that n,m are the number of objects in images I ^′, I ^′′ respectively. We initialize a vector $M\in\Im^{1\times n}, \ \left\{ \forall i \ : \ M\left(i\right)=0 \right\}$.

The first griddle works as follows:

$$ \left\{\left[\exists i,j \ : \ D\left(O^{'}_{i},O_{j}^{''}\right)\leq t_1,t_2\right] \wedge \left[M\left(i\right)=0\right] \right\}\Rightarrow \left\{s=s+1, M\left(i\right)=1\right\} $$

The distance D is computed in the same manner as in the previous matching method (8). Note that in this technique, the placement of the objects is not taken into account. For further distance calculations we assume

$$ \left\{\left[s\geq 0.6n\right] \vee \left[s\geq 0.9m\right]\right\}\Rightarrow \left\{\forall i,j \left[\mathrm{min}\left(L_1\left(O^{'}_{i}.SCD,O^{''}_{j} .SCD\right)\right)\right] \ : \ \mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right) \right\} $$

where

$$ \mathrm{d}\left(a,b\right) = \left\{1.5L_1\left(a.SCD,b.SCD\right) +\right. \left. 4L_1\left(a.EHD,b.EHD\right)\right\} \label{eq:sim-match-dist-final} $$

(11)

is the distance that allows us to rank images passed through first griddle.

1.3 The biggest object matching technique

The biggest object matching technique differ considerably from the other techniques presented. It does not contain complex similarity matching methods—the idea is to find the two biggest objects in the reference image and to calculate the distance between the selected objects from the first image, and the most similar objects from the second image. The assumption that we must use is that n,m ≥ 2. Then we have:

$$\begin{array}{rrr} \left\{\exists i,j \left[\mathrm{max}\left(O_{i}^{'}.Vol\right)\right.\right. \wedge \\ \left.\left.\mathrm{min}\left(L_1\left(O_{i}^{'}.SCD,O_{j}^{''} .SCD\right)\right)\right] \ : \ d(O_{i}^{'},O_{j}^{''})\right\} \wedge\\ \left\{\exists l,k \left[\mathrm{max}\left(O_{l}^{'}.Vol\right)\right.\right. \wedge \\ \left.\left.\mathrm{min}\left(L_1\left(O_{l}^{'}.SCD,O_{k}^{''} .SCD\right)\right)\right] \ : \ d(O_{l}^{'},O_{k}^{''})\right\} \wedge\\ \left(l\neq i\right)\wedge \left(l\neq k\right)\Rightarrow 0.5\mathrm{d}\left(O^{'}_{i},O^{''}_{j}\right)+0.5\mathrm{d}\left(O^{'}_{l},O^{ ''}_{k} \right) \end{array} $$

Where the distance d is computed as follows:

$$ \mathrm{d}\left(a,b\right) = L_1(a.SCD,b.SCD) \label{eq:big-match-dist-final} $$

(12)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kapela, R., Śniatała, P. & Rybarczyk, A. Real-time visual content description system based on MPEG-7 descriptors. Multimed Tools Appl 53, 119–150 (2011). https://doi.org/10.1007/s11042-010-0493-3

Download citation

Published: 02 March 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s11042-010-0493-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time visual content description system based on MPEG-7 descriptors

Abstract

Access this article

Similar content being viewed by others

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

A Complete Real-Time Feature Extraction and Matching System Based on Semantic Kernels Binarized

System-Level Hardware Implementation of Simplified Low-Level Color Image Descriptor

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Object matching techniques

1.1 Combined matching technique

1.2 Similar objects matching technique

1.3 The biggest object matching technique

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time visual content description system based on MPEG-7 descriptors

Abstract

Access this article

Similar content being viewed by others

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

A Complete Real-Time Feature Extraction and Matching System Based on Semantic Kernels Binarized

System-Level Hardware Implementation of Simplified Low-Level Color Image Descriptor

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Object matching techniques

Appendix: Object matching techniques

1.1 Combined matching technique

1.2 Similar objects matching technique

1.3 The biggest object matching technique

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation