Description schemes for video programs, users and devices

https://doi.org/10.1016/S0923-5965(00)00026-6Get rights and content

Abstract

This paper presents a set of description schemes (DS) dealing with video programs, users and devices. Following MPEG-7 terminology, a description of an AV document includes descriptors (termed Ds), which specify the syntax and semantics of a representation entity for a feature of the AV data, and description schemes (termed DSs) which specify the structure and semantics of a set of Ds and DSs. The Program DS is used to describe the physical structure as well as the semantic content of a video program. It focuses on the visual information only. The physical structure is described by the temporal organization of the sequence (segments), the spatial organization of images (regions) as well as the spatio-temporal structure of the video (regions with motion). The semantic description is built around objects and events. Finally, the physical and semantic descriptions are related by a set of links defining where or when instances of specific semantic notions can be found. The User DS is used to describe the personal preferences and usage patterns of a user. It facilitates a smart personalizable device that records and presents to the user audio and video information based upon the user's preferences, prior viewing and listening habits, as well as personal characteristics. Finally, the Device DS keeps a record of the users of the device, available programs, and a description of device capabilities. It allows a device to prepare itself based on the existing users, profiles and available programs. These three types of DSs and the common set of descriptors that they share are designed to support personalization, efficient management of AV information and the expected variability in the capabilities of AV information access devices.

Introduction

Multimedia content provides information and entertainment to a wide range of people. Furthermore, the amount of multimedia content that one can access is increasing at a rapid speed. Besides professional users, also in many households today audiovisual information can be obtained from multiple sources such as cable television, satellite dish, radio, world-wide-web, CD/DVD/tapes, etc. In addition, users can create multimedia content using their personal cameras and computers. To help users find and retrieve relevant information effectively, and to facilitate new and better ways of entertainment, advanced technologies need to be developed for browsing, filtering, and searching the vast amount of multimedia content available. There are ongoing efforts towards new advances in hardware and software technologies and the communication infrastructure. In addition, there are efforts aimed at developing exchangeable formats capable of hosting rich descriptions of (i) the multimedia content, (ii) the users of the content, and (iii) the multimedia devices that access and consume the content, so that effective browsing, filtering, and search may be performed on the basis of this description data. This paper is concerned with such descriptions. We focus on video programs and consider descriptions of their visual content as well as users and devices that access and consume these programs.

To define exchangeable formats, MPEG has initiated a new work item, formally called “Multimedia Content Description Interface”, better known as MPEG-7 [11]. In the context of MPEG-7, a description of an AV document includes descriptors (termed Ds), which specify the syntax and semantics of a representation entity for a feature of the AV data, and description schemes (termed Ds) which specify the structure and semantics of a set of Ds and DSs. Descriptions are expressed in a common description definition language (DDL) to allow their exchange and access. In this paper, we present (1) a DS for describing the visual content of a video program – Program DS; (2) a DS for describing a user of audiovisual content – User DS; and (3) a DS for describing an audiovisual device – Device DS. The proposed description schemes support the following functionalities:

  • Effective content-based filtering and searching of audiovisual information;

  • Efficient interactive browsing of audiovisual information;

  • Personalizable browsing, filtering and searching of audiovisual information and the ability to personalize audiovisual systems regardless of their brand name and physical location;

  • Integrated representation of still images and video.

The following is a short overview of the description schemes proposed.

A Program DS is used to describe both the physical structure and semantic content of a video program. The physical structure involves the description of the temporal organization of the sequence (segments), the spatial organization of images (regions) as well as the spatio-temporal structure of the video (regions with motion). The semantic description is built around objects and events. Finally, the physical and semantic descriptions are related by a set of links defining where or when instances of specific semantic notions can be found. The Program DS allows filtering and search to be performed based on the content of a video program. It also enables a user to access only a portion of a particular video program that the user is interested in, while skipping the remainder of the program.

A User DS is used to describe the personal preferences and usage patterns of a user. It facilitates a smart personalizable device that records and presents to the user audio and video information based upon the user's preferences, prior viewing and listening habits, as well as personal characteristics. It permits the device to automatically discover and record desirable information and to automatically customize itself to the user. The user information contained in the User DS should be portable and usable by different devices so that other devices may likewise be configured automatically to the particular user's preferences upon receiving the user information regardless of their brand name or physical location.

A Device DS keeps a record of the users of the device, available programs, and a description of device capabilities. It allows a device to prepare itself based on the existing users’ profiles and available programs. It also allows efficient communication between different devices. For example, a content provider may supply a customized version of the content to a particular device based on a description of its capabilities.

There is a synergistic interrelation amongst the three types of DSs in the following sense. The Program DS and the User DS use a common vocabulary of descriptors, at least partially, so that the potential desirability of a program can be determined by comparing descriptors representative of the same information. For example, a Program DS and a User DS may include the same set of program categories and actors. A Program DS and a device DS should also include partially overlapping descriptors. With the overlapping descriptors, a device DS will be capable of storing the information contained within a Program DS, e.g., program category, so that the content-related information is properly indexed. With proper indexing, a device is capable of matching such content related information with the user information, if available, for instance for obtaining and recording suitable programs. A User DS and a Device DS should also include partially overlapping descriptors. With these overlapping descriptors, a device can capture the desired device-related information, which would otherwise not be recognized as desirable. A Device DS preferably includes a list of users and available programs. Based on the master list of available programs, and associated Program DS, a device can determine the desired programs for each one of its users.

In the following section, we present a Program DS for describing the visual content of a video program. In Section 3, we present a User DS for describing the personal preferences and usage history of a user. In Section 4, we present a Device DS for describing the capabilities of a device and keeping a record of its existing users and available programs. Finally in Section 5, we summarize our work to date.

Section snippets

Table of contents and index

The Program DS is largely inspired from the classical way of describing the content of written documents such as books: the Table of Contents and the Index [18], [21]. The Table of Contents is a hierarchical representation that splits the document into elementary pieces (chapters, sections, subsections, etc). The order in which the items are presented follows the linear structure of the book itself. As a result, the Table of Contents is a representation of the linear, one-dimensional structure

User DS

A User DS facilitates personalized access and consumption of audiovisual information. A User DS contains descriptors that describe a user's preferences, usage history and demographics pertaining to audiovisual content. The User DS can be used to filter programs according to user preferences, make suggestions to the user on the availability of content that fit the user's preference, and take actions on behalf of the user according to user preferences, usage history, and demographics. A User DS

Device DS

The purpose of a Device DS is to describe a device identified by a device ID. By a device we mean, for example, an audiovisual information appliance with browsing, filtering and search capabilities. By description of a device we mean information about the users of the device, audiovisual content known to the device (including programs that are stored in the device and programs that will be broadcast in the future), and the capabilities of the device. The proposed Device DS includes three

Conclusions

We have proposed description schemes describing video programs, users of devices that access, store and consume these programs, and the devices themselves. These three types of description schemes do indeed share a common set of descriptors to facilitate a complete solution that simultaneously accounts for the desirability of personalization, efficient management of programs and users that are known to devices, and variations in the capabilities of multimedia access devices.

The proposed Program

Philippe Salembier received a degree from the Ecole Polytechnique, Paris, France, in 1983 and a degree from the Ecole Nationale Superieure des Telecommunications, Paris, France, in 1985. He received the Ph.D. from the Swiss Federal Institute of Technology (EPFL) in 1991. He was a Postdoctoral Fellow at the Harvard Robotics Laboratory, Cambridge, MA, in 1991. From 1985 to 1989 he worked at Laboratoires d'Electronique Philips, Limeil-Brevannes, France, in the fields of digital communications and

References (27)

  • G. Yang et al.

    Human face detection in complex background

    Pattern Recognition

    (1994)
  • P. Bouthemy, F. Ganansia, Video partitioning and camera motion characterization for content-based video indexing, in:...
  • R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proc. IEEE 83 (5) (May 1995)...
  • G.C.H. Chuang, C.C. Jay Kuo, Wavelet descriptor of planar curves: Theory and applications. IEEE Trans. Image Process. 5...
  • P. Correia, F. Pereira, The role of analysis in content-based video coding and indexing, Signal Processing 66 (2)...
  • J.D. Courtney, Automatic video indexing via object motion analysis, Pattern Recognition 30 (4) (April 1997)...
  • Dublin Core Metadata Initiative, see...
  • H. Freeman, On the coding of arbitrary geometric configurations, IRE Trans. Electron. Comput. EC-10 (June 1961)...
  • L. Garrido, P. Salembier, D. Garcia, Extensive operators in partition lattices for image sequence analysis, Signal...
  • R.C. Gonzalez et al.

    Digital Image Processing

    (1992)
  • I.T. Jolliffe

    Principal Component Analysis

    (1986)
  • MPEG Requirements Group, MPEG-7 context, objectives and technical roadmap, Doc. ISO/IEC JTC1/SC29/WG11 N2729, Seoul...
  • S. Paek, C.-S. Li, A. Puri et al., Image description scheme, proposals P480, Document ISO/IEC JTC1/SC29/WG11, P480,...
  • Cited by (0)

    Philippe Salembier received a degree from the Ecole Polytechnique, Paris, France, in 1983 and a degree from the Ecole Nationale Superieure des Telecommunications, Paris, France, in 1985. He received the Ph.D. from the Swiss Federal Institute of Technology (EPFL) in 1991. He was a Postdoctoral Fellow at the Harvard Robotics Laboratory, Cambridge, MA, in 1991. From 1985 to 1989 he worked at Laboratoires d'Electronique Philips, Limeil-Brevannes, France, in the fields of digital communications and signal processing for HDTV. In 1989, he joined the Swiss Federal Institute of Technology in Lausanne, Switzerland, to work on image processing. At the end of 1991, after a stay at the Harvard Robotics Lab., he joined the Polytechnic University of Catalonia, Barcelona, Spain, where he is currently associate professor. He is lecturing on the area of digital signal and image processing. His current research interests include image and sequence analysis, compression and indexing, image modeling, segmentation problems, texture analysis, mathematical morphology and nonlinear filtering. In terms of current applications, he is particularly interested in video indexing and the MPEG-7 standardization process. He has served as an Area Editor of the Journal of Visual Communication and Image representation (Academic Press) from 1995 until 1998 and is currently an AdCom officer of the European Association for Signal Processing (EURASIP) in charge of the edition of the Newsletter. He has edited (as guest editor) two special issues of Signal Processing on “mathematical morphology” (1994) and on “video sequence analysis” (1998). He is currently co-editing a special issue of “Signal processing: Image communication” on the MPEG-7 proposals which were recently submitted for evaluation. Finally, he is Deputy-Editor of Signal Processing.

    Richard Qian received the B.S. degree in computer science from Tsinghua University, Beijing, China in 1986. He received the M.S. and Ph.D. degrees in electrical engineering from University of Illinois at Urbana-Champaign in 1992 and 1996, respectively. From 1996 to 1999, he was a researcher at Sharp Labs of America in Camas, Washington, and worked on video analysis and description. He received a Sharp outstanding R&D award in 1998. Since 1999, he has been a senior researcher at lntel Architecture Labs in Hillsboro, Oregon, and leading research in the field of multimedia content modeling, analysis and description. His present research interests also include human-computer interface, mobile computing and digital infotainment.

    Noel E. O'Connor received his primary degree from Dublin City University, Dublin, Ireland, in October 1992. He received his Ph.D. also from Dublin City University in October 1998. From September 1992 to November 1998 he was a Research Assistant in the Video Coding Group of Teltec Ireland. He is currently a lecturer of digital signal processing in the School of Electronic Engineering of Dublin City University. His current research interests include image and sequence compression, region-based and object-based segmentation, and video sequence analysis for indexing applications. He is one of two Irish representatives to the ISO/IEC MPEG standards body.

    Paulo Lobato Correia graduated as an Engineer and obtained an M.Sc. in electrical and computers engineering from Instituto Superior Técnico (IST), Universidade Técnica de Lisboa, Portugal, in 1989 and 1993, respectively. He is currently working towards a Ph.D. in the area of image analysis for coding and indexing. Since 1990 he is a Teaching Assistant at the electrical and computers department of IST, and since 1994 he is a researcher at the Image Communication Group of IST. His current research interests are in the area of video analysis and processing, including content-based video description and representation.

    M. Ibrahim Sezan received the B.S. degrees in Electrical Engineering and Mathematics from Bogazici University, Istanbul, Turkey in 1980. He received the M.S degree in Physics from Stevens Institute of Technology, Hoboken, New Jersey, and the Ph.D. degree in Electrical Computer and Systems Engineering from Rensselaer Polytechnic Institute, Troy, New York in 1982 and 1984, respectively. He is currently a Senior Manager at the Digital Video Department at Sharp Laboratories of America, Camas, Washington, where he is heading a group focusing on algorithm and system development for video resolution enhancement, visual quality optimization, and smart appliances for audiovisual information access, management, and consumption. From 1984 to 1996, he was with Eastman Kodak Company, Rochester, New York, where he headed the Video and Motion Technology Area in the Imaging Research and Advanced Development Laboratories from 1992 to 1996. Dr. Sezan contributed to a number of books on image recovery, image restoration, medical imaging, and video compression. He edited Selected Papers in Digital Image Restoration (SPIE Milestone Series, 1992), and co-edited Motion Analysis and Image Sequence Processing (Kluwer Academic Publishers, 1993). Dr. Sezan is a senior member of IEEE.

    Peter van Beek was born in Amsterdam, the Netherlands, in 1967. He received the M.Sc. Eng. and Ph.D. degrees in Electrical Engineering from the Delft University of Technology, Delft, the Netherlands, in 1990 and 1995, respectively. From March 1996 to September 1998, he was a Research Associate with the Department of Electrical Engineering and Center for Electronic Imaging Systems, University of Rochester, Rochester, New York. In October 1998, he joined Sharp Laboratories of America, Camas, Washington. His research interests include digital image and video analysis, video storage and retrieval and hybrid natural/synthetic media coding.

    1

    Now with Intel Architecture Labs in Hillsboro, Oregon, USA.

    View full text