Abstract
We live in a world where there are huge number of consumers and producers of multimedia content. In this sea of information, finding the right content is like finding a needle in a haystack. Rich annotation of multimedia content during its initial upload on the Web, and further various methodologies for framing search query can be helpful to the user in this regard. In addition to annotation of multimedia content based on the user-provided description, various approaches for annotation and indexing of multimedia files based upon the embedded contents have been presented in the literature. However, annotating multimedia files by using multiple possible sources simultaneously to generate better annotation needs further exploration. We have proposed a framework utilizing these multiple sources of information like text, audio, image, etc. This framework generates annotation based on the contents of user entered description, embedded audio, image analysis, optical character recognition and finally by gathering more information from the Web. This framework provides multiple options to search for content like search by image, audio, video, face and also provides an improved textual search. A system has been implemented based on the proposed framework and the work has also been evaluated.
Similar content being viewed by others
References
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 12(28):2037–2041
AlchemyAPI (2016.) AlchemyAPI http://www.alchemyapi.com/. Accessed 27 Aug 2016
Apvrille L, Courtiat J, Lohr C, Saqui-Sannes P (2004) TURTLE: a real-time UML profile supported by a formal validation toolkit. IEEE Trans Softw Eng 30(7):473–487
Asprise (2016) OCR. https://asprise.com/home/. Accessed 25 Aug 2016
Banerjee R, Srivastava PK (2013) Reconstruction of contested landscape: detecting land cover transformation hosting cultural heritage sites from Central India using remote sensing. Land Use Policy 34:193–203
Belhumeur PN, Hespanha JP, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Celino I, Valle ED, Cerizza D, Turati A (2006) Squiggle: a semantic search engine for indexing and retrieval of multimedia content. In: Proceedings of the 1st international conference on semantic-enhanced multimedia presentation systems, Volume 228, pp 40–54
Chang SF, Huang Q, Huang T, Puri A, Shahraray B (1999) Multimedia search and retrieval. In: Puri A, Chen T (eds) Advances in multimedia: systems, standards, and networks, New York
Chang L, Haofen W, Linhao Y (2010) Towards efficient SPARQL query processing on RDF data. Tsinghua Sci Technol 15(6):613–622
Clausen M, Körner H, Kurth F (2003) An efficient indexing and search technique for multimedia databases. In: SIGIR workshop on multimedia retrieval. Canada, Toronto, pp 1–12
CMU Sphinx (2016) CMU sphinx – open source speech recognition toolkit http://cmusphinx.sourceforge.net/. Accessed 27 Aug 2016
Deilamani MJ, Asli RN (2011) Moving object tracking based on mean shift algorithm. In: International symposium on artificial intelligence and signal processing (AISP), pp 48–53
Faloutsos C (1996) Searching multimedia databases by content. Kluwer Academic Publishers, MA, USA
FFMPEG (2016) FFMPEG http://www.ffmpeg.org/. Accessed 27 Aug 2016
FileInfo (2016) Video File Types http://www.fileinfo.com/filetypes/video. Accessed 28 Aug 2016
Frankel C, Swain MJ, Athitsos V (1996) Webseer: an image search engine for the world wide web. Technical Report. University of Chicago, Chicago
Gir’o X, Vilaplana V, Marqu’es F, Salembier P (2005) automatic extraction and analysis of visual objects information. In: Stamou G, Kollias S (eds) Multimedia content and the semantic web: methods, standards and tools, John Wiley & Sons, pp 203–221
Hausenblas M (2011) Building scaleable and smart multimedia. 1st edn. GRIN Verlag
Helliker J (2012) Media Release – Nielsen VideoCensus launches in Australia http://www.nielsen.com/content/dam/corporate/au/en/press/2012/Nielsen%20VideoCensus%20media%20release_30.11.12.pdf. Accessed 27 Aug 2016
Hunter J (2005) Adding multimedia to the semantic web - building an mpeg-7 ontology. In: Stamou G, Kollias S (eds) Multimedia content and the semantic web: methods, standards and tools, John Wiley & Sons, pp 75–106
Java Server Pages (2016) Java Server Pages http://www.oracle.com/technetwork/java/%20javaee/jsp/index.html. Accessed 26 Aug 2016
Java Servlet Technology (2016) Java Servlet Technology http://www.oracle.com/technetwork/java/index-jsp-135475.html. Accessed 27 Aug 2012
Apache Jene (2016) Jena. https://jena.apache.org/. Accessed 27 Aug 2016
Kim D, Kim D, Jun S, Rho S, Hwang E (2014) TrendsSummary: a platform for retrieving and summarizing trendy multimedia contents. Multimed Tools Appl 73(2):857–872
Klinger E, Starkweather D (2010) pHash. http://www.phash.org. Accessed 27 Aug 2016
Kroupi E, Hanhart P, Lee JS, Rerabek M, Ebrahimi T (2016) Modeling immersive media experiences by sensing impact on subjects. Multimedia Tools Appl 75(20):12409–12429
Lalinsky L (2016) Chromaprint | AcoustID https://acoustid.org/chromaprint. Accessed 27 Aug 2016
Lee BT, Handler J, Lassila O (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101
Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: International conference on image processing, pp 900–903
Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2016) Fully convolutional neural networks for remote sensing image classification. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp 5071–5074
MakeUseOf (2016) Audio file formats explained in simple terms http://www.makeuseof.com/tag/a-look-at-the-different-file-formats-available-part-1-audio/. Accessed 28 Aug 2016
Martinez JM (2016) MPEG-7 overview http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm. Accessed 27 August 2016
Matthews R (2016) Digital image file types explained http://users.wfu.edu/matthews/misc/graphics/formats/formats.html. Accessed 28 Aug 2016
Müllerová J, Pergl J, Pyšek P (2013) Remote sensing as a tool for monitoring plant invasions: testing the effects of data resolution and image classification approach on the detection of a model plant species Heracleum mantegazzianum (giant hogweed). Int J Appl Earth Obs Geoinf 25:55–65
Norouzi M, Fleet DJ, Salakhutdinov RR (2012) Hamming distance metric learning. In: Advances in neural information processing systems (NIPS), pp 1061–1069
Oracle (2016) URI (Java Platform SE 6) http://docs.oracle.com/javase/6/docs/api/java/net/URI.html. Accessed 26 Aug 2016
Pan JZ, Horrocks I (2007) RDFS(FA): connecting RDF(S) and OWL DL. IEEE Trans Knowl Data Eng 19(2):192–206
Porter A (2012) Evaluating musical fingerprinting systems. McGill University, Doctoral dissertation
Sexton JO, Urban DL, Donohue MJ, Songh C (2013) Long-term land cover dynamics by multi-temporal classification across the Landsat-5 record. Remote Sens Environ 128:246–258
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp 1470–1477
Soundhound (2016) Soundhound http://www.soundhound.com/. Accessed 27 Aug 2016
Steiner T (2010) Making video a first class semantic web citizen and a first class web bourgeois. In: 9th International Semantic Web Conference (ISWC10), pp 97–100
Swain MJ (1999) Image and video searching on the World Wide Web. In: Proceedings of the 1999 international conference on challenge of image retrieval (CIR-99), Newcastle, pp 1–8
Opencv Dev Team (2016) Face Recognition with OpenCV http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html. Accessed 27 Aug 2016
Tehrany MS, Pradhan B, Jebur MN (2013) Remote sensing data reveals eco-environmental changes in urban areas of Klang Valley, Malaysia: contribution from object based analysis. J Indian Soc Remote Sensing 41(4):981–991
Tjondronegoro D, Spink A (2008) Web search engine multimedia functionality. Inf Process Manag 44(1):340–357
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86
Vachier C, Meyer F (2005) The viscous watershed transform. J Math Imaging Vision 22(2):251–267
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
W3C (2014) RDF Schema 1.1. http://www.w3.org/TR/rdf-schema/. Accessed 27 Aug 2016
W3C (2016a) Extensible Markup Language (XML) http://www.w3.org/XML/. Accessed 27 Aug 2016
W3C (2016b) HTML. https://www.w3.org/html/. Accessed 27 August 2016
Walker W, Lamere L, Kwok P, Raj B, Singh R, Gouvea E, Wolf P, Woelfel J (2004) Sphinx-4: a flexible open source framework for speech recognition. Technical Report, Sun Microsystems, Inc., USA
Wang H, Wang J (2014) An effective image representation method using kernel classification. In: 26th IEEE International Conference on Tools with Artificial Intelligence, pp 853–858
WebM Project (2016) WebM. https://www.webmproject.org/. Accessed 27 Aug 2016
Zauner C (2010) Implementation and benchmarking of perceptual image hash functions. University of Applied Sciences Hagenberg, Thesis
Zhen-kun W, Weizong Z (2010) A robust and discriminative image perceptual hash algorithm. In: Fourth international conference on genetic and evolutionary computing (ICGEC). Shenzhen, China, pp 709–712
Acknowledgments
Authors are thankful to the editors and anonymous reviewers for their efforts in reviewing the manuscript. A patent has been filed out of this work. We are thankful to IIT Roorkee for providing healthy research and academic environment.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shrivastav, S., Kumar, S. & Kumar, K. Towards an ontology based framework for searching multimedia contents on the web. Multimed Tools Appl 76, 18657–18686 (2017). https://doi.org/10.1007/s11042-017-4350-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4350-5