ABSTRACT
In this demo, we present a scalable mobile video recognition system, named "Me-link," based on progressive fusion of light-weight audio visual features. With our system, users only have to point the mobile camera to the video they are interested in. The system will capture the frames and sounds, then retrieve relevant information immediately. As the users hold the mobile longer, the system progressively aggregates the cues temporally and then returns more accurate results. We also consider the real world noisy environment, where users may not get clear visual or audio signals. In the aggregation step of audio and visual cues, our system automatically detects the available channel for the final rank. On the server side, users can upload the videos with information via website. Besides, we also link the streaming signals so that users can get the real time broadcasting with ``Me-link".
- H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110(3):346--359, 2008. Google ScholarDigital Library
- O. Dan, J. Feng, and B. Davison. Filtering microblogging messages for social tv. In ACM International Conference Companion on World Wide Web, 2011. Google ScholarDigital Library
- D. P. W. Ellis, B. Whitman, and A. Porter. Echoprint: An open music identification service. In International Society for Music Information Retrieval Conference, 2011.Google Scholar
- B. Girod, V. Chandrasekhar, D. M. Chen, N.-M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham. Mobile visual search. IEEE Signal Processing Magazine, 28(4):61--76, 2011.Google ScholarCross Ref
- P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. Google ScholarDigital Library
- W. Liu, T. Mei, Y. Zhang, J. Li, and S. Li. Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing. In ACM international conference on Multimedia, 2013. Google ScholarDigital Library
- M. Muja and D. G. Lowe. Fast matching of binary features. In Conference on Computer and Robot Vision, 2012. Google ScholarDigital Library
- A. L.-C. Wang. An industrial-strength audio search algorithm. In International Conference on Music Information Retrieval, 2003.Google Scholar
Index Terms
- Me-link: link me to the media -- fusing audio and visual cues for robust and efficient mobile media interaction
Recommendations
CueSee: exploring visual cues for people with low vision to facilitate a visual search task
UbiComp '16: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous ComputingVisual search is a major challenge for low vision people. Conventional vision enhancements like magnification help low vision people see more details, but cannot indicate the location of a target in a visual search task. In this paper, we explore visual ...
Dividual Plays Experimental Lab: An installation derived from Dividual Plays
TEI '16: Proceedings of the TEI '16: Tenth International Conference on Tangible, Embedded, and Embodied Interaction"Dividual Plays Experimental Lab" is an extract from the dance piece "Dividual Plays". Dividual Plays was produced as the first research outcome of "Reactor for Awareness in Motion [RAM]", a research project we have been involved since 2010 (http://...
A Spatial Music Listening Experience in Augmented Reality
UIST '21 Adjunct: Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and TechnologyLive music provides a more immersive and social experience that recorded music cannot replicate. In a live music setting, listeners perceive sounds differently based on their position with respect to the musicians and can enjoy the experience with ...
Comments