demonstration

Me-link: link me to the media -- fusing audio and visual cues for robust and efficient mobile media interaction

Authors:
Chun-Yen Yeh

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Yu-Ming Hsu

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Hsinfu Huang

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Hong-Wun Jheng

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Yu-Chuan Su

[email protected], Taipei, Taiwan Roc

[email protected], Taipei, Taiwan Roc
View Profile

,
Tzu-Hsuan Chiu

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Winston Hsu

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide WebApril 2014Pages 147–150https://doi.org/10.1145/2567948.2577018

Published:07 April 2014Publication History

WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

Pages 147–150

ABSTRACT

In this demo, we present a scalable mobile video recognition system, named "Me-link," based on progressive fusion of light-weight audio visual features. With our system, users only have to point the mobile camera to the video they are interested in. The system will capture the frames and sounds, then retrieve relevant information immediately. As the users hold the mobile longer, the system progressively aggregates the cues temporally and then returns more accurate results. We also consider the real world noisy environment, where users may not get clear visual or audio signals. In the aggregation step of audio and visual cues, our system automatically detects the available channel for the final rank. On the server side, users can upload the videos with information via website. Besides, we also link the streaming signals so that users can get the real time broadcasting with ``Me-link".

References

H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110(3):346--359, 2008. Google ScholarDigital Library
O. Dan, J. Feng, and B. Davison. Filtering microblogging messages for social tv. In ACM International Conference Companion on World Wide Web, 2011. Google ScholarDigital Library
D. P. W. Ellis, B. Whitman, and A. Porter. Echoprint: An open music identification service. In International Society for Music Information Retrieval Conference, 2011.Google Scholar
B. Girod, V. Chandrasekhar, D. M. Chen, N.-M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham. Mobile visual search. IEEE Signal Processing Magazine, 28(4):61--76, 2011.Google ScholarCross Ref
P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. Google ScholarDigital Library
W. Liu, T. Mei, Y. Zhang, J. Li, and S. Li. Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing. In ACM international conference on Multimedia, 2013. Google ScholarDigital Library
M. Muja and D. G. Lowe. Fast matching of binary features. In Conference on Computer and Robot Vision, 2012. Google ScholarDigital Library
A. L.-C. Wang. An industrial-strength audio search algorithm. In International Conference on Music Information Retrieval, 2003.Google Scholar

Index Terms

Me-link: link me to the media -- fusing audio and visual cues for robust and efficient mobile media interaction
1. Information systems
  1. Information retrieval

Recommendations

CueSee: exploring visual cues for people with low vision to facilitate a visual search task
UbiComp '16: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing

Visual search is a major challenge for low vision people. Conventional vision enhancements like magnification help low vision people see more details, but cannot indicate the location of a target in a visual search task. In this paper, we explore visual ...
Read More
Dividual Plays Experimental Lab: An installation derived from Dividual Plays
TEI '16: Proceedings of the TEI '16: Tenth International Conference on Tangible, Embedded, and Embodied Interaction

"Dividual Plays Experimental Lab" is an extract from the dance piece "Dividual Plays". Dividual Plays was produced as the first research outcome of "Reactor for Awareness in Motion [RAM]", a research project we have been involved since 2010 (http://...
Read More
A Spatial Music Listening Experience in Augmented Reality
UIST '21 Adjunct: Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology

Live music provides a more immersive and social experience that recorded music cannot replicate. In a live music setting, listeners perceive sounds differently based on their position with respect to the musicians and can enjoy the experience with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web
April 2014
1396 pages
ISBN:9781450327459
DOI:10.1145/2567948
General Chair:
Chin-Wan Chung
Korea Advanced Institute of Science and Technology, Korea
,
Program Chairs:
Andrei Broder
Google Inc., USA
,
Kyuseok Shim
Seoul National University, Korea
,
Torsten Suel
New York University, USA
Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
augmented reality
mobile video recognition
second screen
Qualifiers
- demonstration
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 139
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Me-link: link me to the media -- fusing audio and visual cues for robust and efficient mobile media interaction

WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

CueSee: exploring visual cues for people with low vision to facilitate a visual search task

Dividual Plays Experimental Lab: An installation derived from Dividual Plays

A Spatial Music Listening Experience in Augmented Reality