skip to main content
10.1145/2483977.2483990acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

The jiku mobile video dataset

Published: 28 February 2013 Publication History

Abstract

Proliferation of mobile devices with video recording capability has lead to a tremendous growth in the amount of user-generated mobile videos. Researchers have embarked on developing new interesting applications and enhancement algorithms for mobile video. There is, however, no standard dataset with videos that could represent characteristics of mobile videos captured in realistic scenarios. In this paper, we present our effort to create one such dataset, consisting of videos simultaneously recorded using mobile devices in an unconstrained manner by multiple users attending performance events. Each video is accompanied by concurrent readings from accelerometer and compass sensors. At the time of writing, the dataset contains 473 video clips, with a total length of 30 hours 41 minutes and total size of 122.8 GB. We believe this dataset is useful as a common benchmark dataset for a variety of different research topics on mobile videos, including video analytics, video quality enhancement, and automatic video mashups.

References

[1]
Photobucket survey: Video uploads from mobile devices on the rise. Retrieved November 19, 2012, http://eon.businesswire.com/news/eon/20110829005069/en, August 2011.
[2]
G. Abdollahian, C. Taskiran, Z. Pizlo, and E. Delp. Camera motion-based analysis of user generated video. IEEE Transactions on Multimedia, 12(1):28--41, November 2010.
[3]
J. Assa, L. Wolf, and D. Cohen-Or. The virtual director: A correlation-based online viewing of human motion. Computer Graphics Forum, 29(2):595--604, June 2010.
[4]
E. Bennett and L. McMillan. Video enhancement using per-pixel virtual exposures. ACM Transactions on Graphics, 24(3):845--852, July 2005.
[5]
R. Bentler and L. Chiou. Digital noise reduction: An overview. Trends in Amplification, 10(2):67--82, June 2006.
[6]
T. Brezmes, J.-L. Gorricho, and J. Cotrina. Activity recognition from accelerometer data on a mobile phone. In Proceedings of the International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living (IWANN), pages 796--799, Salamanca, Spain, June 2009.
[7]
J. Cao, Y. Zhang, Y. Song, Z. Chen, X. Zhang, and J. Li. MCG-WEBV: A benchmark dataset for web video analysis. Technical Report ICT-MCG-09-001, ICT, CAS, Beijing, China, 2009.
[8]
CAVIAR. EC Funded CAVIAR project/IST 2001 37540. Retrieved November 19, 2012, http://homepages.inf.ed.ac.uk/rbf/CAVIAR/, 2001.
[9]
A. Cheng, F. Lin, Y. Kuo, and W. Hsu. GPS, compass, or camera?: Investigating effective mobile sensors for automatic search-based image annotation. In Proceedings of the ACM International Conference on Multimedia (MM), pages 815--818, Firenze, Italy, October 2010.
[10]
X. Cheng, C. Dale, and J. Liu. Statistics and social network of YouTube videos. In Proceedings of the International Workshop on Quality of Service (IWQoS), pages 229--238, June 2008.
[11]
F. Cricri, K. Dabov, M. Roininen, S. Mate, I. Curcio, and M. Gabbouj. Multimodal semantics extraction from user-generated videos. Advances in Multimedia, 2012(1):1--1, January 2012.
[12]
X. Desurmont, J. Hayet, J. Delaigle, J. Piater, B. Macq, et al. TRICTRAC video dataset: Public HDTV synthetic soccer video sequences with ground truth. In Proceedings of the Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE), pages 92--100, Graz, Austria, May 2006.
[13]
N. Dezfuli, J. Huber, S. Olberding, and M. Mühlhäuser. CoStream: in-situ co-construction of shared experiences through mobile video sharing during live events. In Proceedings of the ACM Annual Conference on Human Factors in Computing Systems Extended Abstracts (CHI), pages 2477--2482, Austin, Texas, USA, May 2012.
[14]
X. Dong, G. Wang, Y. Pang, W. Li, J. Wen, W. Meng, and Y. Lu. Fast efficient algorithm for enhancement of low lighting video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 1--6, Barcelona, Catalonia, Spain, July 2011.
[15]
M. El-Saban, M. Refaat, A. Kaheel, and A. Abdul-Hamid. Stitching videos streamed by mobile phones in real-time. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1009--1010, Beijing, China, October 2009.
[16]
G. Evangelista, G. Baravdish, O. Svensson, and F. Sofya. PDE-SVD based audio denoising. In Proceedings of the IEEE International Symposium on Communications, Control, and Signal Processing (ISCCSP), pages 1--6, Rome, Italy, May 2012.
[17]
W. Fong, S. Godsill, A. Doucet, and M. West. Monte Carlo smoothing with application to audio signal enhancement. IEEE Transactions on Signal Processing, 50(2):438--449, February 2002.
[18]
V. Gouda and S. Banerjee. Image processing occlusion detection and handling. Journal of International Academy of Physical Sciences, 15(SP2), September 2011.
[19]
S. Jarvinen, J. Peltola, J. Plomp, O. Ojutkangas, I. Heino, J. Lahti, and J. Heinila. Deploying mobile multimedia services for everyday experience sharing. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 1760--1763, July 2009.
[20]
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1--8, Anchorage, Alaska, USA, June 2008.
[21]
A. Loui, J. Luo, S. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa. Kodak's consumer video benchmark data set: concept definition and annotation. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval (MIR), pages 245--254, Augsburg, Germany, September 2007.
[22]
M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2929--2936, Miami, Florida, USA, June 2009.
[23]
D. Min and K. Sohn. Cost aggregation and occlusion handling with WLS in stereo matching. IEEE Transactions on Image Processing, 17(8):1431--1442, August 2008.
[24]
NIST. TREC video retrieval evaluation (TRECVID). Retrieved November 19, 2012, http://www-nlpir.nist.gov/projects/trecvid/, 2001-2010.
[25]
S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen, J. Lee, S. Mukherjee, J. Aggarwal, H. Lee, L. Davis, et al. A large-scale benchmark dataset for event recognition in surveillance video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3153--3160, Colorado Springs, USA, June 2011.
[26]
PETS. Performance evaluation of tracking and surveillance (PETS). Retrieved November 19, 2012, http://www.cvg.rdg.ac.uk/PETS2013/a.html, 2000-2013.
[27]
D. Rudoy and L. Zelnik-Manor. Viewpoint selection for human actions. International Journal of Computer Vision, 97(3):1--12, May 2012.
[28]
M. Saini, R. Gadde, S. Yan, and W. Ooi. MoViMash: Online mobile video mashup. In Proceedings of ACM International Conference on Multimedia (MM), pages 139--148, Nara, Japan, October 2012.
[29]
T. Schierl, T. Stockhammer, and T. Wiegand. Mobile video transmission using scalable video coding. IEEE Transactions on Circuits and Systems for Video Technology, 17(9):1204--1217, September 2007.
[30]
C. Shen, C. Zhang, and S. Fels. A multi-camera surveillance system that estimates quality-of-view measurement. In Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 193--196, San Antonio, Texas, USA, October 2007.
[31]
P. Shrestha, H. de With Peter, H. Weda, M. Barbieri, and E. Aarts. Automatic mashup generation from multiple-camera concert recordings. In Proceedings of ACM International Conference on Multimedia (MM), pages 541--550, Firenze, Italy, October 2010.
[32]
L. Sigal, A. Balan, and M. Black. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Springer International Journal of Computer Vision, 87(1):4--27, August 2010.
[33]
A. Stein and M. Hebert. Occlusion boundaries from motion: low-level detection and mid-level reasoning. International Journal of Computer Vision, 82(3):325--357, April 2009.
[34]
C. Strecha, R. Fransens, and L. Van Gool. A probabilistic approach to large displacement optical flow and occlusion detection. In Proceedings of Workshop on Statistical Methods in Video Processing (SMVP), pages 25--45, Prague, Czech Republic, May 2004.
[35]
S. Vihavainen, S. Mate, L. Liikkanen, and I. Curcio. Video as memorabilia: user needs for collaborative automatic mobile video production. In Proceedings of the ACM Annual Conference on Human Factors in Computing Systems (CHI), pages 651--654, Austin, Texas, USA, May 2012.
[36]
X. Wang, D. Rosenblum, and Y. Wang. Context-aware music recommendation for daily activities. In Proceedings of the ACM International Conference on Multimedia (MM), pages 99--108, Nara, Japan, October 2012.
[37]
L. Yang, J. Liu, X. Yang, and X. Hua. Multi-modality web video categorization. In Proceedings of the International Workshop on Multimedia Information Retrieval (MIR), pages 265--274, Augsburg, Bavaria, Germany, September 2007.

Cited By

View all
  • (2024)Composition and Transmission of Videos Generated by Multiple UsersFrom Multimedia Communications to the Future Internet10.1007/978-3-031-71874-8_14(202-218)Online publication date: 13-Sep-2024
  • (2023)A highly robust deep learning technique for overlap detection using audio fingerprintingMultimedia Tools and Applications10.1007/s11042-023-16713-y83:10(29119-29137)Online publication date: 11-Sep-2023
  • (2019)Multi-tenant mobile offloading systems for real-time computer vision applicationsProceedings of the 20th International Conference on Distributed Computing and Networking10.1145/3288599.3288634(21-30)Online publication date: 4-Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMSys '13: Proceedings of the 4th ACM Multimedia Systems Conference
February 2013
304 pages
ISBN:9781450318945
DOI:10.1145/2483977
  • General Chair:
  • Carsten Griwodz
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset
  2. mobile video
  3. sensor-rich video

Qualifiers

  • Research-article

Funding Sources

Conference

MMSys '13: Multimedia Systems Conference 2013
February 28 - March 1, 2013
Oslo, Norway

Acceptance Rates

MMSys '13 Paper Acceptance Rate 15 of 63 submissions, 24%;
Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Composition and Transmission of Videos Generated by Multiple UsersFrom Multimedia Communications to the Future Internet10.1007/978-3-031-71874-8_14(202-218)Online publication date: 13-Sep-2024
  • (2023)A highly robust deep learning technique for overlap detection using audio fingerprintingMultimedia Tools and Applications10.1007/s11042-023-16713-y83:10(29119-29137)Online publication date: 11-Sep-2023
  • (2019)Multi-tenant mobile offloading systems for real-time computer vision applicationsProceedings of the 20th International Conference on Distributed Computing and Networking10.1145/3288599.3288634(21-30)Online publication date: 4-Jan-2019
  • (2018)SWAPUGCProceedings of the 9th ACM Multimedia Systems Conference10.1145/3204949.3208142(456-459)Online publication date: 12-Jun-2018
  • (2018)The crowd as a cameramanMultimedia Tools and Applications10.1007/s11042-016-4257-677:1(597-629)Online publication date: 1-Jan-2018
  • (2018)Mitigating Multi-tenant Interference in Continuous Mobile OffloadingCloud Computing – CLOUD 201810.1007/978-3-319-94295-7_2(20-36)Online publication date: 19-Jun-2018
  • (2018)Automated Video Mashups: Research and ChallengesMediaSync10.1007/978-3-319-65840-7_6(167-190)Online publication date: 27-Mar-2018
  • (2017)Video Liveness for Citizen Journalism: Attacks and DefensesIEEE Transactions on Mobile Computing10.1109/TMC.2017.268792216:11(3250-3263)Online publication date: 1-Nov-2017
  • (2017)Modeling the timing of cuts in automatic editing of concert videosMultimedia Tools and Applications10.1007/s11042-016-3304-776:5(6683-6707)Online publication date: 1-Mar-2017
  • (2016)One Sensor is not EnoughProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967297(626-630)Online publication date: 1-Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media