Abstract
Mobile sign language video conversations can become unintelligible if high video transmission rates cause network congestion and delayed video. In an effort to understand the perceived lower limits of intelligible sign language video intended for mobile communication, we evaluated sign language video transmitted at four low frame rates (1, 5, 10, and 15 frames per second [fps]) and four low fixed bit rates (15, 30, 60, and 120 kilobits per second [kbps]) at a constant spatial resolution of 320 × 240 pixels. We discovered an “intelligibility ceiling effect,” in which increasing the frame rate above 10fps did not improve perceived intelligibility, and increasing the bit rate above 60kbps produced diminishing returns. Given the study parameters, our findings suggest that relaxing the recommended frame rate and bit rate to 10fps at 60kbps will provide intelligible video conversations while reducing total bandwidth consumption to 25% of the ITU-T standard (at least 25fps and 100kbps). As part of this work, we developed the Human Signal Intelligibility Model, a new conceptual model useful for informing evaluations of video intelligibility and our methodology for creating linguistically accessible web surveys for deaf people. We also conducted a battery-savings experiment quantifying battery drain when sign language video is transmitted at the lower frame rates and bit rates. Results confirmed that increasing the transmission rates monotonically decreased the battery life.
- N. Ahmen, T. Natarajan, and K. R. Rao. 1974. Discrete cosine transform. IEEE Transactions on Computers C-23, 1, 90--93. Google ScholarDigital Library
- L. Aimar, L. Merritt, E. Petit, et al. 2005. x264 - a free h264/avc encoder. Online (last accessed on: 04/01/07). http://www.videolan.org/developers/x264.html.Google Scholar
- Apple. 2013. Apple - QuickTime - Download. Retrieved September 30, 2015 from http://www.apple.com/quicktime/download/.Google Scholar
- ARM. 2008. The architecture for the digital world. Retrieved September 30, 2015 from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0419c/index.html.Google Scholar
- B. Arons. 1997. SpeechSkimmer: A system for interactively skimming recorded speech. Proceedings of the CHI, 3--38. Google ScholarDigital Library
- F. Asim. 2013. AndroSensor. Retrieved September 30, 2015 from http://www.fivasim.com/androsensor.html.Google Scholar
- Asterisk. 2014. Asterisk. Retrieved September 30, 2015 from http://www.asterisk.org/.Google Scholar
- AT&T. 2014. AT&T. Retrieved September 30, 2015 from http://www.att.com/shop/wireless/data-plans.html#&T. Retrieved September 30, 2015 from http://www.att.com/shop/wireless/data-plans.html##fbid=027qt05YFJ6.Google Scholar
- S. Bae, T. N. Pappas, and B. Juang. 2009. Spatial resolution and quantization noise tradeoffs for scalable image compression. ICASSP, IEEE, II--945--II--948.Google Scholar
- D. Barnlund. 1970. A Transactional Model of Communication. Harper & Row. New York, NY.Google Scholar
- D. K. Berlo. 1960. The Process of Communication. Holt, Rinehart, & Winston, New York, NY.Google Scholar
- A. Cavender, R. Ladner, and E. Riskin. 2006. MobileASL: Intelligibility of sign language video as constrained by mobile phone technology. Proceedings of ASSETS, 71--78. Google ScholarDigital Library
- B. Chen. 2013. AT&T allows FaceTime for limited data users. What about unlimited? The New York Times. Retrieved September 30, 2015 from http://bits.blogs.nytimes.com/2013/01/16/facetime-limited-data-att/?_php=true&_type==blogs&_r==0.Google Scholar
- J. Y. C. Chen and J. E. Thropp. 2007. Review of low frame rate effects on human performance. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans 37, 6, 1063--1076. Google ScholarDigital Library
- N. Cherniavsky, J. Chon, J. O. Wobbrock, R. Ladner, and E. Riskin. 2009. Activity analysis enabling real-time video communication on mobile phones for deaf users. UIST. Google ScholarDigital Library
- J. Chon. 2011. Real-time sign language video communication over cell phones. Ph.D. thesis. University of Washington. 1--105.Google Scholar
- J. Chon, N. Cherniavsky, E. Riskin, and R. Ladner. 2009. Enabling access through real-time sign language communication over cell phones. Asilomar Conference on Signals, Systems, and Computers, 588--592. Google ScholarDigital Library
- F. Ciaramello and S. Hemami. 2011. A computational intelligibility model for assessment and compression of American Sign Language video. IEEE Trans. IP. 20, 11. Google ScholarDigital Library
- L. Cicco, S. Mascolo, and V. Palmisano. 2008. Skype video responsiveness to bandwidth variations. NOSSDAV. Google ScholarDigital Library
- H. Clark. 1985. Language use and language users. In: Handbook of Social Psychology. Harper & Row, New York, NY, 179--231.Google Scholar
- Convo. 2011. Convo. Retrieved September 30, 2015 from https://www.convorelay.com/.Google Scholar
- C. Cumming and M. Rodda. 1989. Advocacy, prejudice, and role modeling in the Deaf community. Social Psychology 1, 129, 5--12.Google Scholar
- Doubango Telecom. 2009. IMSDroid-High Quality Video SIP/IMS client for Google Android. Retrieved September 30, 2015 from http://code.google.com/p/imsdroid/.Google Scholar
- R. Feghali, F. Speranza, D. Wang, and A. Vincent. 2007. Video quality metric for bit rate control via joint adjustment of quantization and frame rate. IEEE Transactions on Broadcasting 53, 1, 441--446.Google ScholarCross Ref
- D. Fitzgerald. 2013. How much smartphone data do you really need? The Wall Street Journal. Retrieved September 30, 2015 from http://blogs.wsj.com/digits/2013/08/01/how-much-smartphone-data-do-you-really-need/.Google Scholar
- K. Harrigan. 1995. The SPECIAL system: Self-paced education with compressed interactive audio learning. Journal of Research on Computing in Education 3, 27, 361--370.Google ScholarCross Ref
- G. W. Heiman and R. D. Tweney. 1981. Intelligibility and comprehension of time compressed sign language narratives. Journal of Psycholinguistic Research 10, 1, 3--15.Google ScholarCross Ref
- J. J. Higgins and S. Tashtoush. 1994. An aligned rank transform test for interaction. Nonlinear World 1, 2, 201--2011.Google Scholar
- J. Hollington. 2013. Costs associated with using FaceTime. iLounge. Retrieved September 30, 2015 from http://www.ilounge.com/index.php/articles/comments/costs-associated-with-using-facetime/.Google Scholar
- S. Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2, 65--70.Google Scholar
- S. Hooper, C. Miller, S. Rose, and G. Veletsianos. 2007. The effects of digital video quality on learner comprehension in an American sign language assessment environment. Sign Language Studies 8, 1, 42--58.Google ScholarCross Ref
- B. F. Johnson and J. K. Caird. 1996. The effect of frame rate and video information redundancy on the perceptual learning of American sign language gestures. In Proceedings of the CHI’96 Conference Companion on Human Factors in Computing Systems, ACM, New York, NY. 121--122. Google ScholarDigital Library
- R. Koul. 2003. Synthetic speech perception in individuals with and without disabilities. 19, 1, 49--58.Google Scholar
- Kurtnoise. 2009. Yet another MP4 box user interface for Windows users. Retrieved September 30, 2015 from http://yamb.unite-video.com/index.html.Google Scholar
- H. Lane. 1992. The Mask of Benevolence: Disabling the Deaf Community. Alfred A. Knopf, Inc., New York, NY.Google Scholar
- S. Lawson. 2011. Mobile growth driving out unlimited data. Retrieved September 30, 2015 from http://www.pcworld.com/businesscenter/article/242376/mobile_growth_driving_out_unlimited_data.html.Google Scholar
- C. Lucas and C. Valli. 2000. Linguistics of American Sign Language: An Introduction. Gallaudet University Press, Washington, DC.Google Scholar
- J. Maher. 1996. Seeing Language in Sign: The Work of William C. Stokoe. Gallaudet University Press, Washington, DC.Google Scholar
- G. Marshall. 2014. How much 4G data do you really need? Retrieved September 30, 2015 from http://www.techradar.com/us/news/phone-and-communications/mobile-phones/how-much-4g-data-do-you-really-need--1176594.Google Scholar
- M. Masry and S. S. Hemami. 2001. An analysis of subjective quality in low bit rate video. International Conference on Image Processing, IEEE, 465--468.Google Scholar
- M. Masry and S. Hemami. 2003. CVQE: A metric for continuous video quality evaluation at low bit rates. SPIE Human Vision and Electronic Imaging.Google Scholar
- J. McCarthy, M. A. Sasse, and D. Miras. 2004. Sharp or smooth? Comparing the effects of quantization vs. frame rate for streamed video. Proceedings of the CHI. Google ScholarDigital Library
- Merriam-Webster. 2003. The Merriam-Webster Dictionary. http://www.merriam-webster.com (8 May 2003).Google Scholar
- Microsoft. 2013. How much data will Skype use on my mobile phone? http://community.skype.com/t5/Other-features/How-much-data-does-skype-use/td-p/897886.Google Scholar
- I. Munoz-Baell and T. Ruiz. 2000. Empowering the deaf. Epidemiology and Community Health 1, 54, 40--44.Google ScholarCross Ref
- Cisco. 2015. Cisco visual networking index:global mobile data trafic forecast update, 2014--2019. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white_paper_c11-520862.pdf.Google Scholar
- A. Nemethova, M. Ries, M. Zavodsky, and M. Rupp. 2006. PSNR-based estimation of subjective time-variant video quality for mobiles. Proceedings of MESAQIN 2006, Prag, Tschechien, June, 2006.Google Scholar
- N. Omoigui, L. He, A. Gupta, J. Grudin, and E. Sanocki. 1999. Time-compression. Proceedings of CHI, ACM Press, New York, NY, 136--143. Google ScholarDigital Library
- A. Oppenheim and R. Schafer. 1975. Discrete-Time Signal Processing. Pearson.Google Scholar
- C. Padden and T. Humphries. 2005. Inside Deaf Culture. Harvard University Press, Boston, MA.Google Scholar
- J. Postel. 1980. User Datagram Protocol--RFC 768. https://tools.ietf.org/html/rfc768. Google ScholarDigital Library
- Purple. 2014. Purple VRS on Your Devices. Retrieved September 30, 2015 from http://www.purple.us/.Google Scholar
- T. Reagan. 1995. A social culture understanding of deafness: American Sign Language and the culture of deaf people. Intercultural Relations 19, 2, 239--251.Google ScholarCross Ref
- I. Richardson. 2004. vocdex: H.264 tutorial white papers. http://www.vcodex.com/h264.html.Google Scholar
- E. Riskin, R. Ladner, and J. Wobbrock. 2012. MobileASL. University of Washington. Retrieved September 30, 2015 from http://mobileasl.cs.washington.edu/.Google Scholar
- J. Rosenberg, H. Schulzrinee, G. Camarillo, et al. 2002. SIP: Session Initiation Protocol. RCS 3261. https://tools.ietf.org/html/rfc3261. Google ScholarDigital Library
- A. Saks and G. Hellström. 2006. Quality of conversation experience in sign language, lip reading and text. ITU-T Workshop on End-to-end QoE/QoS.Google Scholar
- C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379-426, 623--656.Google ScholarCross Ref
- Skype. 2011. Skype. Retrieved September 30, 2015 from http://www.skype.com/intl/en-us/home.Google Scholar
- Sorenson. 2014. Sorenson Communications. Retrieved September 30, 2015 from http://www.sorenson.com/.Google Scholar
- G. Sperling, M. Landy, Y. Cohen, and M. Pavel. 1985. Intelligible encoding of ASL image sequences at extremely low information rates. Computer Vision Graphics, and Image Processing 31, 335--391.Google Scholar
- Static Brain Research Institute. 2012. Skype statistics. Retrieved September 30, 2015 from http://www.statisticbrain.com/skype-statistics.Google Scholar
- H. Thu and M. Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronic Letters 44, 13, 800--801.Google ScholarCross Ref
- T-Mobile. 2014. T-Mobile. Retrieved September 30, 2015 from http://www.t-mobile.com/cell-phone-plans/individual.html#lshop_plans_1.Google Scholar
- J. J. Tran, B. Flowers, E. Riskin, R. Ladner, and J. O. Wobbrock. 2014. Analyzing the intelligibility of real-time mobile sign language video transmitted below recommended standards. Proceedings of ASSETS, 177--184. Google ScholarDigital Library
- J. J. Tran, J. Kim, J. Chon, E. Riskin, R. Ladner, and J. O. Wobbrock. 2011. Evaluating quality and comprehension of real-time sign language video on mobile phones. Proceedings of ASSETS, 115--122. Google ScholarDigital Library
- J. J. Tran, E. Riskin, R. Ladner, and J. O. Wobbrock. 2013. Increasing mobile sign language video accessibility by relaxing video transmission standards. Third Mobile Accessibility Workshop at Proceedings of CHI.Google Scholar
- Verizon. 2014. Verizon Wireless. Retrieved September 30, 2015 from http://www.verizonwireless.com/b2c/index.html.Google Scholar
- Y. Wang and Y. Ou. 2012. Modeling rate and perceptual quality of scalable video as functions of quantization and frame rate and its application in scalable video adaptation. IEEE Transactions on Circuits and Systems for Video Technology, 671--682.Google Scholar
- Z. Wang, A. Bovik, and L. Lu. 2002. Why is image quality assessment so difficult? ITASS, 3313--3316.Google Scholar
- E. Weber. 1834. De pulsu, resorptione, auditu et tactu. Anatationes anatomicae et physiologicae.Google Scholar
- T. Wiegang, H. Schwarz, A. Joch, F. Kossentini, and G. Sullivan. 2003. Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology 13, 7, 688--703. Google ScholarDigital Library
- S. Winkler and P. Mohandas. 2008. The evolution of video quality measurement: From PSNR to hybrid metrics. IEEE Transactions on Broadcasting 54, 3, 660--668.Google ScholarCross Ref
- J. O. Wobbrock, L. Findlater, D. Gergie, and J. J. Higgins. 2011. The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of CHI, 143--146. Google ScholarDigital Library
- G. Yadavalli, S. Hemami, and M. Masry. 2003. Frame rate preferences in low bit rate video. IEEE Trans. IP. 441--444.Google Scholar
- E. Zeman. 2010. “iPhone 4 jailbreak unlocks 3G FaceTime calls. Information Week. Retrieved September 30, 2015 from http://www.informationweek.com/mobile/mobile-devices/iphone-4-jailbreak-unlocks-3g-facetime-calls/d/d-id/1091309?Google Scholar
- ZVRS. 2014. ZVRS Communication Service for the Deaf, Inc. http://www.zvrs.com/products/softwareapps.Google Scholar
Index Terms
- Evaluating Intelligibility and Battery Drain of Mobile Sign Language Video Transmitted at Low Frame Rates and Bit Rates
Recommendations
Evaluating quality and comprehension of real-time sign language video on mobile phones
ASSETS '11: The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibilityVideo and image quality are often objectively measured using peak signal-to-noise ratio (PSNR), but for sign language video, human comprehension is most important. Yet the relationship of human comprehension to PSNR has not been studied. In this survey, ...
Analyzing the intelligibility of real-time mobile sign language video transmitted below recommended standards
ASSETS '14: Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibilityMobile sign language video communication has the potential to be more accessible and affordable if the current recommended video transmission standard of 25 frames per second at 100 kilobits per second (kbps) as prescribed in the International ...
A web-based intelligibility evaluation of sign language video transmitted at low frame rates and bitrates
ASSETS '13: Proceedings of the 15th International ACM SIGACCESS Conference on Computers and AccessibilityMobile sign language video conversations can become unintelligible due to high video transmission rates causing network congestion and delayed video. In an effort to understand how much sign language video quality can be sacrificed, we evaluated the ...
Comments