research-article

Crowd++: unsupervised speaker count with smartphones

Authors:

Emiliano Miluzzo,

Bernhard FirnerAuthors Info & Claims

UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

Pages 43 - 52

https://doi.org/10.1145/2493432.2493435

Published: 08 September 2013 Publication History

Abstract

Smartphones are excellent mobile sensing platforms, with the microphone in particular being exercised in several audio inference applications. We take smartphone audio inference a step further and demonstrate for the first time that it's possible to accurately estimate the number of people talking in a certain place -- with an average error distance of 1.5 speakers -- through unsupervised machine learning analysis on audio segments captured by the smartphones. Inference occurs transparently to the user and no human intervention is needed to derive the classification model. Our results are based on the design, implementation, and evaluation of a system called Crowd++, involving 120 participants in 10 very different environments. We show that no dedicated external hardware or cumbersome supervised learning approaches are needed but only off-the-shelf smartphones used in a transparent manner. We believe our findings have profound implications in many research fields, including social sensing and personal wellbeing assessment.

References

[1]

Agneessens, A., Bisio, I., Lavagetto, F., Marchese, M., and Sciarrone, A. Speaker count application for smartphone platforms. In Proc. of IEEE ISWPC (2010).

Digital Library

[2]

Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., and Vinyals, O. Speaker diarization: A review of recent research. IEEE Transaction on Audio, Speech and Language Processing 20, 2 (2012).

Digital Library

[3]

Azizyan, M., Constandache, I., and Roy Choudhury, R. Surroundsense: mobile phone localization via ambience fingerprinting. In Proc. of ACM MobiCom (2009).

Digital Library

[4]

Baken, R. Clinical measurement of speech and voice. College-Hill Press, 1986.

[5]

Carey, M., and et al. Robust prosodic features for speaker identification. In Proc. of ICSLP (1996).

[6]

Cetin, O., and Schriberg, E. Speaker overlaps and asr errors in meetings: Effects before, during, and after the overlap. In Proc. of IEEE ICASSP (2006).

[7]

Chan, A. B., Liang, Z.-S., and Vasconcelos, N. Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proc. of IEEE CVPR (2008).

[8]

Cheveigné, A. D., and Kawahara, H. Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 4 (2002).

[9]

Choudhury, T., and Pentland, A. Sensing and modeling human networks using the sociometer. In Proc. of IEEE ISWC (2003).

Digital Library

[10]

Davis, S., and Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28, 4 (1980).

[11]

Haigh, J., and Mason, J. Robust voice activity detection using cepstral features. In Proc. of TENCON (1993).

[12]

Hermansky, H., and Morgan, N. Rasta processing of speech. IEEE Transactions on Speech and Audio Processing 2, 4 (1994).

[13]

Jayagopi, D. B., Hung, H., Yeo, C., and Gatica-Perez, D. Modeling dominance in group conversations using nonverbal activity cues. IEEE Transactions on Audio, Speech, and Language Processing 17, 3 (2009).

Digital Library

[14]

Kannan, P. G., Venkatagiri, S. P., Chan, M. C., Ananda, A. L., and Peh, L.-S. Low cost crowd counting using audio tones. In Proc. of ACM SenSys (2012).

Digital Library

[15]

Karypis, G., Han, E.-H., and Kumar, V. Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32, 8 (1999).

Digital Library

[16]

Kim, C., and Stern, R. M. Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Proc. of INTERSPEECH (2008).

[17]

Lee, B. S., and Ellis, D. P. W. Noise robust pitch tracking by subband autocorrelation classification. In Proc. of INTERSPEECH (2012).

[18]

Liu, B., Jiang, Y., Sha, F., and Govindan, R. Cloud-enabled privacy-preserving collaborative learning for mobile sensing. In Proc. of ACM SenSys (2012).

Digital Library

[19]

Liu, G., Dimitriadis, D., and Bocchieri, E. Robust speech enhancement techniques for asr in non-stationary noise and dynamic environments. In Proc. of INTERSPEECH (2013).

[20]

Liu, G., Lei, Y., and Hansen, J. H. A novel feature extraction strategy for multi-stream robust emotion identification. In Proc. of INTERSPEECH (2010).

[21]

Liu, G., Zhang, C., and Hansen, J. H. A linguistic data acquisition front-end for language recognition evaluation. In Proc. of Odyssey (2012).

[22]

Lu, H., Brush, A. B., Priyantha, B., Karlson, A. K., and Liu, J. Speakersense: energy efficient unobtrusive speaker identification on mobile phones. In Proc. of Pervasive (2011).

Digital Library

[23]

Markel, J. E., and Gray, A. H. Linear Prediction of Speech. Springer-Verlag New York, Inc., 1982.

Digital Library

[24]

Matic, A., Osmani, V., and Mayora, O. Automatic sensing of speech activity and correlation with mood changes. Pervasive and Mobile Sensing and Computing for Healthcare (2012).

[25]

Miluzzo, E., Cornelius, C. T., Ramaswamy, A., Choudhury, T., Liu, Z., and Campbell, A. T. Darwin phones: the evolution of sensing and inference on mobile phones. In Proc. of ACM MobiSys (2010).

Digital Library

[26]

Ofoegbu, U. O., Iyer, A. N., Yantorno, R. E., and Smolenski, B. Y. A speaker count system for telephone conversations. In Proc. of IEEE ISPACS (2006).

[27]

Rabbi, M., Ali, S., Choudhury, T., and Berke, E. Passive and in-situ assessment of mental and physical well-being using mobile sensors. In Proc. of ACM UbiComp (2012).

Digital Library

[28]

Rabiner, L., and Juang, B.-H. Fundamentals of speech recognition. Prentice-Hall, Inc., 1993.

Digital Library

[29]

Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., and Aucinas, A. Emotionsense: a mobile phones based adaptive platform for experimental social psychology research. In Proc. of ACM UbiComp (2010).

Digital Library

[30]

Reynolds, D. A. Htimit and llhdb: Speech corpora for the study of handset transducer effects. In Proc. of IEEE ICASSP (1997).

Digital Library

[31]

Rosenberg, A. E., Gorin, A., Liu, Z., and Parthasarathy, S. Unsupervised speaker segmentation of telephone conversations. In Proc. of INTERSPEECH (2002).

[32]

Sonmez, K., Shriberg, E., Heck, L., and Weintraub, M. Modeling dynamic prosodic variation for speaker verification. In Proc. of ICSLP (1998).

[33]

Tarzia, S. P., Dinda, P. A., Dick, R. P., and Memik, G. Indoor localization without infrastructure using the acoustic background spectrum. In Proc. of ACM MobiSys (2011).

Digital Library

[34]

Weppner, J., and Lukowicz, P. Collaborative crowd density estimation with mobile phones. In Proc. of ACM PhoneSense (2011).

[35]

Wu, M., Wang, D., and Brown, G. J. A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing 11, 3 (2003).

[36]

Xu, C., Firner, B., Moore, R. S., Zhang, Y., Trappe, W., Howard, R., Zhang, F., and An, N. Scpl: indoor device-free multi-subject counting and localization using radio signal strength. In Proc. of ACM/IEEE IPSN (2013).

Digital Library

Cited By

Al Hossain FTonmoy MLover ACorey GAlam MRahman TXu CDavies N(2024)Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech AudioProceedings of the 25th International Workshop on Mobile Computing Systems and Applications10.1145/3638550.3641133(79-85)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3638550.3641133
Guo RHuang BHao LJia B(2024)Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651535(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651535
Lyons NSantra ARamanna VUln KTaori RPandey A(2024)WIFIACT: Enhancing Human Sensing Through Environment Robust Preprocessing And Bayesian Self-Supervised LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446566(13391-13395)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446566
Show More Cited By

Index Terms

Crowd++: unsupervised speaker count with smartphones
1. Computer systems organization

Recommendations

Tapping into the Vibe of the city using VibN, a continuous sensing application for smartphones
SCI '11: Proceedings of 1st international symposium on From digital footprints to social and community intelligence

We present VibN, a mobile sensing application deployed at large scale through the Apple App Store and the Android Market. VibN has been built to determine "what's going on" around the user in real-time by exploiting multiple sensor feeds. The ...
BeWell: Sensing Sleep, Physical Activities and Social Interactions to Promote Wellbeing

Smartphone sensing and persuasive feedback design is enabling a new generation of wellbeing apps capable of automatically monitoring multiple aspects of physical and mental health. In this article, we present BeWell+ the next generation of the BeWell ...
CBSC: A Crowdsensing System for Automatic Calibrating of Barometers
Abstract
The mobile crowdsensing software systems can complete large-scale and complex sensing tasks with the help of the collective intelligence from large numbers of ordinary users. In this paper, we build a typical crowdsensing system, which can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

September 2013

846 pages

ISBN:9781450317702

DOI:10.1145/2493432

General Chairs:
Friedemann Mattern
ETH Zurich, CH
,
Silvia Santini
TU Darmstadt, DE
,
Program Chairs:
John F. Canny
UC Berkeley, US
,
Marc Langheinrich
Università della Svizzera italiana, CH
,
Jun Rekimoto
University of Tokyo, JP

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing
University of Florida: University of Florida
SIGCHI: ACM Special Interest Group on Computer-Human Interaction

In-Cooperation

SIGSPATIAL: ACM Special Interest Group on Spatial Information

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UbiComp '13

Sponsor:

SIGMOBILE
University of Florida
SIGCHI

UbiComp '13: The 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing

September 8 - 12, 2013

Zurich, Switzerland

Acceptance Rates

UbiComp '13 Paper Acceptance Rate 92 of 394 submissions, 23%;

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

95
Total Citations
View Citations
1,041
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Al Hossain FTonmoy MLover ACorey GAlam MRahman TXu CDavies N(2024)Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech AudioProceedings of the 25th International Workshop on Mobile Computing Systems and Applications10.1145/3638550.3641133(79-85)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3638550.3641133
Guo RHuang BHao LJia B(2024)Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651535(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651535
Lyons NSantra ARamanna VUln KTaori RPandey A(2024)WIFIACT: Enhancing Human Sensing Through Environment Robust Preprocessing And Bayesian Self-Supervised LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446566(13391-13395)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446566
Lamichhane BMoukaddam NSabharwal A(2024)Mobile sensing-based depression severity assessment in participants with heterogeneous mental health conditionsScientific Reports10.1038/s41598-024-69739-z14:1Online publication date: 13-Aug-2024
https://doi.org/10.1038/s41598-024-69739-z
Liang DZhang AThomaz E(2023)Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108827:3(1-29)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610882
Yang WChen SWang ZXu YHuang L(2023)AutoProfile: An Intelligent Profile Switching System for SmartphonesIEEE Transactions on Mobile Computing10.1109/TMC.2022.314120522:6(3151-3164)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TMC.2022.3141205
Cao CDong WZhang WGao Y(2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
https://doi.org/10.1109/JIOT.2022.3173668
Mane AMaurya VMendonca CTripathy A(2023)Crowd Detection System & Prediction Analysis2023 International Conference on Advanced Computing Technologies and Applications (ICACTA)10.1109/ICACTA58201.2023.10393153(1-6)Online publication date: 6-Oct-2023
https://doi.org/10.1109/ICACTA58201.2023.10393153
Chatterjee SSingh AMitra BChakraborty S(2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
https://doi.org/10.1109/DCOSS-IoT58021.2023.00013
Yao SAbdelzaher T(2023)Neural Network Models for Time Series DataArtificial Intelligence for Edge Computing10.1007/978-3-031-40787-1_1(3-25)Online publication date: 4-Aug-2023
https://doi.org/10.1007/978-3-031-40787-1_1
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten