skip to main content
10.1145/2493432.2493435acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
research-article

Crowd++: unsupervised speaker count with smartphones

Published: 08 September 2013 Publication History

Abstract

Smartphones are excellent mobile sensing platforms, with the microphone in particular being exercised in several audio inference applications. We take smartphone audio inference a step further and demonstrate for the first time that it's possible to accurately estimate the number of people talking in a certain place -- with an average error distance of 1.5 speakers -- through unsupervised machine learning analysis on audio segments captured by the smartphones. Inference occurs transparently to the user and no human intervention is needed to derive the classification model. Our results are based on the design, implementation, and evaluation of a system called Crowd++, involving 120 participants in 10 very different environments. We show that no dedicated external hardware or cumbersome supervised learning approaches are needed but only off-the-shelf smartphones used in a transparent manner. We believe our findings have profound implications in many research fields, including social sensing and personal wellbeing assessment.

References

[1]
Agneessens, A., Bisio, I., Lavagetto, F., Marchese, M., and Sciarrone, A. Speaker count application for smartphone platforms. In Proc. of IEEE ISWPC (2010).
[2]
Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., and Vinyals, O. Speaker diarization: A review of recent research. IEEE Transaction on Audio, Speech and Language Processing 20, 2 (2012).
[3]
Azizyan, M., Constandache, I., and Roy Choudhury, R. Surroundsense: mobile phone localization via ambience fingerprinting. In Proc. of ACM MobiCom (2009).
[4]
Baken, R. Clinical measurement of speech and voice. College-Hill Press, 1986.
[5]
Carey, M., and et al. Robust prosodic features for speaker identification. In Proc. of ICSLP (1996).
[6]
Cetin, O., and Schriberg, E. Speaker overlaps and asr errors in meetings: Effects before, during, and after the overlap. In Proc. of IEEE ICASSP (2006).
[7]
Chan, A. B., Liang, Z.-S., and Vasconcelos, N. Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proc. of IEEE CVPR (2008).
[8]
Cheveigné, A. D., and Kawahara, H. Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 4 (2002).
[9]
Choudhury, T., and Pentland, A. Sensing and modeling human networks using the sociometer. In Proc. of IEEE ISWC (2003).
[10]
Davis, S., and Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28, 4 (1980).
[11]
Haigh, J., and Mason, J. Robust voice activity detection using cepstral features. In Proc. of TENCON (1993).
[12]
Hermansky, H., and Morgan, N. Rasta processing of speech. IEEE Transactions on Speech and Audio Processing 2, 4 (1994).
[13]
Jayagopi, D. B., Hung, H., Yeo, C., and Gatica-Perez, D. Modeling dominance in group conversations using nonverbal activity cues. IEEE Transactions on Audio, Speech, and Language Processing 17, 3 (2009).
[14]
Kannan, P. G., Venkatagiri, S. P., Chan, M. C., Ananda, A. L., and Peh, L.-S. Low cost crowd counting using audio tones. In Proc. of ACM SenSys (2012).
[15]
Karypis, G., Han, E.-H., and Kumar, V. Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32, 8 (1999).
[16]
Kim, C., and Stern, R. M. Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Proc. of INTERSPEECH (2008).
[17]
Lee, B. S., and Ellis, D. P. W. Noise robust pitch tracking by subband autocorrelation classification. In Proc. of INTERSPEECH (2012).
[18]
Liu, B., Jiang, Y., Sha, F., and Govindan, R. Cloud-enabled privacy-preserving collaborative learning for mobile sensing. In Proc. of ACM SenSys (2012).
[19]
Liu, G., Dimitriadis, D., and Bocchieri, E. Robust speech enhancement techniques for asr in non-stationary noise and dynamic environments. In Proc. of INTERSPEECH (2013).
[20]
Liu, G., Lei, Y., and Hansen, J. H. A novel feature extraction strategy for multi-stream robust emotion identification. In Proc. of INTERSPEECH (2010).
[21]
Liu, G., Zhang, C., and Hansen, J. H. A linguistic data acquisition front-end for language recognition evaluation. In Proc. of Odyssey (2012).
[22]
Lu, H., Brush, A. B., Priyantha, B., Karlson, A. K., and Liu, J. Speakersense: energy efficient unobtrusive speaker identification on mobile phones. In Proc. of Pervasive (2011).
[23]
Markel, J. E., and Gray, A. H. Linear Prediction of Speech. Springer-Verlag New York, Inc., 1982.
[24]
Matic, A., Osmani, V., and Mayora, O. Automatic sensing of speech activity and correlation with mood changes. Pervasive and Mobile Sensing and Computing for Healthcare (2012).
[25]
Miluzzo, E., Cornelius, C. T., Ramaswamy, A., Choudhury, T., Liu, Z., and Campbell, A. T. Darwin phones: the evolution of sensing and inference on mobile phones. In Proc. of ACM MobiSys (2010).
[26]
Ofoegbu, U. O., Iyer, A. N., Yantorno, R. E., and Smolenski, B. Y. A speaker count system for telephone conversations. In Proc. of IEEE ISPACS (2006).
[27]
Rabbi, M., Ali, S., Choudhury, T., and Berke, E. Passive and in-situ assessment of mental and physical well-being using mobile sensors. In Proc. of ACM UbiComp (2012).
[28]
Rabiner, L., and Juang, B.-H. Fundamentals of speech recognition. Prentice-Hall, Inc., 1993.
[29]
Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., and Aucinas, A. Emotionsense: a mobile phones based adaptive platform for experimental social psychology research. In Proc. of ACM UbiComp (2010).
[30]
Reynolds, D. A. Htimit and llhdb: Speech corpora for the study of handset transducer effects. In Proc. of IEEE ICASSP (1997).
[31]
Rosenberg, A. E., Gorin, A., Liu, Z., and Parthasarathy, S. Unsupervised speaker segmentation of telephone conversations. In Proc. of INTERSPEECH (2002).
[32]
Sonmez, K., Shriberg, E., Heck, L., and Weintraub, M. Modeling dynamic prosodic variation for speaker verification. In Proc. of ICSLP (1998).
[33]
Tarzia, S. P., Dinda, P. A., Dick, R. P., and Memik, G. Indoor localization without infrastructure using the acoustic background spectrum. In Proc. of ACM MobiSys (2011).
[34]
Weppner, J., and Lukowicz, P. Collaborative crowd density estimation with mobile phones. In Proc. of ACM PhoneSense (2011).
[35]
Wu, M., Wang, D., and Brown, G. J. A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing 11, 3 (2003).
[36]
Xu, C., Firner, B., Moore, R. S., Zhang, Y., Trappe, W., Howard, R., Zhang, F., and An, N. Scpl: indoor device-free multi-subject counting and localization using radio signal strength. In Proc. of ACM/IEEE IPSN (2013).

Cited By

View all
  • (2024)Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech AudioProceedings of the 25th International Workshop on Mobile Computing Systems and Applications10.1145/3638550.3641133(79-85)Online publication date: 28-Feb-2024
  • (2024)Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651535(1-8)Online publication date: 30-Jun-2024
  • (2024)WIFIACT: Enhancing Human Sensing Through Environment Robust Preprocessing And Bayesian Self-Supervised LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446566(13391-13395)Online publication date: 14-Apr-2024
  • Show More Cited By

Index Terms

  1. Crowd++: unsupervised speaker count with smartphones

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
    September 2013
    846 pages
    ISBN:9781450317702
    DOI:10.1145/2493432
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 September 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio inference
    2. smartphone sensing
    3. speaker count

    Qualifiers

    • Research-article

    Conference

    UbiComp '13
    Sponsor:

    Acceptance Rates

    UbiComp '13 Paper Acceptance Rate 92 of 394 submissions, 23%;
    Overall Acceptance Rate 764 of 2,912 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech AudioProceedings of the 25th International Workshop on Mobile Computing Systems and Applications10.1145/3638550.3641133(79-85)Online publication date: 28-Feb-2024
    • (2024)Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651535(1-8)Online publication date: 30-Jun-2024
    • (2024)WIFIACT: Enhancing Human Sensing Through Environment Robust Preprocessing And Bayesian Self-Supervised LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446566(13391-13395)Online publication date: 14-Apr-2024
    • (2024)Mobile sensing-based depression severity assessment in participants with heterogeneous mental health conditionsScientific Reports10.1038/s41598-024-69739-z14:1Online publication date: 13-Aug-2024
    • (2023)Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108827:3(1-29)Online publication date: 27-Sep-2023
    • (2023)AutoProfile: An Intelligent Profile Switching System for SmartphonesIEEE Transactions on Mobile Computing10.1109/TMC.2022.314120522:6(3151-3164)Online publication date: 1-Jun-2023
    • (2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
    • (2023)Crowd Detection System & Prediction Analysis2023 International Conference on Advanced Computing Technologies and Applications (ICACTA)10.1109/ICACTA58201.2023.10393153(1-6)Online publication date: 6-Oct-2023
    • (2023)Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT)10.1109/DCOSS-IoT58021.2023.00013(25-33)Online publication date: Jun-2023
    • (2023)Neural Network Models for Time Series DataArtificial Intelligence for Edge Computing10.1007/978-3-031-40787-1_1(3-25)Online publication date: 4-Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media