skip to main content
10.1145/3286978.3286995acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmobiquitousConference Proceedingsconference-collections
research-article

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

Published:05 November 2018Publication History

ABSTRACT

In this paper, we focus on designing and developing ProMETheus, an intelligent system for meeting minutes generated from audio data. The first task in ProMETheus is to recognize the speakers from noisy audio data. Speaker recognition algorithm is used to automatically identify who is speaking according to the speech in an audio data. Naturally, speech recognition will transcribe speakers' audio to text so that ProMETheus can generate the complete meeting text with speakers' name chronologically. In order to show the subject of the meeting and the agreed action, we use text summarization algorithm that can extract meaningful key phrases and summary sentences from the complete meeting text. In addition, sentiment analysis for meeting text of different speakers can make the agreed action more humane due to calculating the relevance score of each course by the sentiment and attitude in text tone. The ProMETheus is capable of accurately summarizing the meeting and analyzing the agreed action. Our robust system is evaluated on a real-world audio meeting dataset that involves multiple speakers in each meeting session.

References

  1. Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D Trippe, Juan B Gutierrez, and Krys Kochut. 2017. Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.Google ScholarGoogle Scholar
  2. Dario Amodei et al. 2016. Deep speech 2: end-to-end speech recognition in english and mandarin. In International Conference on Machine Learning, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R Armfield. 2010. Virtual meetings save real money. Bank Technology News, 23, 7, 13.Google ScholarGoogle Scholar
  4. Brigitte Bigi. 2003. Using kullback-leibler distance for text categorization. In European Conference on Information Retrieval. Springer, 305--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J Boehmer. 2009. Harvard study shows face-to-face meeting value, rising virtual interest. In Meeting News number 12. Vol. 33, 9.Google ScholarGoogle Scholar
  6. Lukáš Burget, Oldřich Plchot, Sandro Cumani, Ondřej Glembek, Pavel Matějka, and Niko Brümmer. 2011. Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 4832--4835.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jean Carletta et al. 2005. The ami meeting corpus: a pre-announcement. In International Workshop on Machine Learning for Multimodal Interaction. Springer, 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pavel Curtis, Anoop Gupta, Bruce Johnson, Katherine J Drakos, Paul J Hough, Mary P Czerwinski, Richard J McAniff, and Raymond E Ozzie. 2012. Collaborative generation of meeting minutes and agenda confirmation. US Patent 8,266,534. Google Patents, (Sept. 2012).Google ScholarGoogle Scholar
  9. Susan T Dumais. 2004. Latent semantic analysis. Annual review of information science and technology, 38, 1, 188--230.Google ScholarGoogle Scholar
  10. Günes Erkan and Dragomir R Radev. 2004. Lexrank: graph-based lexical centrality as salience in text summarization. journal of artificial intelligence research, 22, 457--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. ACM, 369--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2, 3, 258--268.Google ScholarGoogle ScholarCross RefCross Ref
  13. Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Vincent Wan, Martin Karafi, Jithendra Vepa, and Mike Lincoln. 2007. The ami system for the transcription of speech in meetings. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE, IV--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Taufiq Hasan and John HL Hansen. 2011. A study on universal background model training in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1890--1899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Taher H Haveliwala. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE transactions on knowledge and data engineering, 15, 4, 784--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Geoffrey Hinton et al. 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29, 6, 82--97.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8, 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel Jurafsky. 2000. Speech and language processing: an introduction to natural language processing. Computational linguistics, and speech recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.Google ScholarGoogle Scholar
  20. David Kirk et al. 2007. Nvidia cuda software and gpu parallel computing architecture. In ISMM. Vol. 7, 103--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361, 10, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LoopUp. 2016. Enterprise Conferencing: User Behavior & Impact Report. https://loopup.com/enterprise-conferencing-user-behavior-impact-report.Google ScholarGoogle Scholar
  23. Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2015. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114.Google ScholarGoogle Scholar
  24. Rada Mihalcea and Paul Tarau. 2004. Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google ScholarGoogle Scholar
  25. Lorenza Mondada. 2011. The interactional production of multiple spatialities within a participatory democracy meeting. Social Semiotics, 21, 2, 289--316.Google ScholarGoogle ScholarCross RefCross Ref
  26. Geeta Nijhawan and MK Soni. 2014. Speaker recognition using support vector machine. International Journal of Computer Applications, 87, 2.Google ScholarGoogle ScholarCross RefCross Ref
  27. Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2, 1--2, 1--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P Wayne Power and Johann A Schoonees. 2002. Understanding background mixture models for foreground segmentation. In Proceedings image and vision computing New Zealand. Vol. 2002.Google ScholarGoogle Scholar
  29. Steve Renals, Thomas Hain, and Hervé Bourlard. 2007. Recognition and understanding of meetings the ami and amida projects. In Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 238--247.Google ScholarGoogle ScholarCross RefCross Ref
  30. Fred Richardson, Douglas Reynolds, and Najim Dehak. 2015. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22, 10, 1671--1675.Google ScholarGoogle ScholarCross RefCross Ref
  31. Steven G Rogelberg, Clifton W Scott, Brett Agypt, Jason Williams, John E Kello, Tracy McCausland, and Jessie L Olien. 2014. Lateness to meetings: examination of an unexplored temporal phenomenon. European Journal of Work and Organizational Psychology, 23, 3, 323--341.Google ScholarGoogle ScholarCross RefCross Ref
  32. Bernhard Scholkopf, Kah-Kay Sung, Christopher JC Burges, Federico Girosi, Partha Niyogi, Tomaso Poggio, and Vladimir Vapnik. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE transactions on Signal Processing, 45, 11, 2758--2765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wei Shao, Thuong Nguyen, Kai Qin, Moustafa Youssef, and Flora D Salim. 2018. Bledoorguard: a device-free person identification framework using bluetooth signals for door access. IEEE Internet of Things Journal.Google ScholarGoogle Scholar
  34. Wei Shao, Flora D Salim, Thuong Nguyen, and Moustafa Youssef. 2017. Who opened the room? device-free person identification using bluetooth signals in door access. In Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2017 IEEE International Conference on. IEEE, 68--75.Google ScholarGoogle ScholarCross RefCross Ref
  35. Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, and Wei-Ying Ma. 2004. Web-page classification through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 242--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hema Srikanth, Gary Denner, Mette Friedel Margareta Hammer, and Steve R Murray. 2012. Meeting agenda management. US Patent 8,214,748. Google Patents, (July 2012).Google ScholarGoogle Scholar
  37. Phil Thompson, Anne James, and Emanuel Stanciu. 2010. Agent based ontology driven virtual meeting assistant. In International Conference on Future Generation Information Technology. Springer, 51--62.Google ScholarGoogle ScholarCross RefCross Ref
  38. Gokhan Tur et al. 2010. The calo meeting assistant system. IEEE Transactions on Audio, Speech, and Language Processing, 18, 6, 1601--1611.Google ScholarGoogle ScholarCross RefCross Ref
  39. Christian Uhle and Tom Bäckström. 2017. Voice activity detection. In Speech Coding. Springer, 185--203.Google ScholarGoogle Scholar
  40. Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43, 6, 1606--1618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 347--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Michael Yoerger, John Crowe, and Joseph A Allen. 2015. Participate or else!: the effect of participation in decision-making in meetings on employee engagement. Consulting Psychology Journal: Practice and Research, 67, 1, 65.Google ScholarGoogle ScholarCross RefCross Ref
  1. ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services
      November 2018
      490 pages
      ISBN:9781450360937
      DOI:10.1145/3286978

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 November 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate26of87submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader