ABSTRACT
In this paper, we focus on designing and developing ProMETheus, an intelligent system for meeting minutes generated from audio data. The first task in ProMETheus is to recognize the speakers from noisy audio data. Speaker recognition algorithm is used to automatically identify who is speaking according to the speech in an audio data. Naturally, speech recognition will transcribe speakers' audio to text so that ProMETheus can generate the complete meeting text with speakers' name chronologically. In order to show the subject of the meeting and the agreed action, we use text summarization algorithm that can extract meaningful key phrases and summary sentences from the complete meeting text. In addition, sentiment analysis for meeting text of different speakers can make the agreed action more humane due to calculating the relevance score of each course by the sentiment and attitude in text tone. The ProMETheus is capable of accurately summarizing the meeting and analyzing the agreed action. Our robust system is evaluated on a real-world audio meeting dataset that involves multiple speakers in each meeting session.
- Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D Trippe, Juan B Gutierrez, and Krys Kochut. 2017. Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.Google Scholar
- Dario Amodei et al. 2016. Deep speech 2: end-to-end speech recognition in english and mandarin. In International Conference on Machine Learning, 173--182. Google ScholarDigital Library
- R Armfield. 2010. Virtual meetings save real money. Bank Technology News, 23, 7, 13.Google Scholar
- Brigitte Bigi. 2003. Using kullback-leibler distance for text categorization. In European Conference on Information Retrieval. Springer, 305--319. Google ScholarDigital Library
- J Boehmer. 2009. Harvard study shows face-to-face meeting value, rising virtual interest. In Meeting News number 12. Vol. 33, 9.Google Scholar
- Lukáš Burget, Oldřich Plchot, Sandro Cumani, Ondřej Glembek, Pavel Matějka, and Niko Brümmer. 2011. Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 4832--4835.Google ScholarCross Ref
- Jean Carletta et al. 2005. The ami meeting corpus: a pre-announcement. In International Workshop on Machine Learning for Multimodal Interaction. Springer, 28--39. Google ScholarDigital Library
- Pavel Curtis, Anoop Gupta, Bruce Johnson, Katherine J Drakos, Paul J Hough, Mary P Czerwinski, Richard J McAniff, and Raymond E Ozzie. 2012. Collaborative generation of meeting minutes and agenda confirmation. US Patent 8,266,534. Google Patents, (Sept. 2012).Google Scholar
- Susan T Dumais. 2004. Latent semantic analysis. Annual review of information science and technology, 38, 1, 188--230.Google Scholar
- Günes Erkan and Dragomir R Radev. 2004. Lexrank: graph-based lexical centrality as salience in text summarization. journal of artificial intelligence research, 22, 457--479. Google ScholarDigital Library
- Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. ACM, 369--376. Google ScholarDigital Library
- Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2, 3, 258--268.Google ScholarCross Ref
- Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Vincent Wan, Martin Karafi, Jithendra Vepa, and Mike Lincoln. 2007. The ami system for the transcription of speech in meetings. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE, IV--357. Google ScholarDigital Library
- Taufiq Hasan and John HL Hansen. 2011. A study on universal background model training in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1890--1899. Google ScholarDigital Library
- Taher H Haveliwala. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE transactions on knowledge and data engineering, 15, 4, 784--796. Google ScholarDigital Library
- Geoffrey Hinton et al. 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29, 6, 82--97.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8, 1735--1780. Google ScholarDigital Library
- Daniel Jurafsky. 2000. Speech and language processing: an introduction to natural language processing. Computational linguistics, and speech recognition. Google ScholarDigital Library
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.Google Scholar
- David Kirk et al. 2007. Nvidia cuda software and gpu parallel computing architecture. In ISMM. Vol. 7, 103--104. Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361, 10, 1995. Google ScholarDigital Library
- LoopUp. 2016. Enterprise Conferencing: User Behavior & Impact Report. https://loopup.com/enterprise-conferencing-user-behavior-impact-report.Google Scholar
- Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2015. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114.Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
- Lorenza Mondada. 2011. The interactional production of multiple spatialities within a participatory democracy meeting. Social Semiotics, 21, 2, 289--316.Google ScholarCross Ref
- Geeta Nijhawan and MK Soni. 2014. Speaker recognition using support vector machine. International Journal of Computer Applications, 87, 2.Google ScholarCross Ref
- Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2, 1--2, 1--135. Google ScholarDigital Library
- P Wayne Power and Johann A Schoonees. 2002. Understanding background mixture models for foreground segmentation. In Proceedings image and vision computing New Zealand. Vol. 2002.Google Scholar
- Steve Renals, Thomas Hain, and Hervé Bourlard. 2007. Recognition and understanding of meetings the ami and amida projects. In Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 238--247.Google ScholarCross Ref
- Fred Richardson, Douglas Reynolds, and Najim Dehak. 2015. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22, 10, 1671--1675.Google ScholarCross Ref
- Steven G Rogelberg, Clifton W Scott, Brett Agypt, Jason Williams, John E Kello, Tracy McCausland, and Jessie L Olien. 2014. Lateness to meetings: examination of an unexplored temporal phenomenon. European Journal of Work and Organizational Psychology, 23, 3, 323--341.Google ScholarCross Ref
- Bernhard Scholkopf, Kah-Kay Sung, Christopher JC Burges, Federico Girosi, Partha Niyogi, Tomaso Poggio, and Vladimir Vapnik. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE transactions on Signal Processing, 45, 11, 2758--2765. Google ScholarDigital Library
- Wei Shao, Thuong Nguyen, Kai Qin, Moustafa Youssef, and Flora D Salim. 2018. Bledoorguard: a device-free person identification framework using bluetooth signals for door access. IEEE Internet of Things Journal.Google Scholar
- Wei Shao, Flora D Salim, Thuong Nguyen, and Moustafa Youssef. 2017. Who opened the room? device-free person identification using bluetooth signals in door access. In Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2017 IEEE International Conference on. IEEE, 68--75.Google ScholarCross Ref
- Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, and Wei-Ying Ma. 2004. Web-page classification through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 242--249. Google ScholarDigital Library
- Hema Srikanth, Gary Denner, Mette Friedel Margareta Hammer, and Steve R Murray. 2012. Meeting agenda management. US Patent 8,214,748. Google Patents, (July 2012).Google Scholar
- Phil Thompson, Anne James, and Emanuel Stanciu. 2010. Agent based ontology driven virtual meeting assistant. In International Conference on Future Generation Information Technology. Springer, 51--62.Google ScholarCross Ref
- Gokhan Tur et al. 2010. The calo meeting assistant system. IEEE Transactions on Audio, Speech, and Language Processing, 18, 6, 1601--1611.Google ScholarCross Ref
- Christian Uhle and Tom Bäckström. 2017. Voice activity detection. In Speech Coding. Springer, 185--203.Google Scholar
- Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43, 6, 1606--1618. Google ScholarDigital Library
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 347--354. Google ScholarDigital Library
- Michael Yoerger, John Crowe, and Joseph A Allen. 2015. Participate or else!: the effect of participation in decision-making in meetings on employee engagement. Consulting Psychology Journal: Practice and Research, 67, 1, 65.Google ScholarCross Ref
- ProMETheus: An Intelligent Mobile Voice Meeting Minutes System
Recommendations
SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System
AbstractIn this paper, we focus on designing and implementing SmartMeeting, an intelligent system for generating meeting minutes from meeting audio data. SmartMeeting mainly consists of four parts: the first part is the meeting speech detection, which is ...
Pseudo pitch synchronous analysis of speech with applications to speaker recognition
The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations in pitch causing variations in the features. For speaker recognition systems, this phenomenon, known as "pitch mismatch" between training and ...
Environmental robust speech and speaker recognition through multi-channel histogram equalization
Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear ...
Comments