research-article

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

Authors:
Hui Liu

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Xin Wang

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Yuheng Wei

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Wei Shao

RMIT University, Melbourne, Victoria

RMIT University, Melbourne, Victoria
View Profile

,
Jonathan Liono

RMIT University, Melbourne, Victoria

RMIT University, Melbourne, Victoria
View Profile

,
Flora D. Salim

RMIT University, Melbourne, Victoria

RMIT University, Melbourne, Victoria
View Profile

,
Bo Deng

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Junzhao Du

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and ServicesNovember 2018Pages 392–401https://doi.org/10.1145/3286978.3286995

Published:05 November 2018Publication History

MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

Pages 392–401

ABSTRACT

In this paper, we focus on designing and developing ProMETheus, an intelligent system for meeting minutes generated from audio data. The first task in ProMETheus is to recognize the speakers from noisy audio data. Speaker recognition algorithm is used to automatically identify who is speaking according to the speech in an audio data. Naturally, speech recognition will transcribe speakers' audio to text so that ProMETheus can generate the complete meeting text with speakers' name chronologically. In order to show the subject of the meeting and the agreed action, we use text summarization algorithm that can extract meaningful key phrases and summary sentences from the complete meeting text. In addition, sentiment analysis for meeting text of different speakers can make the agreed action more humane due to calculating the relevance score of each course by the sentiment and attitude in text tone. The ProMETheus is capable of accurately summarizing the meeting and analyzing the agreed action. Our robust system is evaluated on a real-world audio meeting dataset that involves multiple speakers in each meeting session.

References

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D Trippe, Juan B Gutierrez, and Krys Kochut. 2017. Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.Google Scholar
Dario Amodei et al. 2016. Deep speech 2: end-to-end speech recognition in english and mandarin. In International Conference on Machine Learning, 173--182. Google ScholarDigital Library
R Armfield. 2010. Virtual meetings save real money. Bank Technology News, 23, 7, 13.Google Scholar
Brigitte Bigi. 2003. Using kullback-leibler distance for text categorization. In European Conference on Information Retrieval. Springer, 305--319. Google ScholarDigital Library
J Boehmer. 2009. Harvard study shows face-to-face meeting value, rising virtual interest. In Meeting News number 12. Vol. 33, 9.Google Scholar
Lukáš Burget, Oldřich Plchot, Sandro Cumani, Ondřej Glembek, Pavel Matějka, and Niko Brümmer. 2011. Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 4832--4835.Google ScholarCross Ref
Jean Carletta et al. 2005. The ami meeting corpus: a pre-announcement. In International Workshop on Machine Learning for Multimodal Interaction. Springer, 28--39. Google ScholarDigital Library
Pavel Curtis, Anoop Gupta, Bruce Johnson, Katherine J Drakos, Paul J Hough, Mary P Czerwinski, Richard J McAniff, and Raymond E Ozzie. 2012. Collaborative generation of meeting minutes and agenda confirmation. US Patent 8,266,534. Google Patents, (Sept. 2012).Google Scholar
Susan T Dumais. 2004. Latent semantic analysis. Annual review of information science and technology, 38, 1, 188--230.Google Scholar
Günes Erkan and Dragomir R Radev. 2004. Lexrank: graph-based lexical centrality as salience in text summarization. journal of artificial intelligence research, 22, 457--479. Google ScholarDigital Library
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. ACM, 369--376. Google ScholarDigital Library
Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2, 3, 258--268.Google ScholarCross Ref
Thomas Hain, Lukas Burget, John Dines, Giulia Garau, Vincent Wan, Martin Karafi, Jithendra Vepa, and Mike Lincoln. 2007. The ami system for the transcription of speech in meetings. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE, IV--357. Google ScholarDigital Library
Taufiq Hasan and John HL Hansen. 2011. A study on universal background model training in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1890--1899. Google ScholarDigital Library
Taher H Haveliwala. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE transactions on knowledge and data engineering, 15, 4, 784--796. Google ScholarDigital Library
Geoffrey Hinton et al. 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29, 6, 82--97.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8, 1735--1780. Google ScholarDigital Library
Daniel Jurafsky. 2000. Speech and language processing: an introduction to natural language processing. Computational linguistics, and speech recognition. Google ScholarDigital Library
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.Google Scholar
David Kirk et al. 2007. Nvidia cuda software and gpu parallel computing architecture. In ISMM. Vol. 7, 103--104. Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361, 10, 1995. Google ScholarDigital Library
LoopUp. 2016. Enterprise Conferencing: User Behavior & Impact Report. https://loopup.com/enterprise-conferencing-user-behavior-impact-report.Google Scholar
Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2015. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114.Google Scholar
Rada Mihalcea and Paul Tarau. 2004. Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
Lorenza Mondada. 2011. The interactional production of multiple spatialities within a participatory democracy meeting. Social Semiotics, 21, 2, 289--316.Google ScholarCross Ref
Geeta Nijhawan and MK Soni. 2014. Speaker recognition using support vector machine. International Journal of Computer Applications, 87, 2.Google ScholarCross Ref
Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2, 1--2, 1--135. Google ScholarDigital Library
P Wayne Power and Johann A Schoonees. 2002. Understanding background mixture models for foreground segmentation. In Proceedings image and vision computing New Zealand. Vol. 2002.Google Scholar
Steve Renals, Thomas Hain, and Hervé Bourlard. 2007. Recognition and understanding of meetings the ami and amida projects. In Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 238--247.Google ScholarCross Ref
Fred Richardson, Douglas Reynolds, and Najim Dehak. 2015. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22, 10, 1671--1675.Google ScholarCross Ref
Steven G Rogelberg, Clifton W Scott, Brett Agypt, Jason Williams, John E Kello, Tracy McCausland, and Jessie L Olien. 2014. Lateness to meetings: examination of an unexplored temporal phenomenon. European Journal of Work and Organizational Psychology, 23, 3, 323--341.Google ScholarCross Ref
Bernhard Scholkopf, Kah-Kay Sung, Christopher JC Burges, Federico Girosi, Partha Niyogi, Tomaso Poggio, and Vladimir Vapnik. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE transactions on Signal Processing, 45, 11, 2758--2765. Google ScholarDigital Library
Wei Shao, Thuong Nguyen, Kai Qin, Moustafa Youssef, and Flora D Salim. 2018. Bledoorguard: a device-free person identification framework using bluetooth signals for door access. IEEE Internet of Things Journal.Google Scholar
Wei Shao, Flora D Salim, Thuong Nguyen, and Moustafa Youssef. 2017. Who opened the room? device-free person identification using bluetooth signals in door access. In Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2017 IEEE International Conference on. IEEE, 68--75.Google ScholarCross Ref
Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, and Wei-Ying Ma. 2004. Web-page classification through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 242--249. Google ScholarDigital Library
Hema Srikanth, Gary Denner, Mette Friedel Margareta Hammer, and Steve R Murray. 2012. Meeting agenda management. US Patent 8,214,748. Google Patents, (July 2012).Google Scholar
Phil Thompson, Anne James, and Emanuel Stanciu. 2010. Agent based ontology driven virtual meeting assistant. In International Conference on Future Generation Information Technology. Springer, 51--62.Google ScholarCross Ref
Gokhan Tur et al. 2010. The calo meeting assistant system. IEEE Transactions on Audio, Speech, and Language Processing, 18, 6, 1601--1611.Google ScholarCross Ref
Christian Uhle and Tom Bäckström. 2017. Voice activity detection. In Speech Coding. Springer, 185--203.Google Scholar
Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43, 6, 1606--1618. Google ScholarDigital Library
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 347--354. Google ScholarDigital Library
Michael Yoerger, John Crowe, and Joseph A Allen. 2015. Participate or else!: the effect of participation in decision-making in meetings on employee engagement. Consulting Psychology Journal: Practice and Research, 67, 1, 65.Google ScholarCross Ref

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System
1. Computing methodologies
  1. Artificial intelligence

Recommendations

SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System
Abstract
In this paper, we focus on designing and implementing SmartMeeting, an intelligent system for generating meeting minutes from meeting audio data. SmartMeeting mainly consists of four parts: the first part is the meeting speech detection, which is ...
Read More
Pseudo pitch synchronous analysis of speech with applications to speaker recognition

The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations in pitch causing variations in the features. For speaker recognition systems, this phenomenon, known as "pitch mismatch" between training and ...
Read More
Environmental robust speech and speaker recognition through multi-channel histogram equalization

Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services
November 2018
490 pages
ISBN:9781450360937
DOI:10.1145/3286978
General Chairs:
Henning Schulzrinne
Columbia University, USA
,
Pan Li
Case Western Reserve University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
meeting minutes
meeting text
sentiment analysis
speaker recognition
speech recognition
text summarization
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate26of87submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 192
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

ABSTRACT

References

Cited By

Recommendations

SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System

Pseudo pitch synchronous analysis of speech with applications to speaker recognition

Environmental robust speech and speaker recognition through multi-channel histogram equalization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

MobiQuitous '18: Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

ABSTRACT

References

Cited By

Recommendations

SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System

Pseudo pitch synchronous analysis of speech with applications to speaker recognition

Environmental robust speech and speaker recognition through multi-channel histogram equalization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media