Smart Services Using Voice and Images

Iliev, Alexander I.; Stanchev, Peter L.

doi:10.1007/978-3-662-62919-2_6

Alexander I. Iliev^11,12 &
Peter L. Stanchev^12,13

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 12630))

452 Accesses

Abstract

In this chapter, we will emphasize on some of the most prominent advances in smart technologies that formulate the smart city ecosystem. Furthermore, we will be highlighting the automation of numerous developments based on the extraction and analysis of digital media, using speech and images. At present, there is a multitude of practical systems used for personalization and recommendation of different media. On the other hand, there are assorted types of services in different areas that are directly benefiting from these advancements. Most of them were created with human-machine interaction methodology in mind, where people had to interact with the machines in various ways. In the past this type of interaction has been completed through the use of conventional interfaces such as a mouse and a keyboard, where the user had to type a response manually, which was in turn recorded by the machine for subsequent analysis. Therefore, in order to simplify these types of interactions and lead to improvement of services, new methodologies must be studied, discovered and developed so as to improve services such as recommendation and personalization services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://www.ibm.com/blogs/insights-on-business/consumer-products/2-5-quintillion-bytes-of-data-created-every-day-how-does-cpg-retail-manage-it/. Accessed Apr 2020
https://www.networkworld.com/article/3325397/storage/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html. Accessed Apr 2020
Lea, R.: Smart Cities: An Overview of the Technology Trends Driving Smart Cities (2017)
Google Scholar
Bala, A., et al.: Voice command recognition system based on MFCC and DTW. Int. J. Eng. Sci. Technol. 2(12), 7335–7342 (2010)
Google Scholar
Parameshachari, B.D., Sawan, K.G., Gooneshwaree, H., Tulsirai, T.G.: A study on smart home control system through speech. Int. J. Comput. Appl. 69(19), 30–39 (2013). 0975 – 8887
Google Scholar
Alkhawaldeh, R.S.: DGR: gender recognition of human speech using one-dimensional conventional neural network. Hindawi Sci. Program. (2019). Article ID 7213717, 12 pages
Google Scholar
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication 116, 56–76 (2020)
Article Google Scholar
Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001). https://doi.org/10.1109/79.911197
Teager, H., Teager, S.: Evidence for nonlinear sound production mechanisms in the vocal tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling, pp. 241–261. Springer, Cham (1990). https://doi.org/10.1007/978-94-009-2037-8_10
Ghazanfar Latif, A.H., Khan, M., Butt, M., Butt, O.: IoT based real-time voice analysis and smart monitoring system for disabled people. In: Asia Pacific Journal of Contemporary Education and Communication Technology, Asia Pacific Institute of Advanced Research (APIAR), vol. 3, no. 2, pp. 227–234 (2017). https://doi.org/10.25275/apjcectv3i2ict5. ISBN (eBook) 978 0 9943656 8 2 | ISSN: 2205-6181
Smith, B.: Raspberry Pi Assembly Language RASPBIAN Beginners: Hands on Guide. CreateSpace Independent Publishing Platform (2013)
Google Scholar
Kumar, S.S., RangaBabu, T.: Emotion and gender recognition of speech signals using SVM. Emotion 4(3) (2015)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behav. Res. Methods 40(2), 531–539 (2008)
Article Google Scholar
Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM, June 2006
Google Scholar
Sood, S.K., Mahajan, I.: Wearable IoT sensor based healthcare system for identifying and controlling chikungunya virus. Comput. Ind. 91(2017), 33–44 (2017)
Article Google Scholar
Gope, P., Hwang, T.: BSN-care: a secure IoT-based modern healthcare system using body sensor network. IEEE Sensors J. 16(5), 1368–1376 (2016)
Article Google Scholar
Frant, E., Ispas, I., Dragomir, V., Dascalu, M., Zoltan, E., Stoica, I.C.: Voice based emotion recognition with convolutional neural networks for companion robots. Romanian J. Inf. Sci. Technol. 20(3), 222–240 (2017)
Google Scholar
Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Commun. 40, 5–32 (2003)
Article Google Scholar
Bhatti, M., Wang, Y., Guan, L.: A neural network approach for human emotion recognition in speech. In: IEEE International Symposium on Circuits and Systems, Vancouver, BC, pp. 181–184 (2004)
Google Scholar
Noda, T., Yano, Y., Doki, S., Okuma, S.: Adaptive emotion recognition in speech by feature selection based on KL-divergence. In: IEEE International Conference on Systems, Man, and Cybernetics in Taipei, Taiwan, 8–11 October 2006, pp. 1921–1926 (2006)
Google Scholar
Murray and Arnott: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(i2), 1097–1108 (1993)
Google Scholar
Nazia, H., Mahmuda, N.: Sensing emotion from voice jitter. In: SenSys 2018, Shenzhen, China, November 4–7 2018, pp. 359–360 (2018). ISBN 978-1-4503-5952-8
Google Scholar
Ganapathy, H.H.S., Mallidi, S.H.: Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans. Audio, Speech, Lang. Process. 22(8), 1285–1295 (2014)
Article Google Scholar
Kheder, M.A.D., Bausquet, P.: Dealing with additive noise in speaker recognition systems based on i-vector approach. In: IEEE ICASSP, Canada (2013)
Google Scholar
Atal, B., Hanauer, S.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2), 637–655 (1971)
Article Google Scholar
Iliev, A.I., Stanchev, P.L.: Smart multifunctional digital content ecosystem using emotion analysis of voice. In: 18th International Conference on Computer Systems and Technologies CompSysTech 2017, Ruse, Bulgaria, 22–24 June 2017, volume 1369, pp. 58–64. ACM (2017). ISBN 978-1-4503-5234-5
Google Scholar
Iliev, A.: Monograph: Emotion Recognition From Speech. Lambert Academic Publishing (2012)
Google Scholar
Iliev, A.I., Stanchev, L.P.: Information retrieval and recommendation using emotion from speech signal. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval, Miami, FL, USA, 10–12 April 2018, pp. 222–225 (2018). https://doi.org/10.1109/MIPR.2018.00054
Iliev, A.I., Scordilis, M.S.: Spoken emotion recognition using glottal symmetry. EURASIP J. Adv. Signal Process. (2011). Article ID 624575, ISSN 1687-6180
Google Scholar
Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence glottal statistics. In: Proceedings of the 15th International Conference on systems, Signals and Image Processing, IEEE-IWSSIP 2008, Bratislava, Slovakia, 25–28 June 2008, pp. 465–468 (2008)
Google Scholar
Iliev, A.I., Stanchev, P.L.: Glottal attributes extracted from speech with application to emotion driven smart systems. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018), KDIR, vol. 1, pp. 297–302, Thomson Reuters, Seville, Spain, 18–20 September 2018. ISBN 978-989-758-330-8
Google Scholar
Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. J. Comput. Speech Lang. 24(3), 445–460 (2010). ISSN 0885-2308
Google Scholar
Iliev, A.I., Zhang, Y., Scordilis, M.S.: Spoken emotion classification using ToBI features and GMM. In: Proceedings of the 14th International Workshop on Signals and Image Processing 2007 and the 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. IEEE-IWSSIP 2007, Maribor, Slovenia, 27–30 June 2007, pp. 495–498 (2007). ISSN 16874722, 16874714
Google Scholar
Iliev, A.I.: Emotion recognition in speech using inter-sentence time-domain statistics. IJIRSET Int. J. Innov. Res. Sci. Eng. Technol. 5(3), 3245–3254 (2016)
Google Scholar
Iliev, A.I.: Feature vectors for emotion recognition in speech. In: National Informatics Conference, Sofia, Bulgaria, pp. 225–238 (2016)
Google Scholar
Iliev, A.I.: Content discovery using perceptual automation. In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES 2018), Tokyo, Japan, 25–28 September 2018, pp. 233–238. ACM (2018). https://doi.org/10.1145/3281375.3281399. ISBN 978-1-4503-5622-0
Lukose, S., Upadhya, S.: Music player based on emotion recognition of voice signals. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), IEEE 2017, pp. 1751–1754 (2017). ISBN 978-1-5090-6106-8
Google Scholar
Stanchev, P.: Using image mining for image retrieval. In: IASTED International Conference Computer Science and Technology, Cancun, Mexico, 19–21 May 2003, pp. 214–218 (2003)
Google Scholar
Viana, W.: Using images to extend smart object discovery in an Internet of Things scenario. file:///C:/Users/pstan/Desktop/4057-829-4030-1-10-20181009.pdf
Google Scholar
Stanchev, P., Green Jr., D., Dimitrov, B.: Some issues in the art image database systems. J. Digit. Inf. Manag. 4(4), 227–232 (2006)
Google Scholar
Stanchev, P., Green Jr., D., Dimitrov, B.: High level color similarity retrieval. Int. J. Inf. Theor. Appl. 10(3), 283–287 (2003)
Google Scholar
Ivanova, K., et al.: Local features in APICAS (analyzing of added value of the descriptors based on MPEG-7 vector quantization). Int. J Comput. Sci. Artif. Intell. 2(4), 23–32 (2012). ISSN: 2226-4450 (online) 2226-4469 (print)
Google Scholar
Radenski, A., et al.: Big data techniques, systems, applications, and platforms: case studies from academia. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 893–898 (2016)
Google Scholar
Ivanova, K., Mitov, I., Stanchev, P., Ein-Dor, P., Vanhoof, K.: Establishing correspondences between attribute spaces and complex concept spaces using meta-PGN classifier. In: Proc. of the 2nd International Conference on Digital Preservation and Presentation of Cultural Heritage, V. Tarnovo, Bulgaria, IMI-BAS, Sofia, pp. 71–77 (2012). ISSN 1314-4006
Google Scholar
Stanchev, P., Kolinski, M.: Novel artist identification approach through digital image analysis using machine learning and merged images. In: Rocha, Á., Ferrás, C., Paredes, M. (eds.) ICITS 2019. AISC, vol. 918, pp. 465–471. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11890-7_45
Chapter Google Scholar
Stanchev, P., Geske, J.: Autonomous cars. History. State of art. Research problems. Springer Communications in Computer and Information Science, vol. 601, pp 1–10 (2016)
Google Scholar
Viswanathan, V., Hussein, R.: Applications of image processing and real-time embedded systems in autonomous cars: a short review. Int. J. Image Process. (IJIP) 11(2), 36–49 (2017)
Google Scholar
Salhi, A., Minaoui, B., Fakir, M.: Robust automatic traffic signs detection using fast polygonal approximation of digital curves. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS), Marrakech, pp. 433–437 (2014)
Google Scholar
Amato, G., Carrara, F., Falchi, F., Gennaro, C., Meghini, C., Vairo, C.: Deep learning for decentralized parking lot occupancy detection. Expert Syst. Appl. 72, 327–334 (2017)
Article Google Scholar
de Almeida, P.R.L., Oliveira, L.S., Britto Jr., A.S., Silva Jr., E.J., Koerich, A.L.: PKLot – a robust dataset for parking lot classification. Expert Syst. Appl. 42, 4937–4949 (2015)
Google Scholar
Stanchev, P., Geske, J.: Smart Parking. Geoinformatics Research Papers, vol. 5, BS1002 (2017). https://doi.org/10.2205/codata2017
Falchi, F., Gennaro, C., Savino, P., Stanchev, P.: Efficient video stream filtering. IEEE Multimed. 52–61 (2008)
Google Scholar
Shapiro, M.: ‘The choice of reference points in best-match file searching’. Comm. ACM 20(5), 339–343 (1977)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UC Berkeley Global, Berkeley, CA, USA
Alexander I. Iliev
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
Alexander I. Iliev & Peter L. Stanchev
Kettering University, Flint, USA
Peter L. Stanchev

Authors

Alexander I. Iliev
View author publications
You can also search for this author in PubMed Google Scholar
Peter L. Stanchev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander I. Iliev .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
IFS, Technical University of Vienna, Vienna, Austria
A Min Tjoa
University of Pau and the Adour Region, Anglet, France
Richard Chbeir

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Iliev, A.I., Stanchev, P.L. (2021). Smart Services Using Voice and Images. In: Hameurlain, A., Tjoa, A.M., Chbeir, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII. Lecture Notes in Computer Science(), vol 12630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62919-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-62919-2_6
Published: 17 January 2021
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-62918-5
Online ISBN: 978-3-662-62919-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics