research-article

Integrated Artificial Intelligence for Making Digital Human

Authors:

Yoshiyuki Usami,

Kosuke Kitaoka,

Koichi ShindoAuthors Info & Claims

ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

Pages 267 - 273

https://doi.org/10.1145/3587716.3587760

Published: 07 September 2023 Publication History

Abstract

Artificial intelligence is actively researched in various fields such as image recognition, image detection, audio recognition, natural language processing, face expression recognition, and facial expression generation. If we want to create artificial intelligence in the original sense, it will be necessary to integrate these many research results and create a system that can exactly imitate the functions of the human brain. Commercially, the current situation is that integrated AI such as Ameria [2], Uneeq [3], Neon [13], LaMDA [29] and the system using GPT-3 [9] have entered the market. However, there is no research that creates integrated AI with open source in the academic field. This work is an attempt to construct such an integrated AI as an academic research which is in an form of open source. Furthermore, this work is described in a form of multi-processing job with socket connection. Then, execution of the program can be accomplished by multiple computers. For the visual input, object detection is performed by Redman’s YOLO [14]. Next, the system accomplishes Image2text which generates sentences describing the image [34]. The system recognizes the meaning of visual input. As for speech recognition, the question and answering task is activated, and it is possible to give an accurate answer to the question through the microphone [7]. In addition, text generation enables this system to respond to human chattering [5]. This work combines four different sources: visual, text, audio, and scraping outworld news sources. We believe that attempts like this work will become more common in future AI studies.

References

[1]

Raed Almalki Abdulwahhab Alshammari and Riyad Alshammari. 2021. Developing a Predictive Model of Predicting Appointment No-Show by Using Machine Learning Algorithms. Journal of Advances in Information Technology 12, 3 (2021), 234–239.

[2]

[2] Ameria Conversational AI. 2022. http//amelia.ai/conversationao-ai.

[3]

[3] UNEEQ Ameria Conversational AI. 2022. http//amelia.ai/conversationao-ai.

[4]

[4] Japanese Bert. 2021. https://github.com/cl-tohoku/bert-japanese/tree/v1.0.

[5]

Sid Black 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode 5 – Workshop on Challenges and Perspectives in Creating Large Language Models, 95–136.

[6]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. CoRR abs/2004.10934 (2020). arXiv:2004.10934https://arxiv.org/abs/2004.10934

[7]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1870–1879. https://doi.org/10.18653/v1/P17-1171

[8]

Félix Mikaelian. et. al. 2021. cdQA: Closed Domain Question Answering.

[9]

Tom B. Brown et al.2020. Language models are few-shot learners.NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing SystemsNo.159, 1877–1901. https://beta.openai.com/docs/models/gpt-3.

[10]

[10] github. 2022(Dec). https://github.com.

[11]

Andrew G. Howard 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861http://arxiv.org/abs/1704.04861

[12]

Elie Maamary Hussein Mozannar, Karl El Hajal and Hazem Hajj. 2019. Neural Arabic Question Answering. Proceedings of the Fourth Arabic Natural Language Processing Workshop, Florence Italy, 108–118.

[13]

[13] NEON is a computationally created virtual being that looks and behaves like us. 2019. http//neonlive.aio-ai.

[14]

R. Girshick J. Redmon, S. Divvala and A. Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.

[15]

Kenton Lee Jacob Devlin, Ming-Wei Chang and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. roceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1, 4171–4186.

[16]

[16] G Jocher 2022. https://github.com/ultralytics/yolov5/discussions/8996.

[17]

Boran Sekeroglu Kubra Tuncal and Cagri Ozkan. 2020. Lung Cancer Incidence Prediction Using Machine Learning Algorithms. Journal of Advances in Information Technology 11, 2 (2020), 91–96.

[18]

Praveen Paritosh Tim Sturge Kurt Bollacker, Colin Evans and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1247–1250.

Digital Library

[19]

Wuttipong Kusonkhum, Korb Srinavin, Narong Leungbootnak, Preenithi Aksorn, and Tanayut Chaitongrat. 2022. Government Construction Project Budget Prediction Using Machine Learning. Journal of Advances in Information Technology 13 (01 2022). https://doi.org/10.12720/jait.13.1.29-35

[20]

Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, and Xiaolin Wei. 2022. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arxiv:2209.02976 [cs.CV]

[21]

Gary Marcus, Ernest Davis, and Scott Aaronson. 2022. A very preliminary analysis of DALL-E 2. arxiv:2204.13807 [cs.CV]

[22]

[22] Japanese Pretained Model. 2022. https://github.com/rinnakk/japanese-pretrained-models.

[23]

[23] Nvidia. 2018. https://github.com/NVlabs/stylegan.

[24]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]

[25]

Optical Character Recognition. 2023. https://ocr2edit.com.

[26]

Optical Character Recognition. 2023. https://ocr.best.

[27]

Online Optical Character Recognition. 2023. https://onlineocr.net.

[28]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arxiv:2205.11487 [cs.CV]

[29]

Romal Thoppilan 2022. LaMDA: Language Models for Dialog Applications. arxiv:2201.08239 [cs.CL]

[30]

Tensorflow tutorials in tensorflow.org. 2020. Image captioning with visual attention.

[31]

[31] Ultralytics. 2020. https://github.com/ultralytics/yolov5.

[32]

[32] Yoshiyuki Usami. 2022. http//github/usami0jp/ai.

[33]

Yoshiyuki Usami. 2022. Making Integrated AI Having Abilities of Hearing, Looking, and Answering. Amazon.com.

[34]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and Tell: A Neural Image Caption Generator. CoRR abs/1411.4555 (2014). arXiv:1411.4555.http://arxiv.org/abs/1411.4555

[35]

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2021. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13029–13038.

[36]

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2022. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arxiv:2207.02696 [cs.CV]

[37]

Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. 2022. GIT: A Generative Image-to-text Transformer for Vision and Language. arxiv:2205.14100 [cs.CV]

[38]

Zhilin Yang 2019. XLNet: generalized autoregressive pretraining for language understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems, 5753–5763.

[39]

Yang Zhou, Dingzeyu Li, Xintong Han, Evangelos Kalogerakis, Eli Shechtman, and Jose Echevarria. 2020. MakeItTalk: Speaker-Aware Talking Head Animation. CoRR abs/2004.12992 (2020). arXiv:2004.12992.https://arxiv.org/abs/2004.12992.

Cited By

Index Terms

Integrated Artificial Intelligence for Making Digital Human
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Integrated Artificial Intelligence for Making Digital Human II
ICMLC '24: Proceedings of the 2024 16th International Conference on Machine Learning and Computing

This paper presents the second report on making an integrated AI. Regarding visual processing, YOLOv8 and image2-text are employed to recognize surrounding visional scenes, which output meaningful text to the AI’s text processing unit. In text processing,...
Interactive Human Centered Artificial Intelligence: A Definition and Research Challenges
AVI '20: Proceedings of the 2020 International Conference on Advanced Visual Interfaces

Artificial Intelligence (AI) has become the buzzword of the last decade. Advances so far have been largely technical with a focus on machine learning (ML). Only recently have we begun seeing a shift towards focusing on the human aspects of artificial ...
Artificial intelligence
Abstract
Artificial intelligence (AI) is the Science and Engineering domain concerned with the theory and practice of developing systems that exhibit the characteristics we associate with intelligence in human behavior. Starting with a brief history of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

February 2023

619 pages

ISBN:9781450398411

DOI:10.1145/3587716

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLC 2023

ICMLC 2023: 2023 15th International Conference on Machine Learning and Computing

February 17 - 20, 2023

Zhuhai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
64
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents