skip to main content
10.1145/3587716.3587760acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Integrated Artificial Intelligence for Making Digital Human

Published: 07 September 2023 Publication History

Abstract

Artificial intelligence is actively researched in various fields such as image recognition, image detection, audio recognition, natural language processing, face expression recognition, and facial expression generation. If we want to create artificial intelligence in the original sense, it will be necessary to integrate these many research results and create a system that can exactly imitate the functions of the human brain. Commercially, the current situation is that integrated AI such as Ameria [2], Uneeq [3], Neon [13], LaMDA [29] and the system using GPT-3 [9] have entered the market. However, there is no research that creates integrated AI with open source in the academic field. This work is an attempt to construct such an integrated AI as an academic research which is in an form of open source. Furthermore, this work is described in a form of multi-processing job with socket connection. Then, execution of the program can be accomplished by multiple computers. For the visual input, object detection is performed by Redman’s YOLO [14]. Next, the system accomplishes Image2text which generates sentences describing the image [34]. The system recognizes the meaning of visual input. As for speech recognition, the question and answering task is activated, and it is possible to give an accurate answer to the question through the microphone [7]. In addition, text generation enables this system to respond to human chattering [5]. This work combines four different sources: visual, text, audio, and scraping outworld news sources. We believe that attempts like this work will become more common in future AI studies.

References

[1]
Raed Almalki Abdulwahhab Alshammari and Riyad Alshammari. 2021. Developing a Predictive Model of Predicting Appointment No-Show by Using Machine Learning Algorithms. Journal of Advances in Information Technology 12, 3 (2021), 234–239.
[2]
[2] Ameria Conversational AI. 2022. http//amelia.ai/conversationao-ai.
[3]
[3] UNEEQ Ameria Conversational AI. 2022. http//amelia.ai/conversationao-ai.
[4]
[4] Japanese Bert. 2021. https://github.com/cl-tohoku/bert-japanese/tree/v1.0.
[5]
Sid Black 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode 5 – Workshop on Challenges and Perspectives in Creating Large Language Models, 95–136.
[6]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. CoRR abs/2004.10934 (2020). arXiv:2004.10934https://arxiv.org/abs/2004.10934
[7]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1870–1879. https://doi.org/10.18653/v1/P17-1171
[8]
Félix Mikaelian. et. al. 2021. cdQA: Closed Domain Question Answering.
[9]
Tom B. Brown et al.2020. Language models are few-shot learners.NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing SystemsNo.159, 1877–1901. https://beta.openai.com/docs/models/gpt-3.
[10]
[10] github. 2022(Dec). https://github.com.
[11]
Andrew G. Howard 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861http://arxiv.org/abs/1704.04861
[12]
Elie Maamary Hussein Mozannar, Karl El Hajal and Hazem Hajj. 2019. Neural Arabic Question Answering. Proceedings of the Fourth Arabic Natural Language Processing Workshop, Florence Italy, 108–118.
[13]
[13] NEON is a computationally created virtual being that looks and behaves like us. 2019. http//neonlive.aio-ai.
[14]
R. Girshick J. Redmon, S. Divvala and A. Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.
[15]
Kenton Lee Jacob Devlin, Ming-Wei Chang and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. roceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1, 4171–4186.
[16]
[16] G Jocher 2022. https://github.com/ultralytics/yolov5/discussions/8996.
[17]
Boran Sekeroglu Kubra Tuncal and Cagri Ozkan. 2020. Lung Cancer Incidence Prediction Using Machine Learning Algorithms. Journal of Advances in Information Technology 11, 2 (2020), 91–96.
[18]
Praveen Paritosh Tim Sturge Kurt Bollacker, Colin Evans and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1247–1250.
[19]
Wuttipong Kusonkhum, Korb Srinavin, Narong Leungbootnak, Preenithi Aksorn, and Tanayut Chaitongrat. 2022. Government Construction Project Budget Prediction Using Machine Learning. Journal of Advances in Information Technology 13 (01 2022). https://doi.org/10.12720/jait.13.1.29-35
[20]
Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, and Xiaolin Wei. 2022. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arxiv:2209.02976 [cs.CV]
[21]
Gary Marcus, Ernest Davis, and Scott Aaronson. 2022. A very preliminary analysis of DALL-E 2. arxiv:2204.13807 [cs.CV]
[22]
[22] Japanese Pretained Model. 2022. https://github.com/rinnakk/japanese-pretrained-models.
[23]
[23] Nvidia. 2018. https://github.com/NVlabs/stylegan.
[24]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]
[25]
Optical Character Recognition. 2023. https://ocr2edit.com.
[26]
Optical Character Recognition. 2023. https://ocr.best.
[27]
Online Optical Character Recognition. 2023. https://onlineocr.net.
[28]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arxiv:2205.11487 [cs.CV]
[29]
Romal Thoppilan 2022. LaMDA: Language Models for Dialog Applications. arxiv:2201.08239 [cs.CL]
[30]
Tensorflow tutorials in tensorflow.org. 2020. Image captioning with visual attention.
[31]
[31] Ultralytics. 2020. https://github.com/ultralytics/yolov5.
[32]
[32] Yoshiyuki Usami. 2022. http//github/usami0jp/ai.
[33]
Yoshiyuki Usami. 2022. Making Integrated AI Having Abilities of Hearing, Looking, and Answering. Amazon.com.
[34]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and Tell: A Neural Image Caption Generator. CoRR abs/1411.4555 (2014). arXiv:1411.4555.http://arxiv.org/abs/1411.4555
[35]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2021. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13029–13038.
[36]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2022. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arxiv:2207.02696 [cs.CV]
[37]
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. 2022. GIT: A Generative Image-to-text Transformer for Vision and Language. arxiv:2205.14100 [cs.CV]
[38]
Zhilin Yang 2019. XLNet: generalized autoregressive pretraining for language understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems, 5753–5763.
[39]
Yang Zhou, Dingzeyu Li, Xintong Han, Evangelos Kalogerakis, Eli Shechtman, and Jose Echevarria. 2020. MakeItTalk: Speaker-Aware Talking Head Animation. CoRR abs/2004.12992 (2020). arXiv:2004.12992.https://arxiv.org/abs/2004.12992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing
February 2023
619 pages
ISBN:9781450398411
DOI:10.1145/3587716
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mage2text
  2. question and answering
  3. text generation
  4. visual object detection
  5. visual object recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media