skip to main content
10.1145/3617733.3617753acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccmConference Proceedingsconference-collections
research-article

Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity

Published: 31 October 2023 Publication History

Abstract

Advancements in cloud computing and distributed computing have fostered research activities in Computer science. As a result, researchers have made significant progress in Neural Networks, Evolutionary Computing Algorithms like Genetic, and Differential evolution algorithms. These algorithms are used to develop clustering, recommendation, and question-and-answering systems using various text representation and similarity measurement techniques. In this research paper, Universal Sentence Encoder (USE) is used to capture the semantic similarity of text; And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents based on user query. The proposed approach is applied to the Stanford Question and Answer (SQuAD) Dataset to identify a user query. Finally, through experiments, we prove that text documents can be efficiently represented as sentence embedding vectors using USE to capture the semantic similarity, and by comparing the results of the Manhattan Distance, GA, and DE algorithms we prove that the evolutionary algorithms are good at finding the top N results than the traditional ranking approach.

References

[1]
Jiapeng Wang, Yihon Dong. 2020. Measurement of Text Similarity: A Survey. Information, 11(9), 421. https://doi.org/10.3390/info11090421
[2]
Jiyeon Kim, Youngchang Kim, Hyesun Suh, Jongjin Jung. 2016. Diversity of Recommendation with Considering Data Similarity among Different Types of Contents. Vol. 7, No. 2, 76-80. https://doi.org/10.12720/jait.7.2.76-80
[3]
D. Meenakshi, A. R. Mohamed Shanavas. 2022. Novel Shared Input Based LSTM for Semantic Similarity Prediction. Journal of Advances in Information Technology, Vol. 13, No. 4, 387-392. https://doi.org/10.12720/jait.13.4.387-392
[4]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John. 2008. Universal Sentence Encoder.
[5]
Sirisha Velampalli, Chandrashekar Muniyappa, Ashutosh Saxena. 2022. Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End Transfer Learning Distributed and Explainable AI Models. Journal of Advances in Information Technology, vol. 13, no. 2, 167-172
[6]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, and Nicole Hamilton . 2005. Learning to rank using gradient descent. Proceedings of the 22nd international conference on Machine learning (ICML '05). Association for Computing Machinery, 89–96. https://doi.org/10.1145/1102351.1102363.
[7]
Danushka Bollegala, Nasimul Noman, Hitoshi Iba. 2011. RankDE: learning a ranking function for information retrieval using differential evolution, Proceedings of the 13th annual conference on Genetic and evolutionary computation (GECCO '11). Association for Computing Machinery, 1771–1778, https://doi.org/10.1145/2001576.2001814
[8]
Urszula Boryczka, Michal Balchanowski. 2020. Using Differential Evolution in order to create a personalized list of recommended items. in Procedia Computer Science, vol. 176, 1940-1949. https://doi.org/10.1016/j.procs.2020.09.233.
[9]
Rainer Storn, Kenneth Price. 1997. Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization, vol 11, Issue 4, 341–359. https://doi.org/10.1023/A:1008202821328
[10]
Yehuda Koren, Robert Bell, Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. in Computer, vol. 42, no. 8, 30-37. https://doi.org/10.1109/MC.2009.263
[11]
Poltak Sihombing, Abdullah Embong, Putra Sumari. 2005. Application of Genetic Algorithm to Determine A Document Similarity Level in IRS. The First Malaysian Software Engineering Conference.
[12]
Bushra Alhijawi, Yousef Kilani. 2016. Using genetic algorithms for measuring the similarity values between users in collaborative filtering recommender systems. IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 1-6. https://doi.org/10.1109/ICIS.2016.7550751
[13]
Alan Diaz-Manríquez, Ana Bertha Ríos-Alvarado, Jose Hugo Barrón-Zambrano, Tania Yukary Guerrero-Melendez, Juan Carlos Elizondo-Leal. 2018. An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy. in IEEE Access, vol. 6, 21552-21559. https://doi.org/10.1109/ACCESS.2018.2815992
[14]
Floyd-Warshall algorithm, https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm
[15]
Davies-Bouldin (DB) Index, https://en.wikipedia.org/wiki/Davies%E2%80%93Bouldin\_index
[16]
K.Nandhini, S.R.Balasundaram. 2014. Extracting easy to understand summary using differential evolution algorithm. in Swarm and Evolutionary Computation, vol. 16, 19-27. https://doi.org/10.1016/j.swevo.2013.12.004
[17]
Ben Jann. (2005). Making Regression Tables from Stored Estimates. The Stata Journal, 5(3), 288–308. https://doi.org/10.1177/1536867X0500500302
[18]
Cosine Similarity, https://en.wikipedia.org/wiki/Cosine\_similarity
[19]
R. J. Kuo, J. T. Chen. 2020. An Application of Differential Evolution Algorithm-based Restricted Boltzmann Machine to Recommendation Systems. Journal of Internet Technology, vol(21) 3, 701-712. https://doi.org/10.3966/160792642020052103008
[20]
Urzula Boryczka, Michal Balchanowski. 2021. Speed up Differential Evolution for ranking of items in recommendation systems. Procedia Computer Science, vol (192),2229-2238. https://doi.org/10.1016/j.procs.2021.08.236
[21]
D. Mustafi, G. Sahoo. 2019. A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering. Soft Comput 23, 6361–6378. https://doi.org/10.1007/s00500-018-3289-4
[22]
J. A. Hartigan and M. A. Wong. 1979. A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1, 100-108. https://doi.org/10.2307/2346830
[23]
Term-Frequency (TF) and Inverse-Document-Frequency (IDF) https://en.wikipedia.org/wiki/Tf\%E2\%80\%93idf
[24]
Natural Language Programming (NLP) text cleaning, https://towardsdatascience.com/cleaning-preprocessing-text-data-by-building-nlp-pipeline-853148add68a
[25]
Stanford Question Answering Dataset (SQuAD), https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
[26]
Cezary Z. Janikow, Zbigniew Michalewics, An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms.
[27]
A.S. Fraser. 1957. Simulation of Genetic Systems by Automatic Digital Computers. Australian Journal of Biological Sciences vol(10), 484-491. https://doi.org/10.1071/BI9570484
[28]
PyGAD Python Genetic Algorithm, https://pygad.readthedocs.io/en/latest/
[29]
TensorFlow Probability (TFP). https://www.tensorflow.org/probability/api\_docs/python/tfp/optimizer/differential\_evolution\_minimize
[30]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, and Hengshu Zhu. 2021. A Comprehensive Survey on Transfer Learning. in Proceedings of the IEEE, vol. 109, no. 1,43-76. https://doi.org/10.1109/JPROC.2020.3004555

Index Terms

  1. Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCCM '23: Proceedings of the 2023 11th International Conference on Computer and Communications Management
    August 2023
    284 pages
    ISBN:9798400707735
    DOI:10.1145/3617733
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. and sentence embeddings
    2. deep neural networks
    3. differential evolution
    4. evolutionary algorithms
    5. genetics
    6. ranking
    7. search
    8. semantic similarity
    9. transfer learning
    10. universal sentence encoder

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCCM 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 23
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media