skip to main content
10.1145/3594315.3594361acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

An Improved Quad-Array Trie Algorithm for Website Sensitive Word Detection

Published:02 August 2023Publication History

ABSTRACT

With the development of the Internet and news media, numerous information releasing websites are emerging one after another, and the importance of website content supervision is also increasing day by day. Due to the convenience of the Internet, the speed of public opinion dissemination and fermentation is extremely fast. If only relying on manual monitoring, it is difficult to detect problems at the first time. Most of the existing sensitive word detection solutions probe sensitive words in the website when the content is published, but the sensitive words will be changing with current events, and the detection at the time of content publishing can no longer meet the needs of public opinion monitoring. In order to improve the efficiency of sensitive word detection, the advantages of two deterministic finite automata (DFA) algorithms (AC and DAT) are combined in this paper, and an improved multi-pattern string matching algorithm (IQAT, Improved Quad-Array Trie) is presented. The empirical results demonstrate that the proposed algorithm substantially improves the detection performance and memory consumption over existing methods.

References

  1. Jongouk Choi, Chi Shen, Hannemann J., Bhattacharyya S. 2017. Real-time server overloaded monitoring algorithm using back propagation artificial neural network. In Proceedings of the IEEE 7th Annual Computing and Communication Workshop and Conference. IEEE, Las Vegas, NV, USA. https://doi.org/10.1109/CCWC.2017.7868359.Google ScholarGoogle ScholarCross RefCross Ref
  2. Qamar S, Mujtaba H, Majeed H, 2021. Relationship identification between conversational agents using emotion analysis. Cognitive Computation, 13(3), 673–687. https://doi.org/10.1007/s12559-020-09806-5.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sartini, Subiyanto, Alim M F. 2021. A sentiment analysis study for twitter using the various model of convolutional neural network. Journal of Physics: Conference Series, 1918(4): 042136 (6pp). https://doi.org/10.1088/1742-6596/1918/4/042136.Google ScholarGoogle ScholarCross RefCross Ref
  4. Collier J R, Johanna D, Jomini S N. 2021. Pathways to Deeper News Engagement: Factors Influencing Click Behaviors on News Sites. Journal of Computer-Mediated Communication, 26(5), 265–283. https://doi.org/10.1093/jcmc/zmab009.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ramy Hanafy, Soha Makady, Abeer Eikorany. 2018. A social trust metric for scholarly reputation mining. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM, Barcelona, Spain, 61-68. https://doi.org/10.1109/ASONAM.2018.8508251.Google ScholarGoogle ScholarCross RefCross Ref
  6. Li Shouhao, Ding Ligui. 2021. The realization of public sentiment and economic fluctuations: the perspective of financial news. Journal of Guizhou University of Finance and Economics, (05), 30-41. https://doi.org/10.3969/j.issn.1003-6636.2021.05.004.Google ScholarGoogle ScholarCross RefCross Ref
  7. Qiu Lier, Zhang Jing, Wang Yuzhou. 2021. Research on public policy evaluation based on internet public opinion big data. Leadership Science, (8), 4. https://doi.org/10.3969/j.issn.1003-2606.2021.08.034.Google ScholarGoogle ScholarCross RefCross Ref
  8. Wang Benyu, Gu Yijun, Peng Shufan, 2022. Intelligent mobile terminal secret-involved information monitoring system. Science Technology and Engineering, 22(6): 2317-2325. https://doi.org/10.3969/j.issn.1671-1815.2022.06.022.Google ScholarGoogle ScholarCross RefCross Ref
  9. Dominika Regeciova, Dusan Kolar, Marek Milkovic. 2021. Pattern Matching in YARA: Improved Aho-Corasick Algorithm. IEEE Access, v9, 62857-62866. https://doi.org/10.1109/ACCESS.2021.3074801.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chen C C, Wang S D. 2013. An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm. ACM Transactions on Architecture and Code Optimization (TACO), 10(4), 1-22. https://doi.org/10.1145/2541228.2541232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dong Mei, Chang Zhijun, Zhang Runjie. 2021. A multiple pattern matching algorithm for specifications of incremental metadata for sci-tech literature.Data Analysis and Knowledge Discovery, 5(6): 135-144. https://doi.org/10.11925/infotech.2096-3467.2020.1006.Google ScholarGoogle ScholarCross RefCross Ref
  12. Uday Trivedi. 2020. An Optimized Aho-Corasick Multi-Pattern Matching Algorithm for Fast Pattern Matching. In Proceedings of the IEEE 17th India Council International Conference, INDICON, New Delhi, India. https://doi.org/10.1109/INDICON49873.2020.9342041.Google ScholarGoogle ScholarCross RefCross Ref
  13. Aoe Jun-ichi. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering,15, 9 (Sep 1989), 1066-1077. https://doi.org/10.1109/32.31365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xu C, Zhang F, Du Z, 2014. A multi-level address-matching algorithm based on Hash function and double-array Trie-tree. Journal of Zhejiang University (Science Edition), 41(2): 217-222. https://doi.org/10.3785/j.issn.1008-9497.2014.02.018.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kanda S, Morita K, Fuketa M. 2016. Compressed double-array tries for string dictionaries supporting fast lookup. Knowledge & Information Systems, 51, 3,1023-1042. https://doi.org/10.1007/s10115-016-0999-8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kamil Sert, Cuneyt F Bazlamacci. 2021. NFA based regular expression matching on FPGA. In Proceedings of the International Conference on Computer, Information, and Telecommunication Systems, CITS, Istanbul, Turkey. https://doi.org/ 10.1109/CITS52676.2021.9618426.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chris Keeler, Kai Salomaa. 2022. Structural properties of NFAs and growth rates of nondeterminism measures. Information and Computation, 284(3), 284-297. https://doi.org/10.1016/j.ic.2021.104690.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Majed AbuSafiya. 2021. Automata-based algorithm for multiple word matching. International Journal of Advanced Computer Science and Applications (IJACSA), 12, 3: 54-65. https://doi.org/10.14569/IJACSA.2021.0120358.Google ScholarGoogle ScholarCross RefCross Ref
  19. LIU Lixia, ZHANG Zhiqian. 2013. Similar string search algorithm based on Trie tree. Journal of Computer Applications, 33, 8: 2375-2378. https://doi.org/10.11772/j.issn.1001-9081.2013.08.2375.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wu, S., Manber, U. 1995. A fast algorithm for multi-pattern searching. Technical Report No. TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ.Google ScholarGoogle Scholar
  21. Degtyarev S V, Titenko E A. 2018. Approximate search in the sample on the basis manber-wu method. Journal of Fundamental and Applied Sciences, 9, 2S: 914-918. https://doi.org/10.4314/jfas.v9i2s.67.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Improved Quad-Array Trie Algorithm for Website Sensitive Word Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence
          March 2023
          824 pages
          ISBN:9781450399029
          DOI:10.1145/3594315

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 August 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)27
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format