ABSTRACT
With the development of the Internet and news media, numerous information releasing websites are emerging one after another, and the importance of website content supervision is also increasing day by day. Due to the convenience of the Internet, the speed of public opinion dissemination and fermentation is extremely fast. If only relying on manual monitoring, it is difficult to detect problems at the first time. Most of the existing sensitive word detection solutions probe sensitive words in the website when the content is published, but the sensitive words will be changing with current events, and the detection at the time of content publishing can no longer meet the needs of public opinion monitoring. In order to improve the efficiency of sensitive word detection, the advantages of two deterministic finite automata (DFA) algorithms (AC and DAT) are combined in this paper, and an improved multi-pattern string matching algorithm (IQAT, Improved Quad-Array Trie) is presented. The empirical results demonstrate that the proposed algorithm substantially improves the detection performance and memory consumption over existing methods.
- Jongouk Choi, Chi Shen, Hannemann J., Bhattacharyya S. 2017. Real-time server overloaded monitoring algorithm using back propagation artificial neural network. In Proceedings of the IEEE 7th Annual Computing and Communication Workshop and Conference. IEEE, Las Vegas, NV, USA. https://doi.org/10.1109/CCWC.2017.7868359.Google ScholarCross Ref
- Qamar S, Mujtaba H, Majeed H, 2021. Relationship identification between conversational agents using emotion analysis. Cognitive Computation, 13(3), 673–687. https://doi.org/10.1007/s12559-020-09806-5.Google ScholarCross Ref
- Sartini, Subiyanto, Alim M F. 2021. A sentiment analysis study for twitter using the various model of convolutional neural network. Journal of Physics: Conference Series, 1918(4): 042136 (6pp). https://doi.org/10.1088/1742-6596/1918/4/042136.Google ScholarCross Ref
- Collier J R, Johanna D, Jomini S N. 2021. Pathways to Deeper News Engagement: Factors Influencing Click Behaviors on News Sites. Journal of Computer-Mediated Communication, 26(5), 265–283. https://doi.org/10.1093/jcmc/zmab009.Google ScholarCross Ref
- Ramy Hanafy, Soha Makady, Abeer Eikorany. 2018. A social trust metric for scholarly reputation mining. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM, Barcelona, Spain, 61-68. https://doi.org/10.1109/ASONAM.2018.8508251.Google ScholarCross Ref
- Li Shouhao, Ding Ligui. 2021. The realization of public sentiment and economic fluctuations: the perspective of financial news. Journal of Guizhou University of Finance and Economics, (05), 30-41. https://doi.org/10.3969/j.issn.1003-6636.2021.05.004.Google ScholarCross Ref
- Qiu Lier, Zhang Jing, Wang Yuzhou. 2021. Research on public policy evaluation based on internet public opinion big data. Leadership Science, (8), 4. https://doi.org/10.3969/j.issn.1003-2606.2021.08.034.Google ScholarCross Ref
- Wang Benyu, Gu Yijun, Peng Shufan, 2022. Intelligent mobile terminal secret-involved information monitoring system. Science Technology and Engineering, 22(6): 2317-2325. https://doi.org/10.3969/j.issn.1671-1815.2022.06.022.Google ScholarCross Ref
- Dominika Regeciova, Dusan Kolar, Marek Milkovic. 2021. Pattern Matching in YARA: Improved Aho-Corasick Algorithm. IEEE Access, v9, 62857-62866. https://doi.org/10.1109/ACCESS.2021.3074801.Google ScholarCross Ref
- Chen C C, Wang S D. 2013. An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm. ACM Transactions on Architecture and Code Optimization (TACO), 10(4), 1-22. https://doi.org/10.1145/2541228.2541232.Google ScholarDigital Library
- Dong Mei, Chang Zhijun, Zhang Runjie. 2021. A multiple pattern matching algorithm for specifications of incremental metadata for sci-tech literature.Data Analysis and Knowledge Discovery, 5(6): 135-144. https://doi.org/10.11925/infotech.2096-3467.2020.1006.Google ScholarCross Ref
- Uday Trivedi. 2020. An Optimized Aho-Corasick Multi-Pattern Matching Algorithm for Fast Pattern Matching. In Proceedings of the IEEE 17th India Council International Conference, INDICON, New Delhi, India. https://doi.org/10.1109/INDICON49873.2020.9342041.Google ScholarCross Ref
- Aoe Jun-ichi. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering,15, 9 (Sep 1989), 1066-1077. https://doi.org/10.1109/32.31365.Google ScholarDigital Library
- Xu C, Zhang F, Du Z, 2014. A multi-level address-matching algorithm based on Hash function and double-array Trie-tree. Journal of Zhejiang University (Science Edition), 41(2): 217-222. https://doi.org/10.3785/j.issn.1008-9497.2014.02.018.Google ScholarCross Ref
- Kanda S, Morita K, Fuketa M. 2016. Compressed double-array tries for string dictionaries supporting fast lookup. Knowledge & Information Systems, 51, 3,1023-1042. https://doi.org/10.1007/s10115-016-0999-8.Google ScholarDigital Library
- Kamil Sert, Cuneyt F Bazlamacci. 2021. NFA based regular expression matching on FPGA. In Proceedings of the International Conference on Computer, Information, and Telecommunication Systems, CITS, Istanbul, Turkey. https://doi.org/ 10.1109/CITS52676.2021.9618426.Google ScholarCross Ref
- Chris Keeler, Kai Salomaa. 2022. Structural properties of NFAs and growth rates of nondeterminism measures. Information and Computation, 284(3), 284-297. https://doi.org/10.1016/j.ic.2021.104690.Google ScholarDigital Library
- Majed AbuSafiya. 2021. Automata-based algorithm for multiple word matching. International Journal of Advanced Computer Science and Applications (IJACSA), 12, 3: 54-65. https://doi.org/10.14569/IJACSA.2021.0120358.Google ScholarCross Ref
- LIU Lixia, ZHANG Zhiqian. 2013. Similar string search algorithm based on Trie tree. Journal of Computer Applications, 33, 8: 2375-2378. https://doi.org/10.11772/j.issn.1001-9081.2013.08.2375.Google ScholarCross Ref
- Wu, S., Manber, U. 1995. A fast algorithm for multi-pattern searching. Technical Report No. TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ.Google Scholar
- Degtyarev S V, Titenko E A. 2018. Approximate search in the sample on the basis manber-wu method. Journal of Fundamental and Applied Sciences, 9, 2S: 914-918. https://doi.org/10.4314/jfas.v9i2s.67.Google ScholarCross Ref
Index Terms
- An Improved Quad-Array Trie Algorithm for Website Sensitive Word Detection
Recommendations
An Improved Algorithm of Individuation K-Anonymity for Multiple Sensitive Attributes
At present, most of privacy preserving approaches in data publishing are applied to single sensitive attribute. However, applying single-sensitive-attribute privacy preserving techniques directly into data with multiple sensitive attributes often causes ...
Personalized sensitive attribute anonymity based on P - sensitive k anonymity
ICIIP '16: Proceedings of the 1st International Conference on Intelligent Information ProcessingWith the development of science and technology, privacy protection has also been highly valued. Existing anonymity algorithms are only anonymous quasi-identifier to achieve privacy protection, but ignore the sensitive properties of the personalized ...
The impact of disposition to privacy, website reputation and website familiarity on information privacy concerns
This study examines the impact of disposition to privacy, perceived reputation of a website, and personal familiarity with the website on a person's privacy concerns about the website. It also analyzes the key attributes of disposition to privacy and ...
Comments