skip to main content
10.1145/3512576.3512612acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicitConference Proceedingsconference-collections
research-article

Stop words detection using a long short term memory recurrent neural network

Authors Info & Claims
Published:11 April 2022Publication History

ABSTRACT

Natural language processing is a field of computer science that focuses on understanding and analyzing textual data in any given language. Analyzing textual data is very tedious and leads to erroneous results due to unnecessary and noisy data in the corpus. Stop words are considered noisy data which the English language has already predefined corpus of stop words. Stop words in other languages such as Cebuano and Filipino are not yet supported in many NLP API. In the Philippines, users use different languages to post on Facebook. In this study, a corpus of Facebook posts was utilized in automatically detecting a stop word. A neural network was created based on Bidirectional Long Short term memory (BiLSTM). Word2vec was used to provide word embedding and representation from the corpus. The experimental result shows 72% accuracy in using the model.

References

  1. Schuster, M., & Paliwal, K. K. (1997). Bidirectional Recurrent Neural Networks. EEE TRANSACTIONS ON SIGNAL PROCESSING, 45(11).Google ScholarGoogle Scholar
  2. Olah, C. (2015). Understanding LSTM Networks. http://colah.github.io/posts/2015-08-Understanding-L STMs/.Google ScholarGoogle Scholar
  3. Wilbur, W. J., & Sirotkin, K. (1992). The automatic identification of stop words. Journal of information science, 18(1), 45-55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Roman-Rangel, E., & Marchand-Maillet, S. (2014, November). Automatic removal of visual stop-words. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 1145-1148).Google ScholarGoogle Scholar
  5. Namly, D., Bouzoubaa, K., Tajmout, R., & Laadimi, A. (2019, October). On Arabic Stop-Words: A Comprehensive List and a Dedicated Morphological Analyzer. In International Conference on Arabic Language Processing (pp. 149- 163). Springer, Cham.Google ScholarGoogle Scholar
  6. Saif, H., Fernandez, M., & Alani, H. (2014, October). Automatic stopword generation using contextual semantics for sentiment analysis of Twitter. In CEUR Workshop Proceedings (Vol. 1272). 7Google ScholarGoogle Scholar
  7. Aquino, A.M., & Niguidula, J.D. (2017). Analysis and Evaluation o 22 f the Technique Applied in Word Representation Using Word 2 vec Algorithm.Google ScholarGoogle Scholar
  8. Al-Amin, M., Islam, M., & Uzzal, S.D. (2017). Sentiment analysis of Bengali comments with Word2Vec and sentiment information of words. 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), 186-190.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gorro, K., Ancheta J., Capao, K., Oco, N., Roxas, R., Sabellano, M., Nonnecke, B., Mohanty, S., Crittenden, C., & Goldberg, K. 2017. Qualitative data analysis of disaster risk reduction suggestions assisted by topic modeling and word2vec Retrieved from http://ieeexplore.ieee.org/document/8300601/Google ScholarGoogle Scholar
  10. Ancheta, J. R., Gorro, K. D., & Uy, M. A. D. 2020. # Walangpasok on Twitter: Natural language processing as a method for analyzing tweets on class suspensions in the Philippines. 12th International Conference on Knowledge and Smart Technology (KST) (pp. 103-108). IEEEGoogle ScholarGoogle ScholarCross RefCross Ref
  11. Capao, K., Gorro, K. D., Gorro, K. D., Sabellano, M. J., Militante, C. L. A. G., & Manalili, J. P. C. (2018, April). Aspect Analysis of Cebu Establishments' Online Reviews using k-means Clustering and word2vec. In 2018 3rd International Conference on Computer and Communication Systems (ICCCS) (pp. 61-66). IEEE.Google ScholarGoogle Scholar
  12. Gorro, K., Gorro, K., Ilano, A., Sebial, A., Ranolo, E., & Vale, E. Qualitative Technology Acceptance Evaluation of JIRA in Software Development Using Machine Learning.In 2019 International Journal of Advanced Engineering Vol .02, No. 02.Google ScholarGoogle Scholar
  13. Chollet, F. (2015). keras. GitHub repository.Google ScholarGoogle Scholar
  14. Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. CoRR, abs/1301.3781Google ScholarGoogle Scholar
  15. Gorro, K. D., Ali, M., Gorro, K. D., Ancheta, J. R., (2020, December) The 8th International Conference on Information Technology: IoT and Smart City, pp 69-73• https://doi.org/10.1145/3446999.3447012Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICIT '21: Proceedings of the 2021 9th International Conference on Information Technology: IoT and Smart City
    December 2021
    584 pages
    ISBN:9781450384971
    DOI:10.1145/3512576

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 11 April 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format