Skip to main content

BiGBERT: Classifying Educational Web Resources for Kindergarten-12\(^{th}\) Grades

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12657))

Abstract

In this paper, we present BiGBERT, a deep learning model that simultaneously examines URLs and snippets from web resources to determine their alignment with children’s educational standards. Preliminary results inferred from ablation studies and comparison with baselines and state-of-the-art counterparts, reveal that leveraging domain knowledge to learn domain-aligned contextual nuances from limited input data leads to improved identification of educational web resources.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For fine-tuning we use 2,655 text passages from NGCS, CCSS, and ICS along with 2,725 from the Brown corpus [5, 12].

  2. 2.

    Due to Terms of Use for Alexa Top Sites, we are unable to share this dataset.

  3. 3.

    We explored SVM as an additional baseline, which performed similarly to BoW and is excluded for brevity.

References

  1. Abdessamed, O., Zakaria, E.: Web site classification based on URL and content: algerian vs. non-algerian case. In: Proceedings of the 12th International Symposium on Programming and Systems (ISPS), pp. 1–8. IEEE (2015)

    Google Scholar 

  2. Amazon, I.: Alexa top sites (2020). https://www.alexa.com/topsites/category. Accessed 17 Sept 2020

  3. Anuyah, O., Azpiazu, I.M., Pera, M.S.: Using structured knowledge and traditional word embeddings to generate concept representations in the educational domain. In: Companion Proceedings of the World Wide Web Conference, pp. 274–282 (2019)

    Google Scholar 

  4. Bell, C., Bell, M.: Infotopia (2020). https://wwww.infotopia.info. Accessed 17 Aug 2020

  5. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Newton (2009)

    MATH  Google Scholar 

  6. Chen, W., Cai, F., Chen, H., De Rijke, M.: Personalized query suggestion diversification in information retrieval. Front. Comput. Sci. 14(3), 1–14 (2019). https://doi.org/10.1007/s11704-018-7283-x

    Article  Google Scholar 

  7. Clavié, B., Gal, K.: Edubert: pretrained deep language models for learning analytics. arXiv preprint arXiv:1912.00690 (2019)

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Eickhoff, C., Serdyukov, P., de Vries, A.P.: Web page classification on child suitability. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1425–1428 (2010)

    Google Scholar 

  10. Ekstrand, M.D., Wright, K.L., Pera, M.S.: Enhancing classroom instruction with online news. Aslib J. Inf. Manag. 72(5), 725–744 (2020)

    Article  Google Scholar 

  11. Elnaggar, A., Gebendorfer, C., Glaser, I., Matthes, F.: Multi-task deep learning for legal document translation, summarization and multi-label classification. In: Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference, pp. 9–15 (2018)

    Google Scholar 

  12. Francis, W.N., Kucera, H.: Brown corpus manual. Lett. Editor 5(2), 7 (1979)

    Google Scholar 

  13. Garbe, W.: Symspell (2020). https://github.com/wolfgarbe/SymSpell

  14. Geraci, F., Papini, T.: Approximating multi-class text classification via automatic generation of training examples. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10762, pp. 585–601. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77116-8_44

    Chapter  Google Scholar 

  15. Hashemi, M.: Web page classification: a survey of perspectives, gaps, and future directions. Multimedia Tools Appl. 79, 11921–11945 (2020)

    Google Scholar 

  16. Hassan, S., Mihalcea, R.: Learning to identify educational materials. ACM Trans. Speech Lang. Process. (TSLP) 8(2), 1–18 (2008)

    Google Scholar 

  17. Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R.: Current challenges for studying search as learning processes. In: Proceedings of Learning and Education with Web Data (2018)

    Google Scholar 

  18. Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inf. 235, 246–50 (2017)

    Google Scholar 

  19. Initiative, CCSSO: Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects (2020). http://www.corestandards.org/wp-content/uploads/ELA_Standards1.pdf

  20. Kastrati, Z., Imran, A.S., Yayilgan, S.Y.: The impact of deep learning on document classification using semantically rich representations. Inf. Process. Manag. 56(5), 1618–1632 (2019)

    Article  Google Scholar 

  21. Liu, G., Guo, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019)

    Article  Google Scholar 

  22. Nimmagadda, S.L., Zhu, D., Rudra, A.: Knowledge base smarter articulations for the open directory project in a sustainable digital ecosystem. In: Companion Proceedings of the International Conference on World Wide Web, pp. 1537–1545 (2017)

    Google Scholar 

  23. Nowak, S., Rüger, S.: How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp. 557–566 (2010)

    Google Scholar 

  24. Rajalakshmi, R., Aravindan, C.: A Naive Bayes approach for URL classification with supervised feature selection and rejection framework. Comput. Intell. 34(1), 363–396 (2018)

    Article  MathSciNet  Google Scholar 

  25. Rajalakshmi, R., Tiwari, H., Patel, J., Kumar, A., Karthik, R.: Design of kids-specific URL classifier using recurrent convolutional neural network. Procedia Comput. Sci. 167, 2124–2131 (2020)

    Article  Google Scholar 

  26. Rajalakshmi, R., Tiwari, H., Patel, J., Rameshkannan, R., Karthik, R.: Bidirectional GRU-based attention model for kid-specific URL classification. In: Deep Learning Techniques and Optimization Strategies in Big Data Analytics, pp. 78–90. IGI Global (2020)

    Google Scholar 

  27. Shen, D., et al.: Web-page classification through summarization. In: Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 242–249 (2004)

    Google Scholar 

  28. Sreenivasulu, T., Jayakarthik, R., Shobarani, R.: Web content classification techniques based on fuzzy ontology. In: Peng, S.-L., Son, L.H., Suseendran, G., Balaganesh, D. (eds.) Intelligent Computing and Innovation on Data Science. LNNS, vol. 118, pp. 189–197. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3284-9_22

    Chapter  Google Scholar 

  29. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16

    Chapter  Google Scholar 

  30. Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

  31. Usta, A., Altingovde, I.S., Vidinli, I.B., Ozcan, R., Ulusoy, Ö.: How k-12 students search for learning? Analysis of an educational search engine log. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1151–1154 (2014)

    Google Scholar 

  32. Xia, T.: Support vector machine based educational resources classification. Int. J. Inf. Educ. Technol. 6(11), 880 (2016)

    Google Scholar 

  33. Yigit-Sert, S., Altingovde, I.S., Macdonald, C., Ounis, I., Ulusoy, Ö.: Explicit diversification of search results across multiple dimensions for educational search. J. Assoc. Inf. Sci. Technol. (2020). https://doi.org/10.1002/asi.24403

  34. Yilmaz, T., Ozcan, R., Altingovde, I.S., Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification. Inf. Process. Manag. 56(1), 228–246 (2019)

    Article  Google Scholar 

  35. Yu, S., Su, J., Luo, D.: Improving BERT-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7, 176600–176612 (2019)

    Article  Google Scholar 

  36. Zhao, W., Zhang, G., Yuan, G., Liu, J., Shan, H., Zhang, S.: The study on the text classification for financial news based on partial information. IEEE Access 8, 100426–100437 (2020)

    Article  Google Scholar 

Download references

Acknowledgments

Work funded by NSF Award # 1763649. The authors would like to thank Dr. Ion Madrazo Azpiazu for his valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Garrett Allen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Allen, G. et al. (2021). BiGBERT: Classifying Educational Web Resources for Kindergarten-12\(^{th}\) Grades. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72240-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72239-5

  • Online ISBN: 978-3-030-72240-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics