BiGBERT: Classifying Educational Web Resources for Kindergarten-12 $$^{th}$$ Grades

Allen, Garrett; Downs, Brody; Shukla, Aprajita; Kennington, Casey; Fails, Jerry Alan; Wright, Katherine Landau; Pera, Maria Soledad

doi:10.1007/978-3-030-72240-1_13

BiGBERT: Classifying Educational Web Resources for Kindergarten-12$^{th}$ Grades

Garrett Allen¹⁴,
Brody Downs¹⁴,
Aprajita Shukla¹⁴,
Casey Kennington¹⁴,
Jerry Alan Fails¹⁴,
Katherine Landau Wright¹⁵ &
…
Maria Soledad Pera¹⁴

Conference paper
First Online: 30 March 2021

2358 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12657))

Abstract

In this paper, we present BiGBERT, a deep learning model that simultaneously examines URLs and snippets from web resources to determine their alignment with children’s educational standards. Preliminary results inferred from ablation studies and comparison with baselines and state-of-the-art counterparts, reveal that leveraging domain knowledge to learn domain-aligned contextual nuances from limited input data leads to improved identification of educational web resources.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For fine-tuning we use 2,655 text passages from NGCS, CCSS, and ICS along with 2,725 from the Brown corpus [5, 12].
2.
Due to Terms of Use for Alexa Top Sites, we are unable to share this dataset.
3.
We explored SVM as an additional baseline, which performed similarly to BoW and is excluded for brevity.

References

Abdessamed, O., Zakaria, E.: Web site classification based on URL and content: algerian vs. non-algerian case. In: Proceedings of the 12th International Symposium on Programming and Systems (ISPS), pp. 1–8. IEEE (2015)
Google Scholar
Amazon, I.: Alexa top sites (2020). https://www.alexa.com/topsites/category. Accessed 17 Sept 2020
Anuyah, O., Azpiazu, I.M., Pera, M.S.: Using structured knowledge and traditional word embeddings to generate concept representations in the educational domain. In: Companion Proceedings of the World Wide Web Conference, pp. 274–282 (2019)
Google Scholar
Bell, C., Bell, M.: Infotopia (2020). https://wwww.infotopia.info. Accessed 17 Aug 2020
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Newton (2009)
MATH Google Scholar
Chen, W., Cai, F., Chen, H., De Rijke, M.: Personalized query suggestion diversification in information retrieval. Front. Comput. Sci. 14(3), 1–14 (2019). https://doi.org/10.1007/s11704-018-7283-x
Article Google Scholar
Clavié, B., Gal, K.: Edubert: pretrained deep language models for learning analytics. arXiv preprint arXiv:1912.00690 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Eickhoff, C., Serdyukov, P., de Vries, A.P.: Web page classification on child suitability. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1425–1428 (2010)
Google Scholar
Ekstrand, M.D., Wright, K.L., Pera, M.S.: Enhancing classroom instruction with online news. Aslib J. Inf. Manag. 72(5), 725–744 (2020)
Article Google Scholar
Elnaggar, A., Gebendorfer, C., Glaser, I., Matthes, F.: Multi-task deep learning for legal document translation, summarization and multi-label classification. In: Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference, pp. 9–15 (2018)
Google Scholar
Francis, W.N., Kucera, H.: Brown corpus manual. Lett. Editor 5(2), 7 (1979)
Google Scholar
Garbe, W.: Symspell (2020). https://github.com/wolfgarbe/SymSpell
Geraci, F., Papini, T.: Approximating multi-class text classification via automatic generation of training examples. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10762, pp. 585–601. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77116-8_44
Chapter Google Scholar
Hashemi, M.: Web page classification: a survey of perspectives, gaps, and future directions. Multimedia Tools Appl. 79, 11921–11945 (2020)
Google Scholar
Hassan, S., Mihalcea, R.: Learning to identify educational materials. ACM Trans. Speech Lang. Process. (TSLP) 8(2), 1–18 (2008)
Google Scholar
Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R.: Current challenges for studying search as learning processes. In: Proceedings of Learning and Education with Web Data (2018)
Google Scholar
Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inf. 235, 246–50 (2017)
Google Scholar
Initiative, CCSSO: Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects (2020). http://www.corestandards.org/wp-content/uploads/ELA_Standards1.pdf
Kastrati, Z., Imran, A.S., Yayilgan, S.Y.: The impact of deep learning on document classification using semantically rich representations. Inf. Process. Manag. 56(5), 1618–1632 (2019)
Article Google Scholar
Liu, G., Guo, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019)
Article Google Scholar
Nimmagadda, S.L., Zhu, D., Rudra, A.: Knowledge base smarter articulations for the open directory project in a sustainable digital ecosystem. In: Companion Proceedings of the International Conference on World Wide Web, pp. 1537–1545 (2017)
Google Scholar
Nowak, S., Rüger, S.: How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp. 557–566 (2010)
Google Scholar
Rajalakshmi, R., Aravindan, C.: A Naive Bayes approach for URL classification with supervised feature selection and rejection framework. Comput. Intell. 34(1), 363–396 (2018)
Article MathSciNet Google Scholar
Rajalakshmi, R., Tiwari, H., Patel, J., Kumar, A., Karthik, R.: Design of kids-specific URL classifier using recurrent convolutional neural network. Procedia Comput. Sci. 167, 2124–2131 (2020)
Article Google Scholar
Rajalakshmi, R., Tiwari, H., Patel, J., Rameshkannan, R., Karthik, R.: Bidirectional GRU-based attention model for kid-specific URL classification. In: Deep Learning Techniques and Optimization Strategies in Big Data Analytics, pp. 78–90. IGI Global (2020)
Google Scholar
Shen, D., et al.: Web-page classification through summarization. In: Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 242–249 (2004)
Google Scholar
Sreenivasulu, T., Jayakarthik, R., Shobarani, R.: Web content classification techniques based on fuzzy ontology. In: Peng, S.-L., Son, L.H., Suseendran, G., Balaganesh, D. (eds.) Intelligent Computing and Innovation on Data Science. LNNS, vol. 118, pp. 189–197. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3284-9_22
Chapter Google Scholar
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Usta, A., Altingovde, I.S., Vidinli, I.B., Ozcan, R., Ulusoy, Ö.: How k-12 students search for learning? Analysis of an educational search engine log. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1151–1154 (2014)
Google Scholar
Xia, T.: Support vector machine based educational resources classification. Int. J. Inf. Educ. Technol. 6(11), 880 (2016)
Google Scholar
Yigit-Sert, S., Altingovde, I.S., Macdonald, C., Ounis, I., Ulusoy, Ö.: Explicit diversification of search results across multiple dimensions for educational search. J. Assoc. Inf. Sci. Technol. (2020). https://doi.org/10.1002/asi.24403
Yilmaz, T., Ozcan, R., Altingovde, I.S., Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification. Inf. Process. Manag. 56(1), 228–246 (2019)
Article Google Scholar
Yu, S., Su, J., Luo, D.: Improving BERT-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7, 176600–176612 (2019)
Article Google Scholar
Zhao, W., Zhang, G., Yuan, G., Liu, J., Shan, H., Zhang, S.: The study on the text classification for financial news based on partial information. IEEE Access 8, 100426–100437 (2020)
Article Google Scholar

Download references

Acknowledgments

Work funded by NSF Award # 1763649. The authors would like to thank Dr. Ion Madrazo Azpiazu for his valuable feedback.

Author information

Authors and Affiliations

Department of Computer Science, Boise State University, Boise, ID, USA
Garrett Allen, Brody Downs, Aprajita Shukla, Casey Kennington, Jerry Alan Fails & Maria Soledad Pera
Department of Literacy, Language and Culture, Boise State University, Boise, ID, USA
Katherine Landau Wright

Authors

Garrett Allen
View author publications
You can also search for this author in PubMed Google Scholar
Brody Downs
View author publications
You can also search for this author in PubMed Google Scholar
Aprajita Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Casey Kennington
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Alan Fails
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Landau Wright
View author publications
You can also search for this author in PubMed Google Scholar
Maria Soledad Pera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Garrett Allen .

Editor information

Editors and Affiliations

Radboud University Nijmegen, Nijmegen, The Netherlands
Djoerd Hiemstra
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Toulouse, Toulouse Institute of Computer Science Research, Toulouse, France
Josiane Mothe
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Raffaele Perego
Leipzig University, Leipzig, Germany
Martin Potthast
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Allen, G. et al. (2021). BiGBERT: Classifying Educational Web Resources for Kindergarten-12$^{th}$ Grades. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-72240-1_13
Published: 30 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72239-5
Online ISBN: 978-3-030-72240-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BiGBERT: Classifying Educational Web Resources for Kindergarten-12\(^{th}\) Grades

Abstract

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation