ABSTRACT
Computerized Adaptive Testing (CAT) arises as a promising personalized test mode in online education, targeting at revealing students' latent knowledge state by selecting test items adaptively. The item selection strategy is the core component of CAT, which searches for the best suitable test item based on students' current estimated ability at each test step. However, existing selection strategies behave in a brute-force manner, which results in the time complexity being linear to the number of items (N) in the item pool, i.e., O(N). Thus, in reality, the search latency becomes the bottleneck for CAT with a large-scale item pool. To this end, we propose a Search-Efficient Computerized Adaptive Testing framework (SECAT), which aims at enhancing CAT with an efficient selection strategy. Specifically, SECAT contains two main phases: item pool indexing and item search. In the item pool indexing phase, we apply a student-aware spatial partition method on the item pool to divide the test items into many sub-spaces, considering the adaptability of test items. In the item search phase, we optimize the traditional single-round search strategy with the asymptotic theory and propose a multi-round search strategy that can further improve the time efficiency. Compared with existing strategies, the time complexity of SECAT decreases from O(N) to O(logN). Across two real-world datasets, SECAT achieves over 200x speed up with negligible accuracy degradation.
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM, Vol. 18, 9 (1975), 509--517.Google ScholarDigital Library
- Haoyang Bi, Haiping Ma, Zhenya Huang, Yu Yin, Qi Liu, Enhong Chen, Yu Su, and Shijin Wang. 2020. Quality meets diversity: A model-agnostic framework for computerized adaptive testing. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 42--51.Google ScholarCross Ref
- Andrew P Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, Vol. 30, 7 (1997), 1145--1159.Google Scholar
- Hua-Hua Chang and Zhiliang Ying. 1996. A global information approach to computerized adaptive testing. Applied Psychological Measurement, Vol. 20, 3 (1996), 213--229.Google ScholarCross Ref
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.Google ScholarDigital Library
- Mohamad Dolatshah, Ali Hadian, and Behrouz Minaei-Bidgoli. 2015. Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv preprint arXiv:1511.00628 (2015).Google Scholar
- Susan E Embretson and Steven P Reise. 2013. Item response theory. Psychology Press.Google Scholar
- Xuhui Fan, Bin Li, and Scott Sisson. 2018. The binary space partitioning-tree process. In International Conference on Artificial Intelligence and Statistics. PMLR, 1859--1867.Google Scholar
- Weibo Gao, Qi Liu, Zhenya Huang, Yu Yin, Haoyang Bi, Mu-Chun Wang, Jianhui Ma, Shijin Wang, and Yu Su. 2021. Rcd: Relation map driven cognitive diagnosis for intelligent education systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 501--510.Google ScholarDigital Library
- Aritra Ghosh and Andrew Lan. 2021. Bobcat: Bilevel optimization-based computerized adaptive testing. arXiv preprint arXiv:2108.07386 (2021).Google Scholar
- Artem Grotov and Maarten De Rijke. 2016. Online learning to rank for information retrieval: Sigir 2016 tutorial. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 1215--1218.Google ScholarDigital Library
- Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2019. Unsupervised neural generative semantic hashing. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 735--744.Google ScholarDigital Library
- Johannes Hartig and Jana Höhler. 2009. Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, Vol. 35, 2--3 (2009), 57--63.Google ScholarCross Ref
- Hung-Yu Huang. 2018. Effects of item calibration errors on computerized adaptive testing under cognitive diagnosis models. Journal of Classification, Vol. 35, 3 (2018), 437--465.Google ScholarDigital Library
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333--2338.Google ScholarDigital Library
- Yueh-Min Huang, Yen-Ting Lin, and Shu-Chen Cheng. 2009. An adaptive testing system for supporting versatile educational assessment. Computers & Education, Vol. 52, 1 (2009), 53--67.Google ScholarDigital Library
- Won-Seok Hwang, Ho-Jong Lee, Sang-Wook Kim, Youngjoon Won, and Min-soo Lee. 2016. Efficient recommendation methods using category experts for a large dataset. Information Fusion, Vol. 28 (2016), 75--82.Google ScholarDigital Library
- Young Kyun Jang and Nam Ik Cho. 2020. Generalized product quantization network for semi-supervised image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3420--3429.Google ScholarCross Ref
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2010), 117--128.Google Scholar
- Yannis Kalantidis and Yannis Avrithis. 2014. Locally optimized product quantization for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2321--2328.Google ScholarDigital Library
- Seongju Kang, Chaeeun Jeong, and Kwangsue Chung. 2020. Tree-based real-time advertisement recommendation system in online broadcasting. IEEE Access, Vol. 8 (2020), 192693--192702.Google ScholarCross Ref
- Noam Koenigstein, Parikshit Ram, and Yuval Shavitt. 2012. Efficient retrieval of recommendations in a matrix factorization framework. In Proceedings of the 21st ACM international conference on Information and knowledge management. 535--544.Google ScholarDigital Library
- Jiatong Li, Fei Wang, Qi Liu, Mengxiao Zhu, Wei Huang, Zhenya Huang, Enhong Chen, Yu Su, and Shijin Wang. 2022. HierCDF: A Bayesian Network-based Hierarchical Cognitive Diagnosis Framework. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 904--913.Google ScholarDigital Library
- Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017).Google Scholar
- Defu Lian, Haoyu Wang, Zheng Liu, Jianxun Lian, Enhong Chen, and Xing Xie. 2020. Lightrec: A memory and search-efficient recommender system. In Proceedings of The Web Conference 2020. 695--705.Google ScholarDigital Library
- Frederic M Lord. 2012. Applications of item response theory to practical testing problems. Routledge.Google Scholar
- Xu Lu, Lei Zhu, Zhiyong Cheng, Liqiang Nie, and Huaxiang Zhang. 2019. Online multi-modal hashing with dynamic query-adaption. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 715--724.Google ScholarDigital Library
- Denghao Ma, Yueguo Chen, Xiaoyong Du, and Yuanzhe Hao. 2018. Interpreting fine-grained categories from natural language queries of entity search. In Database Systems for Advanced Applications: 23rd International Conference, DASFAA 2018, Gold Coast, QLD, Australia, May 21--24, 2018, Proceedings, Part I 23. Springer, 861--877.Google ScholarDigital Library
- Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, Vol. 42, 4 (2018), 824--836.Google Scholar
- Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018. A survey of product quantization. ITE Transactions on Media Technology and Applications, Vol. 6, 1 (2018), 2--10.Google ScholarCross Ref
- Craig N Mills and Manfred Steffen. 2000. The GRE computer adaptive test: Operational issues. In Computerized adaptive testing: Theory and practice. Springer, 75--99.Google ScholarCross Ref
- Andrew Moore. 2013. The Anchors Hierachy: Using the triangle inequality to survive high dimensional data. arXiv preprint arXiv:1301.3877 (2013).Google Scholar
- Parikshit Ram and Alexander G Gray. 2012. Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 931--939.Google ScholarDigital Library
- Lawrence M Rudner. 2009. Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing. Springer, 151--165.Google Scholar
- Andrey V Savchenko. 2017. Maximum-likelihood approximate nearest neighbor method in real-time image recognition. Pattern Recognition, Vol. 61 (2017), 459--469.Google ScholarDigital Library
- Shuanghong Shen, Enhong Chen, Qi Liu, Zhenya Huang, Wei Huang, Yu Yin, Yu Su, and Shijin Wang. 2022. Monitoring Student Progress for Learning Process-Consistent Knowledge Tracing. IEEE Transactions on Knowledge and Data Engineering (2022).Google ScholarDigital Library
- Benno Stein. 2007. Principles of hash-based text retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 527--534.Google ScholarDigital Library
- Nathan A Thompson and David A Weiss. 2011. A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, Vol. 16, 1 (2011), 1.Google Scholar
- Manos Tsagkias, Tracy Holloway King, Surya Kallumadi, Vanessa Murdock, and Maarten de Rijke. 2021. Challenges and research opportunities in ecommerce search and recommendations. In ACM SIGIR Forum, Vol. 54. ACM New York, NY, USA, 1--23.Google Scholar
- Wim J Van der Linden and Peter J Pashley. 2009. Item selection and ability estimation in adaptive testing. In Elements of adaptive testing. Springer, 3--30.Google Scholar
- Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yuying Chen, Yu Yin, Zai Huang, and Shijin Wang. 2020. Neural cognitive diagnosis for intelligent education systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6153--6161.Google ScholarCross Ref
- Zhiliang Ying and CF Jeff Wu. 1997. An asymptotic theory of sequential designs based on maximum likelihood recursions. Statistica Sinica, Vol. 7, 1 (1997), 75--91.Google Scholar
- Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning tree-based deep model for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1079--1088.Google ScholarDigital Library
- Yan Zhuang, Qi Liu, Zhenya Huang, Zhi Li, Shuanghong Shen, and Haiping Ma. 2022. Fully Adaptive Framework: Neural Computerized Adaptive Testing for Online Education. (2022).Google Scholar
Index Terms
- Search-Efficient Computerized Adaptive Testing
Recommendations
GMOCAT: A Graph-Enhanced Multi-Objective Method for Computerized Adaptive Testing
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningComputerized Adaptive Testing (CAT) refers to an online system that adaptively selects the best-suited question for students with various abilities based on their historical response records. Compared with traditional CAT methods based on heuristic ...
A Robust Computerized Adaptive Testing Approach in Educational Question Retrieval
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalComputerized Adaptive Testing (CAT) is a promising testing mode in personalized online education (e.g., GRE), which aims at measuring student's proficiency accurately and reducing test length. The "adaptive" is reflected in its selection algorithm that ...
Measuring English vocabulary size via computerized adaptive testing
Measuring English vocabulary size in EFL contexts normally requires a large number of test items and relies on paper-and-pencil (P&P) formats. The aim of this study was to examine the feasibility and practicality of computerized adaptive testing (CAT) ...
Comments