Abstract
The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.













Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
(2011) A parents guide to evidence-based practice and autism. In: The National Autism Centre, 41 Pacella Park Drive, Randolph. http://www.ids-wi.com/images/Natl_Autism_Center_Parent_Manual.pdf
Allison PD (2002) Missing data: quantitative applications in the social sciences. Br J Math Stat Psychol 55(1):193–196
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders, 5th edn. American Psychiatric Association, Washington DC
Ashton TM (2001) Assistive technology: the application of ABA to technology: the discrete trial trainer. J Spec Educ Technol 16(1):41–42
Baio J, Autism Developmental Disabilities Monitoring Network Surveillance Year 2008 Principal Investigators CfDC, Prevention (2012) Prevalence of autism spectrum disorders: Autism and developmental disabilities monitoring network, 14 sites, United States, 2008. MMWR Surveill Summ 61(3):1–18
Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 2009 785152:1–17
Chueinta W, Hopke PK, Paatero P (2000) Investigation of sources of atmospheric aerosol at urban and suburban residential areas in thailand by positive matrix factorization. Atmos Environ 34(20):3319–3329
Doshi-Velez F, Ge Y, Kohane I (2014) Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics 133(1):e54–e63
Gershman SJ, Blei DM (2012) A tutorial on Bayesian nonparametric models. J Math Psychol 56(1):1–12
Ghahramani Z, Griffiths TL (2005) Infinite latent feature models and the Indian buffet process. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems 18. Vancouver, British Columbia, Canada, pp 475–482
Greer RD, McDonough SH (1999) Is the learn unit a fundamental measure of pedagogy? Behav Anal 22(1):5
Gupta SK, Phung D, Venkatesh S (2012) A nonparametric bayesian poisson gamma model for count data. In: 21st international conference on pattern recognition (ICPR), 2012. IEEE. pp 1815–1818
Hastie T, Tibshirani R, Friedman JJH (2001) The elements of statistical learning, vol 1. Springer, New York
Hetzroni O, Tannous J (2004) Effects of a computer-based intervention program on the communicative functions of children with autism. J Autism Dev Disord 34(2):95–113
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Lovaas O (1987) Behavioral treatment and normal educational and intellectual functioning in young autistic children. J Consult Clin Psychol 55(1):3–9
Moore D, Venkatesh S, Anderson A, Greenhill S, Phung D, Duong T, Cairns D, Marshall W, Whitehouse A (2013) Toby play-pad application to teach children with ASD—a pilot trial. Dev Neurorehabilitation 18(4):213–217
Obenshain MK (2004) Application of data mining techniques to healthcare data. Infect Control Hosp Epidemiol 25(8):690–695
Olinsky A, Chen S, Harlow L (2003) The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur J Oper Res 151(1):53–79
Prior M, Eisenmajer R, Leekam S, Wing L, Gould J, Ong B, Dowe D (1998) Are there subgroups within the autistic spectrum? A cluster analysis of a group of children with autistic spectrum disorders. J Child Psychol Psychiatry 39(06):893–902
Ruiz FJ, Valera I, Blanco C, Perez-Cruz F (2014) Bayesian nonparametric comorbidity analysis of psychiatric disorders. J Mach Learn Res 15(1):1215–1247
Schmidt M, Mohamed S (2009) Probabilistic non-negative tensor factorisation using markov chain monte carlo. In: European signal processing conference, pp 152–155
Singer E (2005) ‘Phenome’ project set to pin down subgroups of autism. Nat Med 11(6):583–583
Smith T (2001) Discrete trial training in the treatment of autism. Focus Autism Other Dev Disabl 16(2):86–92
Teh YW, Görür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. In: International conference on artificial intelligence and statistics, pp 556–563
Vellanki P, Duong T, Venkatesh S, Phung D (2014) Nonparametric discovery of learning patterns and autism subgroups from therapeutic data. In: Proceedings of 22nd international conference on pattern recognition (ICPR), pp 1829–1833
Venkatesh S, Greenhill S, Phung D, Adams B, Duong T (2012) Pervasive multimedia for autism intervention. Pervasive Mob Comput 8(6):863–882
Venkatesh S, Phung D, Duong T, Greenhill S, Adams B (2013) Toby: early intervention in autism through technology. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 3187–3196
Verte S, Geurts HM, Roeyers H, Oosterlaan J, Sergeant JA (2006) Executive functioning in children with an autism spectrum disorder: Can we differentiate within the spectrum? J Autism Dev Disord 36(3):351–372
Whalen C, Moss D, Ilan AB, Vaupel M, Fielding P, Macdonald K, Cernich S, Symon J (2010) Efficacy of TeachTown: basics computer-assisted intervention for the intensive comprehensive autism program in Los Angeles unified school district. Autism 14(3):179–197
White SW, Bray BC, Ollendick TH (2012) Examining shared and unique aspects of social anxiety disorder and autism spectrum disorder using factor analysis. J Autism Dev Disord 42(5):874–884
Williams C, Wright B, Callaghan G, Coughlan B (2002) Do children with autism learn to read more readily by computer assisted instruction or traditional book methods? A pilot study. Autism Int J Res Pract 6(1):71–91
Zhang S, Wang W, Ford J, Makedon F (2006) Learning from incomplete ratings using non-negative matrix factorization. In: SDM, SIAM vol 6, pp 548–552
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we present the derivations for the posteriors of \(w_{vk}\) and \(f_{kn}\), in the scenario where the dataset has missing data. X is our data matrix where the elements \(x_{vn}\), corresponding to the number of LUs accumulated by a child n in a task v, are data points. Our objective is to derive the posteriors for the parameters by using the data points \(x_{vn}\) that are not missing. The inference of posterior of \(w_{vk}\), for a certain value of v, depends on the values \(x_{vn}\) for all values of n. Similarly, the inference of \(f_{kn}\) for a certain value n depends on values \(x_{vn}\) for all values of v. Hence, we consider the data points in two sets \(J_{v}\) and \(I_{n}\) for each inference, respectively, such that \(J_{v}\) contains all the non-missing values from the array \(x_{v,1:N}\) and \(I_{n}\) contains all of those from \(x_{1:V,n}\).
Let us represent the sum \(\sum _{i=1}^{K}f_{in}z_{in}w_{vi}\) as \(\eta _{i}\), the term including parameters for all values \(i=1:K\), \(\sum _{i=1,i\ne k}^{K}f_{in}z_{in}w_{vi}\) as \(\eta _{-i}\), the term including parameters for all values of i except for k, and \(f_{kn}z_{kn}w_{vk}\) as \(\eta _{k}\), the term for the condition when i takes the value k.
-
The posterior of \(w_{vk}\) is
$$\begin{aligned} p(w_{vk}\mid Z,F,X)\propto & {} p(X\mid Z,F,W)p(w_{vk}\mid \alpha _{0},\beta _{0})\\= & {} \left( \prod _{n\in J_{v}}p(x_{vn}\mid \eta _{i}\right) Gamma (\alpha _{0},\beta _{0})\\= & {} \left( \prod _{n\in J_{v}}\frac{\left( \eta _{i}\right) ^{x_{vn}}e^{-\left( \eta _{i}\right) }}{x_{vn}}\right) \times \frac{\beta _{0}^{\alpha _{0}}}{\Gamma (\alpha _{0})}w_{vk}^{\alpha _{0}-1}e^{-\beta _{0}w_{vk}}\\\propto & {} w_{vk}^{\alpha _{0}-1}e^{-\beta _{0}w_{vk}}\times \prod _{n\in J_{v}}\left( \left( \eta _{i}\right) ^{x_{vn}}e^{-\left( \eta _{i}\right) }\right) \\= & {} w_{vk}^{\alpha _{0}-1}e^{-\beta _{0}w_{vk}}\times \prod _{n\in J_{v}}\left( \left( \eta _{-i}+\eta _{k}\right) ^{x_{vn}}e^{-\left( \eta _{-i}+\eta _{k}\right) }\right) \\\propto & {} w_{vk}^{\alpha _{0}-1}e^{-\beta _{0}w_{vk}}\prod _{n\in J_{v}}\left( \left( \eta _{-i}+\eta _{k}\right) ^{x_{vn}}e^{-\left( \eta _{-i}+\eta _{k}\right) }\right) \end{aligned}$$In order to solve the above equation, we take the help of an auxiliary variable. Let us consider that the probability \(p(w_{vk})\) is proportional to the unnormalised exponential function \(p^{*}(w_{vk})\), where \(p^{*}(w_{vk})\) is given by and can be expanded as a binomial function as follows:
$$\begin{aligned} p^{*}(w_{vk})= & {} \left( \eta _{-i}+\eta _{k}\right) ^{x_{vn}}\\= & {} \sum _{j=0}^{x_{vn}}{x_{vn}\atopwithdelims ()j} \left( \eta _{k}\right) ^{j}\left( \eta _{-i}\right) ^{x_{vn}-j} \end{aligned}$$Hence, we have
$$\begin{aligned} p(w_{vk})\propto & {} \left( \eta _{-i}+\eta _{k}\right) ^{x_{vn}} \end{aligned}$$Now let \(r_{vn}\) be an auxiliary variable. We aim to define a probability \(p(w_{vk},r_{vn})\) proportional to \(p^{*}(w_{vk},r_{vn})\) such that \(\sum _{r_{vn}}p^{*}(w_{vk},r_{vn})=p^{*}(w_{vk})\). So let \(p^{*}(w_{vk},r_{vn})={x_{vn}\atopwithdelims ()r_{vn}} \left( \eta _{k}\right) ^{r_{vn}}\left( \eta _{-i}\right) ^{x_{vn}-r_{vn}}\), where \(r_{vn}=\{0,1,2,\ldots ,x_{vn}\}\). Hence, we have
$$\begin{aligned} \sum _{r_{vn=0}}^{x_{vn}}p^{*}(w_{vk},r_{vn})= & {} \sum _{r_{vn=0}}^{x_{vn}}{x_{vn}\atopwithdelims ()r_{vn}} \left( \eta _{k}\right) ^{r_{vn}}\left( \eta _{-i}\right) ^{x_{vn}-r_{vn}}\\= & {} p^{*}(w_{vk}) \end{aligned}$$Additionally, we have
$$\begin{aligned} p(w_{vk}\mid r_{vn})= & {} \frac{p(w_{vk},r_{vn})}{p(r_{vn})}\\\propto & {} p(w_{vk},r_{vn})\\\propto & {} p^{*}(w_{vk},r_{vn})\\= & {} {x_{vn}\atopwithdelims ()r_{vn}} \left( \eta _{k}\right) ^{r_{vn}}\left( \eta _{-i}\right) ^{x_{vn}-r_{vn}}\\ p(r_{vn}\mid w_{vk})= & {} \frac{p(w_{vk},r_{vn})}{p(w_{vk})}\\\propto & {} \frac{p^{*}(w_{vk},r_{vn})}{p^{*}(w_{vk})}\\= & {} \frac{{x_{vn}\atopwithdelims ()r_{vn}} \left( \eta _{k}\right) ^{r_{vn}}\left( \eta _{-i}\right) ^{x_{vn}-r_{vn}}}{\left( \eta _{-i}+\eta _{k}\right) ^{x_{vn}}}\\= & {} {x_{vn}\atopwithdelims ()r_{vn}} \left( \frac{\eta _{k}}{\eta _{_{-i}}+\eta _{k}}\right) ^{r_{vn}}\left( \frac{\eta _{-i}}{\eta _{_{-i}}+\eta _{k}}\right) ^{x_{vn}-r_{vn}}\\ \end{aligned}$$Hence, the conditional distributions have a form of the binomial distribution. After substituting back the values of \(\eta _{-i}\)and \(\eta _{k}\), if we sample \(r_{vn}\) from such a distribution we can approximate the binomial expansion as follows:
$$\begin{aligned} R_{vn}\sim & {} Binomial \left( x_{vn},\frac{z_{kn}f_{kn}w_{vk}}{\sum _{i\ne k}z_{in}f_{in}w_{vi}+z_{kn}f_{kn}w_{vk}}\right) ,~~\forall n\in J_{v} \end{aligned}$$$$\begin{aligned} \left( \sum _{i\ne k}z_{in}f_{in}w_{vi}+z_{kn}f_{kn}w_{vk}\right) ^{x_{vn}}\propto & {} (z_{kn}f_{kn}w_{vk})^{R_{vn}} \end{aligned}$$Hence, we have
$$\begin{aligned} p(w_{vk}\mid Z,F,X)\propto & {} w_{vk}^{\alpha _{0}-1}e^{-\beta _{0}w_{vk}}\prod _{n\in J_{v}}\left( (f_{kn}z_{kn}w_{vk})^{R_{vn}}e^{-\left( f_{kn}z_{kn}w_{vk}\right) }\right) \\\propto & {} w_{vk}^{\alpha _{0}+\sum _{n\in J_{v}}R_{vn}-1}e^{-(\beta _{0}+\sum _{n\in J_{v}}f_{kn}z_{kn})w_{vk}} \end{aligned}$$The above expression is in gamma distribution form \(w_{vk}\sim Gamma (\alpha _{0}',\beta _{0}')\), where
$$\begin{aligned} \alpha _{0}'= & {} \alpha _{0}+\sum _{n\in J_{v}}R_{vn}\\ \beta _{0}'= & {} \beta _{0}+\sum _{n\in J_{v}}f_{kn}z_{kn} \end{aligned}$$ -
The posterior of \(f_{kn}\) is similarly calculated as:
$$\begin{aligned} p(f_{kn}\mid Z,W,X)\propto & {} p(X\mid Z,F,W)p(f_{kn}\mid \alpha _{1},\beta _{1})\\= & {} \left( \prod _{m\in I_{n}}p(x_{vn}\mid \sum _{i=1}^{K}f_{in}z_{in}w_{vi})\right) Gamma (\alpha _{1},\beta _{1}) \end{aligned}$$Hence, we have
$$\begin{aligned} p(f_{kn}\mid Z,W,X)\propto & {} f_{kn}^{\alpha _{1}-1}e^{-\beta _{1}f_{kn}}\prod _{v\in I_{n}}\left( (f_{kn}z_{kn}w_{vk})^{T_{vn}}e^{-\left( f_{kn}z_{kn}w_{vk}\right) }\right) \\\propto & {} f_{kn}^{\alpha _{1}+\sum _{v\in I_{n}}T_{vn}-1}e^{-(\beta _{1}+\sum _{v\in I_{n}}z_{kn}w_{vk})f_{kn}} \end{aligned}$$The above expression is in gamma distribution for \(f_{kn}\sim Gamma (\alpha _{1}',\beta _{1}')\), where
$$\begin{aligned} \alpha _{1}'= & {} \alpha _{1}+\sum _{v\in I_{n}}T_{vn}\\ \beta _{1}'= & {} \beta _{1}+\sum _{v\in I_{n}}z_{kn}w_{vk} \end{aligned}$$and the auxiliary variable \(T_{vn}\) is sampled similar to \(R_{vn}\) from
$$\begin{aligned} T_{vn}\sim & {} Binomial \left( x_{vn},\frac{f_{kn}z_{kn}w_{vk}}{\sum _{i\ne k}f_{in}z_{in}w_{vi}+f_{kn}z_{kn}w_{vk}}\right) ,~~\forall n\in I_{n} \end{aligned}$$
Rights and permissions
About this article
Cite this article
Vellanki, P., Duong, T., Gupta, S. et al. Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data. Knowl Inf Syst 51, 127–157 (2017). https://doi.org/10.1007/s10115-016-0971-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0971-7