Abstract
We describe a method for fitting a Markov chain, with a state transition matrix that depends on a feature vector, to data that can include missing values. Our model consists of separate logistic regressions for each row of the transition matrix. We fit the parameters in the model by maximizing the log-likelihood of the data minus a regularizer. When there are missing values, the log-likelihood becomes intractable, and we resort to the expectation-maximization (EM) heuristic. We illustrate the method on several examples, and describe our efficient Python open-source implementation.
Similar content being viewed by others
References
Alarid-Escudero, F., Krijkamp, E., Enns, E., Yang, A., Hunink, M., Pechlivanoglou, P., Jalal, H.: Cohort state-transition models in R: A tutorial. arXiv preprint arXiv:2001.07824, (2020)
Allison, P.: Missing Data. Sage Publications, New York (2001)
Barratt, S., Dong, Y., Boyd, S.: Low rank forecasting. arXiv preprint arXiv:2101.12414, (2021)
Baum, L.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3(1), 1–8 (1972)
Beck, R., Pauker, S.: The Markov process in medical prognosis. Med. Decis. Making 3(4), 419–458 (1983)
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Boyle, B.: Estimation of feature-dependent Markov process transition probability matrices. Inf. Control 32(4), 379–384 (1976)
Cox, D.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol). 34(2), 187–202 (1972)
Deltour, I., Richardson, S., Le Hesran, J.-Y.: Stochastic algorithms for Markov models estimation with intermittent missing data. Biometrics 55(2), 565–573 (1999)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol). 39(1), 1–22 (1977)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements Of Statistical Learning: Data Mining, Inference, And Prediction. Springer Science & Business Media, Germany (2009)
Incerti, D., Jansen, J.: hesim: Health Economic Simulation Modeling and Decision Analysis, (2021). R package version 0.5.0
Kalbfleisch, J., Lawless, J.: The analysis of panel data under a Markov assumption. J. Am. Stat. Assoc. 80(392), 863–871 (1985)
Kemeny, J., Snell, L.: Markov Chains, vol. 6. Springer, New York (1976)
Korn, E., Whittemore, A.: Methods for analyzing panel studies of acute health effects of air pollution. Biometrics, 795–802, (1979)
Lane, W., Looney, S., Wansley, J.: An application of the Cox proportional hazards model to bank failure. Journal of Banking & Finance 10(4), 511–531 (1986)
Makis, V., Jardine, A.: Optimal replacement in the proportional hazards model. INFOR: Inf. Sys. Oper. Res. 30(1), 172–183 (1992)
Norris, J.: Markov Chains. Cambridge University Press, Cambridge (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order To The Web. Technical report, Stanford InfoLab, Netherlands (1999)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035, (2019)
Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Inc. Retrosheet. Retrosheet. https://retrosheet.org/
Revuz, D.: Markov Chains. Elsevier, Amsterdam (2008)
Sherlaw-Johnson, C., Gallivan, S., Burridge, J.: Estimating a Markov transition matrix from observational data. J. Oper. Res. Soc. 46(3), 405–410 (1995)
Sonnenberg, F., Beck, R.: Markov models in medical decision making: a practical guide. Med. Decis. Making 13(4), 322–338 (1993)
Walrand, J.: Probability In Electrical Engineering And Computer Science: An Application-driven Course. Quorum Books, Santa Barbara, California (2014)
Woo, G.: Quantitative terrorism risk assessment. The Journal of Risk Finance, (2002)
Wu, Jeff: On the convergence properties of the EM algorithm. The Annals of Statistics, pages 95–103, (1983)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 68(1), 49–67 (2006)
Acknowledgements
The authors gratefully acknowledge conversations and discussions about some of the material in this paper with Trevor Hastie, Emmanuel Candes, Scott Harris, and Paul Bien.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barratt, S., Boyd, S. Fitting feature-dependent Markov chains. J Glob Optim 87, 329–346 (2023). https://doi.org/10.1007/s10898-022-01198-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-022-01198-0