Abstract
We propose a probabilistic model for traffic speed data. Our model inherits two key features from latent Dirichlet allocation (LDA). Firstly, unlike e.g. stock market data, lack of data is often perceived for traffic speed data due to unexpected failure of sensors or networks. Therefore, we regard speed data not as a time series, but as an unordered multiset in the same way as LDA regards documents not as a sequence, but as a bag of words. This also enables us to analyze co-occurrence patterns of speed data regardless of their positions along the time axis. Secondly, we regard a daily set of speed data gathered from the same sensor as a document and model it not with a single distribution, but with a mixture of distributions as in LDA. While each such distribution is called topic in LDA, we call it patch to remove text-mining connotation and name our model Patchy. This approach enables us to model speed co-occurrence patterns effectively. However, speed data are non-negative real. Therefore, we use Gamma distributions in place of multinomial distributions. Due to these two features, Patchy can reveal context dependency of traffic speed data. For example, a 60 mph observed on Sunday can be assigned to a patch different from that to which a 60 mph on Wednesday is assigned. We evaluate this context dependency through a binary classification task, where test data are classified as either weekday data or not. We use real traffic speed data provided by New York City and compare Patchy with the baseline method, where a simpler data model is applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: NIPS (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)
Drummond, A., Jermaine, C., Vagena, Z.: Topic models for feature selection in document clustering. In: SDM, pp. 521–529 (2013)
Hennig, P., Stern, D.H., Herbrich, R., Graepel., T.: Kernel topic models. In: AISTATS (2012)
Mills, T.C., Markellos, R.N.: The Econometric Modelling of Financial Time Series. Cambridge University Press (2008)
Minka, T.P.: Estimating a Gamma distribution (2002), http://research.microsoft.com/en-us/um/people/minka/papers/minka-gamma.pdf
Pan, B., Demiryurek, U., Shahabi, C.: Utilizing real-world transportation data for accurate traffic prediction. In: ICDM, pp. 595–604 (2012)
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cDNA microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 143–156 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Masada, T., Takasu, A. (2014). A Topic Model for Traffic Speed Data Analysis. In: Ali, M., Pan, JS., Chen, SM., Horng, MF. (eds) Modern Advances in Applied Intelligence. IEA/AIE 2014. Lecture Notes in Computer Science(), vol 8482. Springer, Cham. https://doi.org/10.1007/978-3-319-07467-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-07467-2_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07466-5
Online ISBN: 978-3-319-07467-2
eBook Packages: Computer ScienceComputer Science (R0)