1 Introduction

Timely resolution of urban environment and infrastructure problems is an essential component of a well-functioning city. Cities have developed various reporting systems, such as 311 services [1], for the citizens to report, track, and comment on the urban anomalies that they encounter in their daily lives. These systems enable the city government institutions to respond on a timely basis to disturbances or anomalies such as noise, blocked driveway and urban infrastructure malfunctions [14]. However, these systems and resulting actions rely on accurate and timely reporting by the citizens, and are by nature reactive. That leads us to the question: what if we could predict which regions of a city will observe certain categories of anomalies in advance? We posit that this would enhance resource allocation and budget planning, and also result in more timely and efficient resolution of issues, thereby minimizing the disruption to the citizens’ lives [19]. However, the development of such an urban anomaly prediction system faces several challenges:

(1) Anomaly Dynamics: The factors underlying urban anomalies may change over time. For example, anomalies in the winter may stem from winter related severe weather (e.g., snow), and it may be infeasible to train the predictive model by using historical data between spring and fall. (2) Coupled Multi-Dimensional Correlations: There may exist strong signals among the locations, time, and categories of occurred anomalies, that is, certain categories of anomalies occur at specific locations and/or specific time. Traditional time-series based models such as Autoregressive Moving-Average (ARMA) [28] and Gaussian Processing (GP) [8] largely rely on heuristic temporal features to make predictions. However, these approaches fail to capture the spatial-temporal factors that drive different categories of anomalies occur.

Fig. 1.
figure 1

The Urban Anomaly PreDiction (UAPD) Framework.

Contributions. To achieve a comprehensive solution that addresses these challenges and meets the goal of prediction, we develop a unified three-phase Urban Anomaly PreDiction ( UAPD ) framework to predict urban anomalies from spatial-temporal data—urban anomaly reports. As illustrated in Fig. 1, at the first phase, we propose a probabilistic model, whose parameters are inferred via Markov chain Monte Carlo, to detect the change point of the historical anomaly records of a city, such that only most relevant reports are used for the prediction of future anomalies. At the second phase, we model the anomaly data starting from the detected change point of all regions in the city with a three-dimensional tensor, where each of three dimensions stands for the regions, time slots, and categories of occurred anomalies, respectively. We then decompose the tensor to incorporate the underlying relationships between each dimension into their corresponding inherent factors in the tensor. Subsequently, the prediction of anomalies in next time slot is furthered to a time-series prediction problem. At the third phase, we leverage Vector Autoregression to capture the inter-dependencies of inherent factors among multiple time series, generating the prediction results.

We evaluate the performance of Urban Anomaly PreDiction ( UAPD ) by using two real-world datasets collected from the 311 service in New York City and Pittsburgh. The evaluation results with different region scales (i.e., precincts and regions divided by road segments) and different time-frames of training data show that our framework can predict different categories of anomalies more accurately than the state-of-the-art baselines.

In summary, our main contributions are as follows:

  • We formalize the urban anomaly prediction problem from spatial-temporal data and develop a unified UAPD framework to predict future occurrences of different anomalies at different urban areas.

  • In UAPD, we propose a change point detection solution to address the challenge of anomaly dynamics and leverage tensor decomposition to model the interrelations among multiple dimensions of urban anomaly data.

  • We conduct extensive experiments with different scenarios in distinct urban environments to demonstrate the effectiveness of our presented framework.

2 Related Work

Anomaly Detection. Our work is related to urban sensing-based anomaly detection [5,6,7, 18, 23]. For example, Doan et al. presented a set of new clustering and anomaly detection techniques using a large-scale urban pedestrian dataset [6]. Chawla et al. used the Principal Component Analysis (PCA) scheme to discover events which may have caused anomalous behavior to appear in road traffic flow [5]. Pan et al. developed a novel system to detect anomalies according to drivers’ routing behavior [23]. Zheng et al. proposed a probability-based anomaly detection method to discover collective anomalies from multiple spatial-temporal datasets [32]. Le et al. proposed an online algorithm to detect anomaly occurrence probability given the data from various sensors in highly dynamic contexts [18]. However, all the above solutions identified urban anomalies after they happen. In contrast, this paper develops a principled approach to predict urban anomalies in different regions of a city. Although there exists one recent work on anomaly prediction [15] by considering the dependency of anomaly occurrence between regions, two significant limitations exist: (i) it ignored the fact that anomalies of different categories can be correlated; (ii) it assumed that some portion of ground truth data from the predicted time slot are known. However, such assumptions do not always hold in the practical scenario for a couple of reasons. Firstly, dependencies between anomalies with different categories are ubiquitous in urban sensing. Secondly, it is difficult to know the ground truth information from the predicted time slot beforehand. To overcome those limitations, this paper develops a new urban anomaly prediction framework to explicitly explore the inter-category dependencies and model the time-evolving inherent factors with respect to different categories, without requiring any ground truth information from the predicted time slot.

Time Series Prediction. Our work is related to the literature that focus on time series prediction [2, 8, 25, 28]. In particular, Vallis et al. [25] proposed a novel statistical technique to automatically detect long-term anomalies in cloud data. A non-parametric approach Gaussian Processing (GP) has been developed to solve the general time series prediction problem [8]. Additionally, Wiesel et al. [28] proposed a time varying Autoregression Moving Average (ARMA) model for co-variance estimation. Bao et al. [2] proposed a Particle Swarm Optimization (PSO)-based strategy to determine the number of sub-models in a self-adaptive mode with varying prediction horizons. Inspired by the work above, we develop a new scheme to explicitly consider the anomaly distribution dependency between regions and incorporate it into the time series prediction model.

Bayesian Inference. Bayesian inference has been widely used in the data analytics communities [20, 21, 30]. For example, Yang et al. studied the problem of learning social knowledge graphs by developing a multi-modal Bayesian embedding model [30]. Lian et al. proposed a sparse Bayesian content-aware collaborative filtering approach for implicit feedback recommendation [21]. Lee et al. proposed a Bayesian nonparametric model for medical risks prediction by exploring phenotype topics from electronic health records [20]. In this paper, we study the problem of urban anomaly prediction by proposing a Bayesian inference model to capture the evolving relationships of anomaly sequences.

3 Problem Definition

Given the historical sequences of anomalies within a city’s geographical regions, the objective is to predict whether certain categories of anomalies will happen at certain places in the future. We use a three-dimensional tensor \(\mathcal {X} \in \mathbb {R}^{I\times J \times K}\) to represent the anomaly sequences of all regions in a city. IJK denote the number of regions (e.g., geographical areas divided by the city’s road network), time slots, and anomaly categories (e.g., noise, blocked driveway, illegal parking, etc.) in the data, respectively. Each element \(\mathcal {X}_{i,j,k}\) represents the number of the k-category anomalies that happened at region i at time j.

Urban Anomaly Prediction Problem: Given the historical anomaly data \(\mathcal {X} \in \mathbb {R}^{I\times J \times K}\) of a city, the goal is to learn a predictive function \(f(\mathcal {X}) \rightarrow {\mathcal {X}}_{:,J+1,:}\) to infer future occurrences of each category k of anomalies at each region i at time \(J+1\).

The urban anomaly prediction solution needs to incorporate additional factors as noted in Sect. 1. First, there may exist periodic and temporary patterns in urban anomalies. For example, the temporary road construction serves as the inherent factors for noise related anomalies. This pattern makes the model trained using data collected during the construction period ineffective for predictions over the following time-frame. Second, the urban anomaly data covers multiple aspects of coupled information—the location, time, and category of anomalies. It is unclear whether certain types of anomalies occur at correlated locations or time, and if so, how we can model the multidimensional data. The key notations are shown in Table 1.

4 The UAPD Framework

Figure 1 illustrates the UAPD framework, with the following steps: (1) UAPD leverages Bayesian inference to detect the change point in the anomaly sequence; (2) UAPD incorporates the spatial-temporal anomaly data into a three-dimensional tensor, which can be decomposed to learn the latent correlations between all dimensions; (3) UAPD applies the Vector Autoregression model to predict the occurrence of different categories of anomalies in each region of a city. We will explain these three steps in detail in the following subsections.

Table 1. Symbols and definitions

4.1 Change Point Detection

Given an anomaly sequence, conventional prediction methods usually train a learning model by using all available data between time 0 and J to predict the future anomalies at time \(J+1\) [22]. However, the occurrence of urban anomalies can be largely influenced by temporary events and periodic patterns. For example, the road construction in a certain area can result in many noise reports during the construction period. As a result, the anomaly sequence of this area may be dramatically changed after the construction, i.e., a significant decrease in noise reports, leading to an ineffective prediction for post-construction anomalies by training models on the anomaly sequence that covers the construction period. To address this issue, we propose a Bayesian inference based method to detect the change point \(\delta \) in an anomaly sequence between time 0 and J. The sequence follows two different data distributions before and after the change point. Thus, UAPD learns the anomaly predictive function, over the time sequences, by leveraging the most relevant latent factors that caused the anomalies.

Specifically, we aim to detect the change point of the anomaly sequence matrix M with regions that have k categories of anomalies. To do so, we first model \(M_j^k\) that reports the k-th category of anomalies at time j as a Poisson distribution. Before and after the change point \(\delta \), \(M_j^k\) follows Poisson distributions with different parameter configurations, \(\alpha _k\) and \(\beta _k\), respectively. Following the standard Bayesian inference, we set the conjugate prior of the Poisson distribution as the Gamma distribution with \(\varvec{a}\) and \(\varvec{b}\) as its shape and scale parameters, respectively. Finally, the change point \(\delta \) is set to obey a multinomial distribution with parameters \(\varvec{p}=(p_0, \cdots , p_j, \cdots , p_J)\), where \(p_j\) denotes the probability of time j to be the change point. The generative process behind our model can be summarized as follows:

$$\begin{aligned} \delta&\sim \textit{Multinomial} (\delta ; \varvec{p}) \nonumber \\ \alpha _k&\sim \textit{Gamma}(\alpha _k; a^\alpha _k,b^\alpha _k) \nonumber \\ \beta _k&\sim \textit{Gamma}(\beta _k; a^\beta _k,b^\beta _k) \\ M_j^k&\sim \left\{ \begin{aligned} \textit{Poisson}(M_j^k;\alpha _k),&~1 \le j \le \delta \nonumber \\ \textit{Poisson}(M_j^k;\beta _k),&~\delta +1 \le j \le J \\ \end{aligned} \right. \end{aligned}$$
(1)

where \(M_j^k\) is the number of regions which have anomalies of k-th category at time j. The variable on the left side of the semi-colon is assigned a probability under the parameter on the right side. The posterior distribution is formally defined as follow:

$$\begin{aligned} \text {Pr}(\varvec{\alpha }, \varvec{\beta }, \delta | M_{1:J})&\propto \text {Pr}(M_{1:\delta }|\varvec{\alpha }) \text {Pr}(M_{\delta +1:J}|\varvec{\beta }) \text {Pr}(\varvec{\alpha }) \text {Pr}(\varvec{\beta }) \text {Pr}(\delta )\\\nonumber&= \prod _{k=1}^{K} \Big (\prod _{j=1}^{\delta } \text {Pr}(M_j^k | \alpha _k) \prod _{j=\delta +1}^{J} \text {Pr}(M_j^k | \beta _k) \text {Pr}(\alpha _k) \text {Pr}(\beta _k) \Big ) \text {Pr}(\delta ) \end{aligned}$$
(2)

We could estimate the change point \(\delta \) and distribution parameters \(\varvec{\alpha }, \varvec{\beta }\) by maximizing the full joint distribution. However, to avoid fine parameter tuning and make accurate detection on change point, we apply the Markov Chain Monte Carlo (MCMC) method to infer parameter. In addition, since we have multiple random variables in this problem, we utilize the Gibbs sampling algorithm to execute the MCMC method. The basic idea in Gibbs sampling is that we sample the new value of each variable in each iteration accordingly by fixing other variables [17, 24]. Algorithm 1 presents the outline of Gibbs sampling for change point detection. The conditional distribution of each variable and parameter derivations are given in the Appendix.

figure a

4.2 Tensor Decomposition

The inherent factors of anomalies among different regions, time slots and anomaly categories are revealed by tensor decomposition on a three dimensional tensor representing the anomaly records. Since the anomaly distributions of regions are often subject to changes in different time periods, the objective of tensor decomposition is to model the temporal effects of anomalies distribution in each region by learning the inherent factors as well as adopting these factors to different time periods.

We use the CANDECOMP/PARAFAC (CP) decomposition approach [3] to factorize the tensor into three different matrices \(R\in \mathbb {R}^{I\times L}\), \(T\in \mathbb {R}^{J\times L}\) and \(C\in \mathbb {R}^{K\times L}\). Here L is the number of inherent factors and is indexed by l. R, T and C are the inherent factor matrices with respect to I regions, J time slots and K anomaly categories, respectively. We can express the three-way tensor factorization of \(\mathcal {X}\) as:

$$\begin{aligned} \mathcal {X} \approx \sum _{l=1}^L R_{:,l} \circ T_{:,l} \circ C_{:,l} \end{aligned}$$
(3)

where \(R_{:,l}\), \(T_{:,l}\) and \(C_{:,l}\) represent the l-th column of R, T and C. \(\circ \) denotes the vector outer product. Each entry \(\mathcal {X}_{i,j,k}\) in tensor \(\mathcal {X}\) can be computed as the inner-product of three L-dimensional vectors as follows:

$$\begin{aligned} \mathcal {X}_{i,j,k} \approx <R_i, T_j, C_k> \equiv \sum _{l=1}^L R_{i,l} T_{j,l} C_{k,l} \end{aligned}$$
(4)

Although many techniques can be applied to CP decomposition, we utilize the Alternative Least Square (ALS) algorithm [4, 12], which has been shown to outperform other algorithms in terms of solution quality [9].

4.3 Vector Autoregression

Using the tensor decomposition model discussed above, we can learn the inherent factors of anomalies from different time slots. Based on the three matrices generated by CP decomposition, we formulate the anomaly prediction problem as the time series prediction task on \(T\in \mathbb {R}^{J\times L}\), since the inherent factors in other two dimensions remain constant. We use Vector Autoregression (VAR) [11] to capture the linear inter dependencies of inherent factors among multiple time series. In particular, VAR model formulates the evolution of a set of L inherent factors over the same sample period (i.e., time slot \(j=1,2,...,J\)) as a linear function of their past values. We define the order S of VAR to represent the time series in the previous S time slots. Formally, it can be expressed as follows:

$$\begin{aligned} T_{J+1} = d + \sum _{s=1}^S B_s T_{J-s} + \epsilon _j \end{aligned}$$
(5)

where d is a \(L\times 1\) vector of constants, \(B_s\) is a time-invariant \(L\times L\) matrix and \(\epsilon _j\) is a \(L\times 1\) vector of errors. Many techniques have been proposed for VAR order selection, such as Bayesian Information Criterion (BIC) [27], Final Prediction Error (FPE) [26] and Akaike information criteria (AIC) [10]. In this paper, we utilize the lowest AIC value to decide the order of VAR. Hence, the number of anomalies with the k-th category in region \(R_i\) in the next time slot can be derived as:

$$\begin{aligned} \mathcal {X}_{i,J+1,k} = <R_i, T_{J+1}, C_k> \equiv \sum _{l=1}^L R_{i,l} T_{J+1,l} C_{k,l} \end{aligned}$$
(6)

5 Evaluation

In this section, we conduct experiments on two real-world datasets collected from 311 Service in New York City (NYC) and Pittsburgh, respectively. In particular, we answer the following research questions:

  • Q1: How does UAPD perform as compared to the state-of-the-art solutions in predicting different categories of urban anomalies?

  • Q2: How does UAPD perform in anomaly prediction with respect to different geographical region scales?

  • Q3: How does UAPD perform in anomaly prediction with respect to different time-frames (i.e., #time-slots J)?

  • Q4: Can UAPD effectively capture the change point of an anomaly sequence? Is change point detection effective for urban anomaly prediction?

  • Q5: Is UAPD stable with regard to the rank of tensor L (i.e., the number of inherent factors)?

5.1 Experimental Setup

Datasets. We evaluated UAPD on real-world urban anomaly reports datasets collected from New York City (NYC) OpenData portalFootnote 1 and Pittsburgh OpenData portalFootnote 2, respectively. Those datasets are collected from 311 Service that is an urban anomaly report platform, which allows citizens to report complaints about urban anomalies such as blocked driveway by texting, phone call or mobile app. Each complaint record contains the timestamp, coordinates and category of anomaly. Different cities may have different anomaly categories due to different urban properties [29]. For the New York City datasets, we focus on 4 key anomaly categories (e.g., Blocked Driveway, Illegal Parking) which are selected in [15, 31, 32]. For the Pittsburgh datasets, we select the top frequently reported anomaly categories (e.g., Potholes, Refuse Violations) as the target predictive anomaly categories. The New York city data was collected from Jan 2014 to Dec 2014 and the Pittsburgh data was collected from Jan 2016 to Dec 2016.

The statistics of these datasets are summarized in Table 2. Due to space limit, we only show the geographical distribution of different categories of anomalies in NYC in Fig. 2. In those figures, firstly for the same anomaly category, we can observe that different regions have different number of anomalies. Secondly for the same region, we can observe that different anomaly categories have different distributions, which is the main motivation to use tensor decomposition model to identify the inherent factors by considering the correlations between different regions and different categories. Similar geographical distributions can be observed in Pittsburgh datasets.

Table 2. Data statistics
Fig. 2.
figure 2

Geographical distribution of anomalies with different categories.

Baselines and Evaluation Metrics. We compare our proposed scheme to the following state-of-the-art techniques:

  • CUAPS: it predicts the anomaly by exploring the dependency of anomaly occurrence between different regions and use some portion of ground truth data from the predicted time slot [15].

  • Random Forest (RF): it incorporates the spatial-temporal features of anomaly sequence into the classification model.

  • Adaptive Boosting (AdaBoost): it predicts the anomaly occurrence of each region by maintaining a distribution of weights over the training examples.

  • Gaussian Processing (GP): it is a nonparametric approach that predicts the abnormal state of each region by exploring temporal features [8].

  • Autoregressive Moving-Average Model (ARMA): it is a time varying model that predicts the abnormal state of each region by exploring anomaly traces features [28].

  • Logistic Regression (LR): it is a regression model that estimates the abnormal state of each region by exploring temporal features [13].

5.2 Performance Validation

We use the following metrics to evaluate the estimation performance of the UAPD scheme: F1-measure, Precision and Recall.

Effectiveness of UAPD (Q1, Q2, Q3). In this subsection, we present the performance of UAPD on two real-world datasets with different geographical region scales and different time slots J. The source code of UAPD is publicly availableFootnote 3. In our evaluation, we consider two versions of our scheme: (i) UAPD-CP: a simplified version of UAPD framework which does not include the change point detection step; (ii) UAPD: the full version of the proposed scheme. We compare the UAPD scheme and its variant with the state-of-the-art algorithms.

Results on New York City Dataset. To investigate the effect of geographical scale, we partition the New York City into different geographical regions using the following methods:

  • High-Level Region: New York City is divided into 76 geographical areas based on the political and administrative districts informationFootnote 4. We refer to each geographical region as a high-level region.

  • Fine-Grained Region: we use major roads (road segments with levels from \(L_1\) to \(L_5\)) to partition the entire city, which results in 862 geographical area [32]. We refer to each geographical area as a fine-grained region.

Based on the above region partition methods, each anomaly can be mapped into a particular region.

The evaluation results on NYC datasets with high-level region are shown in Tables 3 (June) and 4 (Dec), respectively. The evaluation results on NYC datasets with fine-grained region are shown in Tables 5 (June) and 6 (Dec), respectively. To investigate the effect of the number of time slots J, we study the performance of all compared schemes with different values of J. In Tables 3 and 4, we set J to be 150 (from January to May). In Tables 5 and 6, we set J to 330 (from January to November). The evaluation results are the average across the 10 consecutive days for prediction. From the results, we can observe that UAPD consistently outperforms the state-of-the-art baselines in most cases on different categories of anomalies. For example, UAPD outperforms the best baseline (AdaBoost) for high-level region-Building/Use case by 16.2% on F1-Score and the best baseline (RF) for fine-grained region-Blocked Driveway case by 34.7% on F1-Score. In the occasional case that UAPD misses the best performance, it still generates very competitive results. The results are consistent with different geographical region scales and historical time slots.

Table 3. Prediction results on June with high-level region in NYC
Table 4. Prediction results on Dec with high-level region in NYC
Table 5. Prediction results on June with fine-grained region in NYC
Table 6. Prediction results on Dec with fine-grained region in NYC
Table 7. Prediction results on June with high-level regions in pittsburgh

Results on Pittsburgh Dataset. We repeated the same experiments on Pittsburgh dataset. Considering the space limit, we only present the evaluation results for high-level regions in June. In particular, we partition the Pittsburgh city based on its council districts informationFootnote 5. The evaluation results on different categories of anomalies are shown in Table 7. The reported results are also the average across the 10 consecutive days for prediction. In Table 7, we can observe that our scheme UAPD still outperforms other baselines in most of the evaluation metrics. Additionally, we can observe that UAPD outperforms UAPD-CP (without detecting any change point on anomaly sequence) in most cases. Since there do not exist reports of Snow/Ice removal category of anomaly on June, we do not consider this category in this evaluation.

The performance improvements of UAPD are achieved by (i) carefully considering dynamic causes of anomalies; (ii) explicitly exploiting inherent factors of different categories of anomalies; (iii) carefully handling the evolving relationship among different anomaly categories and regions, with Bayesian inference and tensor decomposition, which are missing from the state-of-the-art solutions.

Fig. 3.
figure 3

Analysis of change point detection.

Analysis of Change Point Detection (Q4). Additionally, we conducted experiments to analyze the results of change point detection (i.e., the first component of the proposed framework) and further visualize the result. The detection results on Pittsburgh datasets are shown in Fig. 3. We kept the value of J the same as the above experiments. In Fig. 3, we can observe that the detected starting point of anomaly sequences is Feb 18, 2016 instead of the actual beginning time (i.e., Jan 1, 2016). Moreover, the supplementary figures in both left and right side show that the anomaly distributions before the detected change point significantly differs from that after the change point, which demonstrates the effectiveness of our change point detection on anomaly sequences.

Fig. 4.
figure 4

Performance w.r.t rank parameter L on Jun. with high-level regions in NYC

Fig. 5.
figure 5

Performance w.r.t rank parameter L on Dec. with high-level regions in NYC

Impact of Rank Parameter (Q5). To investigate the effect of the only parameter (i.e., rank parameter L) in our framework, we studied the performance of our proposed scheme by varying the value of L. Particularly, we vary the value of rank from 2 to 9. The evaluation results on NYC datasets are shown in Figs. 4 and 5. We can observe that the performance of UAPD is stable with the value of rank from 5 to 8 which satisfy the value selection rubric of rank parameter L in tensor decomposition [16]. In our experiments, we set the value of rank to 8. Considering the space limit, we only present the results on NYC datasets. In the results on Pittsburgh datasets, similar results can be observed.

6 Conclusion

In this paper, we develop a Urban Anomaly PreDiction(UAPD) framework to predict urban anomalies from spatial-temporal data. UAPD enables the accurate prediction of the occurrences of different anomalies at each region of a city in the future. UAPD explicitly detects the change point of the anomaly sequences and also explores the time-evolving inherent factors and their relationships with each dimension tensor (i.e., regions and anomaly categories). We evaluate our presented framework on two sets of urban anomaly reports collected from 311 Service in New York City and Pittsburgh, respectively. The results show that UAPD significantly outperforms state-of-the-art baselines, and provides a framework for being predictive about disturbances, enabling the city officials to be more proactive in their preparation and response.

7 Appendix

In this section, we give the specific form of conditional posterior functions used in Algorithm 1. According to the posterior function in Eq. (3), we can obtain the full conditional posterior of \(\alpha \) given \(\beta \), \(\delta \) as follows:

$$\begin{aligned} \text {Pr} (\alpha _k | \delta , \varvec{\beta }, M_{1:J}) = \text {Gamma}(\alpha _k ; {}^*\!a^{\alpha }_k, {}^*\!b^{\alpha }_k)\nonumber \\ {}^*\!a^{\alpha }_k = a^{\alpha }_k + \sum _{j=1}^\delta M_j^k;~~~{}^*\!b^{\alpha }_k = \delta +b^{\alpha }_k \end{aligned}$$
(7)

The update of parameters in Gamma distribution of k-th category anomaly before the change point is given by \({}^*\!a_k\), \({}^*\!b_k\). Since we utilize the conjugate prior for the \(\alpha \), the conditional distribution of \(\alpha \) follows the Gamma distribution. Similarly, the conditional posterior of \(\beta \) given \(\alpha \), \(\delta \) has the same form as:

$$\begin{aligned} \text {Pr} (\beta _k | \delta , \varvec{\alpha }, M_{1:J}) = \text {Gamma}(\beta _k ; {}^*\!a^{\beta }_k, {}^*\!b^{\beta }_k)\nonumber \\ {}^*\!a^{\beta }_k = a^{\beta }_k + \sum _{j=\delta +1}^J M_j^k;~~~{}^*\!b^{\beta }_k = J -\delta +b^{\beta }_k \end{aligned}$$
(8)

Finally, the full conditional posterior distribution of \(\delta \) is a Multinomial distribution:

$$\begin{aligned} \text {Pr} (\delta | \varvec{\alpha },\varvec{\beta }, M_{1:J})&= \text {Multinomial} (\delta ; \varvec{p}) \nonumber \\ p_{\delta '} = \text {exp} \Big (\sum _{k=1}^K \big (\sum _{j=1}^{\delta '} M_j^k \text {log}\alpha _k&+ \sum _{j=\delta '}^J M_j^k \text {log} \beta _k - \delta ' \alpha _k - (J-\delta ') \beta _k \big ) + \text {log}\sigma \Big ) \end{aligned}$$
(9)

where \(\sigma \) is the normalization constant of \(p_{\delta '}\). \(\varvec{p}= (p_{1},...,p_{\delta '},...,p_{J})\).