1 Introduction

Music is widely known as an application domain of machine learning. However, in the beginning of the 21st century, recognition/analysis tasks were actively studied, such as music transcription and genre classification. But recently, the number of studies devoted to music generation has been increasing (e.g., [1]).

When generating polyphonic music, one must consider two-directional consistencies: simultaneity (i.e., the vertical or pitch-axis consistency) and sequentiality (i.e., the horizontal or time-axis consistency). Our team has investigated music generation models considering both simultaneity and sequentiality using Bayesian networks [2,3,4]. Here, we present our models applied to chord voicing [2], four-part harmonization [3], and real-time chord prediction [4].

2 Assumed Music Structure and Fundamental Model

Suppose that a chord progression \(C = [c_1, c_2,{\cdots }, c_N]\) (\(c_i\): chord symbol) exists in a piece of music. Each chord \(c_i\) (e.g., Am) is played with a particular voicing \((a^{(1)}_i, a^{(2)}_i,{\cdots }, a^{(K)}_i)\) (\(a^{(k)}_i\): note name (a.k.a. pitch class)) (e.g., (C, E, A)). As noted in Introduction, a set of simultaneous notes \((a^{(1)}_i, a^{(2)}_i,{\cdots }, a^{(K)}_i)\) should be harmonically consistent with one other, and each sequence \(A^{(k)} = [a^{(k)}_1, a^{(k)}_2,{\cdots }, a^{k}_N]\) should be temporally smooth. At the same time, a melody \(M=[m_{1,1}, m_{1,2},{\cdots }, m_{2,1}, \cdots ]\) exists, where \(m_{i,j}\) represents the note name of the j-th note in the i-th chord region. The sequences of chords, voicings, and melody notes are considered to have temporal dependencies within each sequence but also depends on one another, as shown in Fig. 1(a). In fact, this fundamental model is difficult to construct because of variations in the number of melody notes within each chord region. We therefore simplify the model based on restrictions to music structures designed for each music generation task.

Fig. 1.
figure 1

Fundamental model and models specialized to each task

3 Chord Voicing

Chord voicing refers to estimating voicings \((A^{(1)}, A^{(2)},{\cdots }, A^{(K)})\) according to a given chord progression C and melody M. Here we assume \(K=4\) for simplicity. To resolve the difficulty due to variations in the number of melody notes within each chord region, we use a different melody node \(m'_i = (r_{i,0},{\cdots }, r_{i,11})\) (\(0 \le r_{i,p} \le 1\)) that represents the relative length of the appearance of each note name. For example, \(m'_i = (0.5, 0, 0.25, 0, 0.25, 0,{\cdots }, 0)\) is given for a melody [E, D, C, C] (with equal duration). The simplified model is shown in Fig. 1(b).

This model is applied sequentially from the beginning to the end of a given piece. Given \(c_i\), \(m'_i\), and \((a^{(1)}_{i-1},{\cdots }, a^{(K)}_{i-1})\), the i-th chord voicing \((a^{(1)}_i,{\cdots }, a^{(K)}_i)\) as well as its next voicing \((a^{(1)}_{i+1},{\cdots }, a^{(K)}_{i+1})\) is estimated because each voicing should be smoothly connected to the next voicing. \((a^{(1)}_{i+1},{\cdots }, a^{(K)}_{i+1})\) will be overridden at the next step because this step is repeated for each increment of i.

An example of chord voicing is shown in Fig. 2. The model has been trained with 30 jazz pieces arranged for the electronic organ. Listening tests conducted by music experts revealed that 94.7% of the chord voicings were acceptable.

Fig. 2.
figure 2

An example of voicing (excerpted)

4 Four-Part Harmonization

Here, we focus on harmonization. Unlike voicing, a sequence of chord symbols is not given—it has to be estimated. For simplicity, we adopt the “one chord for one melody note” assumption. Based on this assumption, the Bayesian network can be simplified to that shown in Fig. 1(c). Here we assume \(K=3\). This problem is called four-part harmonization because the harmony consists of four voices (i.e., soprano, alto, tenor, and bass). Furthermore, we constructed a Bayesian network in which the chord nodes are removed (Fig. 1(d)) because the chord symbols are sometimes too ambiguous.

Fig. 3.
figure 3

Example of harmonization (left: model with chord nodes, right: model without chord nodes)

Figure 3 shows an example of harmonization using these two models. Our objective quantitative evaluation reveals that the model shown in Fig. 1(d) generates more temporally smooth harmonies than the model shown in Fig. 1(c) even though harmonizations with the former model tend to contain slightly more dissonant sounds.

5 Real-Time Chord Prediction

Finally, we apply our Bayesian network to real-time chord prediction. Music experts can often precisely predict the next chord by listening to the current chord, even if they are not familiar with the piece being played. This ability derives from the fact that chord progressions have strong temporal dependencies; experts have learned these dependencies based on their musical experience. They are therefore able to play an accompaniment to a melody that they are listening to for the first time. The goal here is to achieve a computer system that plays such an accompaniment.

Real-time chord prediction can also be achieved through a simplified version of the fundamental model shown in Fig. 1(a). For simplicity, we estimate only chord symbols, we determine the voicings through a separately designed rule. The model used here is shown in Fig. 1(e). Given a new melody note, its next note is predicted. At the same time, the most likely next chord is inferred based on the current chord and the predicted next note.

An example of chord prediction is shown in Fig. 4. This figure shows that the model appropriately predicts chord progression.

Fig. 4.
figure 4

Example of real-time chord prediction results

6 Conclusion

We have presented Bayesian network models that achieve different music generation tasks: chord voicing, four-part harmonization, and real-time chord prediction. Bayesian networks are flexible models that are suitable to construct a unified music generation model. In the future, we will apply our model to other types of music generation tasks.