Keywords

1 Introduction

Recently, fatal traffic accidents caused by the elderly have increased, as their driving ability declines with age. The key issue here is the lack of awareness in the elderly regarding their compromised driving ability, which appropriately conveyed can make several preventive measures effective. Thus, the role of third parties in perceiving the decline in driving ability in elderly individuals and conveying it to them is becoming important.

We developed a system that provides information about suitable timing by considering the communication atmosphere. The system can understand whether human-to-human communication is proceeding smoothly. When the system senses there has been little progress during the conversation, it attempts to provide a topic that leads to a smoother discussion. We already confirmed that F0 is able to estimate smoothness using a communication database between young individuals. The standard deviation value (SD-F0) for each utterance in a smooth conversation is greater than that of a non-smooth conversation. When the utterance database includes laughter sounds, the difference between “smoothness” and “non-smoothness” is not significant. Thus, excluding laughter from utterances is necessary for smoothness degree estimation [1]. When determining whether the utterance is laughter or usual speech, the average value of F0 (Ave-F0) and SD-F0 are useful [2]. However, we confirmed the effectiveness of speech by using only the conversations among young individuals.

We would like to use our system to inform the elderly of their decline by estimating their degree of decline from their conversation. For practical use, it is necessary for the system to determine the decline in a person’s ability to perform tasks in daily life in addition to determining this situation from an estimation of conversation smoothness. We think the F0 information would also be helpful to estimate the degree of decline.

Several papers illustrated that the acoustic features of the human voice are effective in detecting diseases specific to the elderly. Kato et al. [3]. reported that the prosodic feature is effective in estimating cognitive impairment in the elderly. Taler et al. [4]. reported on the relation between Alzheimer’s disease and disorders as found in the prosody. Some characteristics of speech among the elderly were reported. Formant frequency shifts in elderly speech were found by Tanaka et al. [5]. Most analyses did not use free conversations. To notice this impairment in daily life, we should analyze daily conversations.

As an index for evaluating decline, “aging” is an important factor. It is necessary to analyze the relationship between aging and the F0 characteristics. In this paper, we report the F0 characteristics of free dyadic conversations between elderly individuals, and compare these conversations with those of young individuals. Through analysis results, we describe the possibility of using F0 information to detect impairment or a health condition.

2 Structure of the Communication Atmosphere Estimation

Figure 1 shows the structure of our conversation smoothness estimation using F0 information. After extracting F0 from input utterances, we detect whether the utterance is “laughter” or “speech” using F0. Using only the “speech” utterances, the conversation smoothness is estimated.

Fig. 1.
figure 1

Structure of conversation atmosphere estimation

3 Conversation Analysis

3.1 Conversation Database

We recorded 12 sets of 3-min free dyadic conversations (between two individuals). Figure 2 shows the conversation recording location. We used two microphones and a video camera for recording. The recording conditions are listed in Table 1. The participating speakers had met each other before, but we made use of pairs of those who have never spoken to one another before.

Fig. 2.
figure 2

Conversation recording location

Table 1. Conditions of conversation

We analyzed 12 conversations by comparing those of the elderly and young individuals. We noted the “silent interval length” and “length of one utterance.” Figure 3 shows the “silent interval length” of each conversation, and Fig. 4 shows the “silent interval length” of each conversation.

Fig. 3.
figure 3

Ratio of silent interval length

Fig. 4.
figure 4

Length of one utterance

The results are as follows:

  • The differences in “silent interval length” between the elderly and young individuals are not significant, and the lengths are almost the same.

  • The “length of one utterance” of the elderly is dependent of speakers. The lengths of some elderly speakers are quite long.

3.2 Laughter Utterance Analysis

Based on our previous analysis using young individuals’ conversations, the F0 characteristics are quite different between speech and laughter. We analyzed the conversation speech utterances and non-languagae utterances separately.

3.2.1 Ratio of Non-language Utterance

The conversation database includes speech and several non-language utterances: laughter, cough, clicking tongue, etc. Tables 2 and 3 list the rates of non-language utterances for each conversation.

Table 2. Ratio of each utterance (young person)
Table 3. Ratio of each utterance (elderly person)
  • In conversation, both the elderly and the young individuals included many non-language utterances. The average ratios are 22% for young individuals and 19% for elderly individuals.

  • The t-test results revealed that the ratio of non-language utterances is not independent of age. It is dependent on the speakers.

  • Elderly utterances include many types of non-language utterances. In the young individuals, almost non-language utterances are “laughter.”

3.2.2 Variable Laughter Utterances

Nishio and Koyama [6] explained that laughter utterances can be classified in general as “pleasantness” or “sociability.” We classified laughter into two types: “pleasantness” and “sociability.” However, several occurrences of laughter included words. We added two more types: “pleasantness with speech” and “sociability with speech.”

We asked two individuals to listen to the conversation database and to classify all laughter utterances in one of four types: “pleasantness,” “sociability,” “pleasantness with speech,” or “sociability with speech.” We extracted the laughter utterances that were classified in the same class by them, and compared the ratio of each type between the elderly and the young (Table 4).

Table 4. Ratio of each type for all laughter utterances [%]
  • Both types of conversation (elderly and young) include many laughter utterances, especially the conversations of the elderly include more laughter utterances than the conversations of the young individuals.

  • The t-test results revealed that the ratio of non-language utterance is not independent of age. It is dependent on the speakers.

4 Comparison of F0 Between Elderly and Young

4.1 F0 Extracted from Usual Speech Utterances

We extracted the F0 of each utterance using the database of 12 conversations. After removing utterances such as other noises, voices of non-subject individuals, and non-language utterances, we selected 113 utterances for young individuals and 132 utterances for elderly individuals. We calculated the Ave-F0 and SD-F0 value of each utterance, and compared between these values for young individuals and those for elderly.

Figure 5 shows the distribution of Ave-F0 and SD-F0 values for all speakers. The Ave-F0 values in the distribution are indicated as the differences from the average values of all speech utterances by each speaker. The results show the following:

Fig. 5.
figure 5

Distribution of Ave-F0 and SD-F0 of each speech utterance by 12 individuals

  • The range of Ave-F0 is almost the same between the elderly and the young individuals.

  • The SD-F0 values of elderly individuals are higher than those for young individuals. The differences between the elderly and young individuals are significant when using a t-test, where the confidence level is 95%.

4.2 F0 Extracted from Laughter Utterances

As results of the Tables 2 and 3, the utterances of both elderly and young individuals include many laughter utterances. We analyzed the F0 characteristics of laughter utterances.

The comparison of laughter utterances between the elderly and the young indicate that the laughter of the elderly tends to be unvoiced. Table 5 lists the ratios of the laughter utterances where F0 could not be extracted relative to all laughter utterances. The laughter utterances of the elderly tend to become unvoiced utterances in more cases than for young individuals. However, this depends on the person.

Table 5. Ratio of utterances where could be extracted F0 relative to all laughter utterances

We extracted the F0 value of each utterance using the database of 12 conversations. After removing the utterances (including other noises, other voices, and unvoiced laughter utterances), we selected 47 laughter utterances for young individuals and 35 laughter utterances for elderly individuals. Figure 6 shows the distribution of the Ave-F0 and SD-F0 values.

Fig. 6.
figure 6

Distribution of Ave-F0 and the SD-F0 of laughter utterance

Figure 6 shows the distribution of the Ave-F0 and SD-F0 values for all speakers. The results show the following:

  • The area for the elderly is smaller than that for young individuals. The utterances of the elderly are plotted almost in the young individual’s area.

  • When both the Ave-F0 and the SD-F0 values of a laughter utterance are large, this indicates that the utterance is from a young individual.

5 Conversation Atmosphere Estimation for the Elderly

To discuss the effectiveness of F0 in estimating health conditions, we analyzed the effectiveness of F0 in estimating the elderly conversation atmosphere. To examine the effectiveness, the following two probabilities, which were already confirmed for young individuals, should be confirmed for elderly individuals.

  1. (1)

    The laughter utterances can be separated from usual speech utterances.

  2. (2)

    The differences between smooth utterances and non-smooth utterances are clear.

Figure 7 shows the distributions of the Ave-F0 and the SD-F0 values of utterances. The left-side distribution is for six young individuals, and the right-side figure shows the distribution for six elderly individuals.

Fig. 7.
figure 7

Distribution of Ave-F0 and the SD-F0 of laughter and speech

  • For both young individuals and elderly individuals, the Ave-F0 of laughter utterances tends to be higher than those of speech utterances. The differences between “laughter” and “speech” are significant as per the result of a t-test, which has a confidence level of 95%.

  • The distribution for elderly individuals is narrower than that for young individuals. The difference in the elderly SD-F0 values between speech and laughter is smaller than that for young individuals.

  • The laughter utterances can be classified to several type of classes. However, the difference of each laughter type is smaller than the difference between “speech” and “laughter.”

The results suggest that the Ave-F0 and SD-F0 values for the elderly are more difficult to use for estimating the conversation atmosphere when compared with the values for young individuals. To confirm the ability of the Ave-F0 and SD-F0 values to estimate the conversation atmosphere, we asked three individuals to observe video data and to classify a video scene according to the following two situations:

  • Smooth conversation (S): The topic had not been decided yet. Speakers searched for a topic that interested both of them.

  • Non-smooth conversation (NS): The topic for both of the speakers was already chosen, and the speakers spoke smoothly or eagerly.

The classifying results from the three individuals were very similar. 88% parts were classified under the same situations by all individuals.

Figure 6 shows the distributions of the Ave-F0 and the SD-F0 values of elderly utterances. The SD-F0 values of the utterances during smooth conversation are higher than those for non-smooth conversation. The t-test, which has a confidence level of 95%, revealed that the differences between smooth and non-smooth conversations for elderly individuals are smaller than those for young individuals; this was confirmed in previous experiments [1]. The difference in the elderly between smooth and non-smooth conversations, however, is significant (Fig. 8).

Fig. 8.
figure 8

Distribution between Ave-F0 and SD-F0 of each utterance for six elderly individuals

6 Discussion

We confirmed the differences of the F0 characteristics between elderly and younger individuals through an analysis.

  • For elderly speech utterances, SD-F0 tends to be larger than those of younger individuals.

  • The length of a silent interval is nearly the same for the elderly and younger individuals, but the length of one utterance of the elderly tends to be longer than that of the young individuals.

  • For both the elderly and young individuals, many non-language utterances are included in conversation. Most non-language utterances recorder can be classified as “laughter.” However, the elderly utterances tend to include other non-language utterances, coughs, clicking tongues, etc.

  • The laughter utterances of the elderly tend to be unvoiced.

  • With regard to the distributions of Ave-F0 and SD-F0, the area of the elderly laughter is smaller than that of the younger individuals. When Ave-F0 or SD-F0 is extremely large, the laughter utterances are from the younger individuals.

These results indicate that by calculating the Ave-F0 and SD-F0 values of each utterance, we can estimate whether the speaker is young or elderly. The results also indicate that these values would be useful in assessing impairments in the elderly.

  • On the other hand, conversation atmosphere estimation for the elderly is more difficult than for young individuals. However, the t-test results revealed that the difference between speech and laughter is significant, and the difference between smooth and non-smooth conversations is also significant.

  • However, several laughter utterances for the elderly were unvoiced. The ratio of unvoiced laughter utterances depends on the person. A total of 32% of elderly laughter utterances were unvoiced utterances.

These results suggest that conversation atmosphere estimation and health condition estimation are limited while using only the F0 information.

7 Conclusion

We reported on the F0 characteristics of free dyadic conversations between elderly individuals and compared them with conversations between young individuals. We confirmed several different points between the elderly and the young individuals. The elderly utterances tend to include several types of non-language utterances and unvoiced laughter utterances. The dynamic range of SD-F0 for elderly individuals tends to be narrower when compared with that of with young individuals. These results show that conversation atmosphere estimation and health condition estimation for elderly individuals using F0 characteristics would be more difficult when compared with estimating for young individuals.

However, the results of confirming the differences between “speech” and “laughter” and between “smooth conversation” and “non-smooth conversation” for elderly individuals indicate that these differences are sufficiently large. The results suggest that F0 information is useful for conversation atmosphere estimation, and would have the ability to estimate health conditions.

In the future, first, we will confirm the reliability of our results using a larger quantity of data. In addition, other factors of nonverbal communication, such as gestures, will be analyzed to obtain a more accurate estimate. Next, we would like to confirm its effectiveness for an assessment of health conditions of the subjects.