Keywords

1 Introduction

Laughter is said to be linked to satisfactory human relationships and to have a positive impact on health, hence it is often related to perceived improvements in the quality of life (QOL) [1].

For this reason, research on laughter has gathered momentum in the human interface area, for instance to design a system that promotes smiles [2,3,4,5]. In addition, the impact of “laughter yoga” that triggers laugh spontaneously has drawn attention as a way to maintain daily health [6]. However, specific indicators linking the effects on people, the frequency of laughing episodes in a day, and the type of laugh that is desirable, are lacking.

In addition, in recent years, wearable devices have had the capability to obtain various life logs, but a method has yet to be established to record quantifiable indicators related to feelings and laughter. If it was possible to obtain a quantitative laughter life log from our daily life, the relationship between long-term laughter and health and well-being could be clarified, and it would be conceivable for laughter to be used as a quantitative index of QOL.

In this paper, we focused on the stomach to detect natural laughter resulting from funniness, which is positive even in laughter. Consequently, by measuring the pressure change in the abdomen using a textile sensor we can build a wearable laugh log system capable of detecting and recording laughter.

In our pilot experiment, we conducted experiments that induce laughter under environmental settings and examined a deep learning method to detect laughter in a period within the measured log. Results demonstrated the possibility of detection of laughter in a controlled environment. We then simulated daily scenes that were likely to trigger laughter, and then we measured and examined the detection of laughter through deep learning.

2 Related Work

Although laughter has been studied as an effective way to improve health and promote positive human relationships, quantification requires a large number of samples.

The ILHAIRE project, which produces a conversational agent using natural appearance and laughter, presents databases with large samples of laughter [7]. They categorize three types of laughter: natural expression, induced reaction, and made-up feeling, in connection with laughs measured. Natural expressions are caused by natural emotions as measured in the real world, but this measurement is difficult. By contrast, induced reactions were collected by presenting content such as comedy videos to encourage laughter. In addition, made-up emotions occur when one is directed to laugh; they are most readily measurable, but are not linked to a natural laugh.

This research aims to measure laughter by natural expression. However, the work presented in the above database points to difficulties in using the data collected in a natural environment [8]. Therefore, in this study, we first collect data by measuring laughter induced by external stimuli, such as viewing content that encourages laughter. We then establish a detection method and finally measure and detect laughter in daily scenes.

As a method to log laughs, the usage of face, voice, and skin surface potential in the vicinity of the diaphragm has been proposed. Laugh detection identifies smiling faces by extracting a quantified feature of the face from a camera image, and this functionality is increasingly mounted on cameras that are on the market such as Omron’s smile scan [9].

In addition, with the loud smile meter that measures vocal cord vibration data via a pharyngeal microphone, occurrences of laughter are logged over time from daily life.

With the diaphragm type laughter measuring instrument, the skin surface potential in the vicinity of the diaphragm is monitored by tracking the myoelectric potential reaction of the xiphoid process upper surface muscle at the time of laughter using a surface myoelectric potential measuring device, which measures amount, length, and timing of laughter [10].

Ikeda et al. used the two laugh measurements of a laughing counter and a diaphragm type laughter simultaneously, to classify laughter using the face, throat, and stomach. Reportedly, detection occurs even with smoldering in the face and throat measurement [11]. However, stomach measurements did not display signs of smoldering laughter and instead were only sensitive to true laughter. This suggested that laughter related to natural emotions can only be detected using a diaphragm type measurement device on the stomach. However, with the diaphragm type laughter measuring device, there were complaints from subjects that resting the electrode directly on the skin was annoying. In addition, a diaphragm type laughter measuring method that requires a large electromyograph is not suitable for long term measurement. Therefore, in this research, we aim to develop a stomach type wearable laughter measurement system that can measure true laughter.

3 Pilot Experiment of Laugh Detection

The textile sensor used in this study was woven with yarns that comprised of a chemical thread twisted with conductive material. The sensor has a two-layer structure: in the first layer, a yarn containing a conductive material is woven in the longitudinal direction, and in the second layer, it is woven in the transverse direction. When pressure is applied, a conductive yarn in a section where the vertical and horizontal conductive materials cross each other leads to capacitance changes as the distance changes. The change in electrostatic capacity is used as pressure value. The pressure can be measured in a 6 \(\times \) 14 matrix, and the measurement value is communicated wirelessly using Bluetooth. In this study, we designed a PC and android application with an accuracy of 10 Hz. The system created is displayed in Fig. 1 below, and the way it is worn is shown in Fig. 2.

Fig. 1.
figure 1

Textile sensor (left) and wearable laugh log system (right)

Fig. 2.
figure 2

A person wearing a laugh log

To investigate the capabilities of the prototype system in laughter detection, we conducted a measurement experiment by setting two conditions: “watch a movie” and “talk with a friend”. Two males and eight females, with a mean age of 21.5 years old, participated in the experiment. After measuring the resting state of the subjects for one minute, we measured the abdomean pressure under two conditions for five minutes each, under two postures consisting of sitting and standing, respectively. During the experiment, we recorded the state of the subject and extracted the timing of laughers from the log data. An overview of experimental conditions is shown in Fig. 3.

Fig. 3.
figure 3

Experimental conditions

3.1 Analysis Using Deep Learning

Analysis of measurement results was performed by normalizing the average value of the pressure distribution. From previous research [12], we observed that the pressure suddenly decreased when laughter occurred. However, the measurement value and its variation vary greatly among individuals, and it was difficult to detect laughter using a threshold value. Therefore, we had to study the detection of laughter using deep learning while observing the characteristics of pressure change. Learning is performed using a fifty-dimensional vector taken out by shifting a five-second portion every 0.1 s from the five-minute data during the experiment as one sample. The neural network used for learning is a classification model that considers time series data, known as long short term memory (LSTM). The same number of correct answers (with the center of the vector included in the laugh section) and incorrect answers (without the laughter in the vector) were randomly extracted and learned. The accuracy was 86%, and the recall rate was 88%. A new subject, namely a 24-year-old female, was monitored using the learning model created, and the laughter detection capabilities of deep learning were evaluated. Measurement freely set situations conducive to laugher such as talking with a friend and measured each position in sitting and standing postures three times every five minutes. Classification results were corrected to become one laugh section when sections classified as laugh appeared frequently and those with low confidence were identified as incorrect. As a result, we were able to detect 75% of laughs out of 102 in six measurements. In addition, half of the laughs that could not be classified as such lasted one second or less, and even laughs of one second or more were detected at a position close to a laugh without overlapping, except on three occasions. Hence, this method demonstrated that the ability to detect laughs was sufficiently effective. An example of the corrected classification result is shown in the Fig. 4 below.

Fig. 4.
figure 4

Laugh classification result (left: without correction right: with correction Pink section of right figure shows correctly classified and green section shows incorrectly classified as laughing).

4 Measurement Experiment Simulating Daily Life

As the pilot experiment was conducted under laboratory settings with constrained behavior, it was necessary to also identify laughter from measurement data related to various behaviors in everyday life. Therefore, it was necessary to measure laughter with various behaviors and prepare learning data, although laughter is accidental, and it is difficult to measure while performing a specific action. Hence, we set some scenes conducive to laughter under daily scenarios and conducted a logging experiment. As it is expected that signals from various behaviors will be included, we revised the laugh log system to reduce noise.

4.1 Revision of Laugh Log System

We performed actions such as gait, step up and down, and having a meal. In a preliminary measurement in this experiment, it appeared that noise was created as the measurement device was being shaken in the pocket installed in the lower part of the bellyband interface. Therefore, a belt type pocket was designed with a stretchable cloth, and the measuring device was fixed by winding the belt around the waist. We also improved the ability of the application to estimate the behavior under measurement. Acceleration and geomagnetic sensors built in android were used to calculate the acceleration along three axes and the inclination of the android device so that they could be measured. A pocket was attached to the back of the belly band so that the android device could be stored and the inclination of the upper body could be estimated by the inclination of the android device. An overview of the improved system is displayed in Fig. 5. In addition, as this experiment was conducted in the winter, when wearing the system from a thick knit, the pressure was absorbed by clothes, and the pressure could not be measured accurately. Therefore, during measurement, after wearing the laugh log on top of a thin long-sleeved T-shirt, it was layered with the sweater.

Fig. 5.
figure 5

Improved laugh log system

4.2 Experiment Settings

One subject, a 24 years old female, was monitored in each of six types of daily scenes conducive to laughter for ten periods of twenty minutes. The experiment was divided into several days, and the measurement was performed a plurality of times, consecutively, each day. Measurements were made in places such as laboratories, facilities within the university, and nearby stores. During measurement, a pin microphone was installed on the chest and voice was recorded. The timing of laughs was extracted from the recorded voice for analysis. Based on the assumption that multiple people typically talk while laughing, we set the following six scenes, expected to include complex movements such as standing up, walking, walking ahead, and shouting aloud. Considering the characteristics of the textile sensor whose humidity influences measured values, scenarios that may lead to sweating such as those related to sports were not included this time. To measure natural laughter, we instructed subjects to ignore the measuring to the extent possible during the experiment. The details of the scenes are as follows:

  • Going shopping with friends: going shopping by foot and turning around the inside of the store.

  • Cooking with friends: going to pick up groceries, cutting and cooking in a standing posture.

  • Playing a board game with friends: sitting around a large desk and playing a board game.

  • Doing group work: using a white board or sitting on a nearby chair.

  • Having a meal with friends: eating a meal in a sitting posture, no restriction on meal content.

  • Relaxed talk with friends: sitting in a relaxed state on a couch and chat.

5 Evaluation of Laugh Detection Using Deep Learning

Twenty hours were measured in total during the experiment, consisting of scenes related to the six scenarios, for twenty minutes each and repeated ten times. Duration of laughers for each measurement is reported in Table 1.

Table 1. The duration of laughing time (20 min/ 1 test)

The aggregate laughing time in all measurements was 5,228 s, and the average time for one laughing episode was about 1.7 s. As in the pilot experiment, learning data was prepared to evaluate the laughter detection capability of deep learning. When processing results from this experiment, as pressure change accompanying body movement is measured on many occasions, two parameters are involved in deep learning, namely the transition in pressure value and the transition in inclination in the longitudinal direction of the body. Results point to an upward trend due to the passage of time and a change in pressure triggered by a change in posture including standing or sitting. Therefore, before learning, pressure data was preprocessed according to the procedure displayed in Fig. 6 and then normalized to a [0, 1] range. In step one, noise was eliminated by calculating the moving average per one second. In step two, processing was performed so that the average of the measured values before and after the increase in the inclination change at an interval of a second or more became the same. In step three, the moving average at every ten seconds was subtracted, and the rising trend was eliminated. In step four, the pressure increases that were not relevant to laugh detection were tagged as outliers and those in the top 1% of measurement results every twenty minutes were excluded using the Smirnov Grubbs test and the rest of the data was normalized to a [0, 1] range. In addition, the measured inclination could shift within a [−180, 180] degrees range, to shift to a [0, 360] degrees range and was then normalized to a [0, 1] range. An example of raw measurement data and a part of processed data are shown in Fig. 7. The top row shows the transition in pressure value, while the bottom row shows the transition in gradient value, and the shaded region indicates laughter.

Fig. 6.
figure 6

Steps of data processing

Fig. 7.
figure 7

Example of raw measurement data (left) and a part of processed data (right)

5.1 Creating Learning Data

Five-second data in a fifty-dimensional vector extracted every 0.1 s from the twenty minutes data during the experiment is used for learning as one sample. Those in which the mid value of the vector is included in the laugh are regarded as correct answers, and those for which no determination could be made as incorrect answers. In preliminary experiments, it was found that extremely short laughter was difficult to detect, so laughter that lasted under 0.8 s, which is half the average laughter duration was tagged as incorrect and not considered as laughter, and learning was performed. Measurement data from nine samples out of the six scenes repeated ten times in this experiment was used as learning data. Learning was conducted as 80% of the data set was used as training data and 20% as test data. We used data from one measurement across six scenes as verification data and evaluated whether laughter could be detected from twenty minutes data samples.

5.2 Evaluation of the Learning Method

With deep learning in preliminary experiments, learning was performed by randomly extracting a number of incorrect answers equal to the number of correct answers included in the learning data. However, in this experiment, there are variations in the motion included in the incorrect answer data. Therefore, we examine the required number and variations of incorrect answer data when extracting the incorrect data set.

5.3 Study on the Number of Learning Datapoints

The number of correct answer datapoints included in the learning data was 44,564. It is worth first mentioning that learning was conducted by randomly extracting a number of datapoints tied to incorrect answers consistent with the number of datapoints tied to correct answers. The accuracy of the classification for the verification data, after learning for 3,000 epochs and showing signs of near convergence, is reported in Table 2.

Table 2. Classification result of verification data

Subsequently, a number of incorrect answers datapoints four times larger than that of correct answer datapoints was randomly extracted and learned. The number of correct answers datapoints was 44,564, and the number of incorrect answers datapoints was 178,256. The accuracy of the classification result for the verification data after learning for 3,000 epochs where signs of near convergence were observed is reported in Table 3.

Table 3. Classification result of verification data increasing incorrect answer data set

Increasing the number of datapoints tied to incorrect answers to be learned from improved correct answer and relevance rate. As a result, it is considered that variations in incorrect answers were learned to some extent compared with the previous iteration. However, although the recall rate decreased, it appeared to stem from an ability to learn from datapoints tied to correct answers, which are difficult to distinguish from datapoints tied to incorrect answers. In addition, as there were many erroneous detections, it appeared that the learning across correct and incorrect answers was insufficient, and hence an improvement of the learning method was necessary. We also attempted learning by doubling the number of datapoints tied to incorrect answers and going through eight times more learning, but the accuracy was low. It appears to be because if the number of data points tied to incorrect answers is too low, variations of incorrect answers cannot be learned, and conversely if there is an excess of incorrect answers, the learning accuracy will be high to an extent. Therefore, we concluded that the adequate number of data points tied to incorrect answers is at four times that of datapoints tied to correct answers.

5.4 Study on the Extraction Method for Learning Data

From preliminary experiments, we found that some of the correct and incorrect answers are difficult to distinguish. In the preliminary experiments, a short laugh was regarded as incorrect, so some of the incorrect answers actually include laughter. For that reason, we excluded some of the datapoints tied to incorrect answers containing laughter. Subsequently, learning with arbitrarily extracted data with incorrect answers that is difficult to distinguish and learning with data in which correct answers are limited to those that can readily be distinguished are performed, and the accuracy is compared. “Learning by arbitrarily extracting incorrect answer data” extracts incorrect answer data by using the deviation from the average of the measured values every twenty minutes. When laughter occurs, the pressure value drops sharply, so the deviation from the mean tends to be larger for correct answer data. Therefore, half of the incorrect answer data was extracted from one measurement in descending order in terms of deviation from the average value, and the remaining half was extracted randomly. By contrast, when learning by arbitrarily extracting incorrect answer data, values larger than the deviation from average and when laughing had continued for more than a second were chosen as the correct answers. This serves to clarify the characteristics of correct answer data by restraining the set of correct answers to those that can be readily distinguished. Incorrect answer data was randomly extracted. It appeared that learning was near convergence on these learnings after 1,500 epochs. Table 4 reports the accuracy of classification results for the verification data at the time of learning. It is understood that accuracy is high when arbitrarily selecting correct answer data.

Moreover, the relevance rate became low when incorrect answer data was extracted arbitrarily. By using arbitrary extraction of incorrect answer data when learning, we expected to distinguish incorrect answers from laughter with a high accuracy rate. However, the conformity rate decreased, and it appeared difficult to find a difference in the current data set. When learning with arbitrarily extracted correct answers, one consideration is that the accuracy rate improved because correct answers that are difficult to distinguish from incorrect ones were removed.

Table 4. Classification result of verification data using arbitrary selected data set

5.5 Evaluation of Learning Result

There is a possibility that laughter linked to a modest pressure change could not be detected because the correct answer data set was skewed toward large pressure changes. Therefore, we use this model to analyze the correct and incorrect answers for one laugh, not the detection accuracy for each datapoint of the classification result. As it is known that a slight deviation occurs in detection results in the preliminary experiment and that short laughs cannot be detected, we assume that laughter that can be identified within 0.5 s before and after the laughing section can be detected. In addition, laughter lasting less than a second was labelled as incorrect answer. A part of the corrected classification results is shown in Fig. 8.

Fig. 8.
figure 8

Part of laugh classification result of 6 scenes

5.6 Discussion

Out of the 250 laughs included in the six scenes in the verification data, we were able to detect 75.6% of laughter out of 189 occurrences. By contrast, the number of false positives was 34.1%, 86 out of 252 occurrences. Regarding laughter episodes that could not be detected, duration was short, and often with a low divergence from average. This is thought to be due to narrowing down the correct answers to those that can readily be distinguished. However, as laughter causing large stomach moves can be detected, it can be considered that rough laughter caused by entertainment can generally be detected. As can be seen in the figure below, many erroneous detections occur during meals and shopping. It is difficult to distinguish between noise due to walking and laughter in data on shopping. In this learning we estimate the movement of the upper body from the inclination of the android device, but there is a possibility that considering only the inclination is insufficient to distinguish between noise and laughter during walking. It is a consideration that erroneous detection during a meal stems from unsatisfactory learning of body inclination. Regarding board games and meals, subjects are in both cases in a forward leaning posture, but the measured inclination is forward and backward. This is thought to be due to the height of the desk in the measurement environment. As shown in the figure below, as the desk is tilted, the body is straight, and as the pressure value increases, the android device also tilts forward. When the desk is low, the posture bends back and as the pressure value increases, the android device tilts backward. In the meal scene, the inclination in the backward direction was measured only once in the learning data. Therefore, it can be considered that the characteristics of pressure change due to the forward leaning posture cannot be learned satisfactorily, leading to erroneous detection.

6 Conclusion

In this study, we presented a wearable pressure measurement system using a textile sensor and conducted an experiment setting the stage for a situation where laughter occurs. We then tried to detect laughter by deep learning. As a result, it was shown that laughter could be detected approximately in measurement data linked to various movements in daily life. However, the existence of conditions conducive to erroneous detections, such as having a meal or walking, was also confirmed. To properly implement a life log of laughter in the future, erroneous detection must be reduced by improving the measurement method for inclination and strengthening the learning data. By raising the accuracy of laughter detection, we may be in a position to clarify the relationship between the occurrence of laughter, health, and well-being. Moreover, if the daily amount of laughter could be assessed objectively, one may expect it to be used toward a quantitative index of QOL, and it may also be regarded as a trigger to contemplate the self-reflection of life.