Keywords

1 Introduction

Recently, wearable devices are important tools for recent data-driven information services: healthcare, medical care, and educational services. For example, smart watches (e.g., Fitbit [1], Apple Watch [2], etc.) can continuously sense users’ activities, model the users’ daily behaviors based on the sensing data to provide personalized information services. Currently, most of wearable devices are used for sensing physical statuses of users, e.g., body movements, heart rates, etc. and have often ignored to know background information of the behaviors, namely context information. However, to understand human behaviors more deeply, it is important to know context information, such as the surrounding environment and participating activities. If context information is available by using wearable devices, it is possible to create a new service based on deep understanding of the behaviors.

For example, we have developed a wearable telecare system by using smart watches and wireless biosensors [3]. The service enables family members to share physiological information of cared persons in a peer-to-peer manner. If the family members can get the context information of their cared persons, it can be useful to understand the meaning of physiological data.

In this paper, we describe how to extract auditory context information from ambient sound data sensed by smart watches. Although researches of mobile audio sensing using smartphones have been proposed [4, 5], researches using wearable devices to extract context information are few. As researches using wearable devices are still immature compared to smartphones, it is necessary to investigate the potential use of wearable sensing.

This paper is organized as follows: first, we show the prototype of our wearable ambient sound sensing system by using smart watches. Then, we describe an analysis of the sound data sensed by the system. As the first step of the study, we have applied non-negative matrix factorization (NMF) [6] and k-means clustering to unsupervised segmentation of time-series data. At last, based on the implementation and analysis, we discuss issues and future works of sensing and sharing context information by using wearable devices.

2 Wearable Ambient Sound Sensing System

We describe the prototype of our wearable ambient sound sensing system by using smart watches.

2.1 Wearable Device

The wearable ambient sound sensing system uses a commercially available smart watch: Polar M600 [7]. We have implemented the system as an application program on the Wear OS by Google™smartwatch OS.

The system has sensing and analyzing functions to extract auditory features from ambient sound data. A graphical user interface (GUI) to start/stop the sensing process is also implemented. The GUI using MPAndroidChart [8] shows the recording status of smart watch, such as an auditory feature vector. Currently, communication facilities to share the sensing data with other users are not implemented.

2.2 Auditory Sensing Data

Analyzing processes of the prototype system consist of three processes: sensing ambient sound, converting it to auditory feature vectors, and recording them.

In the sensing process, a microphone of the device senses ambient sound as 16kHz, 16bit, monaural audio format. Then in the converting process, the device immediately converts the data to audio feature vectors, Mel-Frequency Cepstrum Coefficients (MFCC), pitch (F0), and sound pressure (SPL). In the process, the system uses fast Fourier transform (FFT) to obtain the power spectrum of the signal, and calculates F0 and SPL. Then, to derive MFCC, it executes some steps transforming the spectrum so as to reflect human auditory characteristics (e.g., Mel-scale, vocal tract, etc.). We have implemented the sensing and converting processes by using an open source software: TarsosDSP [9]. In the recording process, it records the auditory data in its internal memory. We can retrieve the auditory data through a USB connection.

3 Extracting Context Information

Using our prototype system, we experimentally obtained auditory data of a user who come home from his office by bicycle. After obtaining the data, we have analyzed the data to investigate the feasibility of extracting context information. The auditory data has been analyzed offline.

In this study, we assume that the auditory contexts are latent, continuous periods characterized by auditory features. For example, if a user stayed at home until noon and then moved to a hospital by car, the home, car, and hospital are contexts of the user because they can be characterized by auditory features. As auditory data recorded with the system records is a multi-dimensional time-series data, it means that each context corresponds to a segment of the time-series classified by auditory features.

3.1 Segmentation of Multi-dimensional Time-Series Data

At the first step of the study, we have applied NMF and k-means clustering to segmentation of multi-dimensional time-series data.

Fig. 1.
figure 1

A heat map of averaged data every 30 s of feature vectors obtained by wearable sensing system.

Fig. 2.
figure 2

A heat map of the feature vectors heat map of the feature vectors derived by using non-negative matrix factorization.

The obtained data is 32 dimension time-series data. It represents MFCC, SPL, F0. The length is 48271. At first, we converted the data to the averaged data every 30 s, and normalized it so that it did not contain negative values. The normalized data is shown as a heat map in Fig. 1. In the heat map, the x-axis represents each feature of the data, the y-axis represents the time when the data was recorded. Each cell corresponds to a feature value at each time. Darker cells represent larger values and lighter cells represent smaller values.

As the data still contains artifacts which was caused by body movements. We applied NMF to factorize the data so as to extract important features and reduced the artifacts. In this experiment, the dimension of factorized data is 6. Figure 2 shows a heat map of the factorized data representing a change of the auditory features. The intensity of color indicates the degree of correlation with each factor.

NMF factorizes a matrix according to non-negative constraints. The bases tend to represent local features of the data, namely “parts.” As the parts can better correspond to intuitive notions by humans, it is useful to visualize and understand auditory contexts. The bases of other methods, such as principal component analysis and vector quantization, do not always correspond to the intuitive notions [10].

Then we classified each vector in the factorized data using k-means clustering (k = 4). The clustering was based on Euclidean distance of the 6 features of each period.

Figure 3 is a line graph that plotting the class labels, 0–3, assigned to each period along a time axis. We can see that consecutive periods tend to be classified into the same cluster. This means that we have succeeded in extracting temporally continuous acoustic features and segmenting the time-series data into four contextual periods. Additionally, the extracted features roughly correspond to actual contexts, such as riding a bicycle and eating at home.

Fig. 3.
figure 3

Transitions of the auditory contexts classified by k-means clustering.

4 Discussion and Future Work

This study is at the preliminary stage, so there are a lot of issues to realize the system that senses and shares auditory context information using the wearable devices.

An issue is when and where context information is extracted from the sensing data and how it is shared. The analysis in this study is performed offline on a personal computer. If we share the context information on wearable devices, we require to design the total architecture of the service system, which includes designing communication, data processing architecture, and user interactions. Designing efficient architecture with limited computation resources of wearable devices is future work.

Considering the privacy issue, the system is not designed to record raw sound data. So it is difficult to validate the correctness of extracted context information based on the raw data. To scale up the prototype to practical service, more sophisticated unsupervised (or sem-unsupervised) approaches should be required.

5 Conclusion

We have shown the prototype of our wearable ambient sound sensing system by using smart watches. We experimentally obtained ambient sound data sensed by the system and analyzed the data to extract context information. In the analysis, we formalized the context extraction process as an unsupervised segmentation of multi-dimensional time-series data and applied NMF and k-means clustering to the segmentation. We confirm that the periods segmented by the analysis roughly corresponded to actual contexts.