Skip to main content
Log in

A hierarchical parallel fusion framework for egocentric ADL recognition based on discernment frame partitioning and belief coarsening

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Recently, egocentric activity recognition has become a major research area in pattern recognition and artificial intelligence due to its high significance in potential applications in medical care, rehabilitation, smart home/office, etc. In this study, we develop a hierarchical parallel multimodal fusion framework for the recognition of egocentric activities in daily living (ADL). This framework uses the Dezert–Smarandache theory and is constructed around three modalities: location, motion and vision data from a wearable hybrid sensor system. The reciprocal distance and a trained support vector machine classifier are used to form the basic belief assignments (BBA) of location and motion. For vision data composed of egocentric photo streams, a well-trained convolutional neural network is utilized to produce a set of textual tags and the entropy-based statistics for these tags are used to construct the vision BBA. Discernment partitioning and belief coarsening theory are adopted for the hierarchical fusion of the three BBA functions from different ADL levels. Experimental results show that the recognition accuracy of the proposed fusion method was significantly higher than that of the methods based on single modality or modality combinations when our method was applied to real-life multimodal egocentric activity datasets. In addition, our method also achieved higher adaptability and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The eButton dataset (including both egocentric photo stream and IMU sensor data) used to support the findings of this study are available from the corresponding author upon request.

References

  • Aghaei M, Dimiccoli M, Ferrer CC, Radeva P (2018) Towards social pattern characterization in egocentric photo-streams. Comput Vis Image Underst 171:104–117. https://doi.org/10.1016/j.cviu.2018.05.001

    Article  Google Scholar 

  • Aksasse H, Aksasse B, Ouanan M (2019) Deep convolutional neural networks for human activity classification. In: International conference on information, communication and computing technology (ICICCT 2019), Istanbul City, Turkey, 30–31 October, pp 77–87

  • Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical human activity recognition using wearable sensors. Sensors 15:31314–31338. https://doi.org/10.3390/s151229858

    Article  Google Scholar 

  • Bano S, Suveges T, Zhang J, Mckenna SJ (2018) Multimodal egocentric analysis of focused interactions. IEEE Access 6:37493–37505. https://doi.org/10.1109/ACCESS.2018.2850284

    Article  Google Scholar 

  • Behera A, Hogg DC, Cohn AG (2012) Egocentric activity monitoring and recovery. In: 11th Asian conference on computer vision (ACCV 2012), Daejeon, Korea, November 5–9, 2012. Springer, pp 519–532

  • Cartas A, Luque J, Radeva P, Segura C, Dimiccoli M (2019) Seeing and hearing egocentric actions: how much can we learn? In: The IEEE international conference on computer vision workshop (ICCVW), Seoul, Korea, October 27–November 2, 2019

  • Cartas A, Marín J, Radeva P, Dimiccoli M (2017) Recognizing activities of daily living from egocentric images. In: Iberian conference on pattern recognition and image analysis, Faro, Portugal, June 20–23, 2017, pp 87–95. https://doi.org/10.1007/978-3-319-58838-4_10

  • Cartas A, Marín J, Radeva P, Dimiccoli M (2018) Batch-based activity recognition from egocentric photo-streams revisited. Pattern Anal Appl 21:953–965. https://doi.org/10.1007/s10044-018-0708-1

    Article  MathSciNet  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans intel Syst Technol (TIST) 2:27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  • Clarifai Clarifai API. https://www.clarifai.com/developer/. Accessed 25 Feb 2018

  • Cornacchia M, Ozcan K, Zheng Y, Velipasalar S (2017) A survey on activity detection and classification using wearable sensors. IEEE Sens J 17:386–403. https://doi.org/10.1109/JSEN.2016.2628346

    Article  Google Scholar 

  • Diete A, Stuckenschmidt H (2019) Fusing object information and inertial data for activity recognition. Sensors 2019:4119. https://doi.org/10.3390/s19194119

    Article  Google Scholar 

  • Dimiccoli M, Cartas A, Radeva P (2019) Activity recognition from visual lifelogs: state of the art and future challenges. In: Multimodal behavior analysis in the wild. Elesvier, pp 121–134. https://doi.org/10.1016/B978-0-12-814601-9.00017-1

  • Feng T, Timmermans HJP (2013) Transportation mode recognition using GPS and accelerometer data. Transport Res Part C Emerg Technol 37:118–130. https://doi.org/10.1016/j.trc.2013.09.014

    Article  Google Scholar 

  • Filios G, Nikoletseas S, Pavlopoulou C, Rapti M, Ziegler S (2015) Hierarchical algorithm for daily activity recognition via smartphone sensors. In: 2015 IEEE 2nd world forum on internet of things (WF-IoT), Milan, Italy, Dec. 14–16, 2015, pp 381–386

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  Google Scholar 

  • Hoshen Y, Peleg S (2016) An egocentric look at video photographer identity. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR 2016), Las Vegas, NV, United States, June 27–30, 2016. IEEE, pp 4284–4292

  • Jang J-SR (1993) Anfis: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23:665–685. https://doi.org/10.1109/21.256541

    Article  Google Scholar 

  • Jia W et al (2019) Automatic food detection in egocentric images using artificial intelligence technology. Public Health Nutr 22:1168–1179. https://doi.org/10.1017/S1368980018000538

    Article  Google Scholar 

  • Johnson J, Karpathy A, Fei-Fei L (2016) DenseCap: fully convolutional localization networks for dense captioning. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR 2016), Las Vegas, NV, USA, June 27–30, 2016, pp 4565–4574

  • Kapidis G, Poppe R, Dam Ev, Noldus L, Veltkamp R (2019) Multitask learning to improve egocentric action recognition. In: The IEEE international conference on computer vision workshops, Seoul, Korea, 27 October–2 November, 2019

  • Kazakos E, Nagrani A, Zisserman A, Damen D (2019) EPIC-fusion: audio-visual temporal binding for egocentric action recognition. In: The IEEE international conference on computer vision (ICCV), Seoul, Korea, October 27–November 2, 2019. pp 5492–5501

  • Keller JM, Gray MR, Givens JA (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15:580–585. https://doi.org/10.1109/TSMC.1985.6313426

    Article  Google Scholar 

  • Kerr J et al (2013) Using the SenseCam to improve classifications of sedentary behavior in free-living settings. Am J Prev Med 44:290–296. https://doi.org/10.1016/j.amepre.2012.11.004

    Article  Google Scholar 

  • Kwapisz JR, Weiss GM, Moore SA (2011) Activity recognition using cell phone accelerometers. ACM SIGKDD Explor Newsl 12:74–82. https://doi.org/10.1145/1964897.1964918

    Article  Google Scholar 

  • Lakshminarayana NN, Sankaran N, Setlur S, Govindaraju V (2019) Multimodal deep feature aggregation for facial action unit recognition using visible images and physiological signals. In: 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019), Lille, France, France, 14–18 May, 2019. IEEE, pp 1–4. https://doi.org/10.1109/FG.2019.8756629

  • Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutor 15:1192–1209. https://doi.org/10.1109/SURV.2012.110112.00192

    Article  Google Scholar 

  • Li Y, Ye Z, Rehg JM (2015) Delving into egocentric actions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR 2015), Boston, MA, USA, June 7–12, 2015. IEEE, pp 287–295

  • Li Z, Wei Z, Jia W, Sun M (2013) Daily life event segmentation for lifestyle evaluation based on multi-sensor data recorded by a wearable device. In: 35th annual international conference of the ieee engineering in medicine and biology society (EMBC 2013), Osaka, Japan, July 3–7, 2013. IEEE, pp 2858–2861

  • Oliveira-Barra G, Dimiccoli M, Radeva P (2017) Leveraging activity indexing for egocentric image retrieval. In: Iberian conference on pattern recognition and image analysis, Faro, Portugal, June 20–23, 2017. pp 295–303. https://doi.org/10.1007/978-3-319-58838-4_33

  • Oliver M, Schofield GM, Badland HM, Shepherd J (2010) Utility of accelerometer thresholds for classifying sitting in office workers. Prev Med 51:357–360. https://doi.org/10.1016/j.ypmed.2010.08.010

    Article  Google Scholar 

  • Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10:61–74

    Google Scholar 

  • Possas R, Caceres SP, Ramos F (2018) Egocentric activity recognition on a budget. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA, 17 December, 2018. IEEE, pp 5967–5976. https://doi.org/10.1109/CVPR.2018.00625

  • Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2:37–63

    Google Scholar 

  • Radu V, Tong C, Bhattacharya S, Lane ND, Mascolo C, Marina MK, Kawsar F (2018) Multimodal deep learning for activity and context recognition. Proc ACM Interact Mobile Wearable Ubiquitous Technol 1:1–27. https://doi.org/10.1145/3161174

    Article  Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620. https://doi.org/10.1145/361219.361220

    Article  MATH  Google Scholar 

  • Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Shafer G (1990) Perspectives on the theory and practice of belief functions. Int J Approx Reason 4:323–362. https://doi.org/10.1016/0888-613X(90)90012-Q

    Article  MathSciNet  MATH  Google Scholar 

  • Shafer G, Shenoy PP, Mellouli K (1987) Propagating belief functions in qualitative Markov trees. Int J Approx Reason 1:349–400. https://doi.org/10.1016/0888-613X(87)90024-7

    Article  MathSciNet  MATH  Google Scholar 

  • Shi L-F, Qiu C-X, Xin D-J, Liu G-X (2020) Gait recognition via random forests based on wearable inertial measurement unit. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01870-x

    Article  Google Scholar 

  • Singh S, Arora C, Jawahar CV (2016) First person action recognition using deep learned descriptors. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June 27–30, 2016. IEEE, pp 2620–2628. https://doi.org/10.1109/CVPR.2016.287

  • Smarandache F, Dezert J (2004) Advances and applications of DSmT for information fusion. American Research Press, Rehoboth

    MATH  Google Scholar 

  • Sudhakaran S, Escalera S, Lanz O (2019) LSTA: long short-term attention for egocentric action recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 15–20 June, 2019. IEEE, pp 9954–9963. https://doi.org/10.1109/CVPR.2019.01019

  • Sun M et al (2015) An exploratory study on a chest-worn computer for evaluation of diet, physical activity and lifestyle. J Healthc Eng 6:1–22. https://doi.org/10.1260/2040-2295.6.1.1

    Article  Google Scholar 

  • Surie D, Pederson T, Lagriffoul F, Janlert L-E, Sjölie D (2007) Activity recognition using an egocentric perspective of everyday objects. In: the 4th international conference on ubiquitous intelligence and computing, Hong Kong, China, July 11–13, 2007. Springer, pp 246–257

  • Wang L, Gu T, Tao X, Lu J (2012) A hierarchical approach to real-time activity recognition in body sensor networks. Pervasive Mobile Comput 8:115–130. https://doi.org/10.1016/j.pmcj.2010.12.001

    Article  Google Scholar 

  • Yang T-H, Wu C-H, Huang K-Y, Su M-H (2017) Coupled HMM-based Mmultimodal fusion for mood disorder detection through elicited audio–visual signals. J Ambient Intell Humaniz Comput 8:895–906. https://doi.org/10.1007/s12652-016-0395-y

    Article  Google Scholar 

  • Yu H, Jia W, Li Z, Gong F, Yuan D, Zhang H, Sun M (2019) A multisource fusion framework driven by user-defined knowledge for egocentric activity recognition. EURASIP J Adv Signal Process 14:11–23. https://doi.org/10.1186/s13634-019-0612-x

    Article  Google Scholar 

  • Zhan K, Faux S, Ramos F (2015) Multi-scale conditional random fields for first-person activity recognition on elders and disabled patients. Pervasive Mobile Comput 16:251–267. https://doi.org/10.1016/j.pmcj.2014.11.004

    Article  Google Scholar 

  • Zhang W, Huang Y, Yu W, Yang X, Wang W, Sang J (2019) Multimodal attribute and feature embedding for activity recognition. In: ACM multimedia Asia 2019 (MMAsia '19), Beijing, China, 16–18 December, 2019. vol 44. pp 1–7. https://doi.org/10.1145/3338533.3366592

Download references

Acknowledgments

The authors would like to acknowledge all the participants for their significant contributions to this research study, as well as Clarifai for providing online service.

Funding

This work was supported in part by the National Institutes of Health (NIH) (Grant nos. R01CA165255, R56DK113819) of the United States; the National Natural Science Foundation of China (Grant nos. 61601156, 61701146, 61871164); the Key Research and Development Program of Zhejiang Province (Grants no. 2020C03098); the Fundamental Research Funds for the Universities of Zhejiang Province (Grants no. GK199900299012-024)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingui Sun.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this appendix, we will show how the vision basic belief assignment (BBA) function \(m_{V} (\bullet)\) is transformed by BBA coarsening to the location and motion level to obtain the coarsened BBA functions \(m_{V}^{L} (\bullet)\) and \(m_{V}^{M} (\bullet)\).

In Sect. 4.4, we have given their discernment frames based on the most appropriate activity of daily living (ADL) classification modes for each of the three modalities, i.e., location, motion and vision. We rewrite them as

$$\left\{ \begin{gathered} \Theta_{L} = \{ {\text{"HM",}}\;{{\text{"WP",}}}\;{\text{"OP"}}\} \hfill \\ \Theta_{M} = \;\{ {\text{"LY",}}\;{\text{"SD",}}\;{{\text{"ST",}}}\;{{\text{"WK"}}}\} \hfill \\ \Theta_{V} = \,\{ {\text{"}}CN{\text{"}},\;{\text{"}}CU{\text{"}},\;{\text{"}}ET{\text{"}},\;{\text{"EM"}},\;{\text{"}}MT{\text{"}},\;{\text{"}}RD{\text{"}},\;{\text{"}}SP{\text{"}},\;{\text{"SL"}}, \hfill \\ \;\quad \quad \;\,{\text{"}}TK{\text{"}},\;{\text{"}}TU{\text{"}},\,{\text{"}}TP{\text{"}},\;{\text{"}}WO{\text{"}},\;{\text{"}}WU{\text{"}},\;{\text{"}}TV{\text{"}},\;{\text{"}}WT{\text{"}}\} \hfill \\ \end{gathered} \right.$$
(13)

where \(\Theta_{L}\), \(\Theta_{M}\), and \(\Theta_{V}\) represent the location discernment frame, motion frame, and vision frame, respectively. In terms of hierarchy, the activities in \(\Theta_{L}\) and \(\Theta_{M}\) are high-level activities, and the activities in \(\Theta_{V}\) can be regarded as low-level refinements of the activities in \(\Theta_{L}\) and \(\Theta_{M}\).

As some of the activities in \(\Theta_{V}\) are ambiguous when they are divided into high-level frames, they cannot meet the frame partitioning requirement in Eq. (4). For example, “computer use” can occur both at home and in the workplace, and “reading” can be performed both when sitting and standing. Therefore, when \(\Theta_{V}\) maps to \(\Theta_{L}\) or \(\Theta_{M}\), the ambiguous activities are further refined. The refinements \(\Omega_{V}^{L}\) and \(\Omega_{V}^{M}\) corresponding to \(\Theta_{L}\) and \(\Theta_{M}\) can be written as

$$\begin{gathered} \Omega_{V}^{L} = \{ {\text{"}}CN{{\text{",}}}\;{{\text{"}}}CU_{{{\text{HM}}}} {{\text{",}}}\;{{\text{"CU}}}_{{{\text{WP}}}} {{\text{",}}}\;{{\text{"}}}ET_{{{\text{HM}}}} {{\text{",}}}\;{{\text{"}}}ET_{{{\text{OP}}}} {\text{",}}\;{\text{"EM"}},\;{\text{"}}MT{\text{"}},\;{\text{"}}RD_{{{\text{HM}}}} {{\text{"}}},\;{\text{"}}RD_{{{\text{WP}}}} {{\text{"}}},\;{\text{"}}SP{\text{"}},\;{\text{"SL",}} \hfill \\ \quad \quad \quad {\text{"}}TK_{{{\text{HM}}}} {\text{",}}\;{\text{"}}TK_{{{\text{WP}}}} {\text{",}}\;{\text{"}}TU_{{{\text{HM}}}} {\text{",}}\;{\text{"}}TU_{{{\text{WP}}}} {\text{",}}\;{\text{"}}TU_{{{\text{OP}}}} {\text{",}}\;{\text{"}}TP{\text{",}}\;{\text{"}}WO{\text{",}}\;{\text{"}}WU_{{{\text{HM}}}} {\text{",}}\;{\text{"}}WU_{{{\text{OP}}}} {\text{",}}\;{\text{"}}TV{\text{",}} \hfill \\ \quad \quad \quad {\text{"}}WT_{{{\text{HM}}}} {{\text{"}}},\;{\text{"}}WT_{{{\text{WP}}}} {\text{"}}\} \hfill \\ \end{gathered}$$
(14)
$$\begin{gathered} \Omega_{V}^{M} = \{ {\text{"}}CN{\text{",}}\;{\text{"}}CU{\text{",}}\;{\text{"}}ET_{{{\text{SD}}}} {\text{",}}\;{\text{"}}ET_{{{\text{ST}}}} {\text{",}}\;{\text{"EM",}}\;{\text{"}}MT{\text{",}}\;{\text{"}}RD_{{{\text{SD}}}} {\text{",}}\;{\text{"}}RD_{{{\text{ST}}}} {\text{",}}\;{\text{"}}SP{\text{",}}\;{\text{"S}}L{\text{",}}\;{\text{"}}TK_{{{\text{SD}}}} {\text{",}}\; \hfill \\ \quad \quad \;\;\;{\text{"}}TK_{{{\text{ST}}}} {\text{",}}\;{\text{"}}TU_{{{\text{SD}}}} {\text{",}}\;{\text{"}}TU_{{{\text{ST}}}} {\text{",}}\;{\text{"}}TU_{{{\text{LY}}}} {\text{",}}\;{\text{"}}TP{\text{",}}\;{\text{"}}WO{\text{",}}\;{\text{"}}WU{\text{",}}\;{\text{"}}TV_{{{\text{SD}}}} {\text{",}}\;{\text{"}}TV_{{{\text{ST}}}} {\text{",}}\;{\text{"}}WT{\text{"}}\} \hfill \\ \end{gathered}$$
(15)

where the subscripts denote the refined activity; for example, CUHM and CUWP denote computer use at home and in the workplace, respectively, and RDSD and RDST denote reading when sitting and standing, respectively. After the refinement sets have been acquired, we can define the mappings \(\omega_{L \to V} :D^{{\Theta_{L} }} \to D^{{\Omega_{V}^{L} }}\) and \(\omega_{M \to V} :D^{{\Theta_{M} }} \to D^{{\Omega_{V}^{M} }}\) for partitioning the refinement frame into the high-level frame:

$$\left\{ \begin{gathered} \omega_{L \to V} (\{ {\text{"HM"}}\} ) = \{ {\text{"}}CN{\text{",}}\;{\text{"}}CU_{{\text{HM}}} {{\text{",}}}\;{\text{"}}ET_{\text{HM}} {\text{",}}\;{\text{"EM",}}\;{\text{"}}RD_{{\text{HM}}} {{\text{",}}}\;{\text{"SL",}}\;{\text{"}}TK_{{\text{HM}}} {{\text{",}}}\;{\text{"}}TU_{{\text{HM}}} {{\text{",}}}\;{\text{"WO}}_{{\text{HM}}} {{\text{",}}} \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;\;{\text{"}}WU_{{{\text{HM}}}} {{\text{",}}}\;{\text{"}}TV_{\text{HM}} {{\text{"}}}\} \hfill \\ \omega_{L \to V} (\{ {\text{"WP"}}\} ) = \{ {\text{"CU}}_{{\text{WP}}} {{\text{",}}}\;{\text{"ET}}_{{\text{WP}}} {{\text{",}}}\;{\text{"}}MT{\text{",}}\;{\text{"}}RD_{{{\text{WP}}}} {\text{",}}\;{\text{"}}TK_{{{\text{WP}}}} {\text{",}}\;{\text{"}}TU_{{{\text{WP}}}} {\text{",}}\;{\text{"WO}}_{{{\text{WP}}}} {\text{",}}\;{\text{"}}WT{\text{"}}\} \hfill \\ \omega_{L \to V} (\{ {\text{"OP"}}\} ) = \{ {\text{"}}ET_{{{\text{OP}}}} {\text{",}}\;{\text{"}}SP{\text{",}}\;{\text{"TK}}_{{{\text{OP}}}} {\text{",}}\;{\text{"}}TU_{{{\text{OP}}}} {\text{",}}\;{\text{"}}TP{\text{",}}\;{\text{"}}WO_{{{\text{OP}}}} {\text{",}}\;{\text{"}}WU_{{{\text{OP}}}} {\text{",}}\;{\text{"TV}}_{{{\text{OP}}}} {\text{"}}\} \hfill \\ \end{gathered} \right.$$
(16)
$$\left\{{\begin{array}{*{20}l}{\omega _{{M \to V}} (\{ {\text{"LY"}}\} ) = \{ {\text{"SL"}},\;{\text{"TU}}_{{{\text{LY}}}}{{\text{"}}} \} } \\ {\omega _{{M \to V}} (\{{\text{"SD"}}\} ) = \{ {\text{"}}CU{\text{"}},\;{\text{"}}ET_{{{\text{SD}}}}{\text{"}},\; {\text{"EM}}_{{{\text{SD}}}}{\text{"}},\; {\text{"MT"}},\;{\text{"}}RD_{{{\text{SD}}}}{\text{"}},\;{\text{"}}TK_{{{\text{SD}}}}{\text{"}},\;{\text{"}}TU_{{{\text{SD}}}}{\text{"}},\;{\text{"TP}}_{{{\text{SD}}}}{\text{"}} ,\;{\text{"}}TV{\text{"}},\;{\text{"}}WT{\text{"}}\} } \\ {\omega _{{M \to V}} (\{{\text{"ST"}}\} ) = \{ {\text{"CN"}},\; {\text{"ET}}_{{{\text{ST}}}}{\text{"}},\;{\text{"EM}}_{{{\text{ST}}}}{\text{"}},\;{\text{"}}RD_{{{\text{ST}}}}{\text{"}},\; {\text{"SP}}_{{{\text{ST}}}}{\text{"}}\;{\text{"}}TK_{{{\text{ST}}}}{\text{"}},\;{\text{"}}TU_{{{\text{ST}}}}{\text{"}},\;{\text{"TP}}_{{{\text{ST}}}}{\text{"}},\;{\text{"WU"}}\} } \\ {\omega _{{M \to V}} (\{ {\text{"WK"}}\} ) = \{ {\text{"SP}}_{{{\text{WK}}}}{\text{"}},\;{\text{"}}WO{\text{"}}\} } \\ \end{array} } \right.$$
(17)

According to (4), BBA coarsening can be performed on the basis of the frame partition mapping \(\omega\). Let \(m_{V}^{L} (\bullet)\) be a BBA function defined on frame \(\Omega_{V}^{L}\). Taking \(m_{V}^{L} ({\text{"}}HM{\text{"}})\) as an example, we substitute the frame partition mapping \(\omega_{L \to V} (\{ {\text{"HM"}}\} )\) into (4) to obtain

$${m}_{V}^{L}\left(\text{"HM"}\right)={\mathcal{m}}_{{\Theta }_{L}}({\omega }_{L \to V}(\{"\text{HM"}\}))=$$
$$\begin{gathered} \max \{ m_{V}^{L} {(}\beta {)}\;{|}\;\beta \subseteq \omega_{L \to V} (\{ {\text{"HM"}}\} )\;{\text{and}}\;\beta \in \Theta_{L}^{*} \} = \hfill \\ \max \{ m_{V}^{L} ({\text{"}}CN{\text{"}}),\;m_{V}^{L} ({\text{"}}CU_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"}}ET_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"EM"),}}\;m_{V}^{L} ({\text{"}}RD_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"SL"}}),\; \hfill \\ \quad \quad m_{V}^{L} ({\text{"}}TK_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"}}TU_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"WO}}_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"}}WU_{{{\text{HM}}}} {\text{"}}),\;m_{V}^{L} ({\text{"}}TV_{{{\text{HM}}}} {\text{"}})\} \hfill \\ \end{gathered}$$
(18)

As \(\Omega_{V}^{L}\) is obtained by refining some activities in \(\Theta_{V}\), \(\Theta_{V}\) should be a coarsening set of \(\Omega_{V}^{L}\); therefore, \(\Theta_{V}\) is compatible with \(\Omega_{V}^{L}\). According to the description in Sect. 3, their BBA functions are consistent, and therefore, unrefined activities such as CN have the following relations:

$$m_{V}^{L} ({\text{"CN"}}) = m_{V} ({\text{"CN"}})$$
(19)

Furthermore, for refined activities such as CUHM and CUWP, the coarsening properties of the BBA function mean that the following expression should be satisfied:

$${m}_{V}({\text{"CU"}})={m}_{V}^{L}\left({\text{"CU"}}\right)={\mathcal{m}}_{{\Theta }_{V}}(\{{\text{"CU}}_{{\text{HM}}}{{\text{"}}}, \, {\text{"CU}}_{{\text{WP}}}{{\text{"}}}\})=\text{max}\{{m}_{V}^{L}({\text{"CU}}_{{\text{HM}}}{{\text{"}}}), {m}_{V}^{L} ({\text{"CU}}_{{\text{WP}}}{{\text{"}}})\}$$
(20)

From the discussion in Sect. 4.3, we can see that the proposed vision-based activity recognition method does not distinguish the locations at which activities occur. In other words, the same activity occurring at different locations has the same belief assignment, that is,

$$m_{V}^{L} ({\text{"CU}}_{{{\text{HM}}}}{\text{"}}) = m_{V}^{L} ({\text{"CU}}_{{{\text{WP}}}}{\text{"}})$$
(21)

From (20), we obtain

$$m_{V}^{L} ({\text{"CU}}_{{{\text{HM}}}} {\text{"}}) = m_{V}^{L} ({\text{"CU}}_{{{\text{WP}}}} {\text{"}}) = m_{V} ({\text{"CU"}})$$
(22)

By treating other refined and unrefined activities similarly, (22) can be rewritten as

$$\begin{gathered} m_{V}^{L} ({\text{"HM"}}) = \max \{ m_{V} ({\text{"}}CN{\text{"}}),\;m_{V} ({\text{"}}CU{\text{"}}),\;m_{V} ({\text{"}}ET{\text{"}}),\;m_{V} ({\text{"EM"),}}\;m_{V} ({\text{"}}RD{\text{"}}),\;m_{V} ({\text{"SL"}}),\;m_{V} ({\text{"}}TK{\text{"}}),\; \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;\;m_{V} ({\text{"}}TU{\text{"}}),\;m_{V} ({\text{"WO"}}),\;m_{V} ({\text{"WU"}}),\;m_{V} ({\text{"TV"}})\} \hfill \\ \end{gathered}$$
(23)

Similarly, \(m_{V}^{L} ({\text{"WP"}})\) and \(m_{V}^{L} ({\text{"OP"}})\) are given by

$$\begin{gathered} m_{V}^{L} ({\text{"WP"}}) = \max \{ m_{V} ({\text{"CU"}}),\;m_{V} ({\text{"ET"}}),\;m_{V} ({\text{"}}MT{\text{"}}),\;m_{V} ({\text{"}}RD{\text{"}}),\;m_{V} ({\text{"}}TK{\text{"}}),\;m_{V} ({\text{"}}TU{\text{"}}), \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;m_{V} ({\text{"WO"}}),\;m_{V} ({\text{"}}WT{\text{"}})\} \hfill \\ m_{V}^{L} ({\text{"OP")}} = \max \{ m_{V} ({\text{"}}ET{\text{"}}),\;m_{V} ({\text{"}}SP{\text{"}}),\;m_{V} ({\text{"TK"}}),\;m_{V} ({\text{"}}TU{\text{"}}),\;m_{V} ({\text{"}}TP{\text{"}}),\;m_{V} ({\text{"}}WO{\text{"}}), \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;m_{V} ({\text{"}}WU{\text{"}}),\;m_{V} ({\text{"TV"}})\} \hfill \\ \end{gathered}$$
(24)

Equations (23) and (24) are the results of the coarsening of \(m_{V}^{L} ( \bullet )\). They indicate that by ignoring the difference among the belief assignments of the same activity at different locations, the belief assignment of the element of \(\Theta_{L}\) in \(\Omega_{V}^{L}\) is equal to the maximum belief assignment of all elements of \(\omega_{L \to V} ( \bullet )\) in \(m_{V} ( \bullet )\).

Let \(m_{V}^{M} ( \bullet )\) be the BBA function defined on frame \(\Omega_{V}^{M}\). By using almost the same method as for the coarsening of \(m_{V}^{L} ( \bullet )\), a similar conclusion can be drawn: the belief assignment of the element of \(\Theta_{V}\) in \(\Omega_{V}^{M}\) is equal to the maximum belief assignment of all the elements of \(\omega_{M \to V} ( \bullet )\) in \(m_{V} ( \bullet )\), that is,

$$\left\{ \begin{gathered} m_{V}^{M} ({\text{"LY"}}) = \max \{ m_{V} ({\text{"SL"}}),\;m_{V} ({\text{"}}TU{\text{"}})\} \hfill \\ m_{V}^{M} ({\text{"SD"}}) = \max \{ m_{V} ({\text{"}}CU{\text{"}}),\;m_{V} ({\text{"}}ET{\text{"}}),\;m_{V} ({\text{"EM"),}}\;m_{V} ({\text{"}}MT{\text{"}}),\;m_{V} ({\text{"}}RD{\text{"}}),\;m_{V} ({\text{"}}TK{\text{"}}),\; \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;\;m_{V} ({\text{"}}TU{\text{"}}),\;m_{V} ({\text{"}}TP{\text{"}}),\;m_{V} ({\text{"}}TV{\text{"}}),\;m_{V} ({\text{"}}WT{\text{"}})\} \hfill \\ m_{V}^{M} ({\text{"ST"}}) = \max \{ m_{V} ({\text{"}}CN{\text{"}}),\;m_{V} ({\text{"}}ET{\text{"}}),\;m_{V} ({\text{"EM"),}}\;m_{V} ({\text{"RD"}}),\;m_{V} ({\text{"}}SP{\text{"}}),\;m_{V} ({\text{"}}TK{\text{"}}), \hfill \\ \quad \quad \quad \quad \quad \quad \quad \;\,m_{V} ({\text{"}}TU{\text{"}}),\;m_{V} ({\text{"TP"}}),\;m_{V} ({\text{"}}WU{\text{"}})\} \hfill \\ m_{V}^{M} ({\text{"WK"}}) = \max \{ m_{V} ({\text{"}}SP{\text{"}}),\;m_{V} ({\text{"WO"}})\} \hfill \\ \end{gathered} \right.$$
(25)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, H., Jia, W., Zhang, L. et al. A hierarchical parallel fusion framework for egocentric ADL recognition based on discernment frame partitioning and belief coarsening. J Ambient Intell Human Comput 12, 1693–1715 (2021). https://doi.org/10.1007/s12652-020-02241-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02241-2

Keywords

Navigation