Abstract
Currently, open data and data sets are emerging in human activity recognition (HAR) due to their importance in different application areas such as improving people's lives, enabling informed care decisions, real-world problem solutions, and strategies for choosing the best HAR approaches. There are challenges associated with curating and sharing open data and data sets due to the absence of metadata and complete descriptions of the shared data. By properly curating data sets it will be easier to recognise, obtain and reuse to help make progress in HAR research. In this paper, we propose a conceptual framework for understanding the open data set lifecycle as consisting of four phases of construction, sharing, finding, and using. Similarly, open issues and challenges are explored related to HAR data sets from the published literature. On this basis, an approach is presented to automatically extract metadata through web scraping of the HAR data sets and then perform a natural language processing (NLP) pipeline to detect the metadata of data sets. As a result of metadata retrieval, we show how comparisons can be performed under different scenarios which can help evaluate data set quality and identify areas for improvement in data set curation. This research work will assist the HAR research community in better understanding the open data set lifecycle and how data set quality can be improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banos, O., et al.: mHealthDroid: a novel framework for agile development of mobile health applications. In: Pecchia, L., Chen, L.L., Nugent, C., Bravo, J. (eds.) IWAAL 2014. LNCS, vol. 8868, pp. 91–98. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13105-4_14
Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone accelerometers. ACM SIGKDD Explor. Newsl. 12(2), 74–82 (2011). https://doi.org/10.1145/1964897.1964918
Roggen, D., et al.: Collecting complex activity datasets in highly rich networked sensor environments. In: 2010 Seventh International Conference on Networked Sensing Systems (INSS), pp. 233–240 (2010). https://doi.org/10.1109/INSS.2010.5573462
Abdel-Salam, R., Mostafa, R., Hadhood, M.: Human activity recognition using wearable sensors: review, challenges, evaluation benchmark. In: Li, X., Min, W., Chen, Z., Zhang, L. (eds.) DL-HAR 2021. CCIS, vol. 1370, pp. 1–15. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0575-8_1
Chen, L., Nugent, C.: Ontology‐based activity recognition in intelligent pervasive environments. Int. J. Web Inf. Syst. 5(4), 410–430 (2009)
Chen, L., Nugent, C., Okeyo, G.: An ontology-based hybrid approach to activity modeling for smart homes. IEEE Trans Hum.-Mach. Syst. 44(1), 92–105 (2014). https://doi.org/10.1109/THMS.2013.2293714
The State of Open Data 2021. Digital Science (2021). https://www.digital-science.com/resource/the-state-of-open-data-2021/. Accessed 28 Mar 2022
Demrozi, F., Turetta, C., Pravadelli, G.: B-HAR: an open-source baseline framework for in depth study of human activity recognition datasets and workflows. ArXiv Prepr. arXiv:2101.10870 (2021)
Saddiqa, M., Magnussen, R., Larsen, B., Pedersen, J.M.: Open Data Interface (ODI) for secondary school education. Comput. Educ. 174, 104294 (2021)
Friberger, M.G., Togelius, J.: Generating game content from open data. In: Proceedings of the International Conference on the Foundations of Digital Games, New York, NY, USA, pp. 290–291, May 2012. https://doi.org/10.1145/2282338.2282404
Dunwell, I., Dixon, R., Bul, K.C., Hendrix, M., Kato, P.M., Ascolese, A.: Translating open data to educational minigames. In: 2016 11th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pp. 145–150, October 2016. https://doi.org/10.1109/SMAP.2016.7753400
Chiotaki, D., Karpouzis, K.: Open and cultural data games for learning. In: International Conference on the Foundations of Digital Games, New York, NY, USA, pp. 1–7, September 2020. https://doi.org/10.1145/3402942.3409621
Bouchabou, D., Lohr, C., Kanellos, I., Nguyen, S.M.: HAR in smart homes. ArXiv Prepr. arXiv:2112.11232 (2021)
Rafferty, J., Nugent, C., Liu, J., Chen, L.: Automatic metadata generation through analysis of narration within instructional videos. J. Med. Syst. 39(9), 1–7 (2015). https://doi.org/10.1007/s10916-015-0295-2
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. ArXiv Prepr. arXiv:1902.07669 (2019)
Watkins, H., Gray, R., Jha, A., Nachev, P.: An artificial intelligence natural language processing pipeline for information extraction in neuroradiology. ArXiv Prepr. arXiv:2107.10021 (2021)
Nasar, Z., Jaffry, S.W., Malik, M.K.: Information extraction from scientific articles: a survey. Scientometrics 117(3), 1931–1990 (2018). https://doi.org/10.1007/s11192-018-2921-5
Xia, C., et al.: Multi-grained named entity recognition. ArXiv Prepr. arXiv:1906.08449 (2019)
Stamper, J.C., et al.: Managing the educational dataset lifecycle with datashop. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS (LNAI), vol. 6738, pp. 557–559. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21869-9_100
Chen, K., Yao, L., Zhang, D., Wang, X., Chang, X., Nie, F.: A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 31(5), 1747–1756 (2019)
Gupta, S., Gupta, A.: Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019). https://doi.org/10.1016/j.procs.2019.11.146
Yu, S., Chen, H., Brown, R.A.: Hidden Markov model-based fall detection with motion sensor orientation calibration: a case for real-life home monitoring. IEEE J. Biomed. Health Inform. 22(6), 1847–1853 (2017)
Khaertdinov, B., Ghaleb, E., Asteriadis, S.: Deep triplet networks with attention for sensor-based human activity recognition. In: 2021 IEEE International Conference on Pervasive Computing and Communications (PerCom), pp. 1–10, March 2021. https://doi.org/10.1109/PERCOM50583.2021.9439116
Kwon, E., Park, H., Byon, S., Jung, E.S., Lee, Y.T.: HaaS (Human Activity Analytics as a Service) using sensor data of smart devices. In: 2018 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1500–1502 (2018)
Mekruksavanich, S., Jitpattanakul, A.: Recognition of real-life activities with smartphone sensors using deep learning approaches. In: 2021 IEEE 12th International Conference on Software Engineering and Service Science (ICSESS), pp. 243–246, August 2021. https://doi.org/10.1109/ICSESS52187.2021.9522231
Bacharidis, K., Argyros, A.: Improving deep learning approaches for human activity recognition based on natural language processing of action labels. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Keretna, S., Lim, C.P., Creighton, D.: A hybrid model for named entity recognition using unstructured medical text. In: 2014 9th International Conference on System of Systems Engineering (SOSE), pp. 85–90, June 2014. https://doi.org/10.1109/SYSOSE.2014.6892468
Kumar, K., Haider, M.U., Ahsan, S.S.: Ontology-based full-text searching using named entity recognition. In: Hura, G.S., Singh, A.K., Siong Hoe, L. (eds.) Advances in Communication and Computational Technology. LNEE, vol. 668, pp. 211–222. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5341-7_17
Riboni, D., Bettini, C.: OWL 2 modeling and reasoning with complex human activities. Pervasive Mob. Comput. 7(3), 379–395 (2011). https://doi.org/10.1016/j.pmcj.2011.02.001
McChesney, I., Nugent, C., Rafferty, J., Synnott, J.: Exploring an open data initiative ontology for shareable smart environment experimental datasets. In: Ochoa, S.F., Singh, P., Bravo, J. (eds.) UCAmI 2017. LNCS, vol. 10586, pp. 400–412. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67585-5_42
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alam, G., McChesney, I., Nicholl, P., Rafferty, J. (2023). An Approach to Extract and Compare Metadata of Human Activity Recognition (HAR) Data Sets. In: Bravo, J., Ochoa, S., Favela, J. (eds) Proceedings of the International Conference on Ubiquitous Computing & Ambient Intelligence (UCAmI 2022). UCAmI 2022. Lecture Notes in Networks and Systems, vol 594. Springer, Cham. https://doi.org/10.1007/978-3-031-21333-5_71
Download citation
DOI: https://doi.org/10.1007/978-3-031-21333-5_71
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21332-8
Online ISBN: 978-3-031-21333-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)