research-article

Open access

DeepSQA: Understanding Sensor Data via Question Answering

Authors:

Federico Cerutti,

Mani SrivastavaAuthors Info & Claims

IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation

Pages 106 - 118

https://doi.org/10.1145/3450268.3453529

Published: 18 May 2021 Publication History

Abstract

The ubiquity of mobile, wearable, and IoT devices enhances humans with a network of environmental sensors. These devices capture raw, time-series measurements of scalar physical phenomena. To transform the data into human-digestible representations, deep learning methods have enabled high-level interpretations of the opaque raw sensory data. However, interfacing models with humans requires flexibility to support the vast database of human inquiries about sensor data. Deep learning models are usually trained to perform fixed tasks, limiting the inference outputs to a predefined set of high-level labels.

To enable flexible inference, we introduce DeepSQA, a generalized Sensory Question Answering (SQA) framework that aims to enable natural language questions about raw sensory data in distributed and heterogeneous IoT networks. Given a sensory data context and a natural language question about the data, the task is to provide an accurate natural language answer. In addition to the DeepSQA, we create SQA-Gen, a software framework for generating SQA datasets using labeled source sensory data, and also generate OppQA with SQA-Gen for benchmarking different SQA models. We evaluate DeepSQA across several state-of-the-art QA models and lay the foundation and challenges for future SQA research. We further provide open-source implementations of the framework, the dataset generation tool, and access to the generated dataset, to help facilitate research on the SQA problem.

References

[1]

Jerome Abdelnour, Giampiero Salvi, and Jean Rouat. 2018. CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning. arXiv preprint arXiv:1811.10561 (2018).

[2]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077--6086.

[3]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision. 2425--2433.

Digital Library

[4]

Pouya Bashivan, Irina Rish, Mohammed Yeasin, and Noel Codella. 2015. Learning representations from EEG with deep recurrent-convolutional neural networks. arXiv preprint arXiv:1511.06448 (2015).

[5]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1533--1544.

[6]

Ella Browning, Mark Bolton, Ellie Owen, Akiko Shoji, Tim Guilford, and Robin Freeman. 2018. Predicting animal behaviour using deep learning: GPS data alone accurately predict diving in seabirds. Methods in Ecology and Evolution 9, 3 (2018), 681--692.

[7]

Diane Cook, Kyle D Feuz, and Narayanan C Krishnan. 2013. Transfer learning for activity recognition: A survey. Knowledge and information systems 36, 3 (2013), 537--556.

Digital Library

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[10]

Nils Y Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016).

Digital Library

[11]

Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TAPAS: Weakly Supervised Table Parsing via Pre-training. arXiv preprint arXiv:2004.02349 (2020).

[12]

Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. 2017. Learning to reason: End-to-end module networks for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision. 804--813.

[13]

Drew A Hudson and Christopher D Manning. 2018. Compositional attention networks for machine reasoning. arXiv preprint arXiv: 1803.03067 (2018).

[14]

Drew A Hudson and Christopher D Manning. 2019. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6700--6709.

[15]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.

[16]

Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, and Gunhee Kim. 2017. Tgif-qa: Toward spatio-temporal reasoning in visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2758--2766.

[17]

Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, and Alexander Hauptmann. 2017. Memexqa: Visual memex question answering. arXiv preprint arXiv:1708.01336 (2017).

[18]

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910.

[19]

Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Judy Hoffman, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Inferring and executing programs for visual reasoning. In Proceedings of the IEEE International Conference on Computer Vision. 2989--2998.

[20]

Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648--5656.

[21]

Kushal Kafle, Robik Shrestha, Scott Cohen, Brian Price, and Christopher Kanan. 2020. Answering questions about data visualizations using efficient bimodal fusion. In The IEEE Winter Conference on Applications of Computer Vision. 1498--1507.

[22]

Vahid Kazemi and Ali Elqursh. 2017. Show, ask, attend, and answer: A strong baseline for visual question answering. arXiv preprint arXiv:1704.03162 (2017).

[23]

Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials 15, 3 (2012), 1192--1209.

[24]

Jie Lei, Licheng Yu, Mohit Bansal, and Tamara L Berg. 2018. Tvqa: Localized, compositional video question answering. arXiv preprint arXiv:1809.01696 (2018).

[25]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[26]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[27]

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2017. Film: Visual reasoning with a general conditioning layer. arXiv preprint arXiv:1709.07871 (2017).

[28]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).

[29]

Hesam Sagha, Sundara Tejaswi Digumarti, José del R Millán, Ricardo Chavarriaga, Alberto Calatroni, Daniel Roggen, and Gerhard Tröster. 2011. Benchmarking classification techniques using the Opportunity human activity dataset. In 2011 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 36--40.

[30]

Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. 2016. Complex human activity recognition using smartphone and wrist-worn motion sensors. Sensors 16, 4 (2016), 426.

[31]

Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2016. Movieqa: Understanding stories in movies through question-answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4631--4640.

[32]

Yonatan Vaizman, Katherine Ellis, Gert Lanckriet, and Nadir Weibel. 2018. Extrasensory app: Data collection in-the-wild with rich user interface to self-report behavior. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.

Digital Library

[33]

Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119 (2019), 3--11.

Digital Library

[34]

Wei Wang, Alex X Liu, Muhammad Shahzad, Kang Ling, and Sanglu Lu. 2017. Device-free human activity recognition using commercial WiFi devices. IEEE Journal on Selected Areas in Communications 35, 5 (2017), 1118--1131.

Digital Library

[35]

Xiao-Wei Wang, Dan Nie, and Bao-Liang Lu. 2014. Emotional state classification from EEG data using machine learning approach. Neurocomputing 129 (2014), 94--106.

Digital Library

[36]

Tianwei Xing, Luis Garcia, Marc Roig Vilamala, Federico Cerutti, Lance Kaplan, Alun Preece, and Mani Srivastava. 2020. Neuroplex: learning to detect complex events in sensor networks through knowledge injection. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 489--502.

Digital Library

[37]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 21--29.

[38]

Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum. 2018. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding. In Advances in neural information processing systems. 1031--1042.

[39]

Yu Zheng, Like Liu, Longhao Wang, and Xing Xie. 2008. Learning transportation mode from raw gps data for geographic applications on the web. In Proceedings of the 17th international conference on World Wide Web. 247--256.

Digital Library

Cited By

Zhang XChowdhury RGupta RShang JLarson K(2024)Large language models for time seriesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/921(8335-8343)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/921
Gallo SPaterno FMalizia A(2023)Conversational Interfaces in IoT Ecosystems: Where We Are, What Is Still MissingProceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia10.1145/3626705.3627775(279-293)Online publication date: 3-Dec-2023
https://dl.acm.org/doi/10.1145/3626705.3627775

Index Terms

DeepSQA: Understanding Sensor Data via Question Answering

Recommendations

SIPs: solar irradiance prediction system
IPSN '14: Proceedings of the 13th international symposium on Information processing in sensor networks

There is high interest in up-scaling capacities of renewable energy sources such as wind and solar. However, variability and uncertainty in power output is a major concern and forecasting is, therefore, a top priority. Advancements in forecasting can ...
Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely ...
A Hereditary Attentive Template-based Approach for Complex Knowledge Base Question Answering Systems
Abstract
Knowledge Base Question Answering systems (KBQA) aim to find answers to natural language questions over a knowledge base. This work presents a template matching approach for Complex KBQA systems (C-KBQA) using the combination of Semantic Parsing ...
Highlights
- Knowledge base question answering systems struggle to deal with complex questions.
- A template matching approach for complex question answering.
- A Hereditary Attention that inherits the information in a bottom-up way.
- ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation

May 2021

288 pages

ISBN:9781450383547

DOI:10.1145/3450268

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

U.K. Ministry of Defence under Agreement
U.S. Army Research Laboratory

Conference

IoTDI '21

Sponsor:

SIGBED

IoTDI '21: International Conference on Internet-of-Things Design and Implementation

May 18 - 21, 2021

VA, Charlottesvle, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
972
Total Downloads

Downloads (Last 12 months)357
Downloads (Last 6 weeks)40

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XChowdhury RGupta RShang JLarson K(2024)Large language models for time seriesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/921(8335-8343)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/921
Gallo SPaterno FMalizia A(2023)Conversational Interfaces in IoT Ecosystems: Where We Are, What Is Still MissingProceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia10.1145/3626705.3627775(279-293)Online publication date: 3-Dec-2023
https://dl.acm.org/doi/10.1145/3626705.3627775

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten