Abstract
Information from social media can be leveraged by social scientists to support effective decision making. However, such data sources are often characterised by high volumes and noisy information, therefore data analysis should be always preceded by a data preparation phase. Designing and testing data preparation pipelines requires considering requirements on cost, time, and quality of data extraction. In this work, we aim to propose a methodology for modeling crowd-enhanced data analysis pipelines using a goal-oriented approach, including both automatic and human-related tasks, by suggesting the kind of components to include, their order, and their parameters, while balancing the trade-off between cost, time, and quality of the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akkiraju, R., et al.: Characterizing machine learning processes: a maturity framework. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNCS, vol. 12168, pp. 17–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58666-9_2
Barozzi, S., Fernandez-Marquez, J.L., Shankar, A.R., Pernici, B.: Filtering images extracted from social media in the response phase of emergency events. In: Proceedings of ISCRAM (2019)
Berti-Équille, L.: Learn2Clean: optimizing the sequence of tasks for web data preparation. In: Proceedings of WWW Conference, pp. 2580–2586. ACM (2019)
Chang, W.L., Boyd, D., NBD-PWG NIST big data public working group: NIST big data interoperability framework: volume 6, big data reference architecture [version 2] (2019)
Fritz, S., et al.: Citizen science and the united nations sustainable development goals. Nat. Sustain. 2(10), 922–930 (2019)
Havas, C., et al.: E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17(12), 2766 (2017)
Iren, D., Bilgen, S.: Cost of quality in crowdsourcing. Hum. Comput. 1(2), 283–314 (2014)
Negri, V., et al.: Image-based social sensing: combining AI and the crowd to mine policy-adherence indicators from Twitter. In: ICSE, Track Software Engineering in Society, May 2021
Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data lifecycle challenges in production machine learning: a survey. SIGMOD Rec. 47(2), 17–28 (2018)
Purohit, H., Castillo, C., Imran, M., Pandey, R.: Ranking of social media alerts with workload bounds in emergency operation centers. In: Proceedings of Conference on Web Intelligence (WI), pp. 206–213. IEEE (2018)
Scheunemann, C., Naumann, J., Eichler, M., Stowe, K., Gurevych, I.: Data collection and annotation pipeline for social good projects. In: Proceedings of the AAAI Fall 2020 AI for Social Good Symposium (2020)
Stodden, V.: The data science life cycle: a disciplined approach to advancing data science as a science. Commun. ACM 63(7), 58–66 (2020)
Zahra, K., Imran, M., Ostermann, F.O.: Automatic identification of eyewitness messages on twitter during disasters. Inf. Process. Manag. 57(1), 102107 (2020)
Acknowledgements
This work was funded by the European Commission H2020 Project Crowd4SDG, #872944.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cappiello, C., Pernici, B., Vitali, M. (2021). Modeling Adaptive Data Analysis Pipelines for Crowd-Enhanced Processes. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-89022-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89021-6
Online ISBN: 978-3-030-89022-3
eBook Packages: Computer ScienceComputer Science (R0)