Skip to main content

Modeling Adaptive Data Analysis Pipelines for Crowd-Enhanced Processes

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2021)

Abstract

Information from social media can be leveraged by social scientists to support effective decision making. However, such data sources are often characterised by high volumes and noisy information, therefore data analysis should be always preceded by a data preparation phase. Designing and testing data preparation pipelines requires considering requirements on cost, time, and quality of data extraction. In this work, we aim to propose a methodology for modeling crowd-enhanced data analysis pipelines using a goal-oriented approach, including both automatic and human-related tasks, by suggesting the kind of components to include, their order, and their parameters, while balancing the trade-off between cost, time, and quality of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://medium.com/ai2-blog/crowdsourcing-pricing-ethics-and-best-practices-8487fd5c9872.

References

  1. Akkiraju, R., et al.: Characterizing machine learning processes: a maturity framework. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNCS, vol. 12168, pp. 17–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58666-9_2

    Chapter  Google Scholar 

  2. Barozzi, S., Fernandez-Marquez, J.L., Shankar, A.R., Pernici, B.: Filtering images extracted from social media in the response phase of emergency events. In: Proceedings of ISCRAM (2019)

    Google Scholar 

  3. Berti-Équille, L.: Learn2Clean: optimizing the sequence of tasks for web data preparation. In: Proceedings of WWW Conference, pp. 2580–2586. ACM (2019)

    Google Scholar 

  4. Chang, W.L., Boyd, D., NBD-PWG NIST big data public working group: NIST big data interoperability framework: volume 6, big data reference architecture [version 2] (2019)

    Google Scholar 

  5. Fritz, S., et al.: Citizen science and the united nations sustainable development goals. Nat. Sustain. 2(10), 922–930 (2019)

    Article  Google Scholar 

  6. Havas, C., et al.: E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17(12), 2766 (2017)

    Article  Google Scholar 

  7. Iren, D., Bilgen, S.: Cost of quality in crowdsourcing. Hum. Comput. 1(2), 283–314 (2014)

    Google Scholar 

  8. Negri, V., et al.: Image-based social sensing: combining AI and the crowd to mine policy-adherence indicators from Twitter. In: ICSE, Track Software Engineering in Society, May 2021

    Google Scholar 

  9. Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data lifecycle challenges in production machine learning: a survey. SIGMOD Rec. 47(2), 17–28 (2018)

    Article  Google Scholar 

  10. Purohit, H., Castillo, C., Imran, M., Pandey, R.: Ranking of social media alerts with workload bounds in emergency operation centers. In: Proceedings of Conference on Web Intelligence (WI), pp. 206–213. IEEE (2018)

    Google Scholar 

  11. Scheunemann, C., Naumann, J., Eichler, M., Stowe, K., Gurevych, I.: Data collection and annotation pipeline for social good projects. In: Proceedings of the AAAI Fall 2020 AI for Social Good Symposium (2020)

    Google Scholar 

  12. Stodden, V.: The data science life cycle: a disciplined approach to advancing data science as a science. Commun. ACM 63(7), 58–66 (2020)

    Article  Google Scholar 

  13. Zahra, K., Imran, M., Ostermann, F.O.: Automatic identification of eyewitness messages on twitter during disasters. Inf. Process. Manag. 57(1), 102107 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the European Commission H2020 Project Crowd4SDG, #872944.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monica Vitali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cappiello, C., Pernici, B., Vitali, M. (2021). Modeling Adaptive Data Analysis Pipelines for Crowd-Enhanced Processes. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89022-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89021-6

  • Online ISBN: 978-3-030-89022-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics