Skip to main content

Human Learning in Data Science

  • Conference paper
  • First Online:
  • 1992 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1033))

Abstract

As machine learning becomes a more and more important area in Data Science, bringing with it a rise of abstractness and complexity, the desire for explainability rises, too. With our work we aim to gain explainability focussing on correlation clustering and try to pursue the original goals of different Data Science tasks,: Extracting knowledge from data. As well-known tools like Fold-It or GeoTime show, gamification is a very mighty approach, but not only to solve tasks which prove more difficult for machines than for humans. We could also gain knowledge from how players proceed trying to solve those difficult tasks. That is why we developed Straighten it up!, a game in which users try to find the best linear correlations in high dimensional datasets. Finding arbitrarily oriented subspaces in high dimensional data is an exponentially complex task due to the number of potential subspaces in regards to the number of dimensions. Nevertheless, linearly correlated points are as a simple pattern easy to track by the human eye. Straighten it up! gives users an overview over two-dimensional projections of a self-chosen dataset. Users decide which subspace they want to examine first, and can draw in arbitrarily many lines fitting the data. An offset inside of which points are assigned to the corresponding line can easily be chosen for every line independently, and users can switch between different projections at any time. We developed a scoring system not only as incentive, but first of all for further examination, based on the density of each cluster, its minimum spanning tree, size of offset, and coverage. By tracking every step of a user we are able to detect common mechanisms and examine differences to state-of-the-art correlation and subspace clustering algorithms, resulting in more comprehensibility.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cooper, S., et al.: Predicting protein structures with a multiplayer online game. Nature 466(7307), 756 (2010)

    Article  Google Scholar 

  2. Khatib, F., et al.: Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat. Struct. Mol. Biol. 18(10), 1175 (2011)

    Article  Google Scholar 

  3. Kawrykow, A., et al.: Phylo: a citizen science approach for improving multiple sequence alignment. PLoS ONE 7(3), e31362 (2012)

    Article  Google Scholar 

  4. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  5. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  6. Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  7. Kriegel, H.P., Kröger, P., Zimek, A.: Subspace clustering. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(4), 351–364 (2012)

    Google Scholar 

  8. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 455–466. ACM (2004)

    Google Scholar 

  9. Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min. ASA Data Sci. J. 1(3), 111–127 (2008)

    Article  MathSciNet  Google Scholar 

  10. Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: 18th International Conference on Scientific and Statistical Database Management (SSDBM 2006), pp. 119–128. IEEE (2006)

    Google Scholar 

  11. Kazempour, D., Mauder, M., Kröger, P., Seidl, T.: Detecting global hyperparaboloid correlated clusters based on Hough transform. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 31. ACM (2017)

    Google Scholar 

  12. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining gamification. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, pp. 9–15. ACM (2011)

    Google Scholar 

  13. Zichermann, G., Cunningham, C.: Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. O’Reilly Media Inc., Sebastopol (2011)

    Google Scholar 

  14. Hamari, J., Koivisto, J., Sarsa, H.: Does gamification work?–A literature review of empirical studies on gamification. In: 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 3025–3034. IEEE (2014)

    Google Scholar 

Download references

Acknowledgement

This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Anna Beer , Daniyal Kazempour , Marcel Baur or Thomas Seidl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beer, A., Kazempour, D., Baur, M., Seidl, T. (2019). Human Learning in Data Science. In: Stephanidis, C. (eds) HCI International 2019 - Posters. HCII 2019. Communications in Computer and Information Science, vol 1033. Springer, Cham. https://doi.org/10.1007/978-3-030-23528-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23528-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23527-7

  • Online ISBN: 978-3-030-23528-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics