Skip to main content

Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2023)

Abstract

Nowadays, data analysis is widely used in numerous areas to identify new trends, opportunities, or risks and to improve decision-making. In many cases, however, data analysis is only possible by incorporating specific domain knowledge, which is why domain experts need to be involved. To this end, data mashups are a popular tool for modeling tailored analyses. Yet, with today’s data volumes from heterogeneous source systems, it is very difficult to identify beneficial data sources, in particular for explorative data analysis. In this paper, we first define requirements aiming for user-centric analytics, followed by the introduction of SDRank, a deep-learning-based approach to identify beneficial data sources. In an extensive evaluation with three scenarios, we show that this approach offers high robustness concerning the training data used and can reliably identify beneficial data sources, even for previously unknown domains, i.e., transfer learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    DBPedia: https://www.dbpedia-spotlight.org/.

  2. 2.

    Mockaroo: https://www.mockaroo.com/.

  3. 3.

    Download Databases: https://database-downloads.com/.

  4. 4.

    The OpenSky Network: https://opensky-network.org/.

  5. 5.

    DBpedia: https://dbpedia.org/.

  6. 6.

    Keras: https://keras.io/.

  7. 7.

    \(P(\text {``select a correct data source, 1 draw''}) = \frac{\#\text {correct datasets}}{\#\text {all datasets}} = \frac{5}{20}=0.25\).

  8. 8.

    \(P(\text {``select at least one correct data source, 4 draws''})\)

    \(= 1 - P(\text {``select only incorrect data sources''})\)

    \(= 1 - (\frac{15}{20} * \frac{14}{19} * \frac{13}{18} * \frac{12}{17}) \approx 0.72\).

  9. 9.

    \(P(\text {``select a correct data source, 1 draw''}) = \frac{\#\text {correct datasets}}{\#\text {all datasets}} = \frac{5}{25}=0.20\).

  10. 10.

    \(P(\text {``select at least one correct data source, 4 draws''})\)

    \(= 1 - P(\text {``select only incorrect data sources''})\)

    \(= 1 - (\frac{20}{25} * \frac{19}{24} * \frac{18}{23} * \frac{17}{22}) \approx 0.62\).

References

  1. Ayala, D., Hernández, I., Ruiz, D., Rahm, E.: LEAPME: learning-based property matching with embeddings. Data Knowl. Eng. 137 (2022). https://doi.org/10.1016/j.datak.2021.101943

  2. Behringer, M., Hirmer, P., Mitschang, B.: Towards interactive data processing and analytics – putting the human in the center of the loop. In: ICEIS 2017 - Proceedings of the 19th International Conference on Enterprise Information Systems, vol. 3 (2017). https://doi.org/10.5220/0006326300870096

  3. Behringer, M., Hirmer, P., Mitschang, B.: A human-centered approach for interactive data processing and analytics. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds.) ICEIS 2017. LNBIP, vol. 321, pp. 498–514. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93375-7_23

    Chapter  Google Scholar 

  4. Behringer, M., Treder-Tschechlov, D., Voggesberger, J., Hirmer, P., Mitschang, B.: SDRank - a deep learning approach for similarity ranking of data sources to support user-centric data analysis. In: Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023), Prague, Czech Republic, 24–26 April 2023, pp. 419–428. SciTePress, Setúbal (2023). https://doi.org/10.5220/0011998300003467

  5. Bernstein, P.A., et al.: Generic schema matching, ten years later. VLDB Endow. 4(11), 695–701 (2011)

    Article  Google Scholar 

  6. Craw, S.: Manhattan distance. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 790–791. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1_511

    Chapter  Google Scholar 

  7. Daniel, F., Matera, M.: Mashups - Concepts, Models and Architectures. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55049-2

    Book  Google Scholar 

  8. Endert, A., et al.: The human is the loop: new directions for visual analytics. J. Intell. Inf. Syst. 43(3), 411–435 (2014)

    Article  Google Scholar 

  9. Hallur, G.G., Prabhu, S., Aslekar, A.: Entertainment in era of AI, big data & IoT. In: Das, S., Gochhait, S. (eds.) Digital Entertainment, pp. 87–109. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9724-4_5

    Chapter  Google Scholar 

  10. Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)

    Article  MathSciNet  Google Scholar 

  11. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques (2012)

    Google Scholar 

  12. Henke, N., et al.: The age of analytics: competing in a data-driven world. Technical report (2016). https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world

  13. Jesse, N.: Data strategy and data trust - drivers for business development. IFAC-PapersOnLine 54(13), 8–12 (2021). https://doi.org/10.1016/j.ifacol.2021.10.409

    Article  Google Scholar 

  14. Keim, D.A., Kohlhammer, J., Mansmann, F., May, T., Wanner, F.: Visual analytics. In: Mastering the Information Age - Solving Problems with Visual Analytics, chap. 2, pp. 7–18. Eurographics Association, Goslar (2010)

    Google Scholar 

  15. Krause, E.F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover Publications, Inc. (1975). https://cds.cern.ch/record/1547746

  16. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  17. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1–2), 1338–1347 (2010). https://doi.org/10.14778/1920841.1921005

    Article  Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781

  19. Nielsen, J.: Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Book  Google Scholar 

  20. O’Neill, B.: Elementary Differential Geometry. Academic Press (2006). https://doi.org/10.1016/B978-0-12-088735-4.50006-7

  21. Quigley, E., et al.: “Data is the new oil’’: citizen science and informed consent in an era of researchers handling of an economically valuable resource. Life Sci. Soc. Policy 17(1), 9 (2021). https://doi.org/10.1186/s40504-021-00118-6

    Article  Google Scholar 

  22. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  Google Scholar 

  23. Reinsel, D., Gantz, J., Rydning, J.: Data age 2025: the digitization of the world. Technical report (2018)

    Google Scholar 

  24. Rekatsinas, T., et al.: Finding quality in quantity: the challenge of discovering valuable sources for integration. In: Proceedings of CIDR 2015 (2015)

    Google Scholar 

  25. Ristevski, B., Chen, M.: Big data analytics in medicine and healthcare. J. Integr. Bioinform. 15(3) (2018). https://doi.org/10.1515/jib-2017-0030

  26. Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953). https://doi.org/10.1007/BF02289263

    Article  Google Scholar 

  27. Wagner, M.: Integrating explicit knowledge in the visual analytics process. In: Doctoral Consortium on Computer Vision, Imaging and Computer Graphics Theory and Applications (DCVISIGRAPP 2015). SCITEPRESS Digital Library, Berlin (2015)

    Google Scholar 

  28. Ware, C.: Information Visualization: Perception for Design, 4 edn. Morgan Kaufmann Publishers Inc. (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Behringer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Behringer, M., Treder-Tschechlov, D., Voggesberger, J., Hirmer, P., Mitschang, B. (2024). Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2023. Lecture Notes in Business Information Processing, vol 518. Springer, Cham. https://doi.org/10.1007/978-3-031-64748-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64748-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64747-5

  • Online ISBN: 978-3-031-64748-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics