Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion

Behringer, Michael; Treder-Tschechlov, Dennis; Voggesberger, Julius; Hirmer, Pascal; Mitschang, Bernhard

doi:10.1007/978-3-031-64748-2_14

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 518))

Included in the following conference series:

International Conference on Enterprise Information Systems

127 Accesses

Abstract

Nowadays, data analysis is widely used in numerous areas to identify new trends, opportunities, or risks and to improve decision-making. In many cases, however, data analysis is only possible by incorporating specific domain knowledge, which is why domain experts need to be involved. To this end, data mashups are a popular tool for modeling tailored analyses. Yet, with today’s data volumes from heterogeneous source systems, it is very difficult to identify beneficial data sources, in particular for explorative data analysis. In this paper, we first define requirements aiming for user-centric analytics, followed by the introduction of SDRank, a deep-learning-based approach to identify beneficial data sources. In an extensive evaluation with three scenarios, we show that this approach offers high robustness concerning the training data used and can reliably identify beneficial data sources, even for previously unknown domains, i.e., transfer learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep learning applications and challenges in big data analytics

Article Open access 24 February 2015

Deep Learning Techniques in Big Data Analytics

Toward a prediction approach based on deep learning in Big Data analytics

Article 13 November 2022

Notes

1.
DBPedia: https://www.dbpedia-spotlight.org/.
2.
Mockaroo: https://www.mockaroo.com/.
3.
Download Databases: https://database-downloads.com/.
4.
The OpenSky Network: https://opensky-network.org/.
5.
DBpedia: https://dbpedia.org/.
6.
Keras: https://keras.io/.
7.
$P(\text {``select a correct data source, 1 draw''}) = \frac{\#\text {correct datasets}}{\#\text {all datasets}} = \frac{5}{20}=0.25$.
8.
$P(\text {``select at least one correct data source, 4 draws''})$
$= 1 - P(\text {``select only incorrect data sources''})$
$= 1 - (\frac{15}{20} * \frac{14}{19} * \frac{13}{18} * \frac{12}{17}) \approx 0.72$.
9.
$P(\text {``select a correct data source, 1 draw''}) = \frac{\#\text {correct datasets}}{\#\text {all datasets}} = \frac{5}{25}=0.20$.
10.
$P(\text {``select at least one correct data source, 4 draws''})$
$= 1 - P(\text {``select only incorrect data sources''})$
$= 1 - (\frac{20}{25} * \frac{19}{24} * \frac{18}{23} * \frac{17}{22}) \approx 0.62$.

References

Ayala, D., Hernández, I., Ruiz, D., Rahm, E.: LEAPME: learning-based property matching with embeddings. Data Knowl. Eng. 137 (2022). https://doi.org/10.1016/j.datak.2021.101943
Behringer, M., Hirmer, P., Mitschang, B.: Towards interactive data processing and analytics – putting the human in the center of the loop. In: ICEIS 2017 - Proceedings of the 19th International Conference on Enterprise Information Systems, vol. 3 (2017). https://doi.org/10.5220/0006326300870096
Behringer, M., Hirmer, P., Mitschang, B.: A human-centered approach for interactive data processing and analytics. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds.) ICEIS 2017. LNBIP, vol. 321, pp. 498–514. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93375-7_23
Chapter Google Scholar
Behringer, M., Treder-Tschechlov, D., Voggesberger, J., Hirmer, P., Mitschang, B.: SDRank - a deep learning approach for similarity ranking of data sources to support user-centric data analysis. In: Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023), Prague, Czech Republic, 24–26 April 2023, pp. 419–428. SciTePress, Setúbal (2023). https://doi.org/10.5220/0011998300003467
Bernstein, P.A., et al.: Generic schema matching, ten years later. VLDB Endow. 4(11), 695–701 (2011)
Article Google Scholar
Craw, S.: Manhattan distance. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 790–791. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1_511
Chapter Google Scholar
Daniel, F., Matera, M.: Mashups - Concepts, Models and Architectures. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55049-2
Book Google Scholar
Endert, A., et al.: The human is the loop: new directions for visual analytics. J. Intell. Inf. Syst. 43(3), 411–435 (2014)
Article Google Scholar
Hallur, G.G., Prabhu, S., Aslekar, A.: Entertainment in era of AI, big data & IoT. In: Das, S., Gochhait, S. (eds.) Digital Entertainment, pp. 87–109. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9724-4_5
Chapter Google Scholar
Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)
Article MathSciNet Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques (2012)
Google Scholar
Henke, N., et al.: The age of analytics: competing in a data-driven world. Technical report (2016). https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world
Jesse, N.: Data strategy and data trust - drivers for business development. IFAC-PapersOnLine 54(13), 8–12 (2021). https://doi.org/10.1016/j.ifacol.2021.10.409
Article Google Scholar
Keim, D.A., Kohlhammer, J., Mansmann, F., May, T., Wanner, F.: Visual analytics. In: Mastering the Information Age - Solving Problems with Visual Analytics, chap. 2, pp. 7–18. Eurographics Association, Goslar (2010)
Google Scholar
Krause, E.F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover Publications, Inc. (1975). https://cds.cern.ch/record/1547746
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1–2), 1338–1347 (2010). https://doi.org/10.14778/1920841.1921005
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781
Nielsen, J.: Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Book Google Scholar
O’Neill, B.: Elementary Differential Geometry. Academic Press (2006). https://doi.org/10.1016/B978-0-12-088735-4.50006-7
Quigley, E., et al.: “Data is the new oil’’: citizen science and informed consent in an era of researchers handling of an economically valuable resource. Life Sci. Soc. Policy 17(1), 9 (2021). https://doi.org/10.1186/s40504-021-00118-6
Article Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article Google Scholar
Reinsel, D., Gantz, J., Rydning, J.: Data age 2025: the digitization of the world. Technical report (2018)
Google Scholar
Rekatsinas, T., et al.: Finding quality in quantity: the challenge of discovering valuable sources for integration. In: Proceedings of CIDR 2015 (2015)
Google Scholar
Ristevski, B., Chen, M.: Big data analytics in medicine and healthcare. J. Integr. Bioinform. 15(3) (2018). https://doi.org/10.1515/jib-2017-0030
Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953). https://doi.org/10.1007/BF02289263
Article Google Scholar
Wagner, M.: Integrating explicit knowledge in the visual analytics process. In: Doctoral Consortium on Computer Vision, Imaging and Computer Graphics Theory and Applications (DCVISIGRAPP 2015). SCITEPRESS Digital Library, Berlin (2015)
Google Scholar
Ware, C.: Information Visualization: Perception for Design, 4 edn. Morgan Kaufmann Publishers Inc. (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Parallel and Distributed Systems, University of Stuttgart, Universitätsstr. 38, 70569, Stuttgart, Germany
Michael Behringer, Dennis Treder-Tschechlov, Julius Voggesberger, Pascal Hirmer & Bernhard Mitschang

Authors

Michael Behringer
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Treder-Tschechlov
View author publications
You can also search for this author in PubMed Google Scholar
Julius Voggesberger
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Hirmer
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Mitschang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Behringer .

Editor information

Editors and Affiliations

Polytechnic Institute of Setúbal, Setúbal, Portugal
Joaquim Filipe
Warsaw University of Technology, Warszawa, Poland
Michał Śmiałek
George Mason University, Fairfax, VA, USA
Alexander Brodsky
ESEO, Angers, France
Slimane Hammoudi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Behringer, M., Treder-Tschechlov, D., Voggesberger, J., Hirmer, P., Mitschang, B. (2024). Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2023. Lecture Notes in Business Information Processing, vol 518. Springer, Cham. https://doi.org/10.1007/978-3-031-64748-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-64748-2_14
Published: 26 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64747-5
Online ISBN: 978-3-031-64748-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning applications and challenges in big data analytics

Deep Learning Techniques in Big Data Analytics

Toward a prediction approach based on deep learning in Big Data analytics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Connecting Domain Experts and Data: Enriching User-Centric Data Analysis with Neural Network-Aided Data Source Suggestion

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning applications and challenges in big data analytics

Deep Learning Techniques in Big Data Analytics

Toward a prediction approach based on deep learning in Big Data analytics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation