A Complete Data Science Work-flow For Insurance Field | IEEE Conference Publication | IEEE Xplore

A Complete Data Science Work-flow For Insurance Field


Abstract:

In recent years, "Big Data" has become a new ubiquitous term. Big Data is transforming science, engineering, medicine, health-care, finance, business, and ultimately our ...Show More

Abstract:

In recent years, "Big Data" has become a new ubiquitous term. Big Data is transforming science, engineering, medicine, health-care, finance, business, and ultimately our society itself. Learning from Big Data has become a significant challenge and requires development of new types of algorithms. Most machine learning algorithms can not easily scale up to Big Data. MapReduce is a simplified programming model for processing large datasets in a distributed and parallel manner. In this paper, we present our work carried in a big data project1 which is dedicated to the insurance sector. This allows us to validate our method on real-world data for insurance. We present the complete pipeline or work-flow going from data collection to visualization, passing by data fusion, data analysis, clustering, and prediction tasks. The insurance dataset is enriched with data collected from heterogeneous sources. A predictive and analysis system is proposed by combining the clustering result with decision trees. We use the topological approach, especially the SOM method, for its interest in being able to cluster and visualize the data at the same time. We make the source code of our SOM-MapReduce algorithm, written with Spark using the MapReduce paradigm, publicly available2.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information:
Conference Location: Seattle, WA, USA

Contact IEEE to Subscribe

References

References is not available for this document.