Why exploratory data analysis is important

hyperexponential-exploratory-data-analysis-insurance-insurtech

The flexibility to present and process insurance data in a manner that is easy to work with is of vital importance. hyperexponential explains why.

hyperexponential (hx), a SaaS pricing platform for insurers, provides a pricing tool hx Renew that facilitates the processing of data.

hyperexponential’s Jonathon Bowden, recently demonstrated how exploratory data analysis (EDA) can be used to optimise data processing and build better models.

The best machine learning models are built from clean, high-quality data that has been effectively and skilfully processed. Quite often, Bowden said, this task requires the heaviest lifting and has led to a running joke that most data scientists spend 80% of their time cleaning data and only 20% calibrating models.

Although the core of EDA involves summary statistics, Bowden stressed that there is often more to it.

Understanding the data types is often the first step and identifying which fields will be numerical and which are categorical is the crucial next step. However, Boden noted that it is still important to keep text data because language processing (NLP) technologies might be able to provide valuable insights.

One of the first steps of EDA is to create summary statistics from numerical fields. Then, Bowden said, null values, means, medians, standard deviations, skews, correlations and other valuable metrics, can be noted.

The next step is visualisation. Here, the goal is to look for trends and anomalies and reinforce an understanding of the data and create graphical representations.

“We can save the time spent making the graphs look good since our work here is for EDA purposes only. The goal is to make as many graphs relevant to our data as possible. This will give us the best chances to identify trends and anomalies,” Bowden explained.

One there is a fairly solid understanding of the patterns in a dataset, additional exploration can be carried out.

Bowden highlighted two more advanced methods: clustering and principle component analysis.

There are instances where clustering can significantly help inform the understanding of our data, Bowden said. Clustering algorithms can allocate data into homogenous groups taking in many features, which can then be reconsidered with summary statistics and graphs. With a simple two-dimensional (X, Y) dataset, humans can easily see groupings of points on a scatter plot, but as soon as the number of features goes beyond 3 or 4, it is challenging to make a useful visual representation.

The second method Bowden pointed to, principal component analysis (PCA) is often used in data processing to “flatten” multi-dimensional data.

Bowden said the same methodology can be used for EDA purposes to gain an “explained variance” output of each feature within the data. If it is found that just two or three features explain significant proportions of the variance, examining these features together in more detail should be considered.

The flexibility to present and process insurance data in a manner that is easy to work with is one of the key features of hyperexponential’s next-generation pricing tool, hx Renew.

hyperexponential recently saw Optio Group, a leading speciality MGA, join a growing list of insurers and MGAs to adopt its hx Renew platform.

Copyright © 2022 FinTech Global

Enjoying the stories?

Subscribe to our weekly InsurTech newsletter and get the latest industry news & research

Investors

The following investor(s) were tagged in this article.