How Earnix enhances GLM models with Smart Grouping tech

July 25, 2025

Insurers and banks increasingly face complex analytical challenges as they strive to build accurate, transparent, and scalable predictive models. From managing high-dimensional data to avoiding overfitting, finding the right balance between model performance and interpretability is no small task, especially when dealing with categorical features.

In response to this, financial analytics provider Earnix is investigating more effective ways to manage categorical data within General Linear Models (GLMs).

Its latest innovation, Smart Grouping, aims to simplify model structures and improve performance by intelligently merging similar categories—a capability now built into its Auto-GLM solution.

The modern headache of categorical data

Categorical features, such as city names or policy types, pose a significant challenge for model builders. Because mathematical models, including GLMs, cannot interpret text directly, these values must be translated into numerical formats.

The most common method, One-Hot Encoding, translates each category into a separate binary feature, greatly increasing model complexity and risking overfitting.

For instance, two cities with similar insurance claim statistics might be treated separately, producing misleading coefficient values. Merging them could simplify the model and improve accuracy—but deciding which categories to group is a complex task.

Why classic regularisation falls short

One approach is regularisation, which helps reduce complexity by penalising certain model coefficients.

However, traditional GLM techniques using One-Hot Encoding allow merging only with a reference level. This creates problems when the chosen reference, such as a large city with unusually high claims, doesn’t represent the average case.

As a result, merging becomes biased or ineffective, and alternative reference choices offer only limited improvements.

Inside Smart Grouping

Earnix addresses these limitations through Smart Grouping, a feature built into its Auto-GLM tool within the Model Accelerator suite.

Smart Grouping enables clustering of categorical variables using a two-step regularisation process. First, the algorithm ranks categories using a regularised multivariate GLM, then merges adjacent categories using variable fusion, akin to how numeric variables are binned.

By turning the categorical variable into an ordinal feature for merging, and then translating it back to grouped binary categories, this method produces clearer, more accurate models.

For example, in predicting claim frequency based on city and age, Smart Grouping intelligently determines which cities behave similarly and groups them accordingly.

Smart Grouping offers key benefits over traditional methods. It improves interoperability by clearly defining how categories relate to the outcome variable.

It also enhances multivariate compatibility, as groupings are determined with respect to the full set of covariates in the model. Earnix also tackles overfitting through regularised model ranking and validation schemes when forming category groups.

This results in models that are more accurate and more interpretable, key advantages for insurers and banks seeking both compliance and clarity in their analytical tools.

Earnix’s Smart Grouping is now fully operational in Auto-GLM, giving data professionals immediate access to this enhanced functionality.

Read the full blog from Earnix here.

Read the daily FinTech news