InsurTech Ohio Interview with Mark C. Russell of Ohio Mutual Insurance Group

Mark C. Russell is CEO of Ohio Mutual Insurance Group, a premier insurance carrier born and cultivated in Ohio with more than 100 years servicing customers. Mark leads a team that is driven to be an…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Using Machine Learning to predict customer behavior for Starbucks campaigns

There are a number of ways to define the problem. I wanted to focus on the behavior of the customers. Is it possible to predict if they will complete an offer? Can we wrangle the data in such a way that we could create a Machine Learning Model to be able to predict the behavior for each customer for a given campaign?

First we will start by exploring the data set. We will then decide if there is enough data present in order to have a chance to complete the task. For example is the data of good quality or broken in any way? What is the distribution of the data?

Next we should clean the data, and extract useful features that can be used for our ML-model.

If we get this far we should train our model on parts of our data and test it on unseen parts in order to see how well it performs.

The final step is to try and improve the ML model by optimizing hyper parameters.

We now dig into the data and see what it contains.

Portfolio of offers

The first part of the data contains the different offers, like the type difficulty, reward etc.

Next we see that a dataset containing the profiles of the different customers is available. We See gender, age, income and day of membership. We can see that there are a lot of clients with age 118, where the Gender and Income are NaN. Probably these clients have not filled in their account details yet and are given a default value.

Transactions

The final part of the data is the transcripts, where each transaction made by a person is recorded. We note that events like offer received, offer completed, amount spent etc can be found here.

This dataset contains so much information that it is interesting to dig a little further.

Lets start by looking at how many events a typical user triggers.

Events grouped by user

Also plot it as a histogram.

Distribution of events per client

Now we know that the average number of events per person is about 18. How about the distribution of gender and age within the data?

We can see that there are slightly more males than females in the data.

Further we can see the age distribution.

Distribution of age within the clients.

Now we know a bit about the clients and the distributions of age and gender. Lets try and understand how the campaigns may look like if we follow the behavior of a single person.

Typical person transaction log

We can identify a few things that happens for this particular client. This woman received an offer at time 0, then it was viewed 6 hours later. After that she actually buys some coffee at time 132, enough in order to complete the offer. Then a new offer is received and the client responds with her actions.

Previously we saw that the profile data contained null values. Lets fix that by filling in the gaps. We can impute the missing values with the median value, in order to improve the quality of the data. It is not optimal, but better than throwing the data out. Median value is a better measure since things like income is not normally distributed. It is more common that few people are very rich and some are very poor. If we were to use the mean value it is likely that the rich people would tilt the imputed value too much.

Cleaned client data

We have also one-hot encoded the gender, and added a column to indicate if the gender was unknown or not.

Is it possible to create a dataset that extracts the information for a user, for each of the offers received? We would like to know if the user even viewed the offer, if it was completed, and how much money was spent during the lifetime of the campaign.

Such a dataset was created by iterating each client and record the outcome of each offer. The final result can be viewed here:

Outcomes for each offer and person.

So each campaign can now be followed on a per-user level. Other information was also recorded like, how many offers the person had previously completed, viewed etc.

Here are some statistics about the offerOutcomes dataset:

Offer outcomes by offerType, totals.
Offer outcomes by offerType, averages.

From this we can conclude that the offers of type discount is more frequently completed, with much higher earnings than for example BOGO or information campaigns.

The next step is to join the personal information to the campaign dataset.

Offers with personal details added.

Is it possible to use this data in order to predict the result of future campaigns on an individual level? If we had this information we could target people more directly instead of giving the same offer to everybody.

Much of the ETL work has been done at this point. The data is cleaned, features are extracted and different tables are joined together. Now it we will feed this into a machine learning model.

We need to split the data into a training set and a test set. This in order to be able to evaluate the performance of the model. Lets do that.

Create a Gradient Boosting classifier as a machine learning model. Then we train it on the training set and evaluate it on the unseen test set.

Gradient boosting model

We can see that we have a hit-rate of about 79% on the test set, not too bad at all.

A confusion matrix is a great way to visualize how good the predictions are. We can see where the model makes its errors. Is it false positives or false negatives that are most common? Lets have a look.

Confusion matrix in numbers.

We can see that we have a high number of true negatives, and a quite good portion of the true positives covered. It is not too much skewed in any direction.

Lets try that.

Grid searching for parameters

We have found a new model with slightly different hyper parameters. How good is it?

Classification report of the improved model

We can see that the grid search lead to some improvements, even though they were small. We can see that a larger estimator size with lower learning rate did better.

The final model performs quite good. With about a 80% hit rate of predicting if a client will complete the offer or not. Also there is not a particular large unbalance between false positives or false negatives, as seen in the confusion matrix.

The true negatives are higher than the true positives. This may be due to that the information campaigns does not have a goal to be completed by the customer.

The tuning of hyper-parameters did improve the model, but not by a large amount.

The task was very interesting. From taking the raw data to being able to predict customer behavior on an individual level.

The most difficult problem was indeed to transform the raw data, which was a transaction log of events, into something useful. For example a customer may complete an offer by chance before even viewing the ad.

Also it is quite hard to do good feature engineering. It is an art form to extract useful metrics as input to a ML-model. Should we include the previous behavior of the customer as input when evaluating the probability of a successful campaign? Which metrics should then be included?

In the end the model that predicted if a client completes an offer or not was a success. If we can predict with 80% accuracy weather how the user will behave, we can target a smaller audience but at the same time increasing revenues.

Another thing could be to have an Ensemble of models that produces the final prediction. In most cases this will outperform any single ML-model.

We could also engineer more features as inputs to the model. For example one could add how active has been during the last week, month, year etc. Does he/she increase the spending with time?

It would also be interesting to see if we could predict the increase of spending during informational campaigns.

Add a comment

Related posts:

Is Evil Possible Without Intelligence?

Think about it. What does it really mean to commit an act of evil? When a lion kills and eats a deer for dinner, is that evil? What about when a mother sloth bear eats her stillborn or weak newborn…

This Ancient Way of Detoxifying The Body I See No One Talking About

Traditional methods of detoxing have been used for hundreds of years in Ireland! Discover the old Irish way of detoxing and how it can help you achieve a healthier, happier lifestyle.

How to Hire a Structural Designer for a Project

Why are structural designers in demand nowadays? For residential buildings, tallest buildings, large span bridges, dams, roads, industries, and many more, Structural designers play a vital role in…