Our client is primarily a B2B business with multiple branches across the UK, supplying tools to a wide range of industries.
The project required us to identify customers with a higher revenue potential than their current spending suggests, to allow the sales team to prioritise effectively.
The model needed to be interpretable to allow salespeople to action the insights accordingly.
What we did
✔︎ Explored datasets to understand business rules and data quality.
✔︎ Built a scraping tool to gather data from Companies House.
✔︎ Incorporated information from additional third party sources to enrich the client’s data.
✔︎ Built an end to end pipeline to clean and aggregate customer accounts and score their potential.
✔︎ Added interpretability to the model outputs to help the sales team understand the reasoning behind the scores given.
The sales team now have a scored list of customers to support the prioritisation of follow-ups and sales opportunities. The model incorporates an intelligent feedback loop that allows the user to adjust a customer's score, thereby allowing the model to improve further over time.
Our handover included fully documented pipeline scripts and the accompanying model, along with step-by-step guides that explain how to set up the pipelines and execute the scoring process. Our data-cleaning scripts gave the unified view of their customer base that were free from duplicate records.
accounts tagged with higher potential for the sales teams
How we did it
In the initial phase of the project, we worked closely with the client to explore the different data sources available. We then scripted the cleaning pipeline based on learnings from these workshops.
Next, we identified third party sources that could be used to enrich the in-house data. We built a bespoke scraper for the online Companies House data and incorporated into the client's data pipelines.
We then trained a Python LightGBM model to predict spend given only features of each customer that were not directly related to their current spend. We validated the results, engineering new features to generated greater predictive power and removing features that indicated data leakage. To add transparency to the model, we incorporated the SHAP library, which uses Shapley values to score each variable’s predictive power for individual records.
Finally, we added a feedback loop to allow users to adjust scores. The model learns from these adjustments and corrects itself during future training and predictions.
The project was delivered by a team of data scientists in under 2 months.
Looking to run a similar project?
If you’re interested in finding out more about our services and how they can transform your business, get in touch and we'd be happy to tell you more.