I’ve consistently noticed in the data space that there are times when the most thoroughly thought, well structured solutions are rarely what the client/stakeholder wants.
A key to being a successful data scientist is in fact being able to break down the most complex thoughts/algorithms to everyday language so that those working alongside you can also come for the ride.
Everything below will be anonymised due to the sensitivity of the work.
The Scene
I once had to create an AI solution to cluster customers in order to find which customers may be similarly affected by an incorrect policy our client had. Incredibly interesting work that combined a modern solution to a problem that has arisen years prior.
The outcome or target variable was effectively a gesture of goodwill and so while there was a need to do right by the customer, there was a degree of flexibility around the total figure matched to each customers name.
After sense checking many of the clusters we realised the results our model displayed extremely strong performance and provided the ability to handle copious amounts of data in a very short period of time.
The client was slightly overwhelmed by the “AI” and opted for a more simple approach:
“Can we not just take the average of whats been done so far and give that to the rest?”
Well we can. But we shouldn’t.
Either way, the client comes first and though it may be frustrating not having your elegant solution embraced, it can be hard to get your head around elegant solutions and so I understand.
That being said. I HATE using the average.
If Bill Gates, myself and a random office worker were in a room together we would have an AVERAGE wealth of X billion.
If Bill Gates, myself and a random office worker were in a room together we would have a MEDIAN wealth of X thousand.
It is imperative to know the difference.
The client initially had the average value calculated which left them subject to paying a substantial figure to their customers. They were appalled they would have to pay so much for a gesture of goodwill.
The client failed to understand the difference between the average and the median and it can greatly skew results
The average = (sum(values)/n)
The median = middle value of the dataset when sorted ascending
When we calculated the median for the client the value was 25% that of the average value.
Why?
The average is extremely prone to outliers.
There were data points in our dataset that contained enormous figures that were not representative of the general population and in taking the average value, we were then subject to paying a large gesture of goodwill fee when in fact the median would have sufficed for legal requirements and been more representative of what the customers actually deserved.
Lessons Learned
- It is so vital to understand the basic concepts of maths
- Without knowing the distinct intricacies between average and median, the client would have been exposed to a huge financial hit
- This problem of mean vs. median is also very important for replacing data in ML
2. There are always ways to deliver value.
- You’re not a data scientist, you’re a problem solver
- It can be disappointing when you deliver an MVP that meets and exceeds the initial project scope, only for the project scope to be changed last minute. But don’t let this be the last of you.
- If the client is looking for the simplest solution, give them the best simplest solution possible
3. If you’re not moving forward, you’re moving backwards
- The client should have adapted to the more advanced solution
- ML is an incredibly advanced skill that tends to outperform humans. If your company/team does not embrace it, it’s looking likely that they’ll be left behind by those that do
- Don’t be afraid to adapt to more progressive technology
- Don’t be afraid to learn about new solutions
- Want to learn about new solutions
Best of luck!