Blog, Learn

It’s time to use AI and Machine Learning like bar charts.

Organizations need to deploy AI and machine learning more widely, beyond just data science teams. Yes, sometimes these models will be imperfect, or even result in the wrong decision. But imperfect ML is not worse than imperfect Excel business analysis. The available size and scale of data requires more analysts empowered with an upgraded suite of techniques.

AI and Machine Learning (AI / ML) are treated with too much caution and reverence. Marketed as a chess-playing, Jeopardy winning black box, AI / ML has garnered both a fear and respect it does not deserve. As a result, it has become a special tool, reserved for “big projects.” 

But machine learning is simply a new way to map the patterns that are already being examined to the outcome variables already being targeted

Previously, within a BI reporting tool, an analyst might drill down into a few demographic variables: bar charts for churn by age, sex, geography, and relationship with company. From this result, the marketer might eyeball a pattern that men, aged 35-50, who have higher balances and loan products are retained at highest rates. This specific analysis happens nearly every day in nearly every organization – and can be thought of as a “quick human learning model.”

Simple histograms to profile churn by various characteristics

Machine learning accomplishes the exact same task. With recent advancements in data science tools, like AutoML, creation of ML runs to become democratized and it is now trivial to take a structured dataset, and create a good model. Lack of knowledge of Python or R is no longer a barrier to ML application. 

AutoML tools range in complexity; Einblick’s AutoML wizard is an example of a straightforward application to a classification task

Once the ML run is complete, it is now easy to reproduce the insight above. To start, checking the feature importance output will show us a qualitative rank ordering of most important demographic factors. This model has become a more holistic way to evaluate candidate drivers, and the top drivers can be further explored thoroughly (including with bar charts!)

Shapley visualization in Einblick of a XGBoost Regression 

As well, good ML tools further explain the model created, which can let you walk through individual predictions. This becomes a practical and concrete way to show and understand exactly how a customer’s profile leads to a predicted behavior. Most humans will appreciate tangible examples, even if machine learning is done at scale. 

Understanding how the different variables of one account contribute to a predicted, and actual, churn. 

A ML model doesn’t really need to become a productionalized scoring API to have been useful to our analyst. The above visualizations are very fast to do, and the digestible scope of identifying key drivers and using straightforward outputs means that analysts do not really need much special training. Of course, there is no limitation either; the graphically represented XGBoost-based model is productionalizable as-is or can be honed with fine tuning by data scientists.

And while there are pitfalls to using machine learning without understanding the full technical implementation, any business decision supported by imperfect ML is hardly likely to be worse than the existing imperfect analysis produced in spreadsheets only. An imperfect world of analysis recognizes that those above ML outputs are incremental, for instance, to searching for drivers 1-by-1 based on gut feel. 

Ultimately, organizational data science strategy needs to strive for more agility in applying advanced data science, and create an environment that recognizes that it is easy to derive value from ML at multiple levels of application. A pivot table, a pie chart, and machine learning should be treated co-equally in day-to-day analysis, as each has a valuable part to play in creating data-driven inputs to sensible decision making.