For a project I’m working on at work, I’m building a predictive model that categorizes something (I can’t tell you what) into two bins. There is a default bin that 95% of the things belong to and a bin that the business cares a lot about, containing 5% of the things. Some readers may be familiar with the use of predictive models to identify better sales leads, so that you can target the leads most likely to convert and minimize the amount of effort wasted on people who won’t purchase your product. Although my situation doesn’t have to do with sales leads, I’m going to pretend it does, as it’s a common domain.
My data is many thousands of “leads”, for which I’ve constructed hundreds of predictive features (mostly 1/0, a few numeric) each. I can plug this data into any number of common statistical and machine learning systems which will crunch the numbers and provide a black box that can do a pretty good job of separating more-valuable leads from less valuable leads. That’s great, but now I have to communicate what I’ve done, and how valuable it is, to an audience that struggles with relatively simple statistical concepts like correlation. What can I do?