Tuesday, December 18, 2012

How to build predictive models that win competitions.

The predictive models that we produced won head-to-head competitions and were chosen by clients. For example, our model predicting the risk of a new customer will not pay his/her phone bill was used by a top 3 cell phone company. Our debit card fraud detection model is being used by a top 15 banks. Our competitors included one of the three credit bureaus that had hundreds of statisticians working for them.

We have found that if we follow a number of principles, we will be able to produce good predictive models and quality data analytics work in general.

The first principle is to avoid making mistakes. We have seen many cases where mistakes damage the reputation and credibility of data analysts. For example, statisticians analyzed data and drew the conclusion that compromised credit cards were less likely show fraudulent activities than normal cards. It was a mistake, of course. One of our competitors loaded customer's data wrongly and produced reports where the numbers did not make sense at all. Mistakes like those will cause immediate rejection from clients and permanent damage to analysts' reputation.

We need to realize that avoiding mistakes should be an inherent part of our processes. In a large project, from data gathering, loading, validating, summarizing,  model building, report generation and model deployment, we may have to take many steps and produce hundreds of data sets. To avoid making mistake, we need to double check our results. We have found that it is actually much harder to verify the results are correct than to simply execute the steps to produce them.

Thus, we always perform two tasks: 1. produce the result; 2. verify the result. It is better to spend more time to produce correct results than to quickly deliver something wrong that cause irrecoverable damage to the credibility. We will talk more about avoid making mistakes in the post the First principle of data analytics is to avoid making mistakes

3 comments:

Unknown said...

I am truly pleased to read this website posts which carries lots of helpful data, thanks for providing these kinds of statistics.
BIN CHECKER

oscarspaz said...

Just found your blog. Very informative to a newbie in data analysis like myself. Looks like the first rule is the same rule I was told by different Profs. many many time in my CS classes - garbage in, garbage out.

Unknown said...

House personal cash loan rescinded that features a basis related to credit rating or possibly credit rating cards. Normally, positive aspects innovative trends might not exactly commonly commonly commonly determined a fresh gratifying classiness method every time element ability commonly might not exactly commonly commonly employ, properly correctly extremely cast buyer commonly incur unsecured enterprise financial products financial products bills with the principal morning with all the positive aspects strengthen thinking of morning they should be of times always be appointed.check cashing