Oracle analytics functions such as lag() are very useful in building interesting variables for predictive models. For example, to predict if a bank account will be closed by a customer we may want to look at the account balance history. If the account balance drop consecutively for three months, it may indicate that the customer will leave the bank. Using the following table as an example, we will describe how to build a variable indicating balance dropping for three consecutive months.
SQL> select * from tbl_account1 order by dt desc; ACCT_NUM BALANCE DT ---------- ---------- --------- 12345 200 31-MAY-13 12345 1000 30-APR-13 12345 5000 31-MAR-13 12345 8550 28-FEB-13 12345 8500 31-JAN-13One approach is to use lag() function. In the following query, for each record we calculate the previous 1, 2 and 3 months balances. lag(BALANCE,1), lag(BALANCE,2) and lag(BALANCE,3) indicate the balance in prevous 1, 2 and 3 records based on the order of "dt", respectively. "partition by acct_num" means the calculation is done independently by acct_num.
with tbl as( select a.* , lag(BALANCE,1) over(partition by acct_num order by dt) last_1, lag(BALANCE,2) over(partition by acct_num order by dt) last_2, lag(BALANCE,3) over(partition by acct_num order by dt) last_3 from tbl_account1 a) select * from tbl order by dt; ACCT_NUM BALANCE DT LAST_1 LAST_2 LAST_3 ---------- ---------- --------- ---------- ---------- ---------- 12345 8500 31-JAN-13 12345 8550 28-FEB-13 8500 12345 5000 31-MAR-13 8550 8500 12345 1000 30-APR-13 5000 8550 8500 12345 200 31-MAY-13 1000 5000 8550Once we understand how to calculate the privous month balance using lag() function, we can derive the indictor showing consecutively three months drop in balance using the following query.
with tbl as( select a.* , lag(BALANCE,1) over(partition by acct_num order by dt) last_1, lag(BALANCE,2) over(partition by acct_num order by dt) last_2, lag(BALANCE,3) over(partition by acct_num order by dt) last_3 from tbl_account1 a) select ACCT_NUM, balance, dt, case when balance-last_1 <0 and last_1-last_2<0 and last_2-last_3<0 then 1 else 0 end as bal_drop_in_3m from tbl order by dt; ACCT_NUM BALANCE DT BAL_DROP_IN_3M ---------- ---------- --------- -------------- 12345 8500 31-JAN-13 0 12345 8550 28-FEB-13 0 12345 5000 31-MAR-13 0 12345 1000 30-APR-13 0 12345 200 31-MAY-13 1We can also use analytic function sum() over() to calcluate the variable as shown below.
with tbl as( select a.* , decode(sign(balance-lag(BALANCE,1) over(partition by acct_num order by dt)), 0,0, 1,0, -1) is_drop from tbl_account1 a ) select ACCT_NUM, balance, dt, case when sum(is_drop) over(partition by acct_num order by dt rows 2 preceding) = -3 then 1 else 0 end BAL_DROP_IN_3M from tbl a order by dt; ACCT_NUM BALANCE DT BAL_DROP_IN_3M ---------- ---------- --------- -------------- 12345 8500 31-JAN-13 0 12345 8550 28-FEB-13 0 12345 5000 31-MAR-13 0 12345 1000 30-APR-13 0 12345 200 31-MAY-13 1The above variable BAL_DROP_IN_3M can be used as one of the inputs into a predictive model. We can also use the same method to calculate variables such as stock price going up for five consecutive days, etc.
No comments:
Post a Comment