Oracle analytics functions such as lag() are very useful in building interesting variables for predictive models. For example, to predict if a bank account will be closed by a customer we may want to look at the account balance history. If the account balance drop consecutively for three months, it may indicate that the customer will leave the bank. Using the following table as an example, we will describe how to build a variable indicating balance dropping for three consecutive months.
SQL> select * from tbl_account1 order by dt desc;
ACCT_NUM BALANCE DT
---------- ---------- ---------
12345 200 31-MAY-13
12345 1000 30-APR-13
12345 5000 31-MAR-13
12345 8550 28-FEB-13
12345 8500 31-JAN-13
One approach is to use lag() function. In the following query, for each record we calculate the previous 1, 2 and 3 months balances. lag(BALANCE,1), lag(BALANCE,2) and lag(BALANCE,3) indicate the balance in prevous 1, 2 and 3 records based on the order of "dt", respectively. "partition by acct_num" means the calculation is done independently by acct_num.
with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select * from tbl order by dt;
ACCT_NUM BALANCE DT LAST_1 LAST_2 LAST_3
---------- ---------- --------- ---------- ---------- ----------
12345 8500 31-JAN-13
12345 8550 28-FEB-13 8500
12345 5000 31-MAR-13 8550 8500
12345 1000 30-APR-13 5000 8550 8500
12345 200 31-MAY-13 1000 5000 8550
Once we understand how to calculate the privous month balance using lag() function, we can derive the indictor showing consecutively three months drop in balance using the following query.
with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select ACCT_NUM, balance, dt,
case when balance-last_1 <0
and last_1-last_2<0
and last_2-last_3<0 then 1
else 0 end as bal_drop_in_3m
from tbl order by dt;
ACCT_NUM BALANCE DT BAL_DROP_IN_3M
---------- ---------- --------- --------------
12345 8500 31-JAN-13 0
12345 8550 28-FEB-13 0
12345 5000 31-MAR-13 0
12345 1000 30-APR-13 0
12345 200 31-MAY-13 1
We can also use analytic function sum() over() to calcluate the variable as shown below.
with
tbl as(
select a.* ,
decode(sign(balance-lag(BALANCE,1) over(partition by acct_num order by dt)),
0,0,
1,0,
-1) is_drop
from tbl_account1 a
)
select ACCT_NUM, balance, dt,
case when sum(is_drop) over(partition by acct_num
order by dt rows 2 preceding) = -3 then 1
else 0 end BAL_DROP_IN_3M
from tbl a order by dt;
ACCT_NUM BALANCE DT BAL_DROP_IN_3M
---------- ---------- --------- --------------
12345 8500 31-JAN-13 0
12345 8550 28-FEB-13 0
12345 5000 31-MAR-13 0
12345 1000 30-APR-13 0
12345 200 31-MAY-13 1
The above variable BAL_DROP_IN_3M can be used as one of the inputs into a predictive model. We can also use the same method to calculate variables such as stock price going up for five consecutive days, etc.