Sunday, November 24, 2013

More on Caculating Time Elapsed Between Two Dates

Problem

Time since last transaction, or recency, is an important factor for building predictive models such as fraud detection. How do we calculate the time elapsed between two transactions, i.e., time since last transaction, when the dates/time stamps are located in different rows as shown below?

SQL> select customer_id, TXN_DT, TXN_AMT from TBL_TXN_SMALLER order by CUSTOMER_ID, TXN_DT desc;

CUSTOMER_ID          TXN_DT               TXN_AMT
-------------------- ----------------- ----------
AAAAAAAA00000849171  20111111:16:11:41       5.23
AAAAAAAA00000849171  20111111:12:19:38      13.45
AAAAAAAA00000849171  20111107:12:36:53       12.2
AAAAAAAA00000849171  20111106:20:43:28       9.99
AAAAAAAA00000849171  20111104:23:00:02      19.95
AAAAAAAA00000849171  20111104:23:00:01          0
AAAAAAAA00000849171  20111102:17:08:20      42.77
AAAAAAAA00000849171  20111102:17:05:41      36.33
AAAAAAAA00000849171  20111102:16:00:25      14.07
AAAAAAAA00000849171  20111101:16:41:56      16.32
AAAAAAAA00003868233  20111110:16:07:54      35.16
AAAAAAAA00003868233  20111110:06:12:14      83.69
AAAAAAAA00003868233  20111110:06:12:09          0
AAAAAAAA00003868233  20111109:16:00:38      19.95
AAAAAAAA00003868233  20111104:06:21:50         54
AAAAAAAA00003868233  20111103:06:23:42      29.71
AAAAAAAA00003868233  20111103:04:42:13         79
AAAAAAAA00003868233  20111102:13:17:10        489
AAAAAAAA00003868233  20111102:02:10:28       9.95
AAAAAAAA00003868233  20111031:16:03:42      714.1
AAAAAAAA00004821778  20111002:02:19:17        100

21 rows selected.

Solution

One of the solutions is to use Oracle function lag. In the following query, lag(txn_dt,1) gets the previous row for the customer_id indepedently (partition by CUSTOMER_ID) order by txn_date. Please notice the first transaction from each customer is always null. This makes sense because there is no previous transaction for the first transaction.

SQL> select customer_id, TXN_DT, TXN_AMT, 
lag(TXN_DT,1) over(partition by CUSTOMER_ID order by TXN_DT) prev_dt, 
txn_dt-lag(TXN_DT,1) over(partition by CUSTOMER_ID order by TXN_DT) 
days_since_last from TBL_TXN_SMALLER order by CUSTOMER_ID, TXN_DT desc;

CUSTOMER_ID          TXN_DT               TXN_AMT PREV_DT           DAYS_SINCE_LAST
-------------------- ----------------- ---------- ----------------- ---------------
AAAAAAAA00000849171  20111111:16:11:41       5.23 20111111:12:19:38      .161145833
AAAAAAAA00000849171  20111111:12:19:38      13.45 20111107:12:36:53      3.98802083
AAAAAAAA00000849171  20111107:12:36:53       12.2 20111106:20:43:28      .662094907
AAAAAAAA00000849171  20111106:20:43:28       9.99 20111104:23:00:02      1.90516204
AAAAAAAA00000849171  20111104:23:00:02      19.95 20111104:23:00:01      .000011574
AAAAAAAA00000849171  20111104:23:00:01          0 20111102:17:08:20      2.24422454
AAAAAAAA00000849171  20111102:17:08:20      42.77 20111102:17:05:41      .001840278
AAAAAAAA00000849171  20111102:17:05:41      36.33 20111102:16:00:25      .045324074
AAAAAAAA00000849171  20111102:16:00:25      14.07 20111101:16:41:56      .971168981
AAAAAAAA00000849171  20111101:16:41:56      16.32
AAAAAAAA00003868233  20111110:16:07:54      35.16 20111110:06:12:14      .413657407
AAAAAAAA00003868233  20111110:06:12:14      83.69 20111110:06:12:09       .00005787
AAAAAAAA00003868233  20111110:06:12:09          0 20111109:16:00:38      .591331019
AAAAAAAA00003868233  20111109:16:00:38      19.95 20111104:06:21:50      5.40194444
AAAAAAAA00003868233  20111104:06:21:50         54 20111103:06:23:42      .998703704
AAAAAAAA00003868233  20111103:06:23:42      29.71 20111103:04:42:13      .070474537
AAAAAAAA00003868233  20111103:04:42:13         79 20111102:13:17:10      .642395833
AAAAAAAA00003868233  20111102:13:17:10        489 20111102:02:10:28      .462986111
AAAAAAAA00003868233  20111102:02:10:28       9.95 20111031:16:03:42      1.42136574
AAAAAAAA00003868233  20111031:16:03:42      714.1
AAAAAAAA00004821778  20111002:02:19:17        100

21 rows selected.

Conclusion

Using Oracle analytic function lag, we can calculate the recency factors such as time since last transaction within a single query without complex coding that may involve joining or looping.

No comments: