Problem
Time since last transaction, or recency, is an important factor for building predictive models such as fraud detection. How do we calculate the time elapsed between two transactions, i.e., time since last transaction, when the dates/time stamps are located in different rows as shown below?
SQL> select customer_id, TXN_DT, TXN_AMT from TBL_TXN_SMALLER order by CUSTOMER_ID, TXN_DT desc; CUSTOMER_ID TXN_DT TXN_AMT -------------------- ----------------- ---------- AAAAAAAA00000849171 20111111:16:11:41 5.23 AAAAAAAA00000849171 20111111:12:19:38 13.45 AAAAAAAA00000849171 20111107:12:36:53 12.2 AAAAAAAA00000849171 20111106:20:43:28 9.99 AAAAAAAA00000849171 20111104:23:00:02 19.95 AAAAAAAA00000849171 20111104:23:00:01 0 AAAAAAAA00000849171 20111102:17:08:20 42.77 AAAAAAAA00000849171 20111102:17:05:41 36.33 AAAAAAAA00000849171 20111102:16:00:25 14.07 AAAAAAAA00000849171 20111101:16:41:56 16.32 AAAAAAAA00003868233 20111110:16:07:54 35.16 AAAAAAAA00003868233 20111110:06:12:14 83.69 AAAAAAAA00003868233 20111110:06:12:09 0 AAAAAAAA00003868233 20111109:16:00:38 19.95 AAAAAAAA00003868233 20111104:06:21:50 54 AAAAAAAA00003868233 20111103:06:23:42 29.71 AAAAAAAA00003868233 20111103:04:42:13 79 AAAAAAAA00003868233 20111102:13:17:10 489 AAAAAAAA00003868233 20111102:02:10:28 9.95 AAAAAAAA00003868233 20111031:16:03:42 714.1 AAAAAAAA00004821778 20111002:02:19:17 100 21 rows selected.
Solution
One of the solutions is to use Oracle function lag. In the following query, lag(txn_dt,1) gets the previous row for the customer_id indepedently (partition by CUSTOMER_ID) order by txn_date. Please notice the first transaction from each customer is always null. This makes sense because there is no previous transaction for the first transaction.
SQL> select customer_id, TXN_DT, TXN_AMT, lag(TXN_DT,1) over(partition by CUSTOMER_ID order by TXN_DT) prev_dt, txn_dt-lag(TXN_DT,1) over(partition by CUSTOMER_ID order by TXN_DT) days_since_last from TBL_TXN_SMALLER order by CUSTOMER_ID, TXN_DT desc; CUSTOMER_ID TXN_DT TXN_AMT PREV_DT DAYS_SINCE_LAST -------------------- ----------------- ---------- ----------------- --------------- AAAAAAAA00000849171 20111111:16:11:41 5.23 20111111:12:19:38 .161145833 AAAAAAAA00000849171 20111111:12:19:38 13.45 20111107:12:36:53 3.98802083 AAAAAAAA00000849171 20111107:12:36:53 12.2 20111106:20:43:28 .662094907 AAAAAAAA00000849171 20111106:20:43:28 9.99 20111104:23:00:02 1.90516204 AAAAAAAA00000849171 20111104:23:00:02 19.95 20111104:23:00:01 .000011574 AAAAAAAA00000849171 20111104:23:00:01 0 20111102:17:08:20 2.24422454 AAAAAAAA00000849171 20111102:17:08:20 42.77 20111102:17:05:41 .001840278 AAAAAAAA00000849171 20111102:17:05:41 36.33 20111102:16:00:25 .045324074 AAAAAAAA00000849171 20111102:16:00:25 14.07 20111101:16:41:56 .971168981 AAAAAAAA00000849171 20111101:16:41:56 16.32 AAAAAAAA00003868233 20111110:16:07:54 35.16 20111110:06:12:14 .413657407 AAAAAAAA00003868233 20111110:06:12:14 83.69 20111110:06:12:09 .00005787 AAAAAAAA00003868233 20111110:06:12:09 0 20111109:16:00:38 .591331019 AAAAAAAA00003868233 20111109:16:00:38 19.95 20111104:06:21:50 5.40194444 AAAAAAAA00003868233 20111104:06:21:50 54 20111103:06:23:42 .998703704 AAAAAAAA00003868233 20111103:06:23:42 29.71 20111103:04:42:13 .070474537 AAAAAAAA00003868233 20111103:04:42:13 79 20111102:13:17:10 .642395833 AAAAAAAA00003868233 20111102:13:17:10 489 20111102:02:10:28 .462986111 AAAAAAAA00003868233 20111102:02:10:28 9.95 20111031:16:03:42 1.42136574 AAAAAAAA00003868233 20111031:16:03:42 714.1 AAAAAAAA00004821778 20111002:02:19:17 100 21 rows selected.
Conclusion
Using Oracle analytic function lag, we can calculate the recency factors such as time since last transaction within a single query without complex coding that may involve joining or looping.
No comments:
Post a Comment