Deep Data Mining Blog: April 2014

Friday, April 25, 2014

Build Best Direct Marketing Predictive Models

A number of years ago, I was applying for a PhD statistician position at an online advertising company. As part of the screening process, I was given a project. The following is a simple description of the problems.

Let's take a look at the following records. ACME sends 10 campaigns through emails to subscriber 1 for 26 weeks. The records are ordered by Week_number. USER_CAT is a demographic code, State_ID the home state where the subscriber is located. CAMPAIGN_ID is a number from 1 to 10. Response 1 means the subscriber responded to the campaign.

SQL> select * from TBL_CAMPAIGN where subscriber_id=1 order by week_number;

WEEK_NUMBER SUBSCRIBER_ID USER_CAT   STATE_ID GENDER CAMPAIGN_ID   RESPONSE
----------- ------------- -------- ---------- ------ ----------- ----------
          1             1 B                 2 M                1          1
          2             1 B                 2 M                2          0
          3             1 B                 2 M                3          0
          4             1 B                 2 M                4          0
          5             1 B                 2 M                5          0
          6             1 B                 2 M                6          1
          7             1 B                 2 M                7          0
          8             1 B                 2 M                8          0
          9             1 B                 2 M                9          0
         10             1 B                 2 M               10          0
         11             1 B                 2 M                1          0
         12             1 B                 2 M                2          0
         13             1 B                 2 M                3          0
         14             1 B                 2 M                4          0
         15             1 B                 2 M                5          0
         16             1 B                 2 M                6          1
         17             1 B                 2 M                7          0
         18             1 B                 2 M                8          0
         19             1 B                 2 M                9          0
         20             1 B                 2 M               10          1
         21             1 B                 2 M                1          0
         22             1 B                 2 M                2          0
         23             1 B                 2 M                3          0
         24             1 B                 2 M                4          1
         25             1 B                 2 M                5          0
         26             1 B                 2 M                6          1

26 rows selected.

There are 26 weeks historical records like the above for 100,000 subscribers. Based on the data, can you answer the following questions?

For week 27, suppose we send emails to only 25% of its subscriber base:
(1) Which subscribers would you send email to?
(2) Which campaign(s) would you deliver to them?
(3) What do you expect the response rate to be?

There are several challenges to solve these problems:
1. It is not simply building predictive models based on static variables such as gender, home state, etc. We need to consider variables that capture the dynamic nature of a subscriber's past responses to campaigns. Things to consider include:
a). Is a subscriber who responded recently more likely to respond?
b). Is the sequence of offering affecting the response rate? For example, if a subscriber is first sent campaign 3, say a new credit care with APR 19%, and then 1 week later a campaign 4, one with APR 6%, we expect that he would more likely to respond to campaign 4.
2. We need to find out which campaign out of ten that a subscriber will most likely to respond to.
3. We need to accurately estimate the response rate. The scores returned by some predictive models are not necessarily probabilities.

I did extensive studies and figured out approaches to solve those problems. There are a number of conclusions that I have drawn from the studies:

1. Finding the best derived variables is the most important step for building a successful model.
2. All models, from simple logistic regression to sophisticated gradient boosting trees, perform reasonable well.
3. Pre-modeling tasks,e.g, data loading, merging, calculating derived variables, have taken more than 85% of the effort.

Monday, April 14, 2014

Drop All Objects In an Oracle Schema

In earlier post, we talk about how to drop all tables in an Oracle Schema. Here we are going to present a script that drop all the database objects in a schema including tables, views, functions, procedures, mining objects, etc. Please see the following PL/SQL script.

               Warning!!!!! The script below will remove all objects
                       under the current schema!!!

begin

for i in 1..3 loop
for r in (select object_name, object_type from user_objects 
          order by object_type, object_name)
  loop
  begin
    if (r.object_type = 'MINING MODEL') then 
     execute immediate ' begin dbms_data_mining.drop_model('||''''||
       r.object_name||''''||'); end;';
    elsif (r.object_type = 'TABLE') then
      execute immediate 'drop table "'||r.object_name
                    ||'" cascade constraints purge';
    else
       execute immediate 'drop '||r.object_type||' "'||r.object_name||'"';
    end if;
    exception when others then null;
  end;
  end loop;
end loop;
end;
/

Warning!!!!! The script above will remove all objects under the current schema!!!
In the above scripts, there are a number of things that we want to mentions.
1. Mining objects are predictive models. A mining object may contain tables with names starting with DM$. 2. The top level loop of for i in 1..3 is to remove all the objects three times. The is because objects may have dependency relationships. Objects may not be dropped if other objects are dependent on them. The simple solution to this is to try to drop all objects several times.
I saved it as file this_will_drop_all_user_objects.sql. I logged in my schema using sqlplus and run the scripts.

$sqlplus myschema/mypassword

SQL> @this_will_drop_all_user_objects 

PL/SQL procedure successfully completed.

All objects are removed under the current schema. Again, please be careful before run the script! It will remove all objects under the current schema.

Wednesday, April 09, 2014

Create Function Based Index on Oracle Table

Function based index is a very useful tool. I took advantage of it when I was involved in a project to deploy predictive models into a production Oracle database. The are two requirements:

1. I needed to write about two million records of two columns, account_num (varchar2(19)) and a code (varchar2(4)) into a table.
2. In production, we need to quickly find the code for each account number.
There is one restriction.
1. There is only one column col_xyz( varchar2(32)) in a table tbl_abc that we can write to.

What I did was to concatenate the account_num and code columns and store them in the column col_xyz. I then create function based index on the first 19 characters of col_xyz, the account_num. The following are SQL scripts involved.
My original table looks like the following.

SQL> select * from MY_TABLE where rownum <5;

ACCOUNT_NUM                            CODE
-------------------------------------- ----
AAAAAAAA00000000984                    3045
AAAAAAAA00000001421                    3045
AAAAAAAA00000002644                    3045
AAAAAAAA00000004569                    3045

I concatenate account_num and code and put them into tbl_abc as column col_xyz.

SQL> insert into tbl_abc (col_xyz) select account_num||code from MY_TABLE;

SQL> select * from tbl_abc where rownum <5;

COL_XYZ
--------------------------------
AAAAAAAA000000009843045
AAAAAAAA000000014213045
AAAAAAAA000000026443045
AAAAAAAA000000045693045

I create function based index on the first 19 characters of col_zyx, the account_num.

SQL> create index tbl_abc_idx on tbl_abc(substr(col_xyz,1,19));

Index created.

Once the function based index is created on substr(col_xyz,1,19), the following query that finds the code for account_num is very fast.

SQL> select col_xyz, substr(col_xyz,1,19) account_num, substr(col_xyz, 20,4) code from tbl_abc where substr(col_xyz,1,19)='AAAAAAAA00000000984';

COL_XYZ                          ACCOUNT_NUM         CODE
-------------------------------- ------------------- ----
AAAAAAAA000000009843045          AAAAAAAA00000000984 3045

Please also see Table Joining Using Function Based Index.

Monday, April 07, 2014

Calculate Stock Price Consecutive Ups Using Pattern Matching

Patter matching SQL support in Oracle 12c is a very powerful. Let take a took at the following stock price table. We want to calculate days when the price goes up for 4 or more consecutive days.

SQL> select * from STOCK order by dat;

SYM DAT            PRICE
--- --------- ----------
XYZ 31-MAY-11         14
XYZ 01-JUN-11         19
XYZ 02-JUN-11         21
XYZ 03-JUN-11         23
XYZ 04-JUN-11         27
XYZ 05-JUN-11         14
XYZ 06-JUN-11         17
XYZ 07-JUN-11         22
XYZ 08-JUN-11         26
XYZ 09-JUN-11         27
XYZ 10-JUN-11         21
XYZ 11-JUN-11         17
XYZ 12-JUN-11         27
XYZ 13-JUN-11         27
XYZ 14-JUN-11         16
XYZ 15-JUN-11         14
XYZ 16-JUN-11         16
XYZ 17-JUN-11         26
XYZ 18-JUN-11         25
XYZ 19-JUN-11         24

To find patterns where the prices goes up for 4 or more consecutive days, we use the following pattern matching query.

SELECT *
FROM stock MATCH_RECOGNIZE (
     PARTITION BY SYMBOL
     ORDER BY dat
     MEASURES  strt.price bal,strt.dat dat,
      last (up.price) as last_up_price,
      last (up.dat) AS last_up_dat
     ONE ROW PER MATCH
     AFTER MATCH SKIP TO LAST UP
     PATTERN (STRT up{4,} )
     DEFINE
        up AS up.price > PREV(up.price)
     ) MR
ORDER BY SYMBOL, dat;


SYM        BAL DAT       LAST_UP_PRICE LAST_UP_D
--- ---------- --------- ------------- ---------
XYZ         14 31-MAY-11            27 04-JUN-11
XYZ         14 05-JUN-11            27 09-JUN-11

In the above query, the pattern is defined as (strt up{4,}), any day followed by 4 days up. It also returns the last up price and day (last (up.price) and last (up.dat)).

Sunday, April 06, 2014

More on Calculating Variables That Predict Customer Churn

In the post Calculate Variables That Predict Customer Churn, we use Oracle analytics function lag() to caculate those accounts with account balances dropping for three consecutive months which may indicate that the customer may leave the bank. This calculation can be done using pattern matching, MATCH_RECOGNIZE, available on Oracle 12c. We use the following table as an example.

SQL> select *from tbl_account1 order by acct_num, dt;

  ACCT_NUM    BALANCE DT
---------- ---------- ---------
     12345       8500 31-JAN-13
     12345       8550 28-FEB-13
     12345       5000 31-MAR-13
     12345       1000 30-APR-13
     12345        200 31-MAY-13
     37688       8800 31-JAN-13
     37688       7000 28-FEB-13
     37688       5300 31-MAR-13
     37688       1300 30-APR-13
     37688        500 31-MAY-13

To calculate those accounts with balances dropping for 3 consecutive months, we use the following MATCH_RECOGNIZE query.

SELECT *
FROM tbl_account1 MATCH_RECOGNIZE (
     PARTITION BY acct_num
     ORDER BY dt
     MEASURES
      last(down.balance) bal,
      last(DOWN.dt) AS dt
     ONE ROW PER MATCH
     AFTER MATCH SKIP TO NEXT ROW
     PATTERN (DOWN{3})
     DEFINE
        DOWN AS DOWN.balance < PREV(DOWN.balance)
     ) MR
ORDER BY acct_num, dt;

  ACCT_NUM        BAL DT
---------- ---------- ---------
     12345        200 31-MAY-13
     37688       1300 30-APR-13
     37688        500 31-MAY-13

In the above query, "PARTITION BY acct_num" specifies the pattern matching will be performed indepdently for each acct_num. "DEFINE DOWN AS DOWN.balance < PREV(DOWN.balance)" defines pattern variable DOWN as a result of comparing the balance from current and previous row (based on "order by dt"). PATTERN (DOWN{3})means finding 3 consecutive downs. The syntax for pattern definition is similar to regular expressions. Last(down.balance) and last(down.dt) keep the last balance and dt in a successful match.

Pattern matching queries can be used to extract useful patterns from time series data. These patterns can then be used as input variables for predictive models. Thus, in Oracle 12c SQL, the combination of pattern matching and predictive models is very powerful.

Saturday, April 05, 2014

More On Zip Code and Predictive Models - Variable of High Cardinality

In the post Zip Code and Predictive Models, we talk about how to use zip codes in predictive models. Zip codes have many distinct or unique values. Many other variables include MCC (Merchant Category Code), credit card transaction terminal ID, and IP address have similar characteristics. Actually, there is a terminology to describe the uniqueness or distinctness of variable, the cardinality. High cardinality variables have many unique values. In extreme case, high cardinality variables are unique for each data record and they practically become unique identifiers. These extremely high cardinality variables are not really useful for being included in predictive model. For example, customer names are very unique. If we include names as one of the input variables to build a predictive model, the model will likely perform extremely well on the training data set by simply memorizing association between the customer name and target variable. However, the model will perform poorly on new data that contain unseen names.

The relatively high cardinality variable such as MCC, credit card transaction terminal ID, and IP address can be handled using the same methodology described in Zip Code and Predictive Models to categorize them into smaller number of groups. For tree-based models, it is not necessary to do this for high cardinality varibles.

Zip Code and Predictive Models

It is obvious that zip codes should be treated as categorical variable instead of numeric. Since zip codes have many distinct values, in some cases they may practically become the unique identifiers for data points. We need to deal with them carefully when building predictive models. Otherwise, the models may perform well on training data set but not so well on new data set due to overlearning problem.

For some predictive models such as decision trees and tree-based ensemble models, we can use zip codes directly. For other models, we may categorize zip codes into smaller number of meaningful groups. For example, we can categorize zip code based on their credit card fraud rates when building fraud dectection model. The following query shows card_id, home_zip and fraud indicator(fraud, 0 normal transaction) of a credit card transaction table.

SQL> select card_id, home_zip, is_fraud from CARD_TXN;

   CARD_ID HOME_ZIP           IS_FRAUD
---------- ---------------- ----------
     12345 012320                    0
     45678 012456                    1
 .......................................

Based on the above table, we can categorize zip code by following the following two steps:
1. Calculate the fraud rate for each zip code in the training data set. For zip code with small number of transactions, the fraud rate will be the overall fraud rate for the whole population.
2. Divide zip codes into groups based on fraud rates. we can calculate fraud rate for each home_zip using the following scripts.

create or replace view v_zip_fraud_rate
as
with
tbl_overall_fraud_r as
(select sum(is_fraud)/count(*) fraud_rate from CARD_TXN ),
tbl_zip_fraud_r as
(select home_zip, count(*) tot_num_txns,
sum(is_fraud) tot_num_fraud, sum(is_fraud)/count(*) fraud_rate
from CARD_TXN group by home_zip )
select home_zip, tot_num_txns, tot_num_fraud,
       case when tot_num_txns<50 then b.fraud_rate
            else b.fraud_rate
       end fraud_rate
from tbl_zip_fraud_r a, tbl_overall_fraud_r b;


SQL> select * from V_ZIP_FRAUD_RATE;

HOME_ZIP         TOT_NUM_TXNS TOT_NUM_FRAUD FRAUD_RATE
---------------- ------------ ------------- ----------
012320                   2000            10         .005
012456                   1000             8         .008
012345                     23             0         .006
..........................

In the above example, zip code "012345" has only no fraud transactions with only 23 transactions in total. We assign the overal average fraud rate to it. In the same fashion, we can categorize zip code based on click through rate, response rate, income, etc.

We can easily group zip code based on fraud rate. For example, we can use "case when" queries as shown below. After we convert the zip code into a small number of categroies, we can use it in a predicitve models(such as a logistic regression model). Again, it is not nessary to categorize zip codes when using decision tree based models.

select HOME_ZIP,
   case when fraud_rate<0.001 then 1
        when fraud_rate<0.003 then 2
        when fraud_rate<0.005 then 3
        when fraud_rate<0.007 then 4
        when fraud_rate<0.008 then 5
        else 6 end zip_segment
from
v_zip_fraud_rate;

Friday, April 04, 2014

Empty All Tables In an Oracle Schema

In stead of removing all tables in our schema as shown in post Drop All Tables In an Oracle Schema , there are situations where we want to keep the tables while empty all the data stored in them. The following PL/SQL script does this.

               Warning!!!!! The script below will empty all tables 
                       under the current schema!!!

begin
for r in (select table_name from user_tables order by table_name)
  loop
  begin
   execute immediate 'truncate table '||r.table_name;
   exception when others then null;
  end;
  end loop;
end;
/

Warning!!!!! The script above will empty all tables under the current schema!!!
I saved it as file this_will_empty_all_user_tables.sql. I logged in my schema using sqlplus and run the scripts.

$sqlplus myschema/mypassword

SQL> @this_will_empty_all_user_tables 

PL/SQL procedure successfully completed.

All data stored in the tables in current schema are gone while the table structure remain unchanged.

Drop All Tables In an Oracle Schema

Sometimes, we want to drop all tables in our schema. To do this, I create the following PL/SQL script file.

               Warning!!!!! The script below will remove all tables 
                       under the current schema!!!

begin
for r in (select table_name from user_tables order by table_name)
  loop
  begin
   execute immediate 'drop table '||r.table_name
                     ||' cascade constraints purge';
   exception when others then null;
  end;
  end loop;
end;
/

Warning!!!!! The script above will remove all tables under the current schema!!!
I saved it as file this_will_drop_all_user_tables.sql. I logged in my schema using sqlplus and run the scripts.

$sqlplus myschema/mypassword

SQL> @this_will_drop_all_user_tables 

PL/SQL procedure successfully completed.

All the tables are removed under the current schema. Again, please be careful before run the script! It will remove all tables under the current schema.

Wednesday, April 02, 2014

Calculate Variables That Predict Customer Churn

Oracle analytics functions such as lag() are very useful in building interesting variables for predictive models. For example, to predict if a bank account will be closed by a customer we may want to look at the account balance history. If the account balance drop consecutively for three months, it may indicate that the customer will leave the bank. Using the following table as an example, we will describe how to build a variable indicating balance dropping for three consecutive months.

SQL> select * from tbl_account1 order by dt desc;

  ACCT_NUM    BALANCE DT
---------- ---------- ---------
     12345        200 31-MAY-13
     12345       1000 30-APR-13
     12345       5000 31-MAR-13
     12345       8550 28-FEB-13
     12345       8500 31-JAN-13

One approach is to use lag() function. In the following query, for each record we calculate the previous 1, 2 and 3 months balances. lag(BALANCE,1), lag(BALANCE,2) and lag(BALANCE,3) indicate the balance in prevous 1, 2 and 3 records based on the order of "dt", respectively. "partition by acct_num" means the calculation is done independently by acct_num.

with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select * from tbl order by dt;

  ACCT_NUM    BALANCE DT            LAST_1     LAST_2     LAST_3
---------- ---------- --------- ---------- ---------- ----------
     12345       8500 31-JAN-13
     12345       8550 28-FEB-13       8500
     12345       5000 31-MAR-13       8550       8500
     12345       1000 30-APR-13       5000       8550       8500
     12345        200 31-MAY-13       1000       5000       8550

Once we understand how to calculate the privous month balance using lag() function, we can derive the indictor showing consecutively three months drop in balance using the following query.

with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select ACCT_NUM, balance, dt,
    case when balance-last_1 <0 
          and last_1-last_2<0 
          and last_2-last_3<0 then 1
          else 0 end as bal_drop_in_3m
 from tbl order by dt;

  ACCT_NUM    BALANCE DT        BAL_DROP_IN_3M
---------- ---------- --------- --------------
     12345       8500 31-JAN-13              0
     12345       8550 28-FEB-13              0
     12345       5000 31-MAR-13              0
     12345       1000 30-APR-13              0
     12345        200 31-MAY-13              1

We can also use analytic function sum() over() to calcluate the variable as shown below.

with
tbl as(
select a.* ,
decode(sign(balance-lag(BALANCE,1) over(partition by acct_num order by dt)),
        0,0,
        1,0,
        -1) is_drop
from tbl_account1 a
)
select ACCT_NUM, balance, dt,
case when sum(is_drop) over(partition by acct_num 
            order by dt rows 2 preceding) = -3 then 1
     else 0 end BAL_DROP_IN_3M
 from tbl a order by dt;

  ACCT_NUM    BALANCE DT        BAL_DROP_IN_3M
---------- ---------- --------- --------------
     12345       8500 31-JAN-13              0
     12345       8550 28-FEB-13              0
     12345       5000 31-MAR-13              0
     12345       1000 30-APR-13              0
     12345        200 31-MAY-13              1

The above variable BAL_DROP_IN_3M can be used as one of the inputs into a predictive model. We can also use the same method to calculate variables such as stock price going up for five consecutive days, etc.

Why Sometime It Is Not a Good Idea to Export Database Tables as Text Files?

It is a common practice to export a database table as a text file for the purpose of delivering data to another party. However, by converting a database table into a text file we lose many valuable information about data and that may cause problem.
If a column contains special characters such as new line (ASCII code 10 or 0A in hex), the records may not be successfully loaded back into a database as most import utility will treat new line the end of a record. For example, the record id 3 in following table has a new line character between "data" and "analytics". This can be shown using function utl_raw.CAST_TO_RAW().

SQL> select * from tbl_test;

        ID MESSAGE
---------- ----------------------------------------------------------------
         1 Hello
         2 World
         3 data
            analytics
SQL> select message, utl_raw.CAST_TO_RAW(message) ascii_code from tbl_test where id = 3;

MESSAGE                          ASCII_CODE
-------------------------------- ---------------------------------------------------------
data                             64617461200A20616E616C7974696373
 analytics

If we dump this table as a text file, record 3 will not be loaded correctly because the new line normally marks the end of a record. Other characters that make things complicated including comma(,), single quote('), double quote(") etc. Unfortunately, even some ETL tools convert database tables into text files and then load them back into another database. We have experienced problems with these tools.
Then how do we deliver our data to another party if we do not use text files? One approach is to use Oracle utilities such as data pump(expdp/impdp) or exp/imp. We may also use database link to fetch the data from a remote database directly into a table,e.g, create table tbl_abc as select * from tbl_abc@remote_db.

Compare Oracle Tables in Different Databases: Find an ETL Issue

Problem

There are situations where we need to compare tables or views located in different database. For example, a few years ago I was involved in a project to help building a data warehouse to achieve the bank card transactions in production. The card transactions in the production Oracle database are moved through an ETL process to the data warehouse, another Oracle database. The key questions are: how do we know that the data values are the same after the movement through ETL? How do we compare tables in different databases? (We do not automatically assume that the ETL tool does the data movement job correctly. It turned out the some data values actually changes by the ETL.)

One of ways to compare table across Oracle databases is to use database link. Before we create database link using the following command, the user may need to be granted the "create database link " privilege.

SQL> create database link dl_db1 connect to user1 identified by abc123 
using '(description=(address=(protocol=TCP)(host=server1)(port=1521))
(connect_data=(sid=ORCLID)))';

Once the database link is created, we can compare a table the current database to that in another database. To access the table in another database that the database link points to, we specify "@dl_db1" after the table name as shown below. We can query remote table just like local table.

SQL> select count(1) from tbl_123 a, tbl_123@dl_db1 b where a.id=b.id;

In the project that I mentioned above, we compared the tables before and after being moved by ETL and identified that in some cases values changed. Further investigation showed that it was caused by that the ETL tool converted tables in the source database into text files and then loaded them into tables into the destination database. It is usually not a good idea to dump tables text files as some information are lost during the process.

Tuesday, April 01, 2014

Data Mining Model Management: Query user_mining_models View

When we build models in an Oracle database, all those models are database objects that can be queried using SQL. Thus, we can find out the critical information about the models. One of the most useful view is user_mining_models which contains the following information about models:

 MODEL_NAME
 MINING_FUNCTION
 ALGORITHM
 CREATION_DATE
 BUILD_DURATION
 MODEL_SIZE
 COMMENTS

For example, the following query shows models that I have built in my schema.

SQL> select model_name, MINING_FUNCTION, ALGORITHM from user_mining_models;

MODEL_NAME         MINING_FUNCTION                ALGORITHM
------------------ ------------------------------ ------------------------------
ABCOC1M            CLUSTERING                     O_CLUSTER
NB1021             CLASSIFICATION                 NAIVE_BAYES
SVM1029            CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
AR1029             ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES
AI0929             ATTRIBUTE_IMPORTANCE           MINIMUM_DESCRIPTION_LENGTH
DT1029             CLASSIFICATION                 DECISION_TREE
GLM1031A           CLASSIFICATION                 DECISION_TREE
GLM1031B           CLASSIFICATION                 DECISION_TREE
GLM1031C           CLASSIFICATION                 DECISION_TREE
GLM1031E           CLASSIFICATION                 GENERALIZED_LINEAR_MODEL
KM1031C            CLUSTERING                     KMEANS
OC_SH_CLUS_SAMPLE  CLUSTERING                     O_CLUSTER
KM1211             CLUSTERING                     KMEANS
KM_MODEL_TRY1      CLUSTERING                     KMEANS
GLM0115            CLASSIFICATION                 GENERALIZED_LINEAR_MODEL
KM_MODEL           CLUSTERING                     KMEANS
SVD0119            FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR
TMSVD1             FEATURE_EXTRACTION             SINGULAR_VALUE_DECOMP

The following "group by" query summarize types of models that I have built.

SQL> select MINING_FUNCTION, ALGORITHM, count(*) from user_mining_models
 group by MINING_FUNCTION, ALGORITHM order by MINING_FUNCTION, ALGORITHM;

MINING_FUNCTION                ALGORITHM                        COUNT(*)
------------------------------ ------------------------------ ----------
ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES               1
ATTRIBUTE_IMPORTANCE           MINIMUM_DESCRIPTION_LENGTH              1
CLASSIFICATION                 DECISION_TREE                           4
CLASSIFICATION                 GENERALIZED_LINEAR_MODEL                2
CLASSIFICATION                 NAIVE_BAYES                             1
CLASSIFICATION                 SUPPORT_VECTOR_MACHINES                 1
CLUSTERING                     KMEANS                                  4
CLUSTERING                     O_CLUSTER                               2
FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR               1
FEATURE_EXTRACTION             SINGULAR_VALUE_DECOMP                   1

In summary, models are database objects that can be queried using SQL. This provides an efficient ways to manage many models in our database.We can retrieve our models by names, mining functions (classification, regression, etc.), algorithms, build date, comments, etc.

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics