Deep Data Mining Blog: March 2014

Sunday, March 30, 2014

Oracle On The Fly Model- Clustering

One of the features in Oracle 12c is the ability of building and running models on the fly ( as opposed to the conventional two steps of building a persistent model first and then applying it to the data). On the fly predictive models provides data miners powerful tools for accomplishing sophisticated tasks with easy.
I will use on the fly k-means clustering as an example. Take a look at the following simple consumer data. Cust_ID is the unique identifier for a consumer.

SQL> select * from TBL_CUSTOMER3 where rownum <=10;

   CUST_ID     INCOME  YEARS_RES HOME_VALUE SEX        MARRIED
---------- ---------- ---------- ---------- ---------- ----------
      7422    87499.5          2        117 "M"        "M"
      3356                    15         80 "M"        "U"
      4782    62499.5          3         91 "M"        "M"
      7333    87499.5          7         85 "M"        "M"
       890    42999.5          1         58 "F"        "U"
      6401    87499.5          5        128 "M"        "M"
      2356    87499.5          4         96 "M"        "M"
      1638    87499.5         13        152 "M"        "M"
      6713    62499.5          6         49 "M"        "U"
      3674    87499.5          3        119 "M"        "M"

10 rows selected.

To understand consumers better, usually we want to categorize consumers into small number of groups, instead of dealing with them individually. We can using Oracle predictive function cluster_id() over() to find clusters on the fly as shown below. Here, cluster_id(into 5 using income, YEARS_RES, HOME_VALUE) returns 5 cluster identifiers using the default clustering algorithms (K-means) based on variables income, YEARS_RES, HOME_VALUE. Missing input variables are OK as they will be replaced by the means. Please notice that the numbering of cluster identifier is not important at all,i.e., as long as clusters are different we can number cluster id arbitrarily.


SQL> select * from (select cust_id, 
   cluster_id(into 5 using income, YEARS_RES, HOME_VALUE ) over() cid 
from TBL_CUSTOMER3) where rownum <=10;

   CUST_ID        CID
---------- ----------
      7422          9
      3356          6
      4782          7
      7333          8
       890          5
      6401          9
      2356          9
      1638          8
      6713          7
      3674          9

10 rows selected.

We can calculate some summary information about each group using the following query to gain insight into the data.

SQL> select cid, count(*), avg(income) avg_income, 
avg(years_res) avg_years_res, 
avg(home_value) avg_home_value from 
(select cust_id, income, years_res, home_value, 
cluster_id(into 5 using income, YEARS_RES, HOME_VALUE ) over() cid 
from TBL_CUSTOMER3) group by cid order by count(*) desc;

       CID   COUNT(*)  AVG_INCOME AVG_YEARS_RES AVG_HOME_VALUE
---------- ---------- ----------- ------------- --------------
         7        152    62499.50          3.88          92.89
         9        126    87499.50          2.75         117.98
         5        110    47254.05          2.58          67.37
         8         65    87499.50          9.15         105.58
         6         31    55749.50         13.45          82.39

The clustering models are generated on the fly. When the queries finish, the models are gone. Thus it is very "clean" to run queries involving on the fly models.

Please notice that in the above query, we combine predictive model with standard SQL (group by cid). This is extremely powerful in that we seamlessly raise conventional SQL queries up to a higher level of predictive modeling. We instantly turn a SQL developer into a data miner. Bingo! (If you are interested in this point, please also see my early posts From Oracle SQL Developer to Data Miner and How easy is it for a SQL developer/analyst to become a data miner?

Saturday, March 29, 2014

Oracle Exception NO_DATA_FOUND

Problem

We want to select a column value from a table based on a key. When no record is found for the key, a special message will be displayed.

Solution

Taking the following table as an example.

SQL> select * from TBL_0321;

        ID VAL
---------- --------------------------------
         1 ok

We create a text file find_id.sql that contains the following scripts. When an id is found, it will display "value is:...". When an id is not found, use "exception when NO_DATA_FOUND" kicks in and "Id not found" is displayed.

set serveroutput on;
declare
msg varchar2(32);
begin
 select val into msg from TBL_0321 where id=&1;
 dbms_output.put_line('value is:'||msg);
 exception when NO_DATA_FOUND then
   dbms_output.put_line('Id not found'); 
end;
/

The following are the outputs from actual running of the script.

SQL> @find_id 1
old   4:  select val into msg from TBL_0321 where id=&1;
new   4:  select val into msg from TBL_0321 where id=1;
value is:ok

PL/SQL procedure successfully completed.

SQL> @find_id 2
old   4:  select val into msg from TBL_0321 where id=&1;
new   4:  select val into msg from TBL_0321 where id=2;
Id not found

PL/SQL procedure successfully completed.

Add Comments to Oracle Table

It is a good idea to add comments or descriptions to a table that could be lengthier and more descriptive than the table name. We can retrieve the comments later when needed. This is particularly important for a large project that may involve hundreds of tables.


SQL> comment on table TBL_0321 is 'this is a test';

Comment created.

To retrieve the comments, use the following query.

SQL> select comments from user_tab_comments where table_name='TBL_0321';

COMMENTS
----------------------------------------------------------------------------------
this is a test

Tuesday, March 25, 2014

Using Oracle Dump Function

Oracle dump() function is a useful tool for looking into the "real value", not just the displayed. For example, we may be puzzled to see the results of the query below. Message "data analytics" in Record 3 spans two lines.

SQL> select * from tbl_test;

        ID MESSAGE
---------- --------------------------------
         1 Hello
         2 World
         3 data
            analytics

To find out what is going on, we use dump() function to examine the content of message. On record 3, there is a ASCII character 10 which represents a line new.

SQL> select a.*, dump(id), dump(message) from tbl_test a;

        ID MESSAGE
---------- --------------------------------
DUMP(ID)
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
DUMP(MESSAGE)
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
         1 Hello
Typ=2 Len=2: 193,2
Typ=1 Len=5: 72,101,108,108,111

         2 World
Typ=2 Len=2: 193,3
Typ=1 Len=5: 87,111,114,108,100

         3 data
            analytics
Typ=2 Len=2: 193,4
Typ=1 Len=16: 100,97,116,97,32,10,32,97,110,97,108,121,116,105,99,115

To remove the new line, we use replace() function as shown below. Now the display looks much better.

SQL> select id, replace(message, chr(10), '') from tbl_test a;

        ID REPLACE(MESSAGE,CHR(10),'')
---------- ----------------------------------------------------------------
         1 Hello
         2 World
         3 data  analytics

In addition to dump(), we can also use function utl_raw.CAST_TO_RAW() as described in post Watch out invisible characters.

Make Oracle Table Read Only

It is a good idea to protect a static table from being modified by making it read only. Command "alter table .. read only" does precisely that. The following example shows how it works.

SQL> create table tbl_test (id number, message varchar2(64));

Table created.

SQL> insert into tbl_test values(1,'Hello');

1 row created.

SQL> alter table tbl_test read only;

Table altered.

SQL> insert into tbl_test values(2,'World');
insert into tbl_test values(2,'World')
            *
ERROR at line 1:
ORA-12081: update operation not allowed on table "MYSCHEMA"."TBL_TEST"

As we can see, after we make the table read only, we can not insert data into the table. We can still run the following queries against read only tables.

1. Select from a read only table.

SQL> select * from tbl_test;

        ID MESSAGE
---------- ----------------------------------------------------------------
         1 Hello

2. Create index on a read only table

SQL> create index tbl_test_idx on tbl_test(id);

Index created.

3. Rename a read only table

SQL> rename tbl_test to tbl_test_b;

Table renamed.

4. Drop a read only table

SQL> drop table tbl_test_b;

Table dropped.

Finally, to restore a read only table to a writable table, we use the command "alter table ... read write" as shown below. Once a "ready only" table becomes "read write", we can insert data into it.

SQL> alter table tbl_test read write;

Table altered.

SQL> insert into tbl_test values(2,'World');

1 row created.

Rename an Oracle Table or View

We can use "rename" to rename a table or view.

SQL> rename tbl_test01 to tbl_test01b;

Table renamed.

Alternatively, we can rename a table using "alter table ...rename to...".


SQL> alter table tbl_test01c rename to tbl_test01d;

Table altered.

Sunday, March 09, 2014

Flush Shared Pool and Buffer Cache

In the post Display How Long to Run a Query, we talk about how to display the time elapsed runing a query. When we run a "fresh" or new query, the database needs to do a lot of work to parse the SQL text and decide the execution plan. Parsed SQL query is stored in memory and may be reused in the future if the same SQL text is issued. Thus as we query the databases, information about our past queries are buffered in the memory. If we run the same query again, it may be faster because the database can reuse information in the memory. We can run the following SQL commands to flush the share pool (information about SQL text) and buffer cache(table data buffered in memory). After that, we can more accurately measure how long it takes to run a fresh query .

SQL>alter system flush shared_pool;
SQL>alter system flush buffer_cache;

Saturday, March 01, 2014

Oracle Direct Path Insert

Problem

When there are large amount of data to be inserted into a table, it is better to use direct path insert. To do direct path insert, we can specify hit /*+ append */ as shown below.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;

However, if we do another direct path insert into the same table, there will be error, ORA-12838: cannot read/modify an object after modifying it in parallel.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;
 insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

To do another direct path insert into the same table, we need to commit the first one first.


SQL> commit;

Commit complete.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;

4 rows created.

Conventional insert does not have the issue as shown below.

SQL>  insert   into tblabc select a.id, a.a from TBL_TEST3 a;

4 rows created.

SQL>  insert   into tblabc select a.id, a.a from TBL_TEST3 a;

4 rows created.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;

4 rows created.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;
 insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


SQL> commit;

Commit complete.

SQL>  insert  /*+ append */ into tblabc select a.id, a.a from TBL_TEST3 a;

4 rows created.

Handle Divided By Zero in Oracle

Problem

Ratio is very useful variable. For example, the ratio between credit card account balance and credit limit is a good measurement of the default risk. However, when calculating ratio, it the denominator is zero, Oracle raises an error,ORA-01476: divisor is equal to zero, and stops the query. This is inconvenient as shown below.

SQL> select * from tbl_account;

CUSTOMER_ID ACCT_BALANCE CREDIT_LIMIT
----------- ------------ ------------
          1         2000         5000
          2            0            0
          3         6000        12000

SQL> select a.*, ACCT_BALANCE/CREDIT_LIMIT r from tbl_account a;
ERROR:
ORA-01476: divisor is equal to zero
no rows selected

Solution

One solution to calculating ratio or division is to build a function that returns NULL when the denominator is zero and normal value when it is not zero. The script below builds such a function div(num1, num2) that calculates num1/num2. It will return NULL when num2 is zero and continue the execution.

create or replace FUNCTION div (
   num1 number,
   num2 number
)
   RETURN NUMBER IS
   v_number   NUMBER;
BEGIN
   BEGIN
      v_number:= num1/num2 ;
   EXCEPTION
      WHEN OTHERS THEN
         RETURN NULL;
   END;

   RETURN v_number;
END div;
/

After we run the above script to build function div(), we can use it to replace division "/".

SQL> select a.*, div(ACCT_BALANCE,CREDIT_LIMIT) r from tbl_account a;

CUSTOMER_ID ACCT_BALANCE CREDIT_LIMIT          R
----------- ------------ ------------ ----------
          1         2000         5000         .4
          2            0            0
          3         6000        12000         .5

The following are other outputs of using div()function.

SQL> select div(0,0) r from dual;

         R
----------


SQL> select div(NULL,0) r from dual;

         R
----------


SQL> select div(NULL,NULL) r from dual;

         R
----------

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Sunday, March 30, 2014

Oracle On The Fly Model- Clustering

Saturday, March 29, 2014

Oracle Exception NO_DATA_FOUND

Problem

Solution

Add Comments to Oracle Table

Tuesday, March 25, 2014

Using Oracle Dump Function

Make Oracle Table Read Only

Rename an Oracle Table or View

Sunday, March 09, 2014

Flush Shared Pool and Buffer Cache

Saturday, March 01, 2014

Oracle Direct Path Insert

Problem

Handle Divided By Zero in Oracle

Problem

Solution