Any table can be converted into the a simple table with only three columns: rec_id (record id), var_id (column_id), val (value). For example,
Original Table
Record_ID Variable_1 Variable_2 Variable3
1 1.2 2.2 3.3
2 2.0 1.5 4.0
The above table can be converted into the following simple table.
Rec_id var_id val
1 1 1.2
1 2 2.2
1 3 3.3
2 1 2.0
2 2 1.5
2 3 4.0
Once the data are stored in the simple, three column table, the top-right correlation matrix can be easily calculated using the following query.
select a.var_id var_id1, b.var_id var_id2, corr(a.val, b.val) correl
from tbl_simple a, tbl_simple b where a.rec_id=b.rec_id
and a.var_id<=b.var_id group by a.var_id, b.var_id ;
This method can handle unlimited number of records and variables. To improve the performance, we should create index on rec_id and var_id. Also see my yesterday's post about "Calculate correlation coefficient between two variables using SQL".
2 comments:
I am new to SQL how do you create the simple table from the original?
Thank you
Please see my post Calculate correlation matrix using SQL (continued). Thanks.
Post a Comment