In the earlier post Calculate correlation matrix using SQL, we showed a simple trick of calculating correlation matrix for unlimited number of rows and columns. We need to convert the data table into a simple table with only three columns: rec_id (record id), var_id (column_id), val (value). I was asked how I did the conversion.
If the original table is already in an Oracle database, we can write PL/SQL scripts to convert the original table into the simple table.
In the past experience, my original data were in text file format. So I wrote awk scripts similar to the following to convert the text file into simple format and then loaded them into Oracle. In the script, I included file name as the first column so that I could put data from multiple files into a single destination file.
Original data: $ cat sample.csv record_id, field1, field2, field3 1001,1,2,3 1002,4,5,6 1003,7,8,9 $ awk -F"," '{for (i=1;i<=NF;i++) print FILENAME","FNR-1","i","$i;}' sample.csv sample.csv,0,1,record_id sample.csv,0,2, field1 sample.csv,0,3, field2 sample.csv,0,4, field3 sample.csv,1,1,1001 sample.csv,1,2,1 sample.csv,1,3,2 sample.csv,1,4,3 sample.csv,2,1,1002 sample.csv,2,2,4 sample.csv,2,3,5 sample.csv,2,4,6 sample.csv,3,1,1003 sample.csv,3,2,7 sample.csv,3,3,8 sample.csv,3,4,9
No comments:
Post a Comment