Problem
In the earlier post Calculate Z-Score using SQL in Oracle, we show that we can normalize a variable using z-score which is defined as (variable value-mean)/Standard Deviation. How do we calculate the Z-scores for employee's salaries for the following table?
SQL> select EMPNO, ENAME, JOB, SAL from emp order by job, sal desc; EMPNO ENAME JOB SAL ---------- ---------- --------- ---------- 7902 FORD ANALYST 3000 7788 SCOTT ANALYST 3000 7934 MILLER CLERK 1300 7876 ADAMS CLERK 1100 7900 JAMES CLERK 950 7369 SMITH CLERK 800 7566 JONES MANAGER 2975 7698 BLAKE MANAGER 2850 7782 CLARK MANAGER 2450 7839 KING PRESIDENT 5000 7499 ALLEN SALESMAN 1600 7844 TURNER SALESMAN 1500 7521 WARD SALESMAN 1250 7654 MARTIN SALESMAN 1250 14 rows selected.
Solution
We can use the following query to calculate Z-score.
SQL> with tbl_mean_std as ( select avg(sal) m, stddev(sal) std from EMP ) select EMPNO, ENAME, JOB, SAL, (sal-m)/std as z_score from EMP, tbl_mean_std order by job, sal desc; EMPNO ENAME JOB SAL Z_SCORE ---------- ---------- --------- ---------- ---------- 7902 FORD ANALYST 3000 .783748996 7788 SCOTT ANALYST 3000 .783748996 7934 MILLER CLERK 1300 -.65387922 7876 ADAMS CLERK 1100 -.82301195 7900 JAMES CLERK 950 -.9498615 7369 SMITH CLERK 800 -1.076711 7566 JONES MANAGER 2975 .762607405 7698 BLAKE MANAGER 2850 .656899448 7782 CLARK MANAGER 2450 .318633985 7839 KING PRESIDENT 5000 2.47507631 7499 ALLEN SALESMAN 1600 -.40018012 7844 TURNER SALESMAN 1500 -.48474649 7521 WARD SALESMAN 1250 -.6961624 7654 MARTIN SALESMAN 1250 -.6961624 14 rows selected.Not surprisingly, the president's salary has a z-score of 2.47 which is almost two and half standard deviation above the mean. Mean always has a z-score of zero. If we want to save the mean and standard deviation derived from this data set, we may create a permanent table to store them. This table can be used to calculate the z-score for any future data points. The following queries create a permanent table containing mean and standard deviation. It is then used to create a view that calculates the z-score.
SQL> create table tbl_mean_std_perm as select avg(sal) m, stddev(sal) std from EMP; SQL> create view v_emp_zscore as select EMPNO, ENAME, JOB, SAL, (sal-m)/std as z_score from EMP, tbl_mean_std_perm;We can take a look at the actual values of mean and standard deviation.
SQL> select * from tbl_mean_std_perm; M STD ---------- ---------- 2073.21429 1182.50322
No comments:
Post a Comment