Monday, November 18, 2013

How to Calculate Percentile

Problem

Sometimes, we are more interested in the percentile than the value itself. For example, parents of new born baby want to find out their baby's percentiles for weight and height. Using the employee salary table below, how do we calculate the percentiles for employee's salaries?

SQL>  select EMPNO, ENAME, JOB, SAL from emp order by job, sal desc;

     EMPNO ENAME      JOB              SAL
---------- ---------- --------- ----------
      7902 FORD       ANALYST         3000
      7788 SCOTT      ANALYST         3000
      7934 MILLER     CLERK           1300
      7876 ADAMS      CLERK           1100
      7900 JAMES      CLERK            950
      7369 SMITH      CLERK            800
      7566 JONES      MANAGER         2975
      7698 BLAKE      MANAGER         2850
      7782 CLARK      MANAGER         2450
      7839 KING       PRESIDENT       5000
      7499 ALLEN      SALESMAN        1600
      7844 TURNER     SALESMAN        1500
      7521 WARD       SALESMAN        1250
      7654 MARTIN     SALESMAN        1250

14 rows selected.

Solution

We can use Oracle analytics function cume_dist for convert the value of salary into the percentile as shown below.

SQL>  select EMPNO, ENAME, JOB, SAL, cume_dist() over(order by sal) percentile from emp order by 5 desc;

     EMPNO ENAME      JOB              SAL PERCENTILE
---------- ---------- --------- ---------- ----------
      7839 KING       PRESIDENT       5000          1
      7788 SCOTT      ANALYST         3000 .928571429
      7902 FORD       ANALYST         3000 .928571429
      7566 JONES      MANAGER         2975 .785714286
      7698 BLAKE      MANAGER         2850 .714285714
      7782 CLARK      MANAGER         2450 .642857143
      7499 ALLEN      SALESMAN        1600 .571428571
      7844 TURNER     SALESMAN        1500         .5
      7934 MILLER     CLERK           1300 .428571429
      7521 WARD       SALESMAN        1250 .357142857
      7654 MARTIN     SALESMAN        1250 .357142857
      7876 ADAMS      CLERK           1100 .214285714
      7900 JAMES      CLERK            950 .142857143
      7369 SMITH      CLERK            800 .071428571

14 rows selected.
In the above output, the salary of 1500 corresponds to percentile of 50%. This means that 50% of employees have salaries less or equal to 1500. We  can verify this by running the following query. The number of records with salaries less or equal to 1500 is 7 which is 50% of the total size 14.
SQL> select count(*) from emp where sal <= 1500;

  COUNT(*)
----------
         7

No comments: