Now, we want to solve a more sophisticate problem. We want to select the top 2 most frequent items in table tbl_items_by_person (shown below) for each person, Emily and John (instead of global top 2 most frequent items).
Table tbl_items_by_person
The first step is to calculate the item frequency "group by" name and item.
SQL> select name, item, count(*) as num from TBL_ITEMS_BY_PERSON group by name, item order by name, count(*) desc;
NAME ITEM NUM
-------------------------------- -------------------------------- ----------
Emily Peach 8
Emily Apple 4
Emily Banana 3
Emily Pear 2
John Orange 5
John Apple 3
John Peach 3
John Banana 2
John Pear 2
The second step is to use row_number() or dense_rank() to generate rank for each name separately. This is done by using the clause over(partition by name order by num desc).
SQL> select name, item, num, row_number() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from TBL_ITEMS_BY_
PERSON group by name, item ) order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
Emily Banana 3 3
Emily Pear 2 4
John Orange 5 1
John Apple 3 2
John Peach 3 3
John Banana 2 4
John Pear 2 5
We can pick the top 2 frequent items for Emily and John using the following query.
SQL> with tbl as (select name, item, num, row_number() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from
TBL_ITEMS_BY_PERSON group by name, item )) select * from tbl where rnk<=2 order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
John Orange 5 1
John Peach 3 2
Or, we can use dense_rank() to generate rank which will assign the same rank for ties.
SQL> select name, item, num, dense_rank() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from TBL_ITEMS_BY_
PERSON group by name, item ) order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
Emily Banana 3 3
Emily Pear 2 4
John Orange 5 1
John Apple 3 2
John Peach 3 2
John Banana 2 3
John Pear 2 3
For John, there are 3 peaches and Apples. Both of them are ranked as 2. Thus they are both selected as shown below.
SQL> with tbl as (select name, item, num, dense_rank() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from
TBL_ITEMS_BY_PERSON group by name, item )) select * from tbl where rnk<=2 order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
John Orange 5 1
John Peach 3 2
John Apple 3 2
1 comment:
HI!
Let's assume that I would like to see the frequency of each fruit bought. So in the table we have information that people buy fruits. My result table would show that Apple is bought 76% of the people, Peach is bought 45% of the people. Normally, ONE person can buy several fruits, not just one.
How can I calculate this?
thx,
Imre
Post a Comment