For continuous variables, we can use the function width_bucket() to divide the data into segments and calculate the frequency in each segment. We can then pick the average value of the data points from the most frequent segment as the mode. We use the 1,000 random numbers described in the post Generate 1000 Normally Distributed Numbers. The following query calculate the frequency and average value for each of the segment defined by width_bucket(variable, min_value, max_value, number_of_buckets). Buckets 0 and 12 are for values lower than -3 and higher than 3, respectively.
SQL> select s, count(1), avg(num) from (select width_bucket(num, -3,3, 11) s, num from TBL_1K_RND) group by s order by s;
S COUNT(1) AVG(NUM)
---------- ---------- ----------
0 2 -3.0538402
1 5 -2.6387896
2 29 -2.1748723
3 58 -1.6073189
4 115 -1.0457551
5 185 -.53460548
6 208 .010066623
7 194 .525553927
8 119 1.05966651
9 47 1.5985678
10 32 2.14458861
11 4 2.67165765
12 2 3.06640121
Again, we use row_number() function to generate rank based on frequency and calculate the mode as follows.
(
select s, count(1) cnt, avg(num) num from (select width_bucket(num, -3,3, 11) s, num from TBL_1K_RND) group by s order by s),
tbl_2 as
(
select a.*, row_number() over(order by cnt desc) rnk from tbl_1 a
)
select num from tbl_2 where rnk=1;
NUM
----------
.010066623
No comments:
Post a Comment