All Forums Database
samit_9999 37 posts Joined 03/06
20 Jun 2007
Sample without duplicates

Hi,

I have a table which has records as follows

EmpNo DepNo
1 10
2 10
3 20
4 30
5 20
6 40

I want to select a random sample of 3 records from the table , however i should not select a department number more than once.
Example
EmpNo DepNo
1 10
2 10
3 20
In the above case i should have selected DeptNo 10 only once and the results should have been
EmpNo DepNo
1 10
3 20
4 30

Sam

dnoeth 684 posts Joined 11/04
21 Jun 2007

select *
from tab
qualify rank() over (partition by deptno order by 0) = 1 -- one random row per dept
sample 3

Dieter

samit_9999 37 posts Joined 03/06
21 Jun 2007

Hi Dieter,

This solution you gave does not work ok.

I tried it and got these results which is not ok

EMPNO DEPNO
4 30
3 20
5 20

Deptno 20 is getting repeated.

Sam

Vinay Bagare 96 posts Joined 12/04
21 Jun 2007

Use this instead

select *
from test
qualify rank() over (partition by deptno order by deptno) = 1 -- one random row per dept
sample 3

Thanks,
Vinay

joedsilva 505 posts Joined 07/05
21 Jun 2007

I think the trouble is because RANK() returns the same value for same deptno (in this case it's always 1).

So I think piggy backing on Dieter's original solution and using ROW_NUMBER instead, things might work.

SELECT *
FROM MYTABLE
QUALIFY ROW_NUMBER() OVER (partition by deptno order by 0) = 1
SAMPLE 3;

dnoeth 684 posts Joined 11/04
22 Jun 2007

Hi Joe,
of course you're right, i tested it with a row_number, but i couldn't sent an email from that PC, so i keyed it in again with a RANK ;-)

Dieter

samit_9999 37 posts Joined 03/06
29 Jun 2007

Thanks a lot guys!That did work perfectly as required.

Sorry for the late reply

Sam

TdMan 87 posts Joined 01/07
10 Jul 2007

Suppose when I run
select *
from test
qualify rank() over (partition by deptno order by deptno) = 1

where my table looks like

deptno x y
10 1 2
10 2 3
10 3 4
20 1 2
20 2 3
30 1 2

I am getting all the records from my table. Is this scenario correct? If yes what should be done to get unique deptno records by using partition and not unique command.

joedsilva 505 posts Joined 07/05
10 Jul 2007

if your partitioning and ordering expressions are the same, then your rank output will always be 1.

you can use row_number() instead of rank as mentioned in the above posts.

TdMan 87 posts Joined 01/07
10 Jul 2007

WOW
It worked Joed. Thanks a lot.

Can you brief me what made it work? ie diff btw ROW_NUMBER and RANK in this case. And in what criteria it selects the records when deptno being the same (as we are partitioning by deptno).

joedsilva 505 posts Joined 07/05
11 Jul 2007

Simple explanation is this.

for rank() function, if two records belong to the same partitioning expression value and also have same values for the order by columns, they get the same rank.

for row_number() , it gives a unique value for each record which has the same value for partitioning expression, so even if the order by column(s) have duplicate values, one will get a value and it will be incremented for the next one... and it's random AFAIK as to which record gets what value (between duplicate records).

You can refer functions & operators manual for more details.

You must sign in to leave a comment.