When you check Explain you should see different confidence levels:
HIGH for #2
NO for #3, this indicates that stats could not be used, so it defaults to 1/3 of the rows due to the BETWEEN
For #2 the optimizer rewrote your condition first to
WHERE DWH_Entry_Date BETWEEN
DATE '2012-05-31' - 150 AND DATE '2012-05-31'
and finally to
WHERE DWH_Entry_Date BETWEEN
DATE '2012-01-02' - 150 AND DATE '2012-05-31'
But this kind of optimization is only done for INTEGERs and DATEs, not for TIMESTAMPs.
You have to rewrite it on your own to remove the calculation on the DWH_Entry_Date column for #3:
WHERE subscription_offering_start_dt
BETWEEN CAST(DATE '2012-05-31' AS TIMESTAMP(0))
AND CAST(DATE '2012-05-31' AS TIMESTAMP(0)) + INTERVAL '23:59:59' HOUR TO SECOND
or
WHERE subscription_offering_start_dt
BETWEEN TIMESTAMP '2012-05-31 00:00:00'
AND TIMESTAMP '2012-05-31 23:59:59'
This should result in HIGH confidence again.
Dieter
What's your TD release?
The high confidence indicates stale stats, could you show the output of HELP STATS for vdb.subscription_offering_hist?
Dieter
Sorry, i wasn't clear. Of course "high confidence" usually indicates that there are stats (and the optimizer trusts them).
But in this special case (EST: 74,069 vs. ACT: 1,956,001) i assumed the stats could be outdated, resulting in wrong estimates.
Your stats are current, so this under-estimation by the factor of 26 must have a different reason:
Is there an even distribution of values or are offerings sent on specific days within a month?
Are the stats sampled?
When you incease the range of dates is the estimate closer to the actual number
Dieter
This is strange,
- increasing the range from 1 day to 31 days: 31x
- actual rows: 6x (-> not evenly ditributed)
- estimated rows: 1300x (74,069 -> 11,057,406)
Might be a problem specific to Timestamps and/or your actual data, did you already inspect the stats using "HELP STATS tdb.subscription_offering_hist COLUMN subscription_offering_start_dt"?
Maybe you should ask TD support if this is a known problem.
Dieter

estimated cardinality for the following predicate towards timestamp column is far too high vs date column
1) Table DDL:
--------------------------------------
CREATE MULTISET TABLE tdb.subscription_offering_hist ,NO FALLBACK ,
xxxx (
Subscription_Id INTEGER,
Offering_Id INTEGER,
Subscription_Offering_Start_Dt TIMESTAMP(0),
......
DWH_Entry_Date DATE FORMAT 'YY/MM/DD')
PRIMARY INDEX ( Subscription_Id );
2) predicate towards date column:
-----------------------------------------
SELECT count(Subscription_Id)
FROM devdb.soh
WHERE CAST( '2012-05-31' AS DATE )
BETWEEN DWH_Entry_Date
AND DWH_Entry_Date + 150
;
Actual rows returned : 162,480
Estimated rows returned: 161,085
3) predicate towards Timestamp column:
-----------------------------------------
SELECT Subscription_Id
FROM devdb.soh
WHERE CAST( CAST( '2012-05-31' AS DATE ) AS TIMESTAMP(0))
BETWEEN subscription_offering_start_dt
AND subscription_offering_start_dt + ((TIME '23:59:59' - TIME '00:00:00') HOUR TO SECOND)
Actual rows returned : 1,724
Estimated rows returned: 96,960
What's strange that i've tried different values for the timestamp predicate and whichever the value use; optimizer always estimates 96,960 row; while it;s not the case if we run the predicate for date column; different date sure return different estimations.
i've collected needed stats; also this is a test table so it;s not a corrupted stats problem since i've dropped & recreated stats on original table and same issue presisted.
Regards