All Forums Enterprise
marcmc 112 posts Joined 12/05
28 Jun 2007
product Join

Please help me eliminate the product join on this query as it's driving me mad.I'ld prefer to not change the structure of the query at this stage though.When i remove the lines from the OR statemt onwards there is no product join.The statistics have been updated.Should I apply indices, what are your thoughts on indexing?We run a datawarehouse load followed by a mammoth number of mis reports.Any ideas?SELECT count(distinct a.claim_ref_id) FROM fat_bse_cl_pay_cre_trans aJOIN RRtv_lu_day bON (a.Rep_reported_date_id = b.Rep_reported_date_id)WHERE ((((a.claim_ref_id)IN (SELECT c.claim_ref_id FROM Tmp_Reported_Claims_YTD c)) AND b.RRtv_year_id in (2007)) OR (((a.claim_ref_id)IN (SELECT d.claim_ref_id FROM Tmp_Reported_Claims_YTD1 d)) AND b.RRtv_year_id in (2007)));The explain plan looks like this: Explanation 1) First, we lock a distinct PROD_DATA_V."pseudo table" for read on a RowHash to prevent global deadlock for PROD_DATA_V.d. 2) Next, we lock a distinct PROD_DATA_V."pseudo table" for read on a RowHash to prevent global deadlock for PROD_DATA_V.c. 3) We lock a distinct prod_data_t."pseudo table" for read on a RowHash to prevent global deadlock for prod_data_t.fat_bse_cl_pay_cre_trans. 4) We lock a distinct PROD_DATA_T."pseudo table" for read on a RowHash to prevent global deadlock for PROD_DATA_T.TIt_lu_day. 5) We lock PROD_DATA_V.d for read, we lock PROD_DATA_V.c for read, we lock prod_data_t.fat_bse_cl_pay_cre_trans for read, and we lock PROD_DATA_T.TIt_lu_day for read. 6) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from PROD_DATA_V.c by way of an all-rows scan with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 4 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 4 is estimated with high confidence to be 1 row. The estimated time for this step is 0.01 seconds. 2) We do an all-AMPs RETRIEVE step from PROD_DATA_V.d by way of an all-rows scan with no residual conditions into Spool 5 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 5 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 5 is estimated with high confidence to be 1 row. The estimated time for this step is 0.01 seconds. 7) We do an all-AMPs RETRIEVE step from PROD_DATA_T.TIt_lu_day by way of an all-rows scan with no residual conditions into Spool 6 (all_amps) fanned out into 3 hash join partitions, which is duplicated on all AMPs. The size of Spool 6 is estimated with high confidence to be 79,226 rows. The estimated time for this step is 0.08 seconds. 8) We do an all-AMPs RETRIEVE step from prod_data_t.fat_bse_cl_pay_cre_trans by way of an all-rows scan with a condition of ("NOT (prod_data_t.fat_bse_cl_pay_cre_trans.Rep_Reported_dat e_id IS NULL)") into Spool 7 (all_amps) fanned out into 3 hash join partitions, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 7 is estimated with high confidence to be 16,238,400 rows. The estimated time for this step is 45.92 seconds. 9) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an all-rows scan, which is joined to Spool 7 (Last Use) by way of an all-rows scan. Spool 6 and Spool 7 are joined using a hash join of 3 partitions, with a join condition of ("Rep_Reported_date_id = cur_trn_dt"). The result goes into Spool 8 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 8 is estimated with low confidence to be 16,238,400 rows. The estimated time for this step is 7.35 seconds. 10) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 10 (all_amps), which is duplicated on all AMPs. The size of Spool 10 is estimated with high confidence to be 14 rows. The estimated time for this step is 0.02 seconds. 11) We execute the following steps in parallel. 1) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of an all-rows scan, which is joined to Spool 8 by way of an all-rows scan with a condition of ("year_id = 2007"). Spool 10 and Spool 8 are joined using a product join, with a join condition of ("Claim_ref_id = claim_ref_id"). The result goes into Spool 9 (all_amps), which is redistributed by hash code to all AMPs. The size of Spool 9 is estimated with low confidence to be 88 rows. The estimated time for this step is 2.81 seconds. 2) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 11 (all_amps), which is duplicated on all AMPs. The size of Spool 11 is estimated with high confidence to be 14 rows. The estimated time for this step is 0.02 seconds. 12) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an all-rows scan, which is joined to Spool 8 (Last Use) by way of an all-rows scan with a condition of ("year_id = 2007"). Spool 11 and Spool 8 are joined using a product join, with a join condition of ("Claim_ref_id = claim_ref_id"). The result goes into Spool 9 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 9 by the sort key in spool field1 eliminating duplicate rows. The size of Spool 9 is estimated with low confidence to be 88 rows. The estimated time for this step is 2.81 seconds. 13) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way of an all-rows scan into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with low confidence to be 132 rows. The estimated time for this step is 0.02 seconds. 14) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan, and the grouping identifier in field 2. Aggregate Intermediate Results are computed globally, then placed in Spool 13. The size of Spool 13 is estimated with high confidence to be 1 row. The estimated time for this step is 0.02 seconds. 15) We do an all-AMPs SUM step to aggregate from Spool 13 (Last Use) by way of an all-rows scan. Aggregate Intermediate Results are computed globally, then placed in Spool 15. The size of Spool 15 is estimated with high confidence to be 1 row. The estimated time for this step is 0.02 seconds. 16) We do an all-AMPs RETRIEVE step from Spool 15 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.00 seconds. 17) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 59.05 seconds.

Barry-1604 176 posts Joined 07/05
28 Jun 2007

I think that this will do the equivalent:SELECT count(distinct a.claim_ref_id) FROM fat_bse_cl_pay_cre_trans aJOIN RRtv_lu_day bON (a.Rep_reported_date_id = b.Rep_reported_date_id)WHERE (a.claim_ref_id)IN (SELECT c.claim_ref_id FROM Tmp_Reported_Claims_YTD cUNION SELECT d.claim_ref_id FROM Tmp_Reported_Claims_YTD1 d)AND b.RRtv_year_id in (2007);If the claim_ref_id in YTD and YTD1 are mutually exclusive, change the "UNION" to a "UNION ALL".Let me know if this works for you.Regards,Barry

dnoeth 2958 posts Joined 11/04
29 Jun 2007

Hi marcmc,"The statistics have been updated."Are you shure about that?According to explain both Tmp_Reported_Claims_YTD and Tmp_Reported_Claims_YTD1 just got a single row.Of course the optimizer is then doing a Product Join with that single row.And although you prefer not to change the query, i would suggest to rewrite the WHERE ;-)SELECT count(distinct a.claim_ref_id)FROM fat_bse_cl_pay_cre_trans aJOIN RRtv_lu_day bON (a.Rep_reported_date_id = b.Rep_reported_date_id)WHERE b.RRtv_year_id in (2007)AND ( a.claim_ref_id IN (SELECT c.claim_ref_id FROM Tmp_Reported_Claims_YTD c) OR a.claim_ref_id IN (SELECT d.claim_ref_id FROM Tmp_Reported_Claims_YTD1 d) )And are you shure about "count(distinct a.claim_ref_id)"? Would be more efficient without the DISTINCT...Dieter

Dieter

joedsilva 505 posts Joined 07/05
29 Jun 2007

I am also wondering if the conditionAND b.RRtv_year_id in (2007)had an impact on the performance, it was being evaluated separately for each of the OR conditions, which was an over kill ......had it been outside the OR conditions. ( well ideally I would put in the join condition itself ) like this.fat_bse_cl_pay_cre_trans a INNER JOIN RRtv_lu_day b ON a.Rep_reported_date_id = b.Rep_reported_date_id AND b.RRtv_year_id = 2007 Or like how Dieter has put, as part of where clause but independent of the OR, depending on your coding styles.I think that would have resulted in only those records being spooled from RRtv_lu_dayAlso turn on diagnostic stats and see if explain says anything about stats recommendations.

marcmc 112 posts Joined 12/05
29 Jun 2007

thanks guys i will look at ball approaches.

You must sign in to leave a comment.