All Forums Database
Poongundrandk 9 posts Joined 08/11
10 Apr 2012
Query Performance - hash join

I am trying to improve the performance for the below query . Can anyone suggest better approach to optimize this.

PSTN_SNPS_TBL big table - will have 40million number of records per day. Have partition on Client ID and EDW_SNPS_DT.And we have 60Days of history data in this table. PI is ACCT_ID,SECR_ID.

TESTBOX.SECR_MSTR FINT  => this table have 4 million of data. PI is SECR_ID.
 

It seems it is joing  for hash join and both  bigger  and smaller table is broken into hash partitions in memory  and do Join.

When I run this query it takes approximately 25 - 30 secs .

SELECT *
FROM
PSTN_SNPS_TBL PSTN
INNER JOIN
TESTBOX.SECR_MSTR FINT
ON PSTN.CUSIP_ID = SECR.SECR_ID
AND SECR.CUR_RCD_IN='Y'
WHERE
PSTN.CLIENT_ID = '123'
AND EDW_SNPS_DT = '2012-04-05'

  1) First, we lock TESTBOX.SECR_MSTR in view
     TESTBOX.SECR_MSTR for access, and we lock
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL for access.
  2) Next, we do an all-AMPs RETRIEVE step from
     TESTBOX.SECR_MSTR in view TESTBOX.SECR_MSTR
     by way of an all-rows scan with a condition of (
     "TESTBOX.SECR_MSTR in view
     TESTBOX.SECR_MSTR.CUR_RCD_IN = 'Y'") into Spool 4
     (all_amps) (compressed columns allowed) fanned out into 9 hash
     join partitions, which is built locally on the AMPs.  The size of
     Spool 4 is estimated with high confidence to be 3,625,017 rows (
     1,953,884,163 bytes).  The estimated time for this step is 39.36
     seconds.
  3) We do an all-AMPs RETRIEVE step from a single partition of
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL with a condition of (
     "TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.CLIENT_ID = '123 ' ,
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE '2012-04-05'")
     with a residual condition of ("(TESTBOX.PSTN_SNPS_TBL in
     view TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE
     '2012-04-05') AND (TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.CLIENT_ID = '123 ')") into Spool 5
     (all_amps) (compressed columns allowed) fanned out into 9 hash
     join partitions, which is redistributed by the hash code of (
     TESTBOX.PSTN_SNPS_TBL.SECR_ID) to all AMPs.  The size
     of Spool 5 is estimated with low confidence to be 27,176,952 rows
     (4,620,081,840 bytes).  The estimated time for this step is 12
     minutes and 16 seconds.
  4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
     all-rows scan, which is joined to Spool 5 (Last Use) by way of an
     all-rows scan.  Spool 4 and Spool 5 are joined using a hash join
     of 9 partitions, with a join condition of ("SECR_ID = SECR_ID").
     The result goes into Spool 3 (group_amps), which is built locally
     on the AMPs.  The size of Spool 3 is estimated with low confidence
     to be 17,590,717 rows (21,636,581,910 bytes).  The estimated time
     for this step is 1 minute and 36 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 3 are sent back to the user as the result of
     statement 1.  The total estimated time is 14 minutes and 31
     seconds.

 

Below is  the stats details

PSTN_SNPS_TBL

12/04/01 18:50:47           10,424,873       ACCT_ID
12/04/01 18:40:41            1,462,897       SECR_ID
12/04/01 17:19:27                1,481          CLIENT_ID
12/04/01 17:34:15                    2             EDW_CUR_RCD_IN
12/04/01 17:56:04                  166            EDW_SNPS_DT
12/04/01 17:15:45          105,974,582       ACCT_ID,SECR_ID
12/04/01 18:33:15        5,929,458,598     ACCT_ID,SECR_ID,EDW_SNPS_DT
12/04/01 19:05:11                9,259         PARTITION

 

TESTBOX.SECR_MSTR FINT

12/04/01 19:00:21            6,843,684   SECR_ID
12/04/01 17:40:29                  181      MNR_PROD_CD
12/04/01 17:06:45                    2       EDW_CUR_RCD_IN
12/04/01 17:40:03                   15     PROD_TY_CD
12/04/01 17:02:06                  184    ASST_TY_CD,ASST_STYP_CD,ASST_SUB_STY
12/04/01 17:43:42                    1    PARTITION
 

Can we do something here   for better performance?

Poongundrandk 9 posts Joined 08/11
10 Apr 2012

Removed * from query.Specified columns.Now there is improvement in plan.But still Joining takes some time.

  1) First, we lock TESTBOX.SECR_MSTR in view
     TESTBOX.SECR_MSTR for access, and we lock
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL for access.
  2) Next, we do an all-AMPs RETRIEVE step from a single partition of
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL with a condition of (
     "TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.IBD_ID = '123 ' ,
     TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE '2012-04-05'")
     with a residual condition of ("(TESTBOX.PSTN_SNPS_TBL in
     view TESTBOX.PSTN_SNPS_TBL.EDW_SNPS_DT = DATE
     '2012-04-05') AND (TESTBOX.PSTN_SNPS_TBL in view
     TESTBOX.PSTN_SNPS_TBL.IBD_ID = '123 ')") into Spool 4
     (all_amps) (compressed columns allowed) fanned out into 33 hash
     join partitions, which is built locally on the AMPs.  The size of
     Spool 4 is estimated with low confidence to be 27,176,952 rows (
     2,554,633,488 bytes).  The estimated time for this step is 12.30
     seconds.
  3) We do an all-AMPs RETRIEVE step from TESTBOX.SECR_MSTR
     in view TESTBOX.SECR_MSTR by way of an all-rows scan
     with a condition of ("TESTBOX.SECR_MSTR in view
     TESTBOX.SECR_MSTR.EDW_CUR_RCD_IN = 'Y'") into Spool 5
     (all_amps) (compressed columns allowed) fanned out into 33 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 5 is estimated with high confidence to be 522,002,448 rows (
     15,660,073,440 bytes).  The estimated time for this step is 1
     minute and 52 seconds.
  4) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
     all-rows scan, which is joined to Spool 5 (Last Use) by way of an
     all-rows scan.  Spool 4 and Spool 5 are joined using a hash join
     of 33 partitions, with a join condition of ("CUSIP_ID = CUSIP_ID").
     The result goes into Spool 3 (group_amps), which is built locally
     on the AMPs.  The size of Spool 3 is estimated with low confidence
     to be 17,590,717 rows (2,867,286,871 bytes).  The estimated time
     for this step is 1 minute and 2 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 3 are sent back to the user as the result of
     statement 1.  The total estimated time is 3 minutes and 7 seconds. 

mjasrotia 58 posts Joined 08/11
11 Apr 2012

Try collecting stats on the PARTITION and (PARTITION, PI) Column if not done already.

 

Thanks

dnoeth 1921 posts Joined 11/04
11 Apr 2012

PARTITION stats already exist, but stats on the partitioning columns are missing:

(CLIENT_ID,EDW_SNPS_DT)

You can always do a DIAGNOSTIC HELPSTATS ON FOR SESSION to see if the optimizer might need additional stats.

Dieter

ahtandon 1 post Joined 06/12
01 Jun 2012

What is the partitioning range on the EDW_SNPS_DT in the table?

You must sign in to leave a comment.