All Forums Database
mandava921 1 post Joined 02/13
21 Mar 2013
How to reduce skew factor on stage tables with NoPI?

Hi,
 
I am new to teradata. I have several stage tables defined as NoPI.
I notice most stage talbes have skew factor > 50. Is there a way to reduce this.
What is the best way to handle this situation for stage tables with NoPI?
Any ideas. please advise.
 
Thanks,
VM.

dnoeth 3319 posts Joined 11/04
21 Mar 2013

How large are those tables?
And how are they loaded?
Dieter

Dieter

jodba 4 posts Joined 08/09
22 Mar 2013

Hi VM,
 
These are probably small tables tables loaded with batch load TD utility.
Even though it is true that your NoPI tables will be loaded in round robin manner in this case, the round robin is applied on block level (not row level).
This implies that in the case of small tables that fit in one (a couple) of blocks, all the data will end up on a single (a couple) of AMPs. That should explain the high skew factor.
 
I would recommend distributing those tables on appropriate PI column.
You won't get any benefits on FastLoading a small NoPI table.
 
Cheers,
Igor

Sojib2BD 2 posts Joined 01/13
05 Aug 2013

I have same problem.
My Table size 480 GigaByte and  skew factor > 78 . It is 7 column  table(Using 3 Column as primary Index).
I create also NoPI table but still skew factor is same.
 
How i reduce it ? Please help me.

CarlosAL 357 posts Joined 04/08
05 Aug 2013

If you created the NoPI table from the PI Table the skew will be the same (AMP local). You must create the NoPI table EMPTY and load it from scratch. This should guarantee a good distribution (e.g.: low skew).
HTH.
Cheers.
Carlos.

dnoeth 3319 posts Joined 11/04
07 Aug 2013

In TD14 there's also a new option for INSERT/SELECT into NoPI tables, HASH BY RANDOM, to reduce skew.
 
Dieter

Dieter

manju2399 2 posts Joined 02/14
18 Mar 2014

Hi,
I have a table in PRON which is having 3 Billion data. There are 4 Primary Index column, Partition on Date and secondary Index. when I checked the Distint Primary Indix on the combination of 4 columns. it is aboly 2.98 Billion. Percentage of duplicates is about 2%.
But the Skew in the table is 54. Can you plese explain how reduse the Skew factor.
 
Appriciate your help.
 
Thanks,
Manju

Harpreet Singh 93 posts Joined 10/11
18 Mar 2014

Hi Manju,
Secondary index will not impact skew factor.
Skew factor can be reduced by choosing more evenly distributed set of columns. if your 2% is going to one amp then you can find out what distinguishs it in other columns from each other.
But reducing skew factor by choosing more appropriate Primary index column set will not help Teradata performance unless it is used in SQLs. Primary index needs to be mostly usable in sqls as whole and not partially and nearly evenly distributed.
Thanks
Harpreet

manju2399 2 posts Joined 02/14
20 Mar 2014

Thanks Harpeet ...

raj786 18 posts Joined 04/14
16 Apr 2014

hi 
can we take a column which is declare as compress(0) as PI  column.
max how many column we can go for combination  to find better pi for skew

krishaneesh 138 posts Joined 04/13
16 Apr 2014

Tajmul, compressed columns are not allowed in the PI column list. you can have any number of columns to be defined as the PI but for choosing the better PI, the skew should not be the only factor. instead we should look for what are the most frequently joined columns, what are the updates happening on the table and on what columns and other demographics.  uniquness is only for the distribution factor but if the table is huge and you have a PI which is not part of the Join, then there will be more resource consumed in redistributing the table in the database.

raj786 18 posts Joined 04/14
16 Apr 2014

thanks krish  for quick response
it is very helpfull for me .

raj786 18 posts Joined 04/14
16 Apr 2014

hi
can you tell about the use of collect stats  in  after choosing new pi
//taj

Adeel Chaudhry 730 posts Joined 04/08
16 Apr 2014

Collect stats should be used once the data in underlying tables have changed significantly. It is usually done at the end of running daily/weekly batch. It gives a better view for the system to define join paths, table redistributions etc.
 
Folloing is the syntax of collect stats command:
 
COLLECT STATISTICS ON tablename COLUMN columnname;

-- If you are stuck at something .... consider it an opportunity to think anew.

raj786 18 posts Joined 04/14
17 Apr 2014

thanks for ur response

sk8s3i 13 posts Joined 06/13
2 weeks ago

Hi All,
 
I have table having 75 rows with UPI. Still the skew factor is 77.
What can be the reason for this high skew factor and how possibly can I reduce this?
 
Thanks,
Shardul

ToddAWalter 154 posts Joined 10/11
2 weeks ago

skew factor should be ignored for small tables. Until there are enough rows in a table to have several data blocks per AMP, the probabilities of the hash distribution are just not going to make the data even on every amp. And the impact of skew on the small tables is negligible.
 
If the small tables are used in joins subqueries,... make sure to collects stats on the table because the optimizer will not be able discover the size and distribution accurately without stats.

sk8s3i 13 posts Joined 06/13
2 weeks ago

Thank you Todd. Appreciate your help.

sk8s3i 13 posts Joined 06/13
2 weeks ago

Todd, I have one doubt though. The number of AMPs we have is 72.
Should not the data be distributed "almost" evenly?
 

ToddAWalter 154 posts Joined 10/11
2 weeks ago

The PI values are put through a hash algorithm and a part of the hash is taken as the map to the amp for each row. When you have this small a sample, it is very likely that there will be amps with a few rows and others with none. A small change with this few rows will have a big impact on the skew factor. This will even out with a large set of records but probabilities on such a small set are not going to result in an even distribution nor should they be expected to.
 
You can check it yourself by SELECT HASHAMP(HASHBUCKET(HASHROW(<pi fields>))) as ampno, count(*) group by 1, order by ampno desc; to see how the rows are distributed.

You must sign in to leave a comment.