All Forums Tools
Gurunathan 9 posts Joined 09/09
25 Sep 2009
Need of Collect Statistics

Hi. What is the need of collect statistics in teradata? Can anyone explain with suitable examples? Also, How can we choose the collect statistics columns?

samp6050 16 posts Joined 06/08
29 Sep 2009

Collect stats is an important concept in teradata, collect stats gives PE to come up with a plan with least cost for an requested query. Collect stats defines the confidence level of PE in estimating "how many rows it is goin to access ? how many unique values does a table have , null values etc and all this info is stored in data dictionary. Once you submit a query in TD the parsing engine checks if the stats are aviable for the requested table , if it has collected stats earlier PE generates a plan with "high confidence" . in absence of collect stats plan will ne with "low confidence" . however teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table,coulmn,indexes PE does an "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has , based upon this PE ( knows datademographics & available system componets) estimates the workload and generates a plan.hope this info helps.

Regards,
Sam

Gurunathan 9 posts Joined 09/09
13 Oct 2009

Thanks Samp for your explanation. Thanks a lot !!!

praveen32 1 post Joined 10/11
19 Oct 2011

Hi Sam,

I want to write TD-12 certification Exams.

Could you please share with me the details about Exam, preparation Material, Question Paper pattern, time, etc....

please share the certification material if you have any, my e-mail: pkumar.td@gmail.com

 

Thanks & Regards

Praveen

 

 

25 Oct 2011

collect stats is very very import concept in teradata. with collect stats the optimizer desides the path for the data retrival(i.e low cost plan) and sends it to the amps.

not only collecting stats once . it should be refresh frequently when ever the data is insert into particular table. it is better to collect stats on columns.

if you collect stats on index level if once the indexs are deleted the stats also will be dropped. then the reterival of the  data is slow.

 

syntax for collect stats is

 

collect stats on emp  index(empid);

vigneshm247 4 posts Joined 06/12
19 Jun 2012

Hi, 

 

There is a mention of "least cost" in this discussion. 

 

How do we calculate the cost of a Query? could it be calculated in numbers? then what is the unit of measurement?

Shrey Takkar 2 posts Joined 06/12
30 Jun 2012

Hi All,

Some doubts in statistics collection using Teradata Statistics Wizard:-

  1. What are stale statistics?
  2. Collection of statistics is advisable to be collected as per recommendation for a particular workload or for all tables which are part of the workload?
  3. If I define a workload using "Create from SQL statement" and then get recommendations for collecting statistics on that particular workload, recommendation would include statistics collection on columns/ indexes for the query mentioned in workload. Here it might happen that statistics are collected on some columns/ indexes which may be not required for other queries. In other words, would the statistics collected for a particular query be stale for other queries?
  4. Is it advisable to collect statistics for all tables that are frequently used regularly?
Adeel Chaudhry 448 posts Joined 04/08
02 Jul 2012

1. Stale stats are out-dated stats .... e.g stats are not collected for a long period of time while data has been changing in the table .... so we have stale stats that needs to be refreshed or recollected.

 

2. Ideally, for all tables which are part of workload.

 

3. No, stats collected for a particular query can't be stale for some queries as there 'stale-ness' depends on the data in the table, not the query.

 

4. Not exactly .... it is advisable to collect stats for all tables that are loaded frequently .... i.e. the tables which have changes in data frequently.

 

HTH!

-- If you are stuck at something .... consider it an opportunity to think anew.

Shrey Takkar 2 posts Joined 06/12
08 Jul 2012

Thank you Adeel for your response...

Few more doubts:-

For example, I have two queries involving complex joins. Query 1 is dependent on tables A, B, C and D whereas query 2 is dependent on tables B, C and E.

Tables B and C both are common. But the columns of joins being used (of tables B and C) in query 1 are different from columns of joins being used (of tables B and C) in query 2.

If we get recommendations using Teradata Statistics Wizard for Query 1, it may be different for query 2. So on executing the recommendations for each query, we end up with statistics being collected on tables B and C on columns which are present in join conditions of Query 1 but not for query 2. In other words staistics collected for query 1 won't be useful for query 2 or vice-versa. Going forward we may have statistics collected on almost every column in the table.

Should we collect statistics for column involved in joins?

Would same affect the performance of the query?

Or we can just collect statistics on all tables in producton that are being loaded frequently, irresepective of the queries being fired in database. Because in one or other way the queries would fetch data from the tables only which would statistics collected on them?

Adeel Chaudhry 448 posts Joined 04/08
2 months ago

Collection of unnessasary stats does have a downside, but considering your scenario, it shouldnt be an issue.
 
Moverover, it is always an evolving thing to come up with the best possible stats to collect. So, its better to analyze for a time-being and then tweak, add or remove stats.
 
HTH!

-- If you are stuck at something .... consider it an opportunity to think anew.

You must sign in to leave a comment.