Hi. What is the need of collect statistics in teradata? Can anyone explain with suitable examples? Also, How can we choose the collect statistics columns?
Collect stats is an important concept in teradata, collect stats gives PE to come up with a plan with least cost for an requested query. Collect stats defines the confidence level of PE in estimating "how many rows it is goin to access ? how many unique values does a table have , null values etc and all this info is stored in data dictionary. Once you submit a query in TD the parsing engine checks if the stats are aviable for the requested table , if it has collected stats earlier PE generates a plan with "high confidence" . in absence of collect stats plan will ne with "low confidence" . however teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table,coulmn,indexes PE does an "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has , based upon this PE ( knows datademographics & available system componets) estimates the workload and generates a plan.hope this info helps.
Thanks Samp for your explanation. Thanks a lot !!!
I want to write TD-12 certification Exams.
Could you please share with me the details about Exam, preparation Material, Question Paper pattern, time, etc....
please share the certification material if you have any, my e-mail: firstname.lastname@example.org
Thanks & Regards
collect stats is very very import concept in teradata. with collect stats the optimizer desides the path for the data retrival(i.e low cost plan) and sends it to the amps.
not only collecting stats once . it should be refresh frequently when ever the data is insert into particular table. it is better to collect stats on columns.
if you collect stats on index level if once the indexs are deleted the stats also will be dropped. then the reterival of the data is slow.
syntax for collect stats is
collect stats on emp index(empid);
There is a mention of "least cost" in this discussion.
How do we calculate the cost of a Query? could it be calculated in numbers? then what is the unit of measurement?
Some doubts in statistics collection using Teradata Statistics Wizard:-
1. Stale stats are out-dated stats .... e.g stats are not collected for a long period of time while data has been changing in the table .... so we have stale stats that needs to be refreshed or recollected.
2. Ideally, for all tables which are part of workload.
3. No, stats collected for a particular query can't be stale for some queries as there 'stale-ness' depends on the data in the table, not the query.
4. Not exactly .... it is advisable to collect stats for all tables that are loaded frequently .... i.e. the tables which have changes in data frequently.
-- If you are stuck at something .... consider it an opportunity to think anew.
Thank you Adeel for your response...
Few more doubts:-
For example, I have two queries involving complex joins. Query 1 is dependent on tables A, B, C and D whereas query 2 is dependent on tables B, C and E.
Tables B and C both are common. But the columns of joins being used (of tables B and C) in query 1 are different from columns of joins being used (of tables B and C) in query 2.
If we get recommendations using Teradata Statistics Wizard for Query 1, it may be different for query 2. So on executing the recommendations for each query, we end up with statistics being collected on tables B and C on columns which are present in join conditions of Query 1 but not for query 2. In other words staistics collected for query 1 won't be useful for query 2 or vice-versa. Going forward we may have statistics collected on almost every column in the table.
Should we collect statistics for column involved in joins?
Would same affect the performance of the query?
Or we can just collect statistics on all tables in producton that are being loaded frequently, irresepective of the queries being fired in database. Because in one or other way the queries would fetch data from the tables only which would statistics collected on them?
Collection of unnessasary stats does have a downside, but considering your scenario, it shouldnt be an issue.
Moverover, it is always an evolving thing to come up with the best possible stats to collect. So, its better to analyze for a time-being and then tweak, add or remove stats.