All Forums Database
03 Aug 2013
PPI or NUSI?

Hi Experts,
Can you pls  give your suggestions for the below requirement-->

Requirement-->

 

There is a table which would be having history of 24 months of data at any point of time.

 

there is a requirement that users would be querying based on a month and year only  for their reports ,not on the PI of the table.

 

Any month would be having appx. 120 GB data (after looking at the volume). 

 

 

 

Solution -->

 

we are planning to have column (yr_mon) in table which would be storing the  month & year so that there is no need to derive the month and year  during the run time in a query.

 

We were planning to make yr_mon as PPI of this new table but as we know that in order to make the queries efficient pi and ppi should be

 

used together in order for the faster retreival however in this case User would only be knowing the month and year for querying hence

 

there would be a redistribution in absence of PI in queries,hence not going for PPI.

 

 

 

Finally we are thinking of making a NUSI on (yr_mon) column for fast retreival on this column alone.

 

 

Your opinions on the above pls.. thanks in advance..

 

cheers!

 

Nishant

 

 

 

 

 

 

 

Tags:
dnoeth 2695 posts Joined 11/04
04 Aug 2013

If you add a yr_mon column you probably keep your date column resulting in doubling the diskspace, 4 (date) -> 8 (date+int) bytes.
If users want to access a year they still have to write a condition like BETWEEN 201201 and 201212. If you don't want your user to deal with some basic calculation like BETWEEN DATE '2012-01-01' and DATE '2012-12-31' you might easily hide that month-begin/end calculation in a SQL-UDF: BETWEEN month_begin(201201) and month_end(201201).

 
We were planning to make yr_mon as PPI of this new table but as we know that in order to make the queries efficient pi and ppi should be used together in order for the faster retreival however in this case User would only be knowing the month and year for querying hence there would be a redistribution in absence of PI in queries,hence not going for PPI.
 

This is a confusing sentence, especially without any punctuation. 
When the partitioning column is used in the WHERE-condition you get partition elimination regardless if it's part of the PI or not.
But PI-access/joins without additionally specifying the partitioning might be slower (much slower when the number of partitions is large).
"redistribution" is totally independant of that.
 
The selectivity of a NUSI on yr_mon is quite low, the optimizer will probably not use it.
 
 
Dieter
 
 

Dieter

05 Aug 2013

thanks Dieter for your suggestions..
Apologies for any confusing sentence. I will reteirate it.
 
I mean that if we keep (date) column as a partitioned column and Primary Index (some other column) then as users has mentioned that only  they would be qyerying on the date column with range of 1 month ..
then in case , access with partitioned column alone (with out PI) will not be that much efficient as it would have been with PI + partioned column.. agree?
so can we do some thing else here for performance tuning..
 
thanks in advance.
cheers!
Regards,
Nishant
 
 
 

dnoeth 2695 posts Joined 11/04
07 Aug 2013

Hi Nishant,

access with partitioned column alone (with out PI) will not be that much efficient as it would have been with PI + partioned column.. agree?

No. When only the partitioning column is used the performance is exacty the same regardless if the partitioning column is also part of the PI or not.
There's a lot of details about PPI in the "Database Design" manual,  Chapter 9: Primary Indexes and NoPI Objects
 
Dieter

Dieter

TD_Raj 41 posts Joined 05/10
07 Aug 2013

Hi Nishant
If you are creating your date column as PPI , then using the column in filter condition will lead to partition elimination and will enhance the performace. You may come across situation when it is not always possible to query on PI column so thats ok. However if you have created PPI on the table then you should attempt to use PPI column in your query for better results.
PPI + PI combination to use all the time in query is not possible. So its ok if user query on some other column which is not PI but actually PPI.
--
Raj

15 Aug 2013

Thanks dieter & Raj,

 

 

 

Just to elaborate more ,mentioning below the example-->

 

 

 

Create table Orders

 

(orderid number NOT NULL,

 

order_yr_mon integer ,

 

order name varchar(100),

 

city varchar(100)

 

)

 

primary index (orderid)

 

PARTITION BY RANGE_N(Order_year_mon BETWEEN DATE '2009-12-06' AND DATE '2020-12-31' EACH INTERVAL '1' month);

 

Now if users query like this

 

Select * from Order where order_year_mon ='201207' (with out order id)

 

so my question is would this query would give the partiton elimination or as efficient if in case users query with orderid and order_year_mon both?

 

 

As users are insisting that they will query with order_year_mon only. . 

 

Your comments on above pls.

 

thanks!

 

cheers!

dnoeth 2695 posts Joined 11/04
15 Aug 2013

Hi Nishant,
why don't you simply try it yourself and explain the query?
You would have noticed that your DDL will fail.
 
Dieter

Dieter

You must sign in to leave a comment.