Logically the JOIN is processed first creating an intermediate result set which is then filtered by the WHERE. Any comparison to NULL results in UNKOWN, thus "WHERE b.status<>'UNKNOWN'" removes the previously created rows with NULLs.
When you read the 2nd Explain you'll notice that the optimizer is smart and removed the Outer join and replaced it by an Inner join.
Depending on the expected result set you probably need to
- move the condition into the ON-clause
- or add "OR b.Status IS NULL"
Do a global search in the manuals for "outer join case study", this chapter is a rewritten from an article in the old Teradata Magazine named "A lesson on Outer Joins. Learned the hard way" :-)
Dieter
There's a three-way logic in SQL: TRUE, FALSE, UNKNOWN.
Any comparison (other than IS NULL) involving a NULL results in UNKNOWN, which is quite similar to FALSE in most cases (but NOT UNKNOWN is still UNKNOWN).
This is not how Teradata works, it's how each and every relational DBMS works, it's one of the principles of the relational model.
Dieter
I have a question on another query.
CREATE TABLE t1
(
x CHAR(7) ,
y CHAR(7) ,
z CHAR(4) );
CREATE TABLE sr.t2
(
y CHAR(7) ,
z CHAR(8)
);
Replace VIEW v_t2
(
y
,DEL_FLG
) AS
SELECT
y
,CASE WHEN z < '20110530' THEN 'D'
ELSE '0' END
FROM sr.t2;
insert sr.t1 ('1234567','1234','1234');
select
A.x, A.y,
B.y, B.z
FROM
sr.t1 AS A
LEFT OUTER JOIN sr.t2 AS B
ON ( A.y=B.y ) where
B.z<>'D';
Can you please let me know whether the query should return any rows.
How will be the order of evaluation here .
first will it evaluate join and then apply where clause or vice versa.
Note that the second table/view does not have any rows.
Can anyone help in this.
Why don't you simply run the query?
The logical order of evaluation is always the same:
JOIN -> WHERE -> GROUP BY -> HAVING -> OLAP -> QUALIFY -> SAMPLE -> ORDER BY
Your select will return no rows.
Intermediate set after JOIN:
'1234567','1234',NULL,NULL
WHERE removes the row as NULL<>'D' results in UNKNOWN.
I don't know what you actually wanted to know, as you create a view, which is not used?
Did you mean to access the view instead of the table?
Then the result is still empty, logically the view's result set is created first and then used (just like a Derived Table).
Dieter
Actually my query returns a row in V2R6 base. In TD12, same query returnsno rows.
Can you please explain the V2R6 scenario.Is it giving incorret results in V2R6.
@Mahs
It's a logical order. As long as the result set is the same the optimizer might apply filters in an earlier step.
@sravani
Which query, the one you posted or the one based on the view?
IIRC there was an issue with Case in Outer Joins in older releases.
Could you compare the explains if they are different?
Dieter
@Mahs
Btw, it's not an Outer Join, the optimizer replaced with an Inner :-)
And for Inner Joins there's no difference between placing a filter in ON or WHERE.
Dieter
Thanks dnoeth.
I am talking about query based on view. Sorry , here is the query :
Yes the explains are different.
on V2R6.
select
A.x, A.y,
B.y, B.DEL_FLG
FROM
sr.t1 AS A
LEFT OUTER JOIN sr.v_t2 AS B
ON ( A.y=B.y ) WHERE
B.DEL_FLG<>'D';
*** Query completed. One row found. 4 columns returned.
*** Total elapsed time was 1 second.
x y y DEL_FLG
------- ------- ------- -------
1234567 1234 ? 0
4) We do an all-AMPs RETRIEVE step from sr.A by way of an all-rows
scan with no residual conditions into Spool 2 (all_amps), which is
redistributed by hash code to all AMPs. Then we do a SORT to
order Spool 2 by row hash. The size of Spool 2 is estimated with
low confidence to be 4 rows. The estimated time for this step is
0.01 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to sr.t2 by way of a RowHash
match scan. Spool 2 and sr.t2 are left outer joined using a merge
join, with a join condition of ("y = sr.t2.y"). The result goes
into Spool 3 (all_amps), which is built locally on the AMPs. The
size of Spool 3 is estimated with index join confidence to be 8
rows. The estimated time for this step is 0.04 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan with a condition of ("(( CASE WHEN (z <
'20110530') THEN ('D') ELSE ('0') END ))<> 'D'") into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with index join confidence to be 8 rows. The
estimated time for this step is 0.07 seconds.
on TD12:
select
A.x, A.y,
B.y, B.DEL_FLG
FROM
sr.t1 AS A
LEFT OUTER JOIN sr.v_t2 AS B
ON ( A.y=B.y ) WHERE
B.DEL_FLG<>'D';
*** Query completed. No rows found.
*** Total elapsed time was 1 second.
4) We do an all-AMPs RETRIEVE step from sr.A by way of an all-rows
scan with a condition of ("NOT (sr.A.y IS NULL)") into Spool 3
(all_amps), which is redistributed by the hash code of (sr.A.y) to
all AMPs. Then we do a SORT to order Spool 3 by row hash. The
size of Spool 3 is estimated with no confidence to be 2 rows (54
bytes). The estimated time for this step is 0.01 seconds.
5) We do an all-AMPs JOIN step from sr.t2 in view v_t2 by way of a
RowHash match scan with a condition of ("(( CASE WHEN (sr.t2 in
view v_t2.z < '20110530') THEN ('D') ELSE ('0') END ))<> 'D'"),
which is joined to Spool 3 (Last Use) by way of a RowHash match
scan. sr.t2 and Spool 3 are joined using a merge join, with a
join condition of ("y = sr.t2.y"). The result goes into Spool 2
(group_amps), which is built locally on the AMPs. The size of
Spool 2 is estimated with no confidence to be 2 rows (86 bytes).
The estimated time for this step is 0.05 seconds.
Hi Dieter,
I ran this query using a test table created and try to see whether any inner joins are present in explain plan or not, but I could not find them. Possibly I am reading the explain plan incorrectly. I am fairly new to teradata with out much hands on the SQL off late. can you please let me know where i am missing..
EXPLAIN SELECT a.Emp_no,a.Emp_name,b.Emp_no,b.status FROM
Outer_table a LEFT JOIN Inner_table b ON a.Emp_no=b.Emp_no
WHERE
b.status<>'UNKNOWN'
ORDER BY 1;
1) FIRST, we LOCK a DISTINCT "pseudo table"
FOR READ ON a RowHash TO prevent GLOBAL deadlock FOR
b.
2) NEXT, we LOCK a DISTINCT "pseudo table"
FOR READ ON a RowHash TO prevent GLOBAL deadlock FOR
a.
3) We LOCK b FOR READ, AND we LOCK
a FOR READ.
4) We DO an ALL-AMPs JOIN step FROM b BY
way OF a RowHash match scan WITH a CONDITION OF (
"b.status <> 'UNKNOWN'"), which IS
joined TO a BY way OF a RowHash match
scan WITH NO residual conditions. b
AND a are joined USING a MERGE JOIN,
WITH a JOIN CONDITION OF ("a.EMP_NO =
b.EMP_NO"). The RESULT goes INTO SPOOL
1 (all_amps), which IS built locally ON the AMPs. THEN we DO a
SORT TO ORDER SPOOL 1 BY the sort KEY IN SPOOL field1 (
a.EMP_NO). The SIZE OF SPOOL 1 IS
estimated WITH NO confidence TO be 692 ROWS (36,676 BYTES). The
estimated TIME FOR this step IS 0.04 seconds.
5) Finally, we send OUT an END TRANSACTION step TO ALL AMPs involved
IN processing the REQUEST.
-> The contents OF SPOOL 1 are sent back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.04 seconds.
outer joins can have a butterfly effect & unintended consequences.
The Rolf inspired case study that Dieter refernced is a good one and can be found at this link:
http://www.teradataforum.com/teradata_pdf/b035-1101-061a_6.pdf
Thanks to Dieter and Rolf.


Hi,
I have two tables called outer_table & inner_table
outer_table:
select * from outer_table order by 1;
*** Query completed. 4 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
Emp_no Emp_name
----------- --------
1 ram
2 shyam
3 tom
4 dam
Inner_table:
select * from inner_table order by 1;
*** Query completed. 4 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
Emp_no status
----------- ----------
1 ALIVE
2 DEAD
5 ALIVE
6 UNKNOWN
CASE 1:
SELECT a.Emp_no,a.Emp_name,b.Emp_no,b.status FROM
Outer_table a LEFT JOIN Inner_table b ON a.Emp_no=b.Emp_no
ORDER BY 1;
*** Query completed. 4 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
Emp_no Emp_name Emp_no status
----------- -------- ----------- ----------
1 ram 1 ALIVE
2 shyam 2 DEAD
3 tom ? ?
4 dam ? ?
CASE 2:
SELECT a.Emp_no,a.Emp_name,b.Emp_no,b.status FROM
Outer_table a LEFT JOIN Inner_table b ON a.Emp_no=b.Emp_no
WHERE
b.status<>'UNKNOWN'
ORDER BY 1;
*** Query completed. 2 rows found. 4 columns returned.
*** Total elapsed time was 1 second.
Emp_no Emp_name Emp_no status
----------- -------- ----------- ----------
1 ram 1 ALIVE
2 shyam 2 DEAD
I am quite sure how CASE 1 is working but seems to be lost looking at CASE 2.Please explain.(May be i am overlooking some thing)