Hi all,I'm currently working on a data cleaning exercise, consolidating customers who have separate ID's, but are actually duplicates. The data are manually entered and therefore minor variations in company name may occur.I can perform the analysis outside of Teradata, but the estimated time to completion is literally several days.Is there an easy 'sounds-like' phonetic type of function in Teradata (that is supported by Teradata SQL) that could, for example, match "phone-shop" and "fone shop". I'd like to use this type of feature in a join.Maybe a pipe dream...ThanksTim
Yes, Teradata has a function called SOUNDEX does what you're looking for. If two strings have the same soundex value, then they are phonetically equivalent.Good luck!
Barry,What you say about it ?SELECT case when SOUNDEX('phone') = SOUNDEX('fone') then 'both equal' else 'both not equal :-( ' endRegards,M.Adeel Sabir
Could be how 'fone' is perceived to be pronounced by TDSELECT case when SOUNDEX('fawn') = SOUNDEX('fone') then 'both equal' else 'both not equal ' endboth equalTbob
From the manual, it does look like the intent was to be used on surnames....Soundex is a system that codes surnames having the same or similar sounds, but variantspellings. The Soundex system was first used by the National Archives in 1880 to index theUnited States census.Soundex codes begin with the first letter of the surname followed by a three-digit code. Zerosare added to names that do not have enough letters.So, maybe you wouldn't want to apply it to other words.I believe that it's the only function that Teradata has that does something like the phonetic comparison, though.
Thanks everyone, I'll give SOUNDEX a try.Tim
Hi Tim,if you're on V2R5.1+ this could be solved with a UDF implementing a more advanced phonetic algorithm, e.g. metaphone. You just have to find an implentation in C and wrap it into the UDF...Dieter
This is an old post....and we are on TD 14.0 now. Do we have any inbuilt functionality in Teradata that simulates Sybase's difference function? Or the only option is building a custom UDF?
-- Sybase Example 1
select difference("smithers", "smothers")
-- Sybase Example 2
select difference("smothers", "brothers")
It is basically a difference between two soundex values and returns a value from 0 to 4. The best match is 4.
-- Teradata Example 1
WHEN SOUNDEX('smithers') = SOUNDEX('smothers')
-- Returns 4
-- Teradata Example 2
WHEN SOUNDEX('smithers') = SOUNDEX('brothers')
-- Returns 0