Subject Re: [IBDI] Duplictes query
Author John Culleton
Claudio Valderrama C. wrote:

>> -----Original Message-----
>> From: Roger Pullen [mailto:pullen.roger@...]
>> Sent: Martes 5 de Junio de 2001 10:39
>>
>> I need to identify the ID of those records
>> that are duplicated across the name and address
>> fields, possibly with some degree of mismatch (ie
>> a %LIKE% element)
>
>
> Have a look at
> http://www.cvalde.com/document/mysteriousDbKeyIV.htm
> for ideas.
>
> C.
>
>
I face the same problem and plan to solve it outside of
Interbase. I will first set up each name/address item as a
comma delimited single line in a text file. The order will be
firstname-mi,lastname,streetaddress,
secondaddressline, city,state,zipcode.

Then I will use the unix sort utility and sort by selecting the sort
fields in REVERSE ORDER. In other words I will sort on zipcode, within
zipcode on city, within city on streetaddress, within streetaddress on
lastname. If your data does not use zipcodes then country, postalcode
etc. can be substituted. (In the US zipcode implies state so there is no
need to sort on state.)

If all the data is already in Interbase then the sorting can be done
in the "SELECT" statement.

Then I will use an editor like vim and scan the file manually looking
for potential duplicates and deleting the offending items.

The resulting file can then be resorted on lastname, firstname looking
for additional duplicates (e.g., Culleton, John is listed twice at
almost the same address but with different zipcodes.)

No program can do this scan as well as I can. The LIKE command won't
pick up on things like Caty Street mispelled Katy Street.

HTH

John Culleton