Subject | Re: [IBDI] Duplictes query |
---|---|
Author | John Culleton |
Post date | 2001-06-06T13:42:42Z |
Claudio Valderrama C. wrote:
Interbase. I will first set up each name/address item as a
comma delimited single line in a text file. The order will be
firstname-mi,lastname,streetaddress,
secondaddressline, city,state,zipcode.
Then I will use the unix sort utility and sort by selecting the sort
fields in REVERSE ORDER. In other words I will sort on zipcode, within
zipcode on city, within city on streetaddress, within streetaddress on
lastname. If your data does not use zipcodes then country, postalcode
etc. can be substituted. (In the US zipcode implies state so there is no
need to sort on state.)
If all the data is already in Interbase then the sorting can be done
in the "SELECT" statement.
Then I will use an editor like vim and scan the file manually looking
for potential duplicates and deleting the offending items.
The resulting file can then be resorted on lastname, firstname looking
for additional duplicates (e.g., Culleton, John is listed twice at
almost the same address but with different zipcodes.)
No program can do this scan as well as I can. The LIKE command won't
pick up on things like Caty Street mispelled Katy Street.
HTH
John Culleton
>> -----Original Message-----I face the same problem and plan to solve it outside of
>> From: Roger Pullen [mailto:pullen.roger@...]
>> Sent: Martes 5 de Junio de 2001 10:39
>>
>> I need to identify the ID of those records
>> that are duplicated across the name and address
>> fields, possibly with some degree of mismatch (ie
>> a %LIKE% element)
>
>
> Have a look at
> http://www.cvalde.com/document/mysteriousDbKeyIV.htm
> for ideas.
>
> C.
>
>
Interbase. I will first set up each name/address item as a
comma delimited single line in a text file. The order will be
firstname-mi,lastname,streetaddress,
secondaddressline, city,state,zipcode.
Then I will use the unix sort utility and sort by selecting the sort
fields in REVERSE ORDER. In other words I will sort on zipcode, within
zipcode on city, within city on streetaddress, within streetaddress on
lastname. If your data does not use zipcodes then country, postalcode
etc. can be substituted. (In the US zipcode implies state so there is no
need to sort on state.)
If all the data is already in Interbase then the sorting can be done
in the "SELECT" statement.
Then I will use an editor like vim and scan the file manually looking
for potential duplicates and deleting the offending items.
The resulting file can then be resorted on lastname, firstname looking
for additional duplicates (e.g., Culleton, John is listed twice at
almost the same address but with different zipcodes.)
No program can do this scan as well as I can. The LIKE command won't
pick up on things like Caty Street mispelled Katy Street.
HTH
John Culleton