[SGVLUG] How to find files/messages that are "almost" the same

matti mathew_2000 at yahoo.com
Mon Apr 20 11:12:59 PDT 2009



Hmmm... interesting problem

let the world of command line save you!

lol! ;-)

ok!

this IS how I would try to solve the problem

use "diff" - maybe piping to wc or something
and seeing which files are minimally different

I would also only use diff on files close to the
same size, as obviously files of significantly
different sizes are very different.

http://en.wikipedia.org/wiki/Diff

hmm, so probably use "ls -lta" and pipe to a file,
then sort the file based on size of the files,
extract the nearest files basesd on size, diff
those -> use wc to determine how big the difference
is, and then manually look and see if it worked ;)

best
matti

--- On Mon, 4/20/09, Emerson, Tom (*IC) <Tom.Emerson at wbconsultant.com> wrote:

> From: Emerson, Tom (*IC) <Tom.Emerson at wbconsultant.com>
> Subject: [SGVLUG] How to find files/messages that are "almost" the same
> To: "'SGVLUG Discussion List.'" <sgvlug at sgvlug.net>
> Date: Monday, April 20, 2009, 8:50 AM
> One more reason to dislike certain
> email clients: using automation to sort e-mails can end up
> with "duplicates" in multiple folders, however these are
> not-quite-perfect duplicates, so a binary comparison will
> see them as distinct messages when in fact the /content/ is
> the same.
> 
> Does anyone know of a product or program that would ignore
> small differences (such as an extra space at the end of a
> line) when comparing the body/text of a message?
> 
> 
> 


      


More information about the SGVLUG mailing list