[SGVLUG] Grep "quickie" needed -- searching for hi-bit characters

Christopher Smith x at xman.org
Fri Jan 4 20:59:19 PST 2008


Emerson, Tom (*IC) wrote:
>
> *We have some files that were transferred from one machine to another
> [one of which was a PC], and somewhere in the process, it appears that
> some local-language/"multi-byte" characters got translated to
> multiple-ascii-bytes, which in turn buggered up the record length. 
> Fortunately, these are easy to detect visually as the new values for
> each "byte" of the character are between 128 and 255 and generally
> look like "line noise" when cat'd to the screen.  Unfortunately, the
> files involved are thousands of lines long, so a pure visual search is
> out of the question.*
>
*This all sounds suspiciously like UTF-8 encoding, no? If so, most
unicode libraries have handy routines for this kind of stuff.

--Chris
*


More information about the SGVLUG mailing list