[SGVLUG] Grep "quickie" needed -- searching for hi-bit characters

Claude Felizardo cafelizardo at gmail.com
Fri Jan 4 16:02:55 PST 2008


On Jan 4, 2008 3:57 PM, Emerson, Tom (*IC) <Tom.Emerson at wbconsultant.com> wrote:
>
>
>
> I've got an odd one here -- I know how I'd do this on an HP using some
> proprietary tools I've used for the last 15 years, but this is on a *nix
> system so I need to know how to do this using grep.
>
> We have some files that were transferred from one machine to another [one of
> which was a PC], and somewhere in the process, it appears that some
> local-language/"multi-byte" characters got translated to
> multiple-ascii-bytes, which in turn buggered up the record length.
> Fortunately, these are easy to detect visually as the new values for each
> "byte" of the character are between 128 and 255 and generally look like
> "line noise" when cat'd to the screen.  Unfortunately, the files involved
> are thousands of lines long, so a pure visual search is out of the question.
>
> What would I use as a regex to find characters with a byte (ascii) value >
> 127?

sounds like you should be using sed or perl.
can't think of the regex right now but if it's suppose to be regular
text, what about just running the files through strings?


More information about the SGVLUG mailing list