[SGVLUG] Grep "quickie" needed -- searching for hi-bit characters

Christopher Smith x at xman.org
Fri Jan 4 20:27:57 PST 2008


Christopher Smith wrote:
> Claude Felizardo wrote:
>   
>> On Jan 4, 2008 3:57 PM, Emerson, Tom (*IC) <Tom.Emerson at wbconsultant.com> wrote:
>>   
>>     
>>> I've got an odd one here -- I know how I'd do this on an HP using some
>>> proprietary tools I've used for the last 15 years, but this is on a *nix
>>> system so I need to know how to do this using grep.
>>>
>>> We have some files that were transferred from one machine to another [one of
>>> which was a PC], and somewhere in the process, it appears that some
>>> local-language/"multi-byte" characters got translated to
>>> multiple-ascii-bytes, which in turn buggered up the record length.
>>> Fortunately, these are easy to detect visually as the new values for each
>>> "byte" of the character are between 128 and 255 and generally look like
>>> "line noise" when cat'd to the screen.  Unfortunately, the files involved
>>> are thousands of lines long, so a pure visual search is out of the question.
>>>
>>> What would I use as a regex to find characters with a byte (ascii) value >
>>> 127?
>>>     
>>>       
>> sounds like you should be using sed or perl.
>> can't think of the regex right now but if it's suppose to be regular
>> text, what about just running the files through strings?
>>   
>>     
> This is simple enough to do in C, let alone perl:
>   
Sorry, I missed making my larger point: this isn't really a regexp
problem. It's a simple boolean logic check. No need to use regexp's. It
potentially slows the code down, and certainly makes it more complex.

--Chris



More information about the SGVLUG mailing list