[SGVLUG] Grep "quickie" needed -- searching for hi-bit characters

Claude Felizardo cafelizardo at gmail.com
Fri Jan 4 17:03:36 PST 2008


On Jan 4, 2008 4:13 PM, Emerson, Tom (*IC) <Tom.Emerson at wbconsultant.com> wrote:
> > -----Original Message----- Of Claude Felizardo
> > On Jan 4, 2008 3:57 PM, Emerson, Tom (*IC)
> > >
> > > What would I use as a regex to find characters with a byte (ascii)
> > > value > 127?
> >
> > sounds like you should be using sed or perl.
> > can't think of the regex right now but if it's suppose to be
> > regular text, what about just running the files through strings?
>
> I need to find the lines that have "odd" characters to edit (remove)
> those characters.  Out of the many-thousand-lines of input, there are
> maybe half a dozen lines where there are/were "bad" characters.  I don't
> think "strings" will help here as the rest of the lines will have "good"
> text in them.  (these are movie titles included as part of the data "for
> the benefit of humans" reviewing the file, however the file /format/ is
> a fixed-field format, so when "whatever" translated the "local"
> characters into multiple-byte values, it shifted the remaining fields,
> causing a loader error)

do you know what the specific characters are? If so maybe you can use
the tr command to translate the bad characters into good chars:

cat file | tr '\012' '='

back to grep, what about searching for not normal chars like
[^[:print:]] ?  this might be egrep.  or you might have to spell it
out.

if you have fixed length records, can you search for lines that are
greater than n?
like toss it into VI and go to the end of the line and scroll down
looking for lines that are longer than they are suppose to be?

how are you going to edit this?  vi?
how many files?
is this likely to happen again in which case would it make sense to
create a script to clean things?

oh, i see that you have sent another message...


More information about the SGVLUG mailing list