[SGVLUG] Help with file format

Emerson, Tom Tom.Emerson at wbconsultant.com
Wed Nov 29 10:21:45 PST 2006


> -----Original Message----- Of James Neff
> 
> We received a file from a customer and I'm having trouble 
> determine what the character set is.
> 
> When I run the "file" utility:
> 
> [root at appserver2 06-11-28]# file customer-file.txt
> customer-file.txt: MPEG ADTS, layer I, v1,  96 kBits, 44.1 kHz, Stereo

It thinks it's music? (audio?)
 
> When I run "less"  it thinks its a binary file and I see garbage if I 
> choose to look at it anyway.
> 
> When I run "vi" I can read the file just fine from start to 
> finish but 

Even past line 15103? (the problem line you note below)

> When I run "more" I can read the file just fine from start to finish.

I was going to suggest passing it through ed (or sed) since it appears
that program can read the file and gets past whatever "problem" exists
at line 15103, but maybe more would work better?  [can more even be used
as a filter?]

(hmmm, answering my own semi-rhetorical question here, it looks like it
can -- per the man page I was reading, if stdout is "not a terminal",
the "lines per page is infinite", but in any case you can also use "-n"
to specify how many lines to display, so if it's more than you expect,
it should pass the entire file through, right?)

> What started this problem was when we tried to import this 
> into our MS 
> SQL database using DTS.  At line 15103 the DTS reported an 
> error saying 
> there were extra columns in that record.  When we first opened DTS it 
> reported the file is in UNICODE.   How would I go about 
> verifying that?

Does this database contain any "BLOB" columns?  It would be very
unusual, but not something I would consider "impossible", that some
arbitrary binary data *might* appear to be an
end-of-record/field/whatever when in fact it is not, causing the import
routine to fall over.

Outside of that, the only other thoughts I have would be transmission
corruption (yeah, should be pretty rare nowadays...) or the original
file is hosed.  Can you get a checksum from the client and compare?
(md5, perhaps...) Of course, if this file is "cross-linked" on the
customer's system [the first part is the text to be imported, the
remainder might be the audio file that "file" reports it to be], then a
checksum wouldn't do much good (other than rule out a problem during
transmission or storing on your side)


More information about the SGVLUG mailing list