[SGVLUG] Robots.txt (was: Paging Greg Stark...)

Emerson, Tom (*IC) Tom.Emerson at wbconsultant.com
Tue Mar 25 18:34:07 PST 2008


-----Original Message----- Matt Campbell

What's involved in writing a robot to strip out the headers for all the
messages in our archive?  That way it would be less invasive to have
everything available through Google.

it is not a robot on our side, but rather instructions to /Google's/
robot (or Yahoo's, Altavista's, or any of the gazillion search engines
out there)  Basically, it is a simple text file that lists the
directories that are "off limits" to web-spiders or "robots".  It is
placed in a known/common location, and all "robots" are /supposed/ to
abide by it.
 
As far as "should our e-mail archive be indexed by the big guys?", I
know there are campers on both sides of this issue, and I'm generally on
the "pro" indexing side of the fence for a simple reason (or two) -- if
someone solves a particularly involved Linux problem on the list, the
next person with that same or similar problem WON'T find the answer if
they aren't a member of our group/list (and even if they are, they have
to THINK about searching our archives in the first place)
 
(the secondary reason is that it increases exposure of our group in
particular -- take your case as a prime example: if you found a suitable
solution to your hard drive problems solely by searching "the net" and
finding our archive AND seeing that we were "local", chances are you
would consider stopping in for a meeting or two, right?)
 
On the "anti" side are folks worried about how they may appear to the
rest of the world should one of their sgvlug posts appear in wider
circulation than just this list (ummm... "shouldn't have posted it in
the first place" is usually the counter argument, but even really good
things can be taken "out of context" and seem rather disparaging...)
 
Then there are a few that actively protect their anonymity (sp?) while
online, and a global (or even local) index kind of defeats that purpose
(for that, there is the "x-no-archive" header you can apply to your
e-mail client -- instructions for such are on our site -- but that
doesn't stop manual archiving by packrats like me ;) )
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.sgvlug.net/pipermail/sgvlug/attachments/20080325/86de79b2/attachment.html


More information about the SGVLUG mailing list