[SGVLUG] Robots.txt (was: Paging Greg Stark...)
David Lawyer
dave at lafn.org
Wed Mar 26 16:41:06 PST 2008
On Tue, Mar 25, 2008 at 07:34:07PM -0700, Emerson, Tom (*IC) wrote:
> -----Original Message----- Matt Campbell
>
> What's involved in writing a robot to strip out the headers for all the
> messages in our archive? That way it would be less invasive to have
> everything available through Google.
All messages are currently available via Google. When I'm trying to
do research and checking on, for example, who else is working on "human
energy accounting" google returns what I wrote about it in my
off-topic post to sgvlug. This also happened when I wanted to
recall the name of the person who set up an interferometer at
"Ether Rocks" on Mt. Wilson in the late 1920's and reported positive
results for detection of the ether wind. WARNING, OFF TOPIC SENTENCE:
Just a couple of years ago a physics Prof. in Australia reported
anisotropic speeds of RF transmission in Coax cables depending on
orientation with respect to the fixed stars, varying sinusoidally with
a period equal to the sidereal day, but a lot of people don't believe
it.
So this and all emails people post to sgvlug can be found on google.
What url do they come from? At sgvlug.org/pipermail/sgvlug/...txt.gz.
Of course, Google decompresses this and puts all the text of sgvlug's
messages into it's database, including of course, the email addresses
of the posters. Does robots.txt say not to retrieve this?
> it is not a robot on our side, but rather instructions to /Google's/
> robot (or Yahoo's, AltaVista's, or any of the gazillion search engines
> out there) Basically, it is a simple text file that lists the
> directories that are "off limits" to web-spiders or "robots". It is
> placed in a known/common location, and all "robots" are /supposed/ to
> abide by it.
PS: I also found my "Ether Rocks" sgvlug post on a Yahoo search, so I
think that it's likely that most major search engines have our posts.
[snip]
David Lawyer
More information about the SGVLUG
mailing list