[SGVLUG] Polling Web Sites

Christopher Smith cbsmith at gmail.com
Wed Nov 14 17:48:02 PST 2007


I would second the motion for using curl/libcurl to do the job. It
seems like just the right tool. I'd also add that if the site is set
up correctly you can do an if-modified-since HTTP GET request to
cheaply determine if there is new data to look at using curl's "-z"
flag.

--Chris

On Nov 11, 2007 1:32 AM, Matthew Gallizzi <matthew.gallizzi at gmail.com> wrote:
> I would use curl. If there is a login/password, you can then use curl to
> POST to a URL, create and save a cookie, then grab the page you want to look
> at. If I recall correctly, the man page is fairly well documented.
>
>  Good luck!
>
>
>
> On 11/10/07, bb.odenthal at gmail.com <bb.odenthal at gmail.com > wrote:
> > John,
> >
> > I may be over simplifying this but a web "search" is usually just a POST
> or GET method action on an HTML form.  If you can take a packet trace of the
> transaction (assuming it's not SSL) then it's easy to discover the URL
> format and method for the search.  A simple "lynx -dump" of that URL using
> "watch" every 120 seconds could be helpful (Assuming that a text only
> version of the web page would be of any use to you):
> >
> > #watch -n 120 "lynx -dump http://foo.com/search?bar=san_gabriel_valley"
> >
> > If the site requires more interaction than that (login, password, click on
> a few links, fill out a form) or requires cookies then I suggest using a
> Perl script.  Maybe WWW::Mechanize for some simple HTML form automation.
> >
> > **I'm putting on my Nomex jacket**
> >
> > Or...just spend $30 on http://www.newdigitalsoft.com/airobot/ or similar
> and use a windows box?   It IS an option.
> >
> > -bb
> > -----Original Message-----
> > From: juanslayton at dslextreme.com
> >
> > Date: Sat, 10 Nov 2007 21:37:04
> > To:sgvlug at sgvlug.net
> > Subject: [SGVLUG] Polling Web Sites
> >
> >
> >
> >      Got a little project here that I could use some help on.  El Monte
> > City School District uses a program called Aesop to post daily
> > openings for substitute teachers.  All I have to do is go to their
> > web site and click on the search button and I can see who has
> > currently called in to be absent.  Trouble is, if someone calls in
> > sick just after I've checked, I won't find out about it until the
> > next time I check.  And I have better things to do than sit and click
> > on the search button all evening.
> >      So I began to figure out ways to poll that site automatically.  The
> > current approach works like this:  A timing program (written in C)
> > runs in the background on a virtual terminal and produces a negative
> > pulse on data line 1 of the parallel port every few minutes.  I 'hot
> > wired' the left click switch (high, pull-down side) of a USB mouse to
> > that data line (through a diode to protect the port in case someone
> > physically clicks the mouse).  By leaving the cursor on the search
> > button, the background program electronically clicks that button
> > every few minutes.  All I have to do as I go about my business is
> > glance at the screen every now and then to see if anything new has
> > come up.
> >      But this is over-complicated.  There ought to be a simple way to poll
> > that page programatically without messing with the hardware.  Say, by
> > using the usb event mechanisms?  Like as not somebody somewhere has
> > already written code to do it.  I'd appreciate anyone who could point
> > me in the right direction.
> >
> > John
> >
> >
> ***************************************************************************************
> > If the mind is not constrained by walls and fences, where is the need for
> > Windows and Gates?
> >
>
>
>
>



-- 
Chris


More information about the SGVLUG mailing list