[SGVLUG] process monitor and control

Rae Yip rae.yip at gmail.com
Tue Jun 4 11:57:43 PDT 2013


Monit is okay, but can be a bit tricky to tune properly. It can get
into some restart loops if your processes have some start-up delay or
complicated initialization state. This can also result in a lot of
notification spam.That said, there's not really any other solution
exactly in that niche.

That's why I tend to factor monitoring and notification as separate
functions from auto-restart/process watchdog. The latter is relatively
simple to do with a wrapper script or a cron job, and a lot more
customizable for poll intervals. Then you can set the monitoring and
notification to check and warn on longer time intervals, reducing
alert spam.

There are also more heavy duty "daemon supervisor" systems, some of
which may come default with your distro:

http://tech.cueup.com/blog/2013/03/08/running-daemons/

There isn't a clear winner yet, and some solutions seem tightly bound
to what language you're using (which IMHO seems wrong).

-Rae.

On 6/3/13, Michael Proctor-Smith <mproctor13 at gmail.com> wrote:
> I use monit, it does what you are asking for in that it supports lots of
> types of monitoring and has configurable monitor interval and you can solve
> your restart problem by calling local command or making a http call.
>
>
> On Mon, Jun 3, 2013 at 1:03 PM, Claude Felizardo
> <cafelizardo at gmail.com>wrote:
>
>> Hey all,
>>
>> I'm looking for a package that will not only monitor processes but also
>> restart them if needed, preferably with configurable check intervals and
>> retry limits.
>>
>> Most of the existing monitoring here has been using Big Brother and they
>> are starting to migrate things to Nagios but I'm not sure if it has a
>> restart service capability.  For the stuff I work on, some processes and
>> log monitoring have recently been added to BB but most are not being
>> monitored.  When I do get a BB page, it's usually an obvious problem like
>> a
>> process has died or a log hasn't been updated in a while but quite often
>> the process is still running, the log still being updated but only upon
>> close examination can you determine that there is a problem.  Sometimes a
>> restart might be overkill, just need to send the appropriate message into
>> the system.  Some restarts need to be coordinated and it's annoying to
>> get
>> an alarm while you are restarting part of the system.
>>
>> Now about half of the programs are C++ but a lot of the newer ones are
>> Java started via a shell script.  I also need to monitor a bunch of
>> ActiveMQ servers, some of which are controlled by another group but I do
>> need to know when they are offline so I can make sure my stuff is okay or
>> restart some of my processes when the remote servers are back.
>>
>> Someone had suggested Monit which from the descriptions sounds like it
>> might do the trick.
>>
>> Has anyone used either Nagios or Monit or can recommend something that
>> does restart?  Needs to run on both Solaris and Linux.
>>
>> thanks,
>> Claude
>>
>>
>



More information about the SGVLUG mailing list