[SGVLUG] process monitor and control

Rae Yip rae.yip at gmail.com
Sat Jun 8 01:20:22 PDT 2013


Nagios is fine if you don't mind being stuck in Perl world.

I liked Zenoss because it was Python based, but it does require a different
mindset.

Unfortunately neither is ideal if you're dealing with many thousands of
servers. That requires a much different sort of monitoring and notification
system if you want to have a decent night's sleep.

-Rae.


On Fri, Jun 7, 2013 at 11:04 AM, Christopher Hicks <chicks.net at gmail.com>wrote:

> You should probably be running nagios anyway or one of its derivatives
> like icinga.  And that gives you a nice framework to keep your checks and
> restart scripts.
>
>
> On Fri, Jun 7, 2013 at 10:55 AM, Claude Felizardo <cafelizardo at gmail.com>wrote:
>
>> Rae, I started looking at your link but didn't see anything about nagios
>> or monit.
>>
>> Looks like that's two votes for Monit so long as I watch out for the
>> restart timing.
>>
>> Any comments about nagios?
>>
>> Claude
>>
>>
>> On Tue, Jun 4, 2013 at 11:57 AM, Rae Yip <rae.yip at gmail.com> wrote:
>>
>>> Monit is okay, but can be a bit tricky to tune properly. It can get
>>> into some restart loops if your processes have some start-up delay or
>>> complicated initialization state. This can also result in a lot of
>>> notification spam.That said, there's not really any other solution
>>> exactly in that niche.
>>>
>>> That's why I tend to factor monitoring and notification as separate
>>> functions from auto-restart/process watchdog. The latter is relatively
>>> simple to do with a wrapper script or a cron job, and a lot more
>>> customizable for poll intervals. Then you can set the monitoring and
>>> notification to check and warn on longer time intervals, reducing
>>> alert spam.
>>>
>>> There are also more heavy duty "daemon supervisor" systems, some of
>>> which may come default with your distro:
>>>
>>> http://tech.cueup.com/blog/2013/03/08/running-daemons/
>>>
>>> There isn't a clear winner yet, and some solutions seem tightly bound
>>> to what language you're using (which IMHO seems wrong).
>>>
>>> -Rae.
>>>
>>> On 6/3/13, Michael Proctor-Smith <mproctor13 at gmail.com> wrote:
>>> > I use monit, it does what you are asking for in that it supports lots
>>> of
>>> > types of monitoring and has configurable monitor interval and you can
>>> solve
>>> > your restart problem by calling local command or making a http call.
>>> >
>>> >
>>> > On Mon, Jun 3, 2013 at 1:03 PM, Claude Felizardo
>>> > <cafelizardo at gmail.com>wrote:
>>> >
>>> >> Hey all,
>>> >>
>>> >> I'm looking for a package that will not only monitor processes but
>>> also
>>> >> restart them if needed, preferably with configurable check intervals
>>> and
>>> >> retry limits.
>>> >>
>>> >> Most of the existing monitoring here has been using Big Brother and
>>> they
>>> >> are starting to migrate things to Nagios but I'm not sure if it has a
>>> >> restart service capability.  For the stuff I work on, some processes
>>> and
>>> >> log monitoring have recently been added to BB but most are not being
>>> >> monitored.  When I do get a BB page, it's usually an obvious problem
>>> like
>>> >> a
>>> >> process has died or a log hasn't been updated in a while but quite
>>> often
>>> >> the process is still running, the log still being updated but only
>>> upon
>>> >> close examination can you determine that there is a problem.
>>>  Sometimes a
>>> >> restart might be overkill, just need to send the appropriate message
>>> into
>>> >> the system.  Some restarts need to be coordinated and it's annoying to
>>> >> get
>>> >> an alarm while you are restarting part of the system.
>>> >>
>>> >> Now about half of the programs are C++ but a lot of the newer ones are
>>> >> Java started via a shell script.  I also need to monitor a bunch of
>>> >> ActiveMQ servers, some of which are controlled by another group but I
>>> do
>>> >> need to know when they are offline so I can make sure my stuff is
>>> okay or
>>> >> restart some of my processes when the remote servers are back.
>>> >>
>>> >> Someone had suggested Monit which from the descriptions sounds like it
>>> >> might do the trick.
>>> >>
>>> >> Has anyone used either Nagios or Monit or can recommend something that
>>> >> does restart?  Needs to run on both Solaris and Linux.
>>> >>
>>> >> thanks,
>>> >> Claude
>>> >>
>>> >>
>>> >
>>>
>>>
>>
>
>
> --
> Christopher Hicks
> http://www.chicks.net/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sgvlug.net/pipermail/sgvlug/attachments/20130608/51f3ce7a/attachment.html>


More information about the SGVLUG mailing list