<div dir="ltr">Nagios is fine if you don't mind being stuck in Perl world. <div><br></div><div>I liked Zenoss because it was Python based, but it does require a different mindset.</div><div><br></div><div>Unfortunately neither is ideal if you're dealing with many thousands of servers. That requires a much different sort of monitoring and notification system if you want to have a decent night's sleep.<br>
<div><br></div><div style>-Rae.</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jun 7, 2013 at 11:04 AM, Christopher Hicks <span dir="ltr"><<a href="mailto:chicks.net@gmail.com" target="_blank">chicks.net@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">You should probably be running nagios anyway or one of its derivatives like icinga. And that gives you a nice framework to keep your checks and restart scripts.</div>
<div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">
On Fri, Jun 7, 2013 at 10:55 AM, Claude Felizardo <span dir="ltr"><<a href="mailto:cafelizardo@gmail.com" target="_blank">cafelizardo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div>Rae, I started looking at your link but didn't see anything about nagios or monit. </div><div> </div></div><div>Looks like that's two votes for Monit so long as I watch out for the restart timing.</div>
<div>
<br></div><div>Any comments about nagios?</div><span><font color="#888888"><div><br></div><div>Claude</div></font></span><div><div><div><br><div><br><div class="gmail_quote">On Tue, Jun 4, 2013 at 11:57 AM, Rae Yip <span dir="ltr"><<a href="mailto:rae.yip@gmail.com" target="_blank">rae.yip@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Monit is okay, but can be a bit tricky to tune properly. It can get<br>
into some restart loops if your processes have some start-up delay or<br>
complicated initialization state. This can also result in a lot of<br>
notification spam.That said, there's not really any other solution<br>
exactly in that niche.<br>
<br>
That's why I tend to factor monitoring and notification as separate<br>
functions from auto-restart/process watchdog. The latter is relatively<br>
simple to do with a wrapper script or a cron job, and a lot more<br>
customizable for poll intervals. Then you can set the monitoring and<br>
notification to check and warn on longer time intervals, reducing<br>
alert spam.<br>
<br>
There are also more heavy duty "daemon supervisor" systems, some of<br>
which may come default with your distro:<br>
<br>
<a href="http://tech.cueup.com/blog/2013/03/08/running-daemons/" target="_blank">http://tech.cueup.com/blog/2013/03/08/running-daemons/</a><br>
<br>
There isn't a clear winner yet, and some solutions seem tightly bound<br>
to what language you're using (which IMHO seems wrong).<br>
<span><font color="#888888"><br>
-Rae.<br>
</font></span><div><div><br>
On 6/3/13, Michael Proctor-Smith <<a href="mailto:mproctor13@gmail.com" target="_blank">mproctor13@gmail.com</a>> wrote:<br>
> I use monit, it does what you are asking for in that it supports lots of<br>
> types of monitoring and has configurable monitor interval and you can solve<br>
> your restart problem by calling local command or making a http call.<br>
><br>
><br>
> On Mon, Jun 3, 2013 at 1:03 PM, Claude Felizardo<br>
> <<a href="mailto:cafelizardo@gmail.com" target="_blank">cafelizardo@gmail.com</a>>wrote:<br>
><br>
>> Hey all,<br>
>><br>
>> I'm looking for a package that will not only monitor processes but also<br>
>> restart them if needed, preferably with configurable check intervals and<br>
>> retry limits.<br>
>><br>
>> Most of the existing monitoring here has been using Big Brother and they<br>
>> are starting to migrate things to Nagios but I'm not sure if it has a<br>
>> restart service capability. For the stuff I work on, some processes and<br>
>> log monitoring have recently been added to BB but most are not being<br>
>> monitored. When I do get a BB page, it's usually an obvious problem like<br>
>> a<br>
>> process has died or a log hasn't been updated in a while but quite often<br>
>> the process is still running, the log still being updated but only upon<br>
>> close examination can you determine that there is a problem. Sometimes a<br>
>> restart might be overkill, just need to send the appropriate message into<br>
>> the system. Some restarts need to be coordinated and it's annoying to<br>
>> get<br>
>> an alarm while you are restarting part of the system.<br>
>><br>
>> Now about half of the programs are C++ but a lot of the newer ones are<br>
>> Java started via a shell script. I also need to monitor a bunch of<br>
>> ActiveMQ servers, some of which are controlled by another group but I do<br>
>> need to know when they are offline so I can make sure my stuff is okay or<br>
>> restart some of my processes when the remote servers are back.<br>
>><br>
>> Someone had suggested Monit which from the descriptions sounds like it<br>
>> might do the trick.<br>
>><br>
>> Has anyone used either Nagios or Monit or can recommend something that<br>
>> does restart? Needs to run on both Solaris and Linux.<br>
>><br>
>> thanks,<br>
>> Claude<br>
>><br>
>><br>
><br>
<br>
</div></div></blockquote></div><br></div></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br>Christopher Hicks<br><a href="http://www.chicks.net/" target="_blank">http://www.chicks.net/</a>
</font></span></div>
</blockquote></div><br></div>