Nagios checks


#1

I am confused on how to make check nagios check work correctly. I don’t see any exit code set for each metric. Should i be comparing the ouptut of nagios check to see whether we are within the threshold or will nagios plugin throw the right exit status ?

By reading the code, I don’t see any proper exit status (STATE_CRITICAL) being thrown out except in

r = citrusleaf.citrusleaf_info(arg_host, arg_port, arg_value) if r == -1: print “request to “,arg_host,”:”,arg_port," returned error" # return STATE_CRITICAL sys.exit(2)

Does it encompass all the metric check?

so that i can call the nrpe check like,

sudo -u nrpe /usr/lib64/nagios/plugins/aerospike/citrusleaf_stats.py -s ‘client_connections’

and believe that citrusleaf_stats.py will throw out the right exit status or do i need to do something more?


#2

I believe there is a newer version of the Nagios client located at http://www.aerospike.com/docs/operations/monitor/nagios/

With this tool the command should be sudo -u nrpe /usr/lib64/nagios/plugins/aerospike/aerospike_nagios.py -s 'client_connections' -w 14000 -c 14500

The current Nagios plugin does not support detecting low values. If I were monitoring this particular stat in production I would be interested in whether we are approaching proto-fd-max or if we are approaching 0 connections and we never expect that to be the case. This plugin can check the former but not the latter.


#3

How can i check whether the cluster has stop-writes = false? (which i think is a very important check)

The nagios plugin mentioned above can’t check stop-writes because it is expecting integer but is getting ‘false’ from citrusleaf.citrusleaf_info(…)


#4

You will have to edit the aerospike_nagios.py for this in the following manner around line 78…

try:
    num_stat = int(num_stat)
except:
    pass
if "stop-writes" in arg_stat:
    if num_stat == 'true':
        RETURN_VAL=STATE_CRITICAL
    elif num_stat == 'false':
        RETURN_VAL=STATE_OK
    else:
        RETURN_VAL=STATE_UNKNOWN

In the meanwhile, I will check in this patch in…

You can then call this with -c 0 and -w 0…

 python aerospike_nagios.py -s 'stop-writes' -n test -c 0 -w 0
Aerospike Stats - stop-writes=true