Analysing Logs with the Aerospike Admin Tool


#1

Install

To install Aerospike Admin either install the Aerospike-Tools package which is bundled in our Server Packages or clone and install from the Github Repository.

Starting asadm with log analyzer mode.

  1. Run asadm -l to get into log analyzer mode.

Description.

This document is for analysing server logs for asadm 0.1.7 (pre-3.12.0 tools) or earlier. In general, you should use the latest asadm which should work on all versions of server logs.

The syntax has been changed. For latest documentation, refer to our official web page: http://www.aerospike.com/docs/tools/asadm/user_guide/log_analyser_mode_guide

1. Adding server logs from a directory.

Admin> add server node /media/5255/
2 server logs added for server analysis.
Admin> list

*************************** CLUSTER ***************************
*************************** SERVER ***************************
1  : node0	/media/5255/tmp/as_log_1460944686.36/aerospike.log
2  : node1	/media/5255/tmp2/as_log_1460945660.84/aerospike.log

Notice how logs from within 5255/tmp/* are added. Doesn’t have to be complete path. Tool is smart enough to pick only the aerospike logs.

You can add logs from 2 different directories in different paths.

Admin> add server day1 '/media/5255/' log '/media/5285/'
2 server logs added for server analysis.
5 server logs added for server analysis.
Admin> list
*************************** CLUSTER ***************************
*************************** SERVER ***************************
1  : bb930a336c90200	/media/5285/bdclogs-03122016/aerospike.log.s01235bdc
2  : bb96152cb05be24	/media/5285/bdclogs-03122016/aerospike.log.s01238bdc
3  : bb99152cb05be24	/media/5285/bdclogs-03122016/aerospike.log.s01236bdc
4  : bb9b152cb05be24	/media/5285/bdclogs-03122016/aerospike.log.s01237bdc
5  : bb9e152cb05be24	/media/5285/bdclogs-03122016/aerospike.log.s01239bdc
6  : day10	/media/5255/tmp/as_log_1460944686.36/1aerospike.log
7  : day11	/media/5255/tmp2/as_log_1460945660.84/aerospike2.log

For log file of server (version >=3.7.1), asadm fetches node id from log and set it as a display name.

2. Listing added server logs.

Admin> list

*************************** CLUSTER ***************************
*************************** SERVER ***************************
1  : node0	/media/5255/tmp/as_log_1460944686.36/aerospike.log
2  : node1	/media/5255/tmp2/as_log_1460945660.84/aerospike.log

The first column is the log file number.

3. Removing the server logs.

Admin> remove server -a '1,2'
Admin> list

*************************** CLUSTER ***************************
*************************** SERVER ***************************

Removes all server logs from the list.

4. Selecting specific log files to analyse.

Admin> select server 1
Admin> list selected

*************************** CLUSTER ***************************
*************************** SERVER ***************************
1  : node0	/media/5255/tmp/as_log_1460944686.36/1aerospike.log

You can select individual server logs to work upon once added. By default operations will be on all server logs unless selected.

5. Grep for strings.

Admin> loggrep server show -i -s 'warning'

************************** cluster **************************
[node0]->::Apr 23 2016 21:38:16 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
[node0]->::Apr 23 2016 21:38:16 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
[node0]->::Apr 23 2016 21:38:27 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out

Notice how grep is only working on node0 and not on node1. This is because node1 is not selected.

Once node1 is also selected.

Admin> loggrep server show -i -s 'warning'

************************** cluster **************************
[node0]->::Apr 23 2016 21:41:58 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
[node1]->::Apr 23 2016 21:41:58 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
[node0]->::Apr 23 2016 21:42:08 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
[node1]->::Apr 23 2016 21:42:08 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 12.34.56.78:3002: timed out
  • Example-1: Grep for multiple strings. (OR operation):

Admin> loggrep server show -i -s 'proto' 'clustersize'

************************** cluster **************************
[node0]->::Apr 24 2016 03:12:11 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 26 ::: objects 45473451 ::: sub_objects 0
[node0]->::Apr 24 2016 03:12:11 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (189, 5995, 5806) : hb (50, 50, 0) : fab (354, 354, 0)
[node1]->::Apr 24 2016 03:12:12 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 1 , 0 ) ::: ClusterSize 26 ::: objects 44224075 ::: sub_objects 0
[node1]->::Apr 24 2016 03:12:12 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (193, 6035, 5842) : hb (50, 50, 0) : fab (354, 354, 0)

This will grep for either of the strings proto OR ClusterSize. You can give multiple arguments like this. The number of strings to grep is unlimited.

  • Example-2: Grep for multiple strings. (AND operation using -a)
Admin>loggrep server show -i -a -s 'trans' 'proto'

************************** cluster **************************
[node1]->::Apr 24 2016 02:59:42 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (188, 5186, 4998) : hb (50, 50, 0) : fab (354, 354, 0)
[node0]->::Apr 24 2016 02:59:51 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (184, 5174, 4990) : hb (50, 50, 0) : fab (354, 354, 0)
[node1]->::Apr 24 2016 02:59:52 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (189, 5199, 5010) : hb (50, 50, 0) : fab (354, 354, 0)
[node0]->::Apr 24 2016 03:00:01 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (183, 5192, 5009) : hb (50, 50, 0) : fab (354, 354, 0)

You can use multiple strings here too.

  • Example-3: Grep for multiple Unique strings. (using -u)
Admin> loggrep server show -i -u -s 'warning' 'stacktrace'

************************** cluster **************************
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: found 7 frames
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_segv+0x59) [0x47edd8]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 1: /lib64/libc.so.6() [0x36344326a0]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 2: /lib64/libc.so.6(memcpy+0x15b) [0x36344897cb]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 3: /usr/bin/asd(cf_queue_pop+0x93) [0x538379]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 4: /usr/bin/asd(thr_tsvc_slow+0xca) [0x4cdb18]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 5: /lib64/libpthread.so.0() [0x36348079d1]
[node1]->::Apr 24 2016 00:15:03 GMT: WARNING (as): (signal.c::162) stacktrace: frame 6: /lib64/libc.so.6(clone+0x6d) [0x36344e88fd]
  • Example-4: Counting the number of occurrences for the grepped string.
Admin> loggrep server count -i -s 'nsup start'

************************** cluster **************************
node0 : Count of ['nsup start'] is : 233
node1 : Count of ['nsup start'] is : 241
  • Example-5: Grep strings from a specific time-window.
Admin> loggrep server show -i -s 'migrates' -f 'Apr 17 2016 18:11:10' -d 360

************************** cluster **************************
[node0]->::Apr 17 2016 18:11:10 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 4 ::: objects 43549392 ::: sub_objects 0
[node1]->::Apr 17 2016 18:11:10 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 4 ::: objects 43549392 ::: sub_objects 0
[node0]->::Apr 17 2016 18:11:20 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 4 ::: objects 43534319 ::: sub_objects 0
...
...
...
[node0]->::Apr 17 2016 18:17:10 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 4 ::: objects 43395723 ::: sub_objects 0
[node1]->::Apr 17 2016 18:17:10 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 4 ::: objects 43395723 ::: sub_objects 0

-f & -d can take multiple formats. Use help to know more about it.

This can be used with multiple strings too.

Admin> loggrep server show -i -s 'proto' 'clustersize' -f '-01:00:00' -d 3600

************************** cluster **************************
[node0]->::Apr 24 2016 03:18:11 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 26 ::: objects 45443849 ::: sub_objects 0
[node0]->::Apr 24 2016 03:18:11 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (302, 7132, 6830) : hb (50, 50, 0) : fab (354, 354, 0)
[node1]->::Apr 24 2016 03:18:12 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 26 ::: objects 44195907 ::: sub_objects 0
[node1]->::Apr 24 2016 03:18:12 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (339, 7959, 7620) : hb (50, 50, 0) : fab (354, 354, 0)

6. Getting diff of consecutive occurrences for a value.

  • Example-1: Rate of opening and closing connections.
Admin> loggrep server diff -i -s 'fds - proto' -t 10

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0                           .                         |   node1                           .                               
.                   :   Total                        Diff                         |   Total                        Diff                         |
Apr 17 2016 18:11:10:   [2005, 18505616, 18503611]   [2005, 18505616, 18503611]   |   [2005, 18505616, 18503611]   [2005, 18505616, 18503611]   |
Apr 17 2016 18:11:20:   [2005, 18505616, 18503611]   [0, 0, 0]                    |   [2005, 18505616, 18503611]   [0, 0, 0]                    |
Apr 17 2016 18:11:30:   [2005, 18505616, 18503611]   [0, 0, 0]                    |   [2005, 18505616, 18503611]   [0, 0, 0]                    |
Apr 17 2016 18:12:20:   [2004, 18505616, 18503612]   [-1, 0, 1]                   |   [2004, 18505616, 18503612]   [-1, 0, 1]                   |
  • Example-2: For Objects in the cluster.
Admin> loggrep server diff -i -s 'objects' -t 10

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0        .            |   node1        .            |
.                   :   Total        Diff         |   Total        Diff         |
Apr 17 2016 18:11:10:   [43549392]   [43549392]   |   [43549392]   [43549392]   |
Apr 17 2016 18:11:20:   [43534319]   [-15073]     |   [43534319]   [-15073]     |
Apr 17 2016 18:11:30:   [43533736]   [-583]       |   [43533736]   [-583]       |
...
...

By default slice interval is 60 secs. If not specified, the output will look like below.

Admin> loggrep server diff -i -s 'objects'

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0         .             |   node1         .              |
.                   :   Total         Diff          |   Total         Diff           |
Apr 17 2016 18:11:10:   [261180762]   [261180762]   |   [261180762]   [261180762]    |
Apr 17 2016 18:12:10:   [261027933]   [-152829]     |   [261027933]   [-152829]      |
Apr 17 2016 18:13:10:   [260877103]   [-150830]     |   [260877103]   [-150830]      |
Apr 17 2016 18:14:10:   [260723664]   [-153439]     |   [260723664]   [-153439]      |

This might seem odd because it is adding object count for 60 seconds in this o/p. Please use -t 10 in case you need absolute values. This can further be optimised to view every nth output by using -k option.

Admin> loggrep server diff -i -s 'objects' -t 10 -k 6

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0        .            |   node1        .            |
.                   :   Total        Diff         |   Total        Diff         |
Apr 17 2016 18:11:10:   [43549392]   [43549392]   |   [43549392]   [43549392]   |
Apr 17 2016 18:12:10:   [43522897]   [1834]       |   [43522897]   [1834]       |
Apr 17 2016 18:13:10:   [43498933]   [1853]       |   [43498933]   [1853]       |
Apr 17 2016 18:14:10:   [43471733]   [1857]       |   [43471733]   [1857]       |

This is will give the output from every (tk) interval. In this case 106=60 secs slice.

  • Example:3 Getting diff from a specific time-window.
Admin> loggrep server diff -i -s 'fds - proto' -t 10 -k 6 -f 'Apr 17 2016 18:11:10' -d 360

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0                        .                            |   node1                        .                            | 
                    :   Total                        Diff                         |   Total                        Diff                         |
Apr 17 2016 18:11:10:   [2005, 18505616, 18503611]   [2005, 18505616, 18503611]   |   [2005, 18505616, 18503611]   [2005, 18505616, 18503611]   |
Apr 17 2016 18:12:10:   [2005, 18505616, 18503611]   [0, 0, 0]                    |   [2005, 18505616, 18503611]   [0, 0, 0]                    |
Apr 17 2016 18:13:10:   [2004, 18505617, 18503613]   [0, 0, 0]                    |   [2004, 18505617, 18503613]   [0, 0, 0]                    |
Apr 17 2016 18:14:10:   [2004, 18505621, 18503617]   [0, 2, 2]                    |   [2004, 18505621, 18503617]   [0, 2, 2]                    |
Apr 17 2016 18:15:10:   [2004, 18505622, 18503618]   [0, 1, 1]                    |   [2004, 18505622, 18503618]   [0, 1, 1]                    |
Apr 17 2016 18:16:10:   [2004, 18505623, 18503619]   [0, 0, 0]                    |   [2004, 18505623, 18503619]   [0, 0, 0]                    |

7. Using loglatency on server logs.

  • Example-1: Basic.
Admin> loglatency -h writes_master

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0    .        .         .         |   node1    .        .         .         |
.                   :   % >1ms   % >8ms   % >64ms   ops/sec   |   % >1ms   % >8ms   % >64ms   ops/sec   |
Apr 17 2016 18:11:30:   100.00   0.03     0.00      647.6     |   100.00   0.03     0.00      647.6     |
Apr 17 2016 18:11:40:   100.00   0.03     0.00      675.6     |   100.00   0.03     0.00      675.6     |
Apr 17 2016 18:11:50:   100.00   0.00     0.00      676.6     |   100.00   0.00     0.00      676.6     |
Apr 17 2016 18:12:00:   100.00   0.00     0.00      685.3     |   100.00   0.00     0.00      685.3     |
  • Example-2: Changing the slice interval to every hour.
Admin> loglatency -h writes_master -t 3600

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0    .        .         .         |   node1    .        .         .         |
.                   :   % >1ms   % >8ms   % >64ms   ops/sec   |   % >1ms   % >8ms   % >64ms   ops/sec   |
Apr 17 2016 19:11:20:   100.00   0.01     0.00      480.8     |   100.00   0.01     0.00      480.8     |
Apr 17 2016 20:11:20:   100.00   0.01     0.01      156.3     |   100.00   0.01     0.01      156.3     |
Apr 17 2016 21:11:20:   100.00   0.01     0.00      238.1     |   100.00   0.01     0.00      238.1     |
  • Example-3: Changing the slice interval to every hour starting from a specific time.
Admin> loglatency -h writes_master -t 3600 -f 'Apr 17 2016 18:11:10'

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0    .        .         .         |   node1    .        .         .         |
.                   :   % >1ms   % >8ms   % >64ms   ops/sec   |   % >1ms   % >8ms   % >64ms   ops/sec   |
Apr 17 2016 19:11:20:   100.00   0.01     0.00      480.8     |   100.00   0.01     0.00      480.8     |
Apr 17 2016 20:11:20:   100.00   0.01     0.01      156.3     |   100.00   0.01     0.01      156.3     |
Apr 17 2016 21:11:20:   100.00   0.01     0.00      238.1     |   100.00   0.01     0.00      238.1     |
  • Example-4: Changing the number of buckets -b option. Default is 3.
Admin> loglatency -h writes_master -t 900 -d 3600 -b 4

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0    .        .         .          .         |   node1    .        .         .          .         |
.                   :   % >1ms   % >8ms   % >64ms   % >512ms   ops/sec   |   % >1ms   % >8ms   % >64ms   % >512ms   ops/sec   |
Apr 17 2016 18:26:20:   100.00   0.01     0.00      0.00       629.1     |   100.00   0.01     0.00      0.00       629.1     |
Apr 17 2016 18:41:20:   100.00   0.03     0.00      0.00       522.1     |   100.00   0.03     0.00      0.00       522.1     |
Apr 17 2016 18:56:20:   100.00   0.01     0.00      0.00       421.8     |   100.00   0.01     0.00      0.00       421.8     |
  • Example-4:Show 0-th then every e-th bucket. Using -e option
Admin> loglatency -h writes_master -t 900 -d 3600 -b 4 -e 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                :   node0    .        .        .        .         |   node1    .        .        .        .         |
.                   :   % >1ms   % >2ms   % >4ms   % >8ms   ops/sec   |   % >1ms   % >2ms   % >4ms   % >8ms   ops/sec   |
Apr 17 2016 18:26:20:   100.00   100.00   0.15     0.01     629.1     |   100.00   100.00   0.15     0.01     629.1     |
Apr 17 2016 18:41:20:   100.00   100.00   0.20     0.03     522.1     |   100.00   100.00   0.20     0.03     522.1     |