How To install troubleshooting tools, gather collectinfo and logs
When troubleshooting possible hardware, configuration, or operating system issues, it is important to be able to gather the required information. As such, some packages need to be installed to enable this.
With the packages available, the next step is to get the collectinfo and the required logs.
The below article explains how to get the required information.
Installing required tools
Certain tools, while optional for the successful run of the collectinfo command, greatly help in troubleshooting issues and, in some cases, are required to find the root cause. These tools are:
- ifconfig and iproute2
To install the tools on a RHEL-type system (RedHat, Centos and derivatives):
$ yum -y install procps iproute net-tools lsof sysstat arptables
To install the tools on a Debian-type system (Debian, Ubuntu and derivatives):
$ apt-get update $ apt-get -y install sysstat net-tools iproute2 arp-scan lsof procps
When troubleshooting, the Aerospike support team will need the collectinfo file. The
collectinfo command gathers the cluster configuration, statistics and metrics, as well as operating system information, into a tgz file. This allows the support team to see important system and cluster information for troubleshooting purposes. While the system information is only gathered for one node, on which the command runs, the Aerospike-specific data is gathered for the whole cluster. Unless otherwise advised, you will only need to provide one of those, keeping in mind:
- if the issue happens periodically, if you can, grab a collectinfo at the time it happens. If you cannot, then grab it any time.
- if the issue involves a single node, it’s useful to grab collectinfo from that node and one that is not experiencing the issue. This way we can compare system information between the ‘good’ and ‘bad’ nodes.
- if the issue affects XDR shipping, it is useful to have one collectinfo from the source cluster, and one from the destination cluster.
- in all other cases, just grab a collectinfo from one of the nodes.
In order to grab the collectinfo execute the below command:
$ sudo asadm -e collectinfo
If you use TLS or authentication, you may need to provide
asadm with extra parameters. Refer to the
asadm documentation for further details.
Once the tool has finished, it will print the name of the collectinfo tgz file. This file will be in /tmp and will be of the format
/tmp/collect_info_YYYYMMDD_hhmmss.tgz, for example:
collect_info_20190606_110818.tgz. This is the file required for troubleshooting.
Note that collectinfo does not gather Aerospike server log files. This needs to be done separately.
After getting the collectinfo, you may need to gather the logs, which will be under the location specified in your
/etc/aerospike/aerospike.conf file, if advised to do so.
The logs will need to be gathered from one or more nodes, depending on the issue, as advised. If not otherwise advised, the duration within the logs must normally capture the time the issue occurs plus at least a few hours before and (if not ongoing) after, as this allows for a proper measurement of changes in behaviour over time.
Sometimes, the INFO logging level may not be enough to torubleshoot an issue, and further logs or benchmarks may be required. If this is the case, a support team member will advise you accordingly, in order to capture the required information and provide a new set of logs.
Make sure the logging level is at least INFO for all services, as specified in
/etc/aerospike/aerospike.conf. If you change the level to WARNING or CRITICAL, the logs will not contain enough information for most troubleshooting needs.
COLLECTINFO LOGS SYSTEM TROUBLESHOOTING