How To enable access control for a data centre with XDR


#1

HowTo enable access control for a data centre with XDR

Context

To enable access control security for a data center (DC) having XDR, with minimal downtime per DC.

Caveats

  • The enabling of the security feature requires a cluster shutdown.

  • For the rollback plan, you will need to take cluster downtime, and remove the security configuration.

Pre-requisites

  • A couple of three-node clusters (“A” and “B”) that represent two data centers running CentOS 6.8 with on-disk storage.

  • Active-active XDR setup exists across the DCs.

Method

We shall use two DCs (“A” and “B”) having an active-active XDR setup. Each cluster consists of three nodes in a multicast network topology. The configuration file on one of the nodes in DC “A” is shown below:

service {
	user root
	group root
	paxos-single-replica-limit 1
	pidfile /var/run/aerospike/asd.pid
	proto-fd-max 15000
}

logging {
	file /var/log/aerospike/aerospike.log {
		context any info
	}
}

network {
	service {
		address 10.0.0.11
		port 3000
	}

	heartbeat {
		mode multicast
		multicast-group 239.1.99.222
		port 9918
	        address 10.0.0.11

		interval 150
		timeout 10
	}

	fabric {
                address 10.0.0.11
		port 3001
	}

	info {
		port 3003
	}
}

xdr {
        enable-xdr true
        xdr-digestlog-path /opt/aerospike/xdr/digestlog 2G
        datacenter B {
        	dc-node-address-port 10.0.0.14 3000
        	dc-node-address-port 10.0.0.15 3000
        	dc-node-address-port 10.0.0.16 3000
	}
}

namespace ns1 {
        replication-factor 2
        memory-size 100M
        default-ttl 120m
 
        enable-xdr true
        xdr-remote-datacenter B

        storage-engine device {
                file /opt/aerospike/data/ns1.dat
                filesize 1500M
        }
}

namespace ns2 {
        replication-factor 2
        memory-size 100M
        default-ttl 120m
        storage-engine device {
                file /opt/aerospike/data/ns2.dat
                filesize 1500M
        }
}

The configuration file on one of the nodes in DC “B” is as follows:

service {
    user root
    group root
    paxos-single-replica-limit 1
    pidfile /var/run/aerospike/asd.pid
    proto-fd-max 15000
}

logging {
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        address 10.0.0.14
        port 3000
    }

    heartbeat {
        mode multicast
        multicast-group 239.1.99.223 # Note that this is ends in .223 in DC B
        port 9918
        address 10.0.0.14

        interval 150
        timeout 10
    }

    fabric {
        address 10.0.0.14
        port 3001
    }

    info {
        port 3003
    }
}

xdr {
        enable-xdr true
        xdr-digestlog-path /opt/aerospike/xdr/digestlog 2G
        datacenter A {
                dc-node-address-port 10.0.0.11 3000
                dc-node-address-port 10.0.0.12 3000
                dc-node-address-port 10.0.0.13 3000
        }
}

namespace ns1 {
        replication-factor 2
        memory-size 100M
        default-ttl 120m
        storage-engine device {
                file /opt/aerospike/data/ns1.dat
                filesize 1500M
        }
}

namespace ns2 {
        replication-factor 2
        memory-size 100M
        default-ttl 120m

        enable-xdr true
        xdr-remote-datacenter A

        storage-engine device {
                file /opt/aerospike/data/ns2.dat
                filesize 1500M
        }
}
  1. Using asadm on any one of the nodes in DC “A”, set xdr-shipping-enabled to false as shown below:

     $ asadm
     Admin> asinfo -v "set-config:context=xdr;xdr-shipping-enabled=false"
     10.0.0.11:3000 (10.0.0.11) returned:
     ok
    
     10.0.0.13:3000 (10.0.0.13) returned:
     ok
    
     10.0.0.12:3000 (10.0.0.12) returned:
     ok
    

    The application clients can continue to read and write to DC “A”. You can verify that there are pending XDR records to be shipped where there is a non-zero value for xdr_ship_outstanding_objects from asadm output:

     $ asadm
     Admin> show stat xdr like outstanding
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~XDR Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     NODE                        :   10.0.0.11:3000   10.0.0.12:3000   10.0.0.13:3000
     xdr_ship_outstanding_objects:   66513            67426            66061
    

    You can also observe from the Aerospike server logs that “dlog-outstanding” and “lg” will increase as you do writes to DC “A”:

     $ sudo tailf /var/log/aerospike/aerospike.log | grep -i xdr
    
     May 26 2017 09:23:11 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 0 inflight 0 dlog-outstanding 0 dlog-delta-per-sec 0.0
     May 26 2017 09:23:11 GMT: INFO (xdr): (xdr.c:2069) detail: sh 34558 ul 0 lg 66522 rlg 0 rlgi 0 rlgo 0 lproc 66522 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    
     May 26 2017 09:23:21 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 0 inflight 0 dlog-outstanding 9349 dlog-delta-per-sec 934.9
     May 26 2017 09:23:21 GMT: INFO (xdr): (xdr.c:2069) detail: sh 34558 ul 57 lg 75871 rlg 0 rlgi 0 rlgo 0 lproc 66522 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    
     May 26 2017 09:23:31 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 0 inflight 0 dlog-outstanding 49949 dlog-delta-per-sec 4060.0
     May 26 2017 09:23:31 GMT: INFO (xdr): (xdr.c:2069) detail: sh 34558 ul 73 lg 116471 rlg 0 rlgi 0 rlgo 0 lproc 66522 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    
     May 26 2017 09:23:41 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 0 inflight 0 dlog-outstanding 66513 dlog-delta-per-sec 1656.4
     May 26 2017 09:23:41 GMT: INFO (xdr): (xdr.c:2069) detail: sh 34558 ul 0 lg 133035 rlg 0 rlgi 0 rlgo 0 lproc 66522 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    

    XDR will eventually ship the newly updated records when DC “B” is up and running with security enabled.

  2. Stop client writes on DC “B” and shutdown the Aerospike server on all nodes in DC “B”:

     $ sudo service aerospike stop
     Stopping aerospike: [ OK ]
    
  3. The /etc/aerospike/aerospike.conf file on all nodes in DC “B” need to be enabled for security. Add the following to the configuration files in all the nodes in DC “B”.

     security {
         enable-security true
     }
    
  4. Start the Aerospike server on all the nodes in DC “B”:

     $ sudo service aerospike start
     Starting and checking aerospike: [ OK ]
    
  5. Use aql to login as “admin” user (password is also “admin”) on one of the nodes in DC “B”. Change the admin password immediately after logging in.

     $ aql -Uadmin -P
     Enter Password:
     Aerospike Query Client
     Version 3.12.1
     C Client Version 4.1.5
     Copyright 2012-2016 Aerospike. All rights reserved.
    
     aql> set password newpasswd for admin
    

    You can now create ‘ns1-xdr-user’ user and ‘ns1-xdr-role’ roles. Since we have enabled security, you also need to create users for client applications to connect to as shown below:

     aql> create role ns1-xdr-role privileges read-write.ns1
     OK
    
     aql> create role ns2-user privileges read-write.ns2
     OK
    
     aql> show roles
     +------------------+------------------+
     | role             | privileges       |
     +------------------+------------------+
     | "data-admin"     | "data-admin"     |
     | "ns1-xdr-role"   | "read-write.ns1" |
     | "ns2-user"       | "read-write.ns2" |
     | "read"           | "read"           |
     | "read-write"     | "read-write"     |
     | "read-write-udf" | "read-write-udf" |
     | "sys-admin"      | "sys-admin"      |
     | "user-admin"     | "user-admin"     |
     +------------------+------------------+
     7 rows in set (0.002 secs)
    
     aql> create user ns1-xdr-user password krypton role ns1-xdr-role
     OK
    
     aql> create user ns2-user password ns2 role ns2-user,sys-admin
     OK
    
     aql> show users
     +----------------+---------------------------+
     | user           | roles                     |
     +----------------+---------------------------+
     | "admin"        | "user-admin"              |
     | "ns1-xdr-user" | "ns1-xdr-role"            |
     | "ns2-user"     | "ns2-user, sys-admin"     |
     +----------------+---------------------------+
     2 rows in set (0.002 secs)
     OK
    
     aql> exit
    
     $
    

    You can now verify the newly created accounts using AQL as shown below:

     $ aql -U ns1-xdr-user -P
     Enter Password:
     Aerospike Query Client
     Version 3.12.1
     C Client Version 4.1.5
     Copyright 2012-2016 Aerospike. All rights reserved.
     aql> quit
    
     $ aql -U ns2-user -P
     Enter Password:
     Aerospike Query Client
     Version 3.12.1
     C Client Version 4.1.5
     Copyright 2012-2016 Aerospike. All rights reserved.
     aql> quit
    
     $
    
  6. On each node in DC “A”, create the /opt/aerospike/data/security_credentials_B.txt file with the following contents:

     $ cat /opt/aerospike/data/security_credentials_B.txt
     credentials
     {
     username ns1-xdr-user
     password krypton
     }
    
  7. On any one node in DC “A”, using asadm, update the security credentials dynamically to allow XDR writes to DC “B” using the following command:

     Admin> asinfo -v "set-config:context=xdr;dc=B;dc-security-config-file=/opt/aerospike/data/security_credentials_B.txt"
     10.0.0.11:3000 (10.0.0.11) returned:
     pending
    
     10.0.0.13:3000 (10.0.0.13) returned:
     pending
    
     10.0.0.12:3000 (10.0.0.12) returned:
     pending
    

    The above command returns “pending”. Check in the server logs to ensure that the XDR connection has been established successfully before proceeding to the next command:

     May 26 2017 09:35:49 GMT: INFO (info): (thr_info.c:3484) config-set command : params context=xdr;dc=B;dc-security-config-file=/opt/aerospike/data/security_credentials_B.txt
     May 26 2017 09:35:49 GMT: INFO (xdr): (xdr.c:5524) Reconnecting to remote cluster 'B'
     May 26 2017 09:35:49 GMT: INFO (xdr): (xdr.c:5483) Disconnecting from remote cluster 'B'
     May 26 2017 09:35:49 GMT: INFO (xdr): (xdr.c:5505) Disconnected
     May 26 2017 09:35:49 GMT: INFO (xdr): (xdr.c:5429) Connecting to new remote cluster 'B'
     May 26 2017 09:35:49 GMT: INFO (xdr): (as_cluster.c:106) Add node BB9481178270008 10.0.0.14:3000
     May 26 2017 09:35:49 GMT: INFO (xdr): (as_cluster.c:106) Add node BB9AEA38C270008 10.0.0.15:3000
     May 26 2017 09:35:49 GMT: INFO (xdr): (as_cluster.c:106) Add node BB965F791270008 10.0.0.16:3000
     May 26 2017 09:35:49 GMT: INFO (xdr): (xdr.c:5444) Connected
    

    You should then add the credentials file to the Aerospike configuration file so that when you do the reversal process to enable security for DC “A” and you perform a restart, this configuration persists:

     xdr {
             ...
         datacenter B {
                     dc-node-address-port 10.0.0.14 3000
                     dc-node-address-port 10.0.0.15 3000
                     dc-node-address-port 10.0.0.16 3000
    
                     dc-security-config-file /opt/aerospike/data/security_credentials_B.txt
         }
     }
    
  8. From any one node in DC “A”, using asadm reset xdr-shipping-enabled back to true.

     Admin> asinfo -v "set-config:context=xdr;xdr-shipping-enabled=true"
     10.0.0.11:3000 (10.0.0.11) returned:
     ok
    
     10.0.0.13:3000 (10.0.0.13) returned:
     ok
    
     10.0.0.12:3000 (10.0.0.12) returned:
     ok
    
  9. You should start your client applications that communicate with DC “B” with the new ns2-user credentials.

    If you have inserted records when DC “B” was down, or when you insert new records to DC “A”, XDR will now use the security credentials to ship the records to DC “B”. You can verify the same in the Aerospike server logs on DC “A” where the “sh” and “lproc” values for xdr will increase while the “dlog-outstanding” value will decrease as records get shipped:

     May 26 2017 09:37:41 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 0 inflight 0 dlog-outstanding 66513 dlog-delta-per-sec 0.0
     May 26 2017 09:37:41 GMT: INFO (xdr): (xdr.c:2069) detail: sh 34558 ul 0 lg 133035 rlg 0 rlgi 0 rlgo 0 lproc 66522 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    
     May 26 2017 09:37:44 GMT: INFO (xdr): (xdr_info.c:522) XDR Shipping Enabled
     May 26 2017 09:37:44 GMT: INFO (info): (thr_info.c:3484) config-set command : params context=xdr;xdr-shipping-enabled=true
    
     May 26 2017 09:37:51 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 1381 inflight 24 dlog-outstanding 29313 dlog-delta-per-sec -3720.0
     May 26 2017 09:37:51 GMT: INFO (xdr): (xdr.c:2069) detail: sh 48378 ul 0 lg 133035 rlg 0 rlgi 0 rlgo 0 lproc 103622 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    
     May 26 2017 09:38:01 GMT: INFO (xdr): (xdr.c:2060) summary: throughput 2096 inflight 0 dlog-outstanding 0 dlog-delta-per-sec -2931.3
     May 26 2017 09:38:01 GMT: INFO (xdr): (xdr.c:2069) detail: sh 69346 ul 0 lg 133035 rlg 0 rlgi 0 rlgo 0 lproc 133035 rproc 37 lkdproc 0 errcl 0 errsrv 0 hkskip 0 hkf 0 flat 0
    

    You can also observe the Aerospike server logs in DC “B” where the namespace object count will increase as indicated below:

     $ sudo tailf /var/log/aerospike/aerospike.log | grep objects
     May 26 2017 09:37:37 GMT: INFO (info): (ticker.c:382) {ns1} objects: all 68039 master 34948 prole 33091
     May 26 2017 09:37:37 GMT: INFO (info): (ticker.c:382) {ns2} objects: all 0 master 0 prole 0
    
     May 26 2017 09:37:47 GMT: INFO (info): (ticker.c:382) {ns1} objects: all 79142 master 40642 prole 38500
     May 26 2017 09:37:47 GMT: INFO (info): (ticker.c:382) {ns2} objects: all 0 master 0 prole 0
    
     May 26 2017 09:37:57 GMT: INFO (info): (ticker.c:382) {ns1} objects: all 132743 master 68075 prole 64668
     May 26 2017 09:37:57 GMT: INFO (info): (ticker.c:382) {ns2} objects: all 0 master 0 prole 0
    
     May 26 2017 09:38:07 GMT: INFO (info): (ticker.c:382) {ns1} objects: all 136464 master 69939 prole 66525
     May 26 2017 09:38:07 GMT: INFO (info): (ticker.c:382) {ns2} objects: all 0 master 0 prole 0
    

    Since you have enabled security for DC “B”, you will need to open a new AMC browser session and login as ns2-user. You will also be able to visually observe the increase in the object count in the AMC dashboard, and that XDR writes have occurred. In similar fashion, the steps 1 to 8 should be repeated to enable security for XDR from DC “B” to DC “A”. You can thus take minimal downtime per DC, and upgrade the DC to use access control security.

Notes

  • It is a good practice to have a separate audit trail log for security. It is to maintain “separation of concerns”. For example, an individual who wants to monitor security can only be given access to this log file.

  • The histogram metrics are available in Aerospike, but, any kind of client profiling ought to be done through external monitoring tools. The security mechanism that is provided is for access control.

  • The initial connection for authentication will have a small overhead, and it should be negligible. Subsequent transactions will be cached. But, it is recommended to test security and loads in a staging environment for your application use cases.

  • If you are re-configuring a DC that already has security enabled, make sure that the user has the required role credentials to perform operations. For example, to change dynamic server configuration variables the login user requires the sys-admin role.

Keywords

XDR SECURITY

Timestamp

6/11/2017