Aeropsike async client is making frequent authentication like several per second

We are using aerospike client 6.1.0 and we switched to async client call. Our changes were working fine but we faced issue on server side that there were several authentication logs per second due to which disk was getting filled. Logs example:

2024-10-17 21:53:03 Oct 17 2024 16:23:03.263 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:59225 | authenticated user: myuser | action: login | detail: user=myuser
2024-10-17 21:53:03 Oct 17 2024 16:23:03.263 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:59224 | authenticated user: myuser | action: login | detail: user=myuser
2024-10-17 21:53:03 Oct 17 2024 16:23:03.263 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:59223 | authenticated user: myuser | action: login | detail: user=myuser
2024

Aerospike async client config:

@Provides
  @Singleton
  public AerospikeClient aerospikeClient() {
    ClientPolicy policy = new ClientPolicy();
    EventPolicy eventPolicy = new EventPolicy();
    int eventLoopSize = Runtime.getRuntime().availableProcessors();
    EventLoops eventLoops = new NioEventLoops(eventPolicy, eventLoopSize);
    policy.timeout = aerospikeConfig.getTimeout();
    policy.user = aerospikeConfig.getUser();
    policy.password = aerospikeConfig.getPassword();
    policy.eventLoops = eventLoops;
    //policy.asyncMinConnsPerNode = aerospikeConfig.getMinConnsPerNode();
    return new AerospikeClient(policy, aerospikeConfig.getHost(), aerospikeConfig.getPort());
  }

We tried asyncMinConnsPerNode still it is logging frequently. Other settings are set to default like maxSocketIdle = 0. Server side proto-fd-idle-ms is also default to 0 Could you please provide guidance on how to resolve this issue? Any suggestions or further steps would be greatly appreciated.

Those action: login are new client instantiations. The policy settings around number of connections (min/max ConnsPerNode or their async counterparts) wouldn’t affect those action: login. They would affect the number of action: authentication in your logs. So, it seems like your application framework is aggressively re-instantiating the aerospikeClient itself.

But when I try to print the client address, it’s the same: Client:com.aerospike.client.AerospikeClient@30a34f54 asyncCall:operate: 2024-10-18T18:22:22.507678 Client:com.aerospike.client.AerospikeClient@30a34f54 asyncCall:operate: 2024-10-18T18:22:22.559694 Client:com.aerospike.client.AerospikeClient@1b9437bf asyncCall:operate: 2024-10-18T18:22:22.559883 Client:com.aerospike.client.AerospikeClient@1b9437bf asyncCall:operate: 2024-10-18T18:22:22.559962

It would then be the same client that seems to be (closed and) reopened by the framework. I am not familiar with Flink but it appears something is making it re-instantiate the client. You may want to open a case with Aerospike Support to see how to troubleshoot this further but the symptoms are fairly clear that the code instantiating the client (causing the login) is triggered whereas a client should login once and then authenticate connections as configured and not have to relogin again.

I see one potential reason as to why it was happening, correct my reasoning if something wrong. We were not using maxCommandsInProcess in process due to which it was executing all commands immediately and since async call simply blocks the socket till the execution completes we were seeing each time new login attempt. Also, it is doing login event in the starting:

2024-10-21 11:56:54 Oct 21 2024 06:26:54.575 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:43267 | authenticated user: myuser | action: login | detail: user=myuser

But seems like it is doing authentication login for every first command in that socket: 2024-10-21 12:21:43 Oct 21 2024 06:51:43.144 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:43353 | authenticated user: myuser | action: authentication | detail: user=myuser

Or what else is the difference between authentication & login action.

So, the action: login should only be for the first time a client connects to a node and then every 24 hours by default (session-ttl). For every new connection opened after that (for transactions), you would see a log message with action: authentication.

Check the following article around how the different async policies work (specifically the maxComandsInProcess): How to Troubleshoot Async Delay Queue Full Errors.

My only guess, if you see so many action: login entries in your logs is that there are some conditions (errors returned or some missing feedback) that your application framework is reacting to by re-instantiating the client. I would further recommend reaching out to Aerospike Support as you must be having an Enterprise Edition license (in order to leverage the user/pwd security feature).

No, we are not seeing many login action event, we are seeing those only in the starting as you suggested instead we are seeing authentication action events only. While trying to run on local aerospike container, I could see that sockets are getting closed:

2024-10-21 21:23:44 Oct 21 2024 15:53:44.678 GMT: DEBUG (socket): (socket.c:1640) Adding FD 212 to epoll instance with FD 53, events = 0x40002001
2024-10-21 21:23:44 Oct 21 2024 15:53:44.678 GMT: DEBUG (socket): (socket.c:1640) Adding FD 213 to epoll instance with FD 56, events = 0x40002001
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: DEBUG (socket): (socket.c:1640) Adding FD 214 to epoll instance with FD 59, events = 0x40002001
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:45760 | authenticated user: myuser | action: authentication | detail: user=myuser
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: DEBUG (socket): (socket.c:1640) Adding FD 215 to epoll instance with FD 62, events = 0x40002001
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:45762 | authenticated user: myuser | action: authentication | detail: user=myuser
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:45764 | authenticated user: myuser | action: authentication | detail: user=myuser
2024-10-21 21:23:44 Oct 21 2024 15:53:44.679 GMT: INFO (audit): (security.c:7584) permitted | client: 192.168.65.1:45763 | authenticated user: myuser | action: authentication | detail: user=myuser
2024-10-21 21:24:46 Oct 21 2024 15:54:46.104 GMT: DEBUG (socket): (socket.c:1435) Closing FD 198
2024-10-21 21:24:46 Oct 21 2024 15:54:46.104 GMT: DEBUG (socket): (socket.c:1435) Closing FD 201
2024-10-21 21:24:46 Oct 21 2024 15:54:46.105 GMT: DEBUG (socket): (socket.c:1435) Closing FD 202
2024-10-21 21:24:46 Oct 21 2024 15:54:46.105 GMT: DEBUG (socket): (socket.c:1435) Closing FD 204
2024-10-21 21:24:46 Oct 21 2024 15:54:46.105 GMT: DEBUG (socket): (socket.c:1435) Closing FD 205
2024-10-21 21:24:46 Oct 21 2024 15:54:46.105 GMT: DEBUG (socket): (socket.c:1435) Closing FD 215
2024-10-21 21:25:17 Oct 21 2024 15:55:17.142 GMT: DEBUG (socket): (socket.c:1435) Closing FD 212
2024-10-21 21:26:44 Oct 21 2024 15:56:44.287 GMT: DEBUG (socket): (socket.c:1640) Adding FD 198 to epoll instance with FD 65, events = 0x40002001
2024-10-21 21:26:45 Oct 21 2024 15:56:45.270 GMT: DEBUG (socket): (socket.c:1435) Closing FD 198
2024-10-21 21:27:16 Oct 21 2024 15:57:16.130 GMT: DEBUG (socket): (socket.c:1640) Adding FD 198 to epoll instance with FD 68, events = 0x40002001
2024-10-21 21:27:16 Oct 21 2024 15:57:16.130 GMT: DEBUG (socket): (socket.c:1640) Adding FD 201 to epoll instance with FD 71, events = 0x40002001
2024-10-21 21:27:16 Oct 21 2024 15:57:16.130 GMT: DEBUG (socket): (socket.c:1640) Adding FD 202 to epoll instance with FD 74, events = 0x40002001
2024-10-21 21:27:16 Oct 21 2024 15:57:16.131 GMT: DEBUG (socket): (socket.c:1640) Adding FD 204 to epoll instance with FD 77, events = 0x40002001

Your initial post (see above) had three action: login from the same IP address within the same millisecond (maybe three different containers or VMs running on the same host then?). My focus had been on the action: login rather than the action: authentication. Configuring the policy (as per the article I sent you) should control the number of connections initiated and closed.

Rather than continuing with partial info, I would suggest opening a case with Aerospike Support and providing all supporting details (specifically client and server logs) in order to facilitate troubleshooting.