the application is a task running in aws spark service , i guess the reason is one of below
1 every thread in task try to connect the aerospike everytime, and do not use connection pool , so there is no long connection , for a new client , the seeds node down , it will not know other alive node
2 the application code do not cache the error when one node down , then the error throw to spark ,then spark think the task going wrong ,and recreate a task ,so long connection killed by spark , new task can’t access the seeds node (down node) , it will not know other alive node either
And from the developer’s reply , they sure do not catch the error .
thanks for your help