Not able to get the required Throughput time

Hi All,

I am trying to achieve a TPS of more than 1 Million. But I am not able to go beyond 100,000 TPS for Read operations and 13000 for Write Operation.

I am generating records of about 1 Million from Java code concurrently and for each record generated it simultaneously insert into Aerospike for which I am using AerospikeClient.put method and for reading I am using AerospikeClient.get

A single record contains just a numeric key and double Value.

Hardware : Server with 24 cores and 280 GB RAM.

Can you please Suggest any improvement in this ?

We are comparing BigMemory, GigaSpace and Aerospike for benchmarking ?

BigMemory is taking just 2 seconds for read/write operations for 1 Million records and GigaSpace is taking 6-8 seconds.

The configuration which I am using with Aerospike is as follows

Aerospike database configuration file.

service { user root group root paxos-single-replica-limit 1 pidfile /var/run/aerospike/asd.pid service-threads 4 transaction-queues 4 transaction-threads-per-queue 4 proto-fd-max 15000 }

logging { file /var/log/aerospike/aerospike.log { context any info } }

network { service { address any port 3000 }

heartbeat {
	mode multicast
	address 239.1.99.222
	port 9918
	interval 150
	timeout 10
}

fabric {
	port 3001
}

info {
	port 3003
}

}

namespace test { replication-factor 2 memory-size 4G default-ttl 30d

}

namespace bar { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

hi Nitin,

You should do the following config optimizations:

service-threads 24 transaction-queue 24

Also, can you please tell ys how many thread clients are you using? What is the hardware config for the client? What is the network throughput between the client and the server machine? You should be running the client with 128 threads (or multiple clients with higher number of threads). Also, you should be running the client / benchmark process on some other machine (not on the same machine as the server).

1 Like

Hi Anshu,

Thanks…

I am running a Java client on the same machine as Server. So the hardware configuration is same. But the Java Client is not heavy, its just generating a Key/value pair with key=long value and value=double. Its like Key - Value 1 - 1.0 2 - 2.0 3 - 3.0 … 1,00,000 - 100,000.0

hi Nitin,

For such benchmarking purposes, you should not run the client on the same machine as the server. Also, did you update the config with the suggested values?

1 Like

Hi Anshu,

After some code change, where I was calculating the key, the performance has improved.

Now with 16 Threads in Java Client and with Aerospike configuration as

service-threads 4 transaction-queue 4

The TPS was coming as around 1,80,000 - 1,90,000 TPS for 10Million records with 50% Write and 50% Read(First I am writing then after that reading). Writing 10M records in 55 seconds and reading in 60 seconds.

and with 128 Threads in Java Client and with Aerospike configuration as

service-threads 24 transaction-queue 24

The TPS was coming as around 2,30,000 TPS for 10Million records with 50% Write and 50% Read. Writing 10M records in 40 seconds and reading in 45 seconds.

Can we achieve better TPS then this ? Like 1M TPS or 10M TPS. ?

Thanks Nitin

hi NItin,

We would need to look at various factors as to what is presently the limiting factor for your setup. It could be CPU bound (or even client bound).

You will have to look at the top output while you are doing the inserts to understand the bottleneck. It might also be possible that the clients are not able to push enough.

Also, did you separate out the client to a different machine? Please note if you are still running them on same box, the resources will be heavily constrained. Quoting the 1M TPS blog:

Enough clients to generate 1 M transactions per second.

This is something many people underestimate. It is actually hard to generate 1 M transactions per second on the client side. In general, we find that you will need 4 hosts with specs like the server above to create that much traffic. If you have more basic hosts, you may need 10 or more to generate that much traffic. These client hosts should not be the Aerospike server hosts.

One thing you should also check on is your networking. This is the most common reason for bottlenecks as you try to get to 1 Mtps on a single machine. This is usually not due to bandwidth, but to the number of transactions per second. So you will not be able to easily see this using standard networking tools.

As Anshu mentioned, you should check the output of “top”. When you are running top, try hitting “1” and see what the load distribution is.

  • If you are balanced and using all the cores at a high level, you are CPU bound. This implies there is nothing more you can do on that server.
  • What is more likely is that you are bound on one or two cores cores in the “SI” column. If you are seeing any core with more than 30% si, you have likely maxed out the network. What you should check is whether or not you have multiple network queues on your NIC. You can do this by checking the file “/proc/interrupts”. What you should look for are the interrupts that include the network device id (like “eth0”) and see if there are interrupts like this: “eth0-fp-1”. If there are many of these for a single network device, you may be able to balance these across CPU cores. Please let us know if this is the situation and we can help you with the balancing. Aerospike did package a script called the afterburner for this, but it is only good for certain situations.
  • If you are bottlenecked on something else (like CPU wait), then this will also be of interest. It should not be occurring with your configuration.

Thanks Anshu and Bayoukingpin.

I am still running the Client on same machine. After running ‘top’ command, I pressed ‘1’ and found that none of the cores was going above 47-48% in ‘us’ column. And ‘si’ was also less than 20% for all cores. As I am running on same machine so there isn’t any problem of network delays or issues related to NIC queues.

We had analysed GigaSpace on same configuration earlier and it is giving better numbers. Like 2.6 seconds for write and 1.9 seconds for reading 1M records.

I have one question, if there is some limitation on writing the data then why it can’t read with better numbers ?

If you see that there is spare CPU left, that means you are not able to saturate aerospike server. This happens typically when the client becomes the bottleneck. For this reason, we run the load from multiple client machines in our benchmarks. When we reach the peaks, the total cpu utilization is somewhere around 90% including system & si. In other words, the idle cpu % should be < 10%

Seems you are running both client and server on same machine. In this mode you will not get the max throughput of aerospike on that box. If you want to realize the full potential of aerospike, you should run the clients for different box.

1 Like