I’m using the python client and was wondering if i’m doing the right calculations and about performance:
We need to have a “Sorted set” kind of functionality, that means we insert values to the key and we need query them sorted based on the value. (We want to get the TOP X, when inserting we want the set to be resorted according to the value).
We went ahead with “map” type. (The total number of objects in my map can reach 50K-1M objects)
I know keys in Aerospike have a limit of 10MB.
so i wonder if my calc is correct here:
Each “row” will have my useridentifier mapped to a score (float).
so: Float = 4bytes (right?) + userIdentifier (string up to 50bytes)
so for 1 map i’m limited to:
10MB (1048576 bytes) / (50+4) == 19418.0741
I’m limited to 19K objects in 1 map?
If that is the case - then i guess my option is to “Bucket” (split by key) But then i have performance issues that i need to insert into multiple keys and python seems to be very bad at that case… (or maybe i’m doing something wrong in my code , even though using asyncio )
Regarding the calculation, the max record size would actually be 8MiB (based on the write-block-size configured). Also, don’t forget the extra overhead to account for, as detailed in the Capacity Planning doc. But that shouldn’t matter that much, you already know you cannot fit all entries in a record.
Regarding insertion speed, I am not sure what the base line for the Python client is, but I would hope it is not horrible. Seems like there is an example for using multiple threads on the file below, which may help:
We don’t have specific guidance on how to make asyncio work (better) with the Python client. You may want to experiment with using multiple threads (or event multiple processes) in your application to get better throughput. Whether and how much it helps would depend on your workload. Just for a very rough reference, the simple workload above yielded 10s of thousands of TPS with multiple threads in a rough low-end hardware test.