Bucketing / splitting data

Yechiel_Levi · October 20, 2020, 10:33am

Hi,

I’m using the python client and was wondering if i’m doing the right calculations and about performance:

We need to have a “Sorted set” kind of functionality, that means we insert values to the key and we need query them sorted based on the value. (We want to get the TOP X, when inserting we want the set to be resorted according to the value).

We went ahead with “map” type. (The total number of objects in my map can reach 50K-1M objects)

I know keys in Aerospike have a limit of 10MB. so i wonder if my calc is correct here: Each “row” will have my useridentifier mapped to a score (float). so: Float = 4bytes (right?) + userIdentifier (string up to 50bytes) so for 1 map i’m limited to: 10MB (1048576 bytes) / (50+4) == 19418.0741 I’m limited to 19K objects in 1 map?

If that is the case - then i guess my option is to “Bucket” (split by key) But then i have performance issues that i need to insert into multiple keys and python seems to be very bad at that case… (or maybe i’m doing something wrong in my code , even though using asyncio )

meher · October 27, 2020, 12:18am

Regarding the calculation, the max record size would actually be 8MiB (based on the write-block-size configured). Also, don’t forget the extra overhead to account for, as detailed in the Capacity Planning doc. But that shouldn’t matter that much, you already know you cannot fit all entries in a record.

Regarding insertion speed, I am not sure what the base line for the Python client is, but I would hope it is not horrible. Seems like there is an example for using multiple threads on the file below, which may help:

github.com

aerospike/aerospike-client-python/blob/master/examples/client/multi_thread.py

# -*- coding: utf-8 -*-
##########################################################################
# Copyright 2013-2021 Aerospike, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########################################################################

from __future__ import print_function

import aerospike

This file has been truncated. show original

neelp · October 27, 2020, 6:16pm

We don’t have specific guidance on how to make asyncio work (better) with the Python client. You may want to experiment with using multiple threads (or event multiple processes) in your application to get better throughput. Whether and how much it helps would depend on your workload. Just for a very rough reference, the simple workload above yielded 10s of thousands of TPS with multiple threads in a rough low-end hardware test.

system · October 27, 2021, 6:16pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aerospike learning - writing many keys slow performance Python Client	10	1810	October 28, 2021
Aerospike Benchmark Tool Help Aerospike Server Benchmarks benchmark	3	1190	February 18, 2020
A growing list (max ~150kB): single-record list vs bucketing Aerospike Server Benchmarks benchmark , list	2	2056	November 4, 2016
Aerospike - Python Benchmark	1	1417	December 13, 2019
Right way to insert data asynchronously Node.js Client nodejs	7	1598	March 30, 2021

Bucketing / splitting data

Related topics