Bulk import in Aerospike

by evinet » Sat Jun 22, 2013 3:34 am

Hey, I’d like to perform a “bulk import” from a json file to an aerospike namespace/set. I follow the documentation to do do that, but performances are very huggly : max 201 tps where Mongodb can perform up to 47000 tps, on a server with 8 cpu and 64Go of ram and SSD disk. I know I’m missing something, but as a nOOb on Aerospike, I perfer ask why. I tried the aerospike_bench project to optain the 201 tps because with my code I just optain 10 to 20 tps (very bad !). I’d like to know what is the best practice to make “bulk import”. I saw an instruction compute with object computation, but the documentation is very poor on it. Could someone help me please ?

Thanks a lot.

Emmanuel

by young » Sat Jun 22, 2013 8:37 am

You are correct 201 tps is very slow. This does seem quite slow. Could you send us your configuration file “/etc/citrusleaf/citrusleaf.conf”. Which instructions did you follow to load the data?

by evinet » Sun Jun 23, 2013 12:40 am

Hye Young, I’ve no access to my server config for till tomorrow because of a ssh problem to solve at the moment. I use the system info example to check the remote server configuration and obtain the following informations:

2013-06-23 09:32:42 CEST INFO ServerInfo Begin
Server Configuration
services=null
node=BB98E80A2902500
cluster-generation=0
features=as_msg;replicas-read;replicas-prole;replicas-write;replicas-master;cluster-generation;partition-info;partition-generation
partition-generation=6
build=2.6.7
cluster_size=1
cluster_key=7A122113A5BEF71C
cluster_integrity=true
objects=24543
used-bytes-disk=13501952
total-bytes-disk=85899345920
free-pct-disk=99
used-bytes-memory=5505032
total-bytes-memory=12884901888
free-pct-memory=99
stat_read_reqs=1510
stat_read_success=2
stat_read_errs_notfound=1508
stat_read_errs_other=0
stat_read_latency_gt50=0
stat_read_latency_gt100=0
stat_read_latency_gt250=0
stat_write_reqs=36961
stat_write_reqs_xdr=0
stat_write_success=36961
stat_write_errs=0
stat_write_latency_gt50=0
stat_write_latency_gt100=0
stat_write_latency_gt250=0
stat_rw_timeout=0
stat_proxy_reqs=0
stat_proxy_reqs_xdr=0
stat_proxy_success=0
stat_proxy_errs=0
stat_proxy_latency_gt50=0
stat_proxy_latency_gt100=0
stat_proxy_latency_gt250=0
stat_expired_objects=0
stat_evicted_objects=0
stat_deleted_set_objects=0
stat_evicted_set_objects=0
stat_evicted_objects_time=0
stat_single_bin_records=0
stat_zero_bin_records=0
stat_zero_bin_records_read=0
stat_nsup_deletes_not_shipped=0
err_tsvc_requests=0
err_out_of_space=0
err_duplicate_proxy_request=0
err_rw_request_not_found=6
err_rw_pending_limit=0
err_rw_cant_put_unique=0
err_write_empty_writes=0
err_rcrb_reduce_gt5=0
err_rcrb_reduce_gt50=0
err_rcrb_reduce_gt100=0
err_rcrb_reduce_gt250=0
fabric_msgs_sent=0
fabric_msgs_rcvd=0
migrate_msgs_sent=0
migrate_msgs_recv=0
migrate_progress_send=0
migrate_progress_recv=0
migrate_num_incoming_accepted=0
migrate_num_incoming_refused=0
queue=0
transactions=40433
reaped_fds=0
scan_initiate=0
tscan_initiate=20
scan_pending=0
tscan_pending=0
batch_initiate=0
batch_queue=0
batch_tree_count=0
batch_timeout=0
batch_errors=0
info_queue=0
proxy_initiate=0
proxy_action=0
proxy_retry=0
proxy_retry_q_full=0
proxy_unproxy=0
proxy_retry_same_dest=0
proxy_retry_new_dest=0
write_master=38471
write_prole=0
read_dup_master=0
read_dup_prole=0
rw_err_dup_internal=0
rw_err_dup_cluster_key=0
rw_err_dup_send=0
rw_err_dup_write_internal=0
rw_err_dup_write_cluster_key=0
rw_err_write_internal=0
rw_err_write_cluster_key=0
rw_err_write_send=0
rw_err_ack_internal=0
rw_err_ack_nomatch=0
rw_err_ack_badnode=0
client_connections=2
waiting_transactions=0
tree_count=0
record_refs=24543
record_locks=0
migrate_tx_objs=0
migrate_rx_objs=0
write_reqs=0
storage_queue_full=0
storage_queue_delay=0
partition_actual=12288
partition_replica=0
partition_desync=0
partition_absent=0
partition_object_count=24543
partition_ref_count=12288
system_free_mem_pct=99
system_swapping=false
err_replica_null_node=0
err_replica_non_null_node=0
err_sync_copy_null_node=0
err_sync_copy_null_master=0
storage_defrag_records=0
err_storage_defrag_fd_get=0
storage_defrag_seek=0
storage_defrag_read=0
storage_defrag_bad_magic=0
storage_defrag_sigfail=0
storage_defrag_corrupt_record=0
storage_defrag_wait=0
err_write_fail_prole_unknown=0
err_write_fail_prole_generation=0
err_write_fail_unknown=0
err_write_fail_key_exists=0
err_write_fail_generation=0
err_write_fail_generation_xdr=0
err_write_fail_bin_exists=0
err_write_fail_parameter=0
err_write_fail_noxdr=0
err_write_fail_prole_delete=0
stat_duplicate_operation=0
uptime=923764
stat_write_errs_notfound=0
stat_write_errs_other=0
stat_leaked_wblocks=0
heartbeat_received_self=6142730
heartbeat_received_foreign=0
version=Aerospike 2.0

Namespace Configuration
type=device
objects=4495
expired-objects=0
evicted-objects=0
set-deleted-objects=0
set-evicted-objects=0
memory-size=4294967296
used-bytes-memory=588745
total-bytes-memory=4294967296
free-pct-memory=99
max-void-time=112176417
min-evicted-ttl=0
max-evicted-ttl=0
current-time=109668757
stop-writes=false
lwm-breached=true
hwm-breached=false
repl-factor=1
default-ttl=2592000
max-ttl=0
conflict-resolution-policy=generation
allow_versions=false
single-bin=false
enable-xdr=false
disallow-null-setname=false
available-bin-names=32763
low-water-pct=0
high-water-disk-pct=50
high-water-memory-pct=60
stop-writes-pct=90
evict-tenths-pct=5
cold-start-evict-ttl=0
used-bytes-disk=2316800
total-bytes-disk=42949672960
free-pct-disk=99
defrag-period=120
defrag-max-blocks=4000
defrag-lwm-pct=50
write-smoothing-period=0
defrag-startup-minimum=10
data-in-memory=true
load-at-startup=true
file=/opt/citrusleaf/data/test.data
filesize=42949672960
writethreads=1
writecache=67108864
obj-size-hist-max=100
available_pct=99
2013-06-23 09:32:42 CEST INFO ServerInfo End

I add the tps info with the bench tests :

MBP-EV:benchmarks emmanuelvinet$ ./run_benchmarks -h poc-1.ezakus.net -p 3000 -n test -k 100000000 -l 30 -s 1 -o S:50 -w RU,10 -z 20
Benchmark: poc-1.ezakus.net:3000, namespace: test, num keys: 100000000, threads 20, read-write ratio: 10/90
2013-06-22 10:05:50.301 INFO Thread 1 Add node BB98E80A2902500 176.31.235.209:3000
2013-06-22 10:05:50.941 write(tps=0 fail=0) read(tps=0 fail=0) total(tps=0 fail=0)
2013-06-22 10:05:51.942 write(tps=137 fail=0) read(tps=14 fail=0) total(tps=151 fail=0)
2013-06-22 10:05:52.944 write(tps=211 fail=0) read(tps=10 fail=0) total(tps=221 fail=0)
2013-06-22 10:05:53.945 write(tps=164 fail=0) read(tps=34 fail=0) total(tps=198 fail=0)
2013-06-22 10:05:54.946 write(tps=182 fail=0) read(tps=8 fail=0) total(tps=190 fail=0)
2013-06-22 10:05:55.947 write(tps=174 fail=0) read(tps=50 fail=0) total(tps=224 fail=0)
2013-06-22 10:05:56.949 write(tps=192 fail=0) read(tps=30 fail=0) total(tps=222 fail=0)
2013-06-22 10:05:57.950 write(tps=181 fail=0) read(tps=45 fail=0) total(tps=226 fail=0)
2013-06-22 10:05:58.951 write(tps=182 fail=0) read(tps=17 fail=0) total(tps=199 fail=0)
2013-06-22 10:05:59.953 write(tps=185 fail=0) read(tps=17 fail=0) total(tps=202 fail=0)
2013-06-22 10:06:00.954 write(tps=160 fail=0) read(tps=42 fail=0) total(tps=202 fail=0)
2013-06-22 10:06:01.955 write(tps=189 fail=0) read(tps=22 fail=0) total(tps=211 fail=0)
2013-06-22 10:06:02.956 write(tps=204 fail=0) read(tps=16 fail=0) total(tps=220 fail=0)

I’ll grabb some more informations as soon as I’ll be able to reconnect to the server.

Thanks a lot. Emmanuel

by evinet » Sun Jun 23, 2013 12:44 am

In addition to my precedent post, this is the part of code I’m using to insert data in the set :

public void insert(JSONObject json){
        // Parse the json string
        ArrayList<Bin> binList = new ArrayList<Bin>();
        Key key = null;
        try {
            if (client != null) {
                if (policy == null){
                   policy = new WritePolicy();
                   policy.timeout = 50; // 50 millisecond timeout.
                }
                
                String[] fields =  JSONObject.getNames(json);
                Bin[] binArray = null;

                for (int i=0; i < fields.length; i++)
                {
                    if(fields[i].equals("_id"))
                        try {
                            String uuid = getUUIDfromEncoded(json.getString(fields[i]));
                           key = new Key(NAMESPACE, SETNAME, uuid);
                        } catch (AerospikeException e) {
                            e.printStackTrace();
                        }
                    else {
                       binList.add(new Bin(fields[i], json.getString(fields[i])));
                    }
                }

                binArray = new Bin[binList.size()];
                int i = 0;
                for(Bin bin : binList){
                    binArray[i] = binList.get(i);
                    i++;
                }
                
                try {
                    client.put(policy,key,binArray);
                    //Operation operations = null;
                    // TODO : define some operations
                    //client.operate(policy, key, operations);
                } catch (AerospikeException e) {
                    e.printStackTrace();
                    System.out.println("INSERT ERROR : " + json.toString());
                }
            }
        } catch (Exception e) {
           System.out.println("BAD FORMAT : " + json.toString());
            this.exitWithError("Bad JSON format: " + e.getMessage() );
        }
    }

by young » Tue Jun 25, 2013 5:26 pm

Thank you for the code. We believe the problem may be due to an encoding bug. We are testing a fix now and will let you know when this has been released.