Total Number of records from aql & fetched through query do not match

aql

#1

Hi,

I am fetching all the records from aerospike using query on primary key where I am defining the query with my Namespace and Set & then projecting the bins using the desired Bin Names.

I am taking the count of total number of records fetched in the above mentioned query.

Now If I ran show sets in aql in parallel then I can see n_objects corresponding the set_name. I am assuming that these are the number of records in that set name. Is my assumption correct?

count of total number of records fetched through query are not matching with count of total number of n_objects corresponding the set_name in aql.

Am I doing something wrong in my implementation? Please provide me necessary pointer if possible.

Example: we have namespace : test, set: aero.

  1. If I do “show set” in aql then I am able to see n_objects = 6M
  2. If I do “select * from test.aero” in 1 of the aerospike node then I get 3 M rows returned
  3. If I am doing “query from java aerospike client” with only 1 of the node ip address added as host then I am getting 9 M records

I am not sure how to check actual number of records.

Note: We are running aerospike cluster with 3 nodes having replication factor of 2 with no sharding.


#2

hi

can you post you code?

Peter


#3

Thanks Peter !!

Please find the code below:

 public long postAllListOfRecords(List<XYZ> machinesToPostOnHttp, String nameSpaceName, String setName, String path,
			String... binNames) {
		Statement statement = new Statement();
		statement.setNamespace(nameSpaceName);
		statement.setSetName(setName);
		statement.setBinNames(binNames);
		List<String> recordsList = new ArrayList<>();
		String rec = "";
		long totalCount = 0;
		try {
			RecordSet rs = aerospikeClient.query(null, statement);
			try {
				Record record = null;
				while (rs != null && rs.next()) {
					record = rs.getRecord();
					if (record != null) {
						rec = (String) record.getValue(AerospikeConstants.RECORD_BIN2_NAME);
					}
					if (LOG.isDebugEnabled())
						LOG.debug(
								"Record [{}] has been fetched from Aerospike: namespace [{}] Set-Name [{}] bin names [{}]",
								record, nameSpaceName, setName, Arrays.toString(binNames));
					if (StringUtils.isNotEmpty(rec))
						recordsList.add(rec);
					if (recordsList.size() >= 1000) {
						for (XYZ xyz : machinesToPostOnHttp) {
							postListOfRecords(xyz, recordsList, path);
							if (LOG.isInfoEnabled()) {
								LOG.info("[{}] type record is posted in size of [{}]", setName, 1000);
							}
						}
						totalCount += recordsList.size();
						recordsList = new ArrayList<>();
					}
				}
			} finally {
				if (rs != null)
					rs.close();
			}
		} catch (AerospikeException expected) {
			if (LOG.isErrorEnabled())
				LOG.error(
						"Not able to fetch list of Records: namespace [{}] Set-Name [{}] bin names [{}] in aerospike",
						nameSpaceName, setName, binNames.toString(), expected);
			throw new InternalServerException("Something bad happened with Aerospike", expected);
		}
		if (recordsList.size() > 0) {
			for (XYZ xyz : machinesToPostOnHttp) {
				postListOfRecords(xyz, recordsList, path);
			}

			totalCount += recordsList.size();

		}

		if (LOG.isInfoEnabled()) {
			LOG.info("[{}] type record is posted in total size of [{}]", setName, totalCount);
		}
		return totalCount;
	}

#4

Not able to get better formatting. So I have shared code here as well.


#5

HI,

Sorry for taking so long to answer you.

When you use AQL and show sets you will see a result like this:

aql> show sets
+-----------+----------------+----------------------+---------+----------+------------+---------------------+
| n_objects | set-enable-xdr | set-stop-write-count | ns_name | set_name | set-delete | set-evict-hwm-count |
+-----------+----------------+----------------------+---------+----------+------------+---------------------+
| 89522     | "use-default"  | 0                    | "test"  | "users"  | "false"    | 0                   |
+-----------+----------------+----------------------+---------+----------+------------+---------------------+
1 row in set (0.001 secs)
OK

The n_objects statistic is a count of the number of master AND replica records in a set. So if you have a replication count of 2, and 3million records, you will see 6million n_objects

I hope this helps

Peter


#6

Thanks Peter !! This information is really helpful. But could you please review my java implementation as well. Why this java implementation is giving more number of records? Am I doing something wrong here?


#7

I ran the benchmark on aerospike cluster with 3 nodes and replication factor of 2. But I did not see the 1M records getting 2M in n_objects. Am I doing something wrong?

Please see the script:

./run_benchmarks -h <ip-address-of-cluster> -p 3000 -n test -k 1000000 -S 1 -b 3 -o S:90 -o S:200 -o S:20 -R -w I -z 20 -latency 7,1 -maxRetries 5 -writeTimeout 2500

#8

Hi

I’ll work up a generic answer for you in Java.

Peter


#9

Hi

Here is java code to give you the record count, taking into consideration the replication count and nodes in the cluster.

private static Logger log = Logger.getLogger(RecordCount.class);

. . .

// Counting records in a set using Info
Node[] nodes = client.getNodes();
int replicationCount = 2; 
int nodeCount = nodes.length;
int n_objects = 0;
for (Node node : nodes){
	// Invoke an info call to each node in the cluster and sum the n_objects value
	// The infoString will contain a result like this:
	// n_objects=100001:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-delete=false;
	String infoString = Info.request(node, "sets/test/users"); 
	String n_objectsString = infoString.substring(infoString.indexOf("=")+1, infoString.indexOf(":"));
	n_objects = Integer.parseInt(n_objectsString);
}
log.info(String.format("Total Master and Replica objects %d", n_objects));
log.info(String.format("Total Master objects %d", (nodeCount > 1) ? n_objects/replicationCount : n_objects));

I hope this helps

Peter


#10

Thanks Peter !! I don’t want to check the count of records in my java program.

My question was:

Count of all the records fetched from a set through aerospike-java-client query API != count of records shown as n_objects in aql > show sets != count of number of rows in aql > select * from dfm.<set_name>

I have shared my java code implementation as well. I am still not able to make any progress to get the answer for the above question.


#11

HI

I’ve simplified your code and included it in the following example:

	public void work() throws Exception {
		// Counting records in a set using Info
		Node[] nodes = client.getNodes();
		int replicationCount = 2; 
		int nodeCount = nodes.length;
		int n_objects = 0;
		for (Node node : nodes){
			// Invoke an info call to each node in the cluster and sum the n_objects value
			// The infoString will contain a result like this:
			// n_objects=100001:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-delete=false;
			String infoString = Info.request(node, "sets/test/users"); 
			String n_objectsString = infoString.substring(infoString.indexOf("=")+1, infoString.indexOf(":"));
			n_objects += Integer.parseInt(n_objectsString);
		}
		log.info(String.format("Total Master and Replica objects %d", n_objects));
		log.info(String.format("Total Master objects %d", (nodeCount > 1) ? n_objects/replicationCount : n_objects));
		
	}
	
	public void queryWork(){
		Statement statement = new Statement();
		statement.setNamespace("test");
		statement.setSetName("users");
		long totalCount = 0;
			RecordSet rs = client.query(null, statement);
			try {
				while (rs != null && rs.next()) {
					totalCount++;
					Record record = rs.getRecord();
				}
			} finally {
				if (rs != null)
					rs.close();
			}
			log.info(String.format(
					"Query Record count [%d]", totalCount));
	}
	
	public void scanWork(){
		final AtomicInteger count = new AtomicInteger(0);
		client.scanAll(null, "test", "users", new ScanCallback() {
			
			@Override
			public void scanCallback(Key key, Record record) throws AerospikeException {
				count.incrementAndGet();
				
			}
		}, "username", "password");
		log.info(String.format(
				"Scan Record count [%d]", count.longValue()));
	}

The method work() uses the info command to obtain the count of records in a set.

The method queryWork() does the same by reading each record as part of a query, and incrementing a simple counter.

The method scanWork() does the same using a scan and counting the records returned.

Here is the output:

Total Master and Replica objects 1001
Total Master objects 1001
Scan Record count [1001]
Query Record count [1001]

And here is the output from aql:

select * from test.users
. . .
1001 rows in set (0.502 secs)

As you can see, the count is the same. Are you sure that your count logic is correct?

Peter


#12

Thanks Peter for your effort !! I also did an experiment where I loaded 4 M records across different SET in aerospike cluster having 4 nodes. Then I tried to query back all the records . So I have the count of data loaded & data fetched. Count of data loaded & data fetched are same.

Then I tried to check records using aql & amc console , but there the count was different. So I concluded that there is no bug on my code.


#13

Please note that in aql, the values are printed separately for each node. So for a 4 node cluster, you’d have to add all 4 rows of data to get the full count for the cluster, master+replicated.


#14

Hi,

sorry for reanimating this thread but I’d like to make the object counting code more versatile.

Any idea how to query the actual replicationCount in work() during runtime?


#15

First, perform and info request to get the namespace configuration.

String str = Info.request(node, "get-config:context=namespace;id=<your namespace>"); 

Then, search str for “repl-factor=” and retrieve it’s value.