Map.merge doesn't work with nested maps [Resolved]


#1

It doesn’t seem that map.merge is working with nested maps.

How should we do the merge when our maps are looking like this:

{
  "kpiMap": {
    "KPI613": {
      "RT1167": 0,
      "CH1010280": 81200911,
      "CH1024141": 339799
    },
    "KPI608": {
      "RT1167": 0,
      "CH1010280": 1613843,
      "CH1024141": 20000
    }
  }
}

#2

I assume you mean merging two maps that look somewhat like that :no_bell:


#3

For this example I am setting up a record with two bins (a, b), each containing a map similar to the one you’ve shown. Only one of the fields exists in both (KPI613):

aql> select * from test.demo where PK ='1'
+------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| a                                                                                                                | b                                                                                                                |
+------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| {"KPI613":{"RT1167":1, "CH1010280":10, "CH1024141":100}, "KPI608":{"RT1167":2, "CH1010280":20, "CH1024141":200}} | {"KPI613":{"RT1167":4, "CH1010280":40, "CH1024141":400}, "KPI715":{"RT1167":8, "CH1010280":80, "CH1024141":800}} |
+------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+

I created a record UDF module called go.lua


function try(rec)
    local a = rec['a']
    local b = rec['b']

    return map.merge(a, b, function (v1, v2)
        if (type(v1) == "number" and type(v2) == "number") then
            return v1 + v2
        elseif (getmetatable(v1) == getmetatable(map()) and
                getmetatable(v2) == getmetatable(v1)) then
            return map.merge(v1, v2, function (n1, n2)
                debug("%d + %d = %d", n1, n2, (n1+n2))
                return n1 + n2
            end)
        end
    end)
end

Then I called it from a simple Python script:

from __future__ import print_function
import aerospike

config = {'hosts': [('192.168.119.3',3000)]}

client = aerospike.client(config).connect()
res = client.apply(('test', 'demo', '1'), 'go', 'try', [])
print(res)
client.close()

The result is as expected:

{'KPI608': {'CH1010280': 20, 'CH1024141': 200, 'RT1167': 2}, 'KPI715': {'CH1010280': 80, 'CH1024141': 800, 'RT1167': 8}, 'KPI613': {'CH1010280': 50, 'CH1024141': 500, 'RT1167': 5}}

As you can see, you need to think about the types, because Aerospike is schema-free, and map.merge() handles merging two maps with similarly named fields of the type number by default. If you want to handle other types you’ll need to provide your own function. You know your data, so it’s up to you to write the appropriate code.

In my example I’m only checking for numeric (integer) and Map types. If your data is messy you’ll need to also check for, and handle string, list, etc.


#4

Thanks a lot Ronen for the provided solution.

I changed my code and put the map.merge into a separate function which I call recursively when the type of v1 and v2 are equal to the metatable for type map (same as your code). This seems to be working, at least I don’t get an error message any longer.

My only problem now is that for some reason only part of the data is being merged. The provided filter should return 111 results, in the aggregate function I do see these 111 results in the server log.

But in the reduce function I only see 15 results in the client log and hence the total sum is not correct.

Any idea what could be the reason for that?

BTW, if I add a count member and add

result["count"] = result["count"] + 1

in the aggregate function, then count is 111 as well at the end.


#5

In general, the purpose of your mapper should be to transform data from what you have in the bins of the record to a smaller data set, for example a map. Get rid of the bin data you’re not interested in, and transform anything else. The purpose of your reducer is to combine similar data coming out of the reducer into the same field.

This is why in my example above, the ‘reduction’ only has special handling for the intersection of those maps. I hope that clarifies how these work.

Not much can be done to debug your problem without code and sample data, so please give a example for reproducing your issue.


#6

I also shared with you a reproducible example via Dropbox.

You can use it with the same data dump that I did send previously.

Just register the additional UDF calculateOthers.lua and execute the aggregation of the enclosed aql command.

Also included is a JSON dump of the corresponding data (SG3.json)

As shown in the aql-result.txt the sum for KPI613 and RT1167 is 121071943900, but summing up the corresponding individual values in SG3.json the total actually should be 346648348799.

So for some reason the merge does not sum up all values.

Let me know if you need anything else.

TIA


#7

I’ve now replaced the map.merge with a custom routine to loop through the entire map and sum up values manually. The result is actually exactly the same as for the map.merge.

I figured out that for some reason only half of the data makes it into the reduce method at all. So in my example the filter results in 8 matching records, but only 4 of them are going into the reduce method.

What could be the reason for such behavior? Any help here would be highly appreciated.

Thanks


#8

Depending on your cluster size and the hardware of each node, you probably have multiple mappers running in parallel, each taking on a portion of the records matched by the query. The reducers don’t get records, they get maps which are the output of the mapper. Only filters and mappers have a record as their input.

Are you actually losing data or seeing unexpected results? Please provide a small set of data that can be used to reproduce the problem, and your Lua. I can try to debug it when I have some time.


#9

It’s a single instance that I’m using for developments, but nevertheless I would expect that the reducer sees the same amount of data as the mapper.

The result that I’m getting matches the four results I do see in the reducer, hence I assume that the other data is lost somehow.

I did provide data and sample code for you to reproduce the issue and shared it with you via Dropbox, here again:

You can use the data of the other example that I prepared for you for my other issue Commonly used library of functions:

Thanks


#10

Ronen, were you able to download the data?

Thanks


#11

Sorry about the delay, I had to push out the new Python 1.0.53 release. I’ll get to this soon.


#12

Ok, thanks a lot Ronen :smile:


#13

Were you able to look into this already, I’m really stuck and cannot understand why data is missing in the reduce step.


#14

No need to look into this any longer, in the meantime I’ve figured out the source of the problem myself.