Using large maps/tables in stream


#1

I’m trying to aggregate data by website and user id to come up with total visits and unique visits.

I thought was to use a map/table of user_ids for each website but that isn’t coming up with the right counts

  local function countData(mapRecord, rec)
  local domain = rec["domain"]
  local userid = rec["user_id"]
  local website = mapRecord[domain]
  if website == null then
    website = map {visits = 0, uniques = 0, users = map{} }
  end
  
  if website.users[userid] == null then
   website.users[userid] = 1
   website.uniques = website.uniques + 1
  end
  
  website.visits = website.visits + 1

  mapRecord[domain] = website
  return mapRecord
end

local function mergeData(a, b)
  a.uniques = a.uniques + b.uniques
  a.visits = a.visits + b.visits
  return a
end

local function reducer(a, b)
  return map.merge(a, b, mergeData)
end


function countImpressions(stream)
  return stream : aggregate(map(), countData) : reduce(reducer)
end 

As you see we are getting back the same number for visits and uniques:

{"demowebsite.com":{"uniques":4406, "visits": 4406}, "mysite.com":{"uniques":6100, "events": 6100}

#2

Nothing apparent seems to stand out as issue here. Maybe a bit extra logging with a small data set will give more clues.