Why do ad tech companies prefer Aerospike?


#1

I read somewhere that out of the top 24 Ads Targeting companies use Aerospike. Can anyone throw a light on exactly why Aerospike is suitable for this domain. Its a simple matching of user profiles with ads and it could be attained by using other in memory databases like mongo, redis, HANA etc.

Thanks.


#2

Key/Value storage is the majority of the use-cases and like you said, there are other database systems that can accomplish this. Here are some advantages with Aerospike:

  • Aerospike is actually a key-value store first, not a document store or relational system or something else as the foundation.

  • It’s the fastest, Mongo does not have anywhere near the same performance, especially with writes. Only systems at the same speed are in-memory like Redis, etc.

  • Aerospike can read/write to SSDs at basically the same speed as in-memory. Obviously there’s still some latency and will be bottlenecked by the storage performance but most server class SSDs (even in the cloud) can do plenty of TPS. This means way more storage space with less servers.

  • The LDTs and UDFs allow for some advanced operations like aggregations, maintaining lists, etc. Each database has it’s own special abilities, Redis is probably the most versatile but for basic key-value it’s all the same.

  • The BEST part of Aerospike is the how easy it is to run, and keep running. The clustering is pretty simple and takes about 10 mins to setup a two node cluster on AWS which gives you scalability and availability. I’ve yet to come across another database that works as well as this.

Just to give you more details, our cluster is in AWS and every 1-2 weeks every single server is replaced. No downtime, no rebalancing, no need for lots of manual config or tuning. We barely even look at the status console since everything just keeps working. That’s a lot of peace of mind which matters when you need 100% uptime and top performance.


#3

Manigandham,

Thanks for your detailed response.

I have a case in my company where we r trying to choose between HANA and Aerospike. Aside from being in memory RDBMA, HANA has another advantage that it provides an organic Predictive Analytics Library built in with tight integration with R. You could write stored procedures intermingled with R code and run predictive analytics and do statistical stuff which will be useful for intelligent ad targeting.

How do you or other ad targeting companies achieve Predictive capabilities through Aerospike?

From whatever limited knowledge I have gathered about Aerospike, I know that it provides secondary indexes and it also provides LDT’s which kind of serve the purpose of denormalizing the table structures in traditional RDBMS systems. Both of these features and the tabular, bin/record based functionality makes me feel that Aerospike can actually replace RDBMS systems like HANA which markets itself as an in memory relational database. What are your thoughts on this?

Am I right in my assumption that Aerospike could potentially totally replace RDBMS systems. Yes, aerospike does NOT provide features like joins across many tables which are sometimes useful for reporting purposes. But there could be a way around that too by using UDFs?

I like Aerospike a lot because it provides automatic failover, clustering, redundancy and it can handle very high requests per second. I am not sure yet whether HANA can handle that many TPS and how seamlessly it handles failover, rebalacing. It seems that it does require a significantly higher manual configuration and tuning.

Thanks a lot for your reply

Regards, Samar


#4

Always use the best tool for the job.

There is no replacement for a relational database. If that’s what you need for your scenario and data then that’s what you should use. Aerospike is key-value which is completely different so you’ll have major work to do to even come close to what a relational database can do for queries and analytical processing. With UDFs and LDTs you can do some complicated stuff with Aerospike but it’s going to be very different from HANA. Aerospike also has far better/easier clustering than HANA which like most RDBMS works on a standby master-slave approach.

I can’t give you advice beyond this, it’s really up to you to outline what you’re trying to do and what kind of data model, access, indexing, computation, querying and database management you want.

  • Aerospike - key/value, UDF/LDT, some stream processing, aggregations, SSD storage that performs like RAM, fast access, easy clustering
  • HANA - relational data, column-oriented, lots of indexing and query abilities, fancy analytics/statistical processing, in-memory RAM requirements, complicated HA and clustering setup

If you need both, you should use both to get the job done instead of forcing it into a single solution.

Our ad network analytics are proprietary but the basic approach is the same for everyone: there’s no time to do complicated analytics on every request so instead all of that is done offline or in the background (can use Aerospike and or other databases). We then condense and compile the information into items that can be easily fetched by various keys which is a perfect fit for Aerospike. This way we trade the computation cost for some delay on fresh information and increased data size but it makes things very fast when we just need to get some info and respond to a request.


#5

Thanks Manigandham again for your detailed response. It has been v helpful.

Its interesting that you say that your ad network analytics are proprietary. So we could build our custom analytics frameworks which pull information from Aerospike and run analytics operations on it and store the data back into aerospike. Now, HANA would give an advantage here as it has a Predictive Analytics Library tightly built into it. So, why wouldn’t ad companies prefer using HANA? I do not see any of the ad targeting companies using HANA? AM curious why thats the case?

Aerospike must be solving a specific need for the ad targeting companies since so many ad companies are using it. But it seems that HANA would do the job equally well. When I google “HANA ads targeting” I get no results. When I google “aerospike ads targeting” i am flooded with results. ?

Thanks again. Look forward to your thoughts on this.

Thanks


#6

I’m not sure what exactly you’re asking. I think you might be confusing several different components and layers.

  1. There are many different storage systems out there - relational databases, key-value stores, document stores, graph databases, wide-column, log-based file systems, and more. Relational and key-value have been around for a long time but they are very different, and which one you use depends on your data and requirements. They are not replacements for each other.

  2. Adtech companies aren’t special, they have the same requirements as any other tech company and use all kinds of databases like relational, document-stores, graph databases, and key-value.

  3. Ad companies have a big use for key-value stores because of the requirements in serving ads. Lots of information has to be looked up very quickly to serve a request and usually its isolated information (like user profiles) so key-value systems are a perfect fit. Aerospike is one of the best but you’ll see Redis, Riak, Couchbase, and others being used as well.

  4. Analytics and other heavy algorithms that run on the data are a separate problem. For scalability, most companies will do this offline instead of on each request, so it doesn’t really matter what database you’re using. Ad companies generate lots of data so the problem isnt the analytics but the data storage. We can write algorithms in any language but need a storage system that can grow while being stable and reliable (but this is outside of the scope of this forum topic).

  5. Relational databases are hard to scale up. Column-oriented architecture will save space but doesn’t solve the problem. HANA is column-oriented but also in-memory so you need to have enough memory to hold all of your data. HANA also gives you analytics tools along with other vendors like SQL Server and Oracle so if that’s what you need the most then it might be a good choice for you.

First, decide whether you need a key-value system or a relational system for what you’re doing, then compare the different options in each category to see if they fit your requirements.


#7

Thank you Manigandham for your detailed and helpful response.


#8

Hi manigandham, we’re also considering using Aerospike on AWS, could you please tell how do you maintain the host IP addresses when 1-2 weeks every single server is replaced? In case, you know some scripts that can swap for example the Elastic IPs to the replaced nodes, could you please share it?


#9

You don’t need to maintain the host ip addresses. We didnt use elastic IPs on AWS.

When adding a new node to a cluster, you only need to give it one existing node address that it can use as the seed node. All the nodes will then share all the addresses for every node in the cluster automatically.

The client drivers are also smart enough to automatically learn all the node addresses too so you only need a single address for them as well when first connecting. It’s not much work to make sure the code is using at least 1 current ip address.

The only issue is that if your nodes restart, they might be pointing to old ip addresses for the seed nodes so you should edit the config then or have some automation for that, but in our case we didnt need it.


#10

Thank you very much,

for the automation, I’m thinking of some options:

  • script to assign one EIP automatically to another node in the VPC in case one node goes down
  • or manually assign EIP to the newly create node, to avoid pointing to the old ip address
  • or before the client connects to the cluster, use aws-sdk- library to read ips of instances inside the VPC and use them.

So far option 3 seems the best in term of stability. Are there anyone else who has done something similar?