For what parts of a LinkedIn-type social network you would suggest to use Aerospike for (if at all)? Examples: user auth info, user short profiles, user full profiles, posts, system messages/alerts, user messages, articles, audio info, images info, video info, friends, different stats, etc. etc.
As you you see, there are roughly three different groups of data with different read/write ratio and caching options:
users + auth
content
posts + messages
MVP is built using MySQL for everything except for relations - using OrientDB for traversals like “friends-of-friends”. Sure, in the far and bright future, the system will be much more sophisticated. Right now, i would just like to build a small but solid foundation, with tools correctly selected for the current needs - first users, first scaling, lean budget, and all other startup niceties.
Any advice is highly appreciated!
Thank you,
Dennis
This is a very open ended question and hard to give a specific answer. It depends a lot on your requirements. But let me try.
Before going into the details…one of our engineer has built a twitter like app (called tweetaspike). You should take a look at it to give you a good idea of how you can use Aerospike for social apps. https://github.com/aerospike/tweetaspike
Aerospike is a system which is good at handling both reads and writes with speed @ scale. In terms of features, its a very efficient key-value store with support for secondary indexes and user defined functions to extend the base functionality. Wherever you can model the data in a denormalized (non/less-relational) way as a key-value pair, Aerospike can handle it very well.
Aerospike does not offer functionality like joins, multiple where causes (yet), or graph search which may be needed in some parts of your system. These are the things you should avoid in Aerospike.
Different databases do different kinds of things well. For social graph I would use an actual graph database. For anything that fits a classic RESTful API, basically GET/POST/PUT/DELETE /api/:resrouce/:id kind of work, I would use Aerospike. That would include, in a social network context, retrieving the user object, posts, comments, likes, or any other kind of discrete resource with an ID, range of IDS, many-to-one relationships, etc. Also any kind of metadata for such objects.
Aerospike does key-value operations (both reads and writes) extremely well, and queries of the type I described, too. When it comes to fitting an RDBMS like MySQL into a REST context, developers end up denormalizing their schema till it effectively is a key-value store. If you have a key-value store situation an actual key-value store, such as Aerospike, will handle the same operations at speeds that are usually a hundred times faster, at scale.
You can use AS (or any other kinda Document Store-like DB) for nearly anything with two exceptions:
Graph relations where you want to query for e.g. the friend of a
friend of a friend (that’s what graph DB’s have been created for)
Transactions because AS does not support them (only within 1 record, which is not really what transactions are about - in most cases you want multi-record transactions and AS can’t handle those).
It all boils down to designing a good data schema that takes advantages of AS strengths and avoids its foibles. This process is kinda different to the relational database world because with NoSQL you are fine with de-normalized (duplicate) data or anything else as long as it helps to fulfill your query-needs. When designing a NoSQL-schema, I always favor the perspective of how to query something as fast as possible (reads), before considering how to update it still in real time (create/updates). You’ll end up with having to write to a few places on every change but that allows you for realtime (max 50ms) reads on nearly anything… There are many blog posts and recorded talks on the internet about NoSQL data modeling.