Lightweight Electronic Medical Records ( EMR) on national scale - Concerns on Durability vs Availability

Prasad_Reddy · May 14, 2020, 7:00pm

I am working on a “Light Weight EMR( Electronic Medical Record)” to monitor Covid patients that are relatively less severe in their symptoms, so that those patients can be monitored at home cost effectively. The planned system would have IoT devices sending frequent vital signs of the patient ( Heart Rate, Blood pressure, Oxygen level ) to the cloud database on a national scale ( Potentially millions of patients and devices)

For this system Durability is very important, but availability and latency are not a concern as the outpatient’s can be treated based on the vital signs that are 5-10 minutes old. I have gone through AeroSpike documentation. I feel in very simple terms it is a " Massive implementation of RAID 10 on the cloud with Paxos and gossip protocol acting as software based RAID controller"

When I hear most of the companies implementing with replication factor of 2 even at PetaByte scale, I am thinking I must be missing something. What if a second node fails during the replication/repair/transfer of the 1st node failure? To me it looks like a replication factor of 3 or more is needed for critical data.

Am I correct in my assumption? How long does it take for one node failure to be replicated? ( Assumptions are SSD based data, DRAM based indexes and low availability is acceptable ) How long does it take for one for replication if one data center goes down? Are there other methods of improving durability of data outside of increasing replication factor.

kporter · May 15, 2020, 12:00am

What you have described would seem to require Strong Consistency which prioritizes consistency over availability.

Yes, the replication-factor allows you to adjust the risk to suite your needs. Many customers do run replication-factor 3 some run higher but majority use 2. More replicas aren’t free, there are storage, memory, CPU, and bandwidth costs associated with it. With replication-factor, you can tune this risk to what makes sense for your use case.

Yes, Strong Consistency also offers the commit-to-device configuration which only responds the the client when the data has been committed to the storage layer. This means that even if the process were not shutdown cleanly (crash, sigkill, etc) all committed data will be safely committed to disk to be recovered when the process is restarted.

Time to replicate a node depends on the amount of data and how you have tuned migrations.

TimF · May 15, 2020, 4:37am

In addition to the above, Aerospike also supports rack awareness which allows the different replicas of a single piece of data to be placed in different availability zones (AZs). Whilst most people run with replication factor 2, it is not unusual for people to run in the cloud with replication-factor 3 in 3 different AZs for critical data. This means that you would need to lose 3 nodes in 3 different AZs before the data could re-replicate to another node, which is a very unlikely scenario.

Additionally, cloud providers often offer a network-based storage solution, such as EBS on AWS. Whilst these devices are not recommended as a primary device due to latency and throughput concerns, Aerospike offers a shadow write mode where all reads come from the ephemeral drive, and all writes are written both to the ephemeral drive and the network drive. So even if you lose all nodes which have a copy of the data, new instances can be launched and attached to the same network volume and Aerospike will recover the data from the network volume to the ephemeral device, ensuring no data is lost.

Whilst Aerospike is typically known for its very high throughput, low latency and high scalability, it is also used in some use cases where latency is not critical such as real time digital payments. The typical reasons for selecting Aerospike over other databases include:

Strong Consistency
Shared-nothing Architecture and self-healing clusters which allows for 24x7 uptime.
Ease of maintenance
Globally distributed transactions (at the record level)

Please let me know if you have any follow up questions.

Topic		Replies	Views
A New Aerospike Use Case (Complete dataset on each node across wide area network) Planning	2	2556	May 1, 2015
Aerospike Primary in memory Secondray on Disk? How Developers Are Using Aerospike	5	2700	January 13, 2017
Are there recommendations for deploying on Amazon EC2? Configuration	9	5112	March 8, 2020
Is Aerospike good for handling less than 500 GB of data?	4	1102	August 5, 2020
Aerospike cluster behavior in different consistency mode? Configuration	6	1619	September 28, 2018

Lightweight Electronic Medical Records ( EMR) on national scale - Concerns on Durability vs Availability

Related topics