What are consistency model for modern DBs offered by AWS

This recipe explains consistency model for modern DBs offered by AWS

What are consistency model for modern DBs offered by AWS?

Database consistency is defined by a set of values that all data points within the database system must align to in order to be read and accepted properly. If any data enters the database that does not match the preconditioned values, the dataset will experience consistency errors. By establishing rules, database consistency can be achieved. Any data transaction written to the database must only change affected data as defined by the specific constraints, triggers, variables, cascades, and so on established by the database's developer's rules.

Assume you work for the National Transportation Safety Institute (NTSI). You've been assigned the task of compiling a database of new California driver's licenses. California's population has exploded in the last ten years, necessitating a new alphabet and numerical format for all first-time driver's license holders. Your team has determined that the new set value in your database for a California driver's license is as follows: 1 Alphabetic + 7 Numeric This rule is now mandatory for all entries. An entry with the string "C08846024" would result in an error. Why? Because the entered value was 1 Alpha + 8 Numeric, which is essentially inconsistent data.

Learn to Transform your data pipeline with Azure Data Factory!

Consistency also implies that any data changes to a single object in one table must be reflected in all other tables where that object appears. Continuing with the driver's license example, if the new driver's home address changes, that change must be reflected in all tables where that prior address existed. If one table has the old address and the others have the new address, this is an example of data inconsistency.

Data-Centric Consistency Models

Tanenbaum and Maarten Van Steen, two experts in this field, define the consistency model as a contract between software (processes) and memory implementation (data store). This model ensures that if the software follows certain rules, the memory will function properly. Because defining the last operation writes in a system without a global clock is difficult, some constraints should be placed on the values that can be returned by a read operation.

Client-Centric Consistency Models

The emphasis in a client-centric consistency model is on how data is perceived by clients. If data replication is not complete, data may differ from client to client. Because faster data access is the primary goal, we may choose a less-strict consistency model, such as eventual consistency.

Eventual Consistency

In this approach, the system ensures that if no new updates are made to a specific piece of data, all reads to that item will eventually return the most recently updated value. The update messages are sent to all other replicas by the updated replicas. In these states, different replicas may return different values when queried, but all replicas will eventually receive the update and be consistent. This model is appropriate for applications with hundreds of thousands of concurrent reads and writes per second, such as Twitter updates, Instagram photo uploads, Facebook status pages, messaging systems, and so on, where data integrity is not a primary concern.

Read-Your-Write Consistency

RYW (Read-Your-Writes) consistency is achieved when the system guarantees that any attempt to read a record after it has been updated will return the updated value. RDBMS typically provides read-write consistency.

Read-after-Write Consistency

RAW consistency is more stringent than eventual consistency. All clients will see a newly inserted data item or record right away. Please keep in mind that it only applies to new data. This model does not take into account updates or deletions.

Amazon S3 Consistency Models

In all regions, Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket, as well as eventual consistency for overwrite PUTS and DELETES. As a result, if you add a new object to your bucket, both you and your clients will notice it. However, if you overwrite an object, it may take some time to update its replicas, which is why the eventual consistency model is used. Amazon S3 ensures high availability by replicating data across multiple servers and availability zones. When a new record is added, or a record/data is updated and deleted, it is obvious that data integrity must be maintained. The following are the scenarios for the aforementioned cases:

• A new PUT request is submitted. If the object is queried immediately, it may not appear in the list until the changes are propagated to all servers and AZs. The read-after-write consistency model is used in this case.

• An UPDATE request is submitted. Because the eventual consistency model is used for UPDATEs, a query to list the object may return an outdated value.

• A DELETE request is issued. Due to the use of the eventual consistency model for DELETES, a query to list or read the object may return the deleted object.

Amazon DynamoDB Consistency Models

Amazon DynamoDB is a popular NoSQL service provided by AWS. NoSQL storage is designed to be distributed. Amazon DynamoDB stores three geographically distributed replicas of each table to ensure high availability and data durability. In DynamoDB, a write operation follows eventual consistency. A DyanamoDB table read operation (GetItem, BatchGetItem, Query, or Scan operation) is an eventual consistent read by default. However, for the most recent data, you can configure a strong consistent read request. It is worth noting that a strong consistent read operation consumes twice as many read units as a subsequent consistent read request. In general, it is recommended to use eventual consistent read because DynamoDB's change propagation is very fast (DynamoDB uses SSDs for low-latency) and you will get the same result for half the price of a strong read consistent request.

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Project-Driven Approach to PySpark Partitioning Best Practices
In this Big Data Project, you will learn to implement PySpark Partitioning Best Practices.

Build a Spark Streaming Pipeline with Synapse and CosmosDB
In this Spark Streaming project, you will learn to build a robust and scalable spark streaming pipeline using Azure Synapse Analytics and Azure Cosmos DB and also gain expertise in window functions, joins, and logic apps for comprehensive real-time data analysis and processing.

GCP Project to Explore Cloud Functions using Python Part 1
In this project we will explore the Cloud Services of GCP such as Cloud Storage, Cloud Engine and PubSub

Build a Real-Time Spark Streaming Pipeline on AWS using Scala
In this Spark Streaming project, you will build a real-time spark streaming pipeline on AWS using Scala and Python.

Snowflake Azure Project to build real-time Twitter feed dashboard
In this Snowflake Azure project, you will ingest generated Twitter feeds to Snowflake in near real-time to power an in-built dashboard utility for obtaining popularity feeds reports.

Build Serverless Pipeline using AWS CDK and Lambda in Python
In this AWS Data Engineering Project, you will learn to build a serverless pipeline using AWS CDK and other AWS serverless technologies like AWS Lambda and Glue.

Build a Data Pipeline with Azure Synapse and Spark Pool
In this Azure Project, you will learn to build a Data Pipeline in Azure using Azure Synapse Analytics, Azure Storage, Azure Synapse Spark Pool to perform data transformations on an Airline dataset and visualize the results in Power BI.

Movielens Dataset Analysis on Azure
Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

AWS Project for Batch Processing with PySpark on AWS EMR
In this AWS Project, you will learn how to perform batch processing on Wikipedia data with PySpark on AWS EMR.

SQL Project for Data Analysis using Oracle Database-Part 2
In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.