Discover more from The Best Data Architectures Newsletter
Data Architecture 101
How do you design a system for the Cloud?
Welcome to my first blog!
I am passionate about Data and the critical role it provides in all of our lives in today’s digital economy. Being a data and technology geek, one of the first things I wanted to learn about the cloud is how to design a system from the ground up? Here is my attempt (based on numerous blogs, videos and websites I have consumed).
Let’s say you are a small or medium business looking to build your IT systems for the cloud. There are 3 basis components you need-
Basic Building Blocks (Taken from Arista’s website)
Think of the Compute component as cloud servers you have to rent for the computing power you would need to process your applications or workloads, where computing resources are needed.
These kind of servers could be
“bare-metal” single tenant servers: These servers would be dedicated to running workloads for your business where you would get full root access. Typical benefits would be improved performance, by being able to consume all the available resources on the server (and avoiding the system overhead of running hypervisors or virtualization software) and full root access to customize or being able to fully customize the OS on these servers. On the flip side, this would mean that costs of horizontally scaling by adding servers would go up exponentially as your business scales and workloads increase.
Public Cloud/ shared multi-tenant servers, on top of a virtualization layer: In this case, you are sharing the underlying resources of these servers with others on a “public cloud” available from all the major cloud vendors. Typical benefits would included manageable costs as you scale your compute needs to support growing workloads
EC2 (Elastic cloud compute) on AWS is one of the best known examples of a compute feature.
Networking is the ability to connect to every IT system either within your on-premise data center, server or mainframe or being able to connect to sources and/or target systems on the cloud. All the cloud vendors (AWS, Azure, Google Cloud Platform and IBM Cloud) offer networking services by default in the form of a Virtual private Cloud (VPC) when a user creates a new account, and which is isolated from other VPCs on the public cloud. In other words, it’s like having a “VIP” area for your select crowd inside a crowded club or bar, where you have multiple customers or “tenants” sharing space.
The VPC comes with internal load balancing (refer to my blog about containers, kubernetes here for more details on load balancers), ability to interface with on-prem data center using VPN connectivity and Routes to configure external IP access from the internet to select services on your workloads that are external facing
Typically, today most applications are designed in 3 blocks:
Web server/Ingress - This fields requests from external clients (for example- for a customer facing application) and delivers data to external users.
Database or repository - for storing the data used in the web server
Engine- where the intelligence or business logic where most of the processing is done
Storage is the ability to store the vast volumes of structured or unstructured data (product, customer) your organization generates. The typical way most organizations store this data is in a database or better yet, in a data warehouse in order to run analytics and reports on this.
Typically, storage comes in different versions-
Object based storage: This type works well for storing vast numbers of backups or user files. A common use-case is using a content delivery network, where many websites use object based storage to hold their content and media files. AWS Simple storage service (S3) is an example of this type of storage. The main advantages include cost efficient storage and the use-cases would include backup, restoration and disaster recovery where data access is not a big concern.
Block Storage: This delivers high availability persistent storage that is attached to a compute instance (for example- EC2). AWS Elastic Block storage (EBS) is an example. The main advantages would include low latency performance and fast I/O speeds and being able to rapidly scale up or down.