Deploying Scalable and Resilient Relational Databases Using Terraform

Introduction

In today’s cloud-centric landscape, having a robust and scalable database infrastructure is crucial for ensuring the availability and performance of your applications. Amazon Web Services (AWS) provides a powerful managed database service called Amazon RDS (Relational Database Service), which simplifies the deployment, scaling, and management of relational databases. To make the provisioning process even more efficient and repeatable, we can leverage Terraform, a popular Infrastructure as Code (IaC) tool.

In this blog, we will guide you through the process of provisioning a multi-AZ (Availability Zone) RDS instance with Terraform. Multi-AZ deployment offers high availability and durability by automatically replicating your database to a standby instance in a different AZ. This setup ensures that your applications can continue running seamlessly even in the event of a failure in one availability zone.

We will also explore how to make our RDS instance scalable using AWS Auto Scaling, allowing the database to handle increased traffic and workload demands. With the ability to dynamically adjust the capacity based on usage patterns, our database will remain performant and responsive during peak periods while optimizing costs during low-traffic periods.

By the end of this blog, you will have a clear understanding of the steps involved in using Terraform to provision a scalable and resilient RDS instance. You will gain insights into the configuration options and best practices for designing a highly available database architecture on AWS.

So, let’s dive in and explore how to leverage Terraform’s power to effortlessly provision an RDS multi-AZ and scalable infrastructure, ensuring a reliable and efficient database foundation for your applications.

What is Amazon RDS?

Amazon RDS (Relational Database Service) is a managed database service by AWS that makes it easier to set up, operate, and scale relational databases in the cloud. RDS supports popular database engines such as MySQL, PostgreSQL, Oracle, and SQL Server, offering a reliable and highly available solution for storing and managing structured data.

With AWS RDS, users can offload the burden of database administration tasks, such as provisioning, patching, and backups, to AWS, allowing them to focus on developing applications. AWS RDS also provides high availability through multi-AZ (Availability Zone) deployments, where data is automatically replicated across multiple geographically separate AZs to provide failover support. This ensures that databases remain accessible even in the event of an infrastructure failure. RDS also offers automated backups, enabling point-in-time recovery and eliminating the need for manual backups.

RDS offers built-in security features such as encryption at rest and in transit, network isolation through Amazon VPC, and integration with AWS Identity and Access Management (IAM) for fine-grained access control.

With this brief introduction of RDS, now we will move towards provisioning a RDS Database using Terraform. For this article we are going to deploy AWS RDS database with PostgreSQL engine in Multi-AZ setting.

What is terraform?

Terraform is an open-source Infrastructure as Code (IaC) tool developed by HashiCorp. It provides a declarative language for defining and managing infrastructure resources across various cloud platforms. With Terraform, infrastructure configurations are expressed as code, allowing for version control, collaboration, and automation. Terraform follows a “write, plan, and apply” workflow, where users define the desired state of their infrastructure in Terraform configuration files, and Terraform takes care of provisioning and managing the resources to match that desired state. It offers a wide range of providers, including AWS, Azure, Google Cloud, and more, making it a versatile tool for managing infrastructure across different cloud platforms.

Provisioning Multi-AZ RDS postgreSQL with terraform

Variables:

First of all we create variables required for our RDS Database. Add the following to variables.tf.

## RDS variables
variable "rds-postgres-db-username" {
  type = string
}
variable "rds-postgres-db-password" {
  type = string
}
variable "rds-postgres-db-name" {
  type = string
}
variable "rds-postgres-db-port" {
  type = number
}
variable "rds-postgres-db-maintenance-window" {
  type = string
}

Then add following to the tterraform.tfvars:

## RDS Variables
rds-postgres-db-username           = "dbusername"
rds-postgres-db-password           = "dbpassword"
rds-postgres-db-port               = 5432
rds-postgres-db-name               = "dbname"
rds-postgres-db-maintenance-window = "Sun:00:00-Sun:03:00"

Here we have configured some of the credentials for our RDS Database, a more secure approach is to manage you RDS secrets and credentials using AWS Secrets Manager, which we will demonstrate in our upcoming articles.

Database Subnet Group:

First of all we will provsion a Database (DB) subnet group. A DB subnet group is a logical grouping of subnets within a virtual private cloud (VPC) that is specifically designed for hosting database instances. The purpose of a DB subnet group is to define the subnets where the RDS instance will be deployed. RDS ensures that the DB instance is provisioned within the specified subnets.

DB subnet groups enable high availability and fault tolerance for RDS instances. By distributing the RDS instances across multiple availability zones (AZs) within the selected subnets, it ensures that the database remains accessible even in the event of an AZ failure. This helps to minimize downtime and maintain data durability.

resource "aws_db_subnet_group" "db-subnet-group" {
  name = "${var.project}-db-subnet-group-${var.environment}"
  subnet_ids = [
    aws_subnet.vpc-private-subnet-1.id,
    aws_subnet.vpc-private-subnet-2.id,
    aws_subnet.vpc-private-subnet-3.id
  ]
  tags = {
    Name = "${var.project}-db-subnet-group-${var.environment}"
  }
}

️Here we have created the DB subnet group and provided the subnet ids of our private subnets.

Security Groups Configurations:

Next we will configure some security measures for our RDS instance. First we will create security group for our RDS instance.

## Security Group for RDS Postgres
resource "aws_security_group" "rds-postgres-sg" {
  name        = "${var.project}-rds-postgres-sg-${var.environment}"
  description = "Security group for RDS Postgres"
  vpc_id      = aws_vpc.kodetronix-vpc.id
  ingress {
    description     = "Postgres port"
    from_port       = var.rds-postgres-db-port
    to_port         = var.rds-postgres-db-port
    protocol        = "tcp"
    security_groups = [aws_security_group.ec2-bastion-sg.id]
  }
}

As of now as you can see that only from our EC2 Bastion Host we would be able to connect to our RDS Database, as ingress is only allowed from the security group of Bastion Host.

Next we will create KMS key that will be used to encrypt replicas and a parameter group fro PostgreSQL which will allow only SSL connections to the database.

## KMS Key for RDS Postgres
resource "aws_kms_key" "db_key" {
  description         = "KMS key for RDS Postgres"
  key_usage           = "ENCRYPT_DECRYPT"
  enable_key_rotation = true
}
## Parameter Group for RDS Postgres
resource "aws_db_parameter_group" "rds-postgres-pg" {
  name        = "${var.project}-rds-postgres-pg-${var.environment}"
  family      = "postgres14"
  description = "Custom Parameter Group for RDS Postgres"
  parameter {
    name         = "rds.force_ssl"
    value        = "1"
    apply_method = "immediate"
  }
  tags = {
    Name = "${var.project}-rds-postgres-pg-${var.environment}"
  }
}

Create RDS Database Instance:

Now we will provision our PostgreSQL RDS Database Instance:

resource "aws_db_instance" "rds-postgres" {
  identifier                            = "${var.project}-rds-postgres-db-${var.environment}"
  engine                                = "postgres"
  engine_version                        = "14.6"
  instance_class                        = "db.t3.micro"
  allocated_storage                     = 20
  max_allocated_storage                 = 100
  storage_type                          = "gp2"
  storage_encrypted                     = true
  kms_key_id                            = aws_kms_key.db_key.arn
  db_name                               = var.rds-postgres-db-name
  username                              = var.rds-postgres-db-username
  password                              = var.rds-postgres-db-password
  port                                  = var.rds-postgres-db-port
  multi_az                              = true
  network_type                          = "IPV4"
  db_subnet_group_name                  = aws_db_subnet_group.db-subnet-group.id
  vpc_security_group_ids                = [aws_security_group.rds-postgres-sg.id]
  deletion_protection                   = false
  allow_major_version_upgrade           = false
  auto_minor_version_upgrade            = true
  apply_immediately                     = false
  backup_retention_period               = 30
  backup_window                         = "21:00-23:00"
  maintenance_window                    = var.rds-postgres-db-maintenance-window
  copy_tags_to_snapshot                 = true
  delete_automated_backups              = true
  enabled_cloudwatch_logs_exports       = ["postgresql", "upgrade"]
  performance_insights_enabled          = true
  performance_insights_kms_key_id       = aws_kms_key.db_key.arn
  performance_insights_retention_period = 7
  publicly_accessible                   = false
  parameter_group_name                  = aws_db_parameter_group.rds-postgres-pg.id
  skip_final_snapshot                   = false
  final_snapshot_identifier             = "${var.project}-rds-postgres-db-final-snapshot-${var.environment}"
}

For our RDS Database we select PostgreSQL engine version 14.6, for DB instance we are using db.t3.micro, which is a general purpose and is included in the free tier, this because is a example.

In our configuration, we assign 20 GB of storage to the RDS instance, and we enable storage autoscaling, which enables the RDS to dynamically adjust its storage capacity as needed, with a maximum limit of 100 GB. Additionally, we enable Multi-AZ, which ensures that the database data is asynchronously replicated across instances located in different availability zones (AZs).

This replication provides fault tolerance and high availability, as the system can automatically failover to an instance in another AZ in the event of infrastructure failures or other outages. By utilizing these features, we enhance the scalability and resilience of our RDS database, ensuring data durability and minimizing downtime.

Caution! Here we have set the deletion_protection to false, this is only for demo purposes so we can tear down infrastructure easily. In a production setting you should enable the deletion protection.

For PostgreSQL engine version upgrades, we have restricted it to minor versions only, while major version upgrades are disabled. As for automated backups, they are retained for a period of 30 days. It is crucial to note that the backup window and maintenance window should never overlap.

In our setup, backups are scheduled between 9 PM and 11 PM UTC, while the maintenance window is set for every Sunday between 12 AM and 03 AM. To gain in-depth performance insights and analysis, we have also enabled performance insights, which will provide detailed insights at the query and user levels. With these configurations in place, we have successfully met the minimum requirements for deploying our RDS database.

Database Observability and Telemetry:

Having fulfilled the basic prerequisites for provisioning our RDS database, it’s time to enhance its observability and telemetry. Our next step involves setting up CloudWatch alarms to monitor the CPU, Memory, and Storage utilization of our RDS database. Additionally, we’ll create an SNS topic and set up a subscription for that topic.

Whenever any of the mentioned metrics exceed the alarm threshold, the SNS topic will promptly send an email notification to our subscribed email address, enabling us to take appropriate actions in a timely manner. This enhanced monitoring and notification system ensures that we stay informed about the health and performance of our RDS database and empowers us to respond effectively whenever necessary.

resource "aws_sns_topic" "databases-sns-topic" {
  name = "${var.project}-databases-sns-topic-${var.environment}"
  tags = {
    Name = "${var.project}-databases-sns-topic-${var.environment}"
  }
}
resource "aws_sns_topic_subscription" "db-alerts-sns-subscription" {
  topic_arn = aws_sns_topic.databases-sns-topic.arn
  protocol  = "email"
  endpoint  = "alerts@yourdomain.com"
}

In above code snippet we first create SNS topic and then a SNS topic subscription which will send the email notification.

## Cloudwatch alarm for RDS CPU utilization
resource "aws_cloudwatch_metric_alarm" "rds-postgres-cpu-utilization-alarm" {
  alarm_name          = "${var.project}-rds-postgres-cpu-utilization-alarm-${var.environment}"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors RDS Postgres CPU utilization"
  alarm_actions       = [aws_sns_topic.databases-sns-topic.arn]
  ok_actions          = [aws_sns_topic.databases-sns-topic.arn]
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.rds-postgres.id
  }
}
 ## Cloudwatch alarm for RDS Memory utilization
resource "aws_cloudwatch_metric_alarm" "rds-postgres-memory-utilization-alarm" {
  alarm_name          = "${var.project}-rds-postgres-memory-utilization-alarm-${var.environment}"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "FreeableMemory"
  namespace           = "AWS/RDS"
  period              = "300"
  statistic           = "Average"
  threshold           = "6400"
  alarm_description   = "This metric monitors RDS Postgres memory utilization"
  alarm_actions       = [aws_sns_topic.databases-sns-topic.arn]
  ok_actions          = [aws_sns_topic.databases-sns-topic.arn]
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.rds-postgres.id
  }
}
## Cloudwatch alarm for RDS Storage utilization
resource "aws_cloudwatch_metric_alarm" "rds-postgres-storage-utilization-alarm" {
  alarm_name          = "${var.project}-rds-postgres-storage-utilization-alarm-${var.environment}"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "FreeStorageSpace"
  namespace           = "AWS/RDS"
  period              = "3600"
  statistic           = "Average"
  threshold           = "16000"
  alarm_description   = "This metric monitors RDS Postgres storage utilization"
  alarm_actions       = [aws_sns_topic.databases-sns-topic.arn]
  ok_actions          = [aws_sns_topic.databases-sns-topic.arn]
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.rds-postgres.id
  }
}

We establish thresholds of 80 percent for CPU, memory, and storage utilization. When the CPU usage consistently exceeds 80 percent, the CPU CloudWatch alarm is triggered. Similarly, the Memory CloudWatch alarm is activated when memory usage remains above 6.4 GB continuously for more than 5 minutes. For storage, the alarm is triggered when 16 GB of storage is used and remains at that level for more than one hour. By setting these thresholds and alarms, we can proactively monitor the performance of our system and receive timely notifications when any of these metrics reach critical levels, allowing us to take prompt action and ensure the smooth operation of our infrastructure.

Conclusion

In conclusion, we have explored the concept of a 3-tier architecture, understanding its significance in building scalable and modular applications. By separating our application into distinct layers, we achieve improved maintainability and scalability. Leveraging Terraform, we provisioned an Amazon RDS instance, configuring its storage, backups, and other parameters. With features like storage autoscaling, Multi-AZ deployment, and performance insights, we enhanced the resilience, availability, and performance of our database. By combining these principles and tools, we have established a solid foundation for building robust and scalable applications, ready to meet the demands of modern systems.

References

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.