The AWS Well-Architected Framework (WAFR) provides a consistent approach for customers to evaluate architectures and implement designs that will scale over time. Regular Well-Architected Reviews are crucial for ensuring that workloads remain secure, reliable, performant, cost-optimized, and operationally excellent, with sustainability considerations. However, conducting these reviews manually across numerous accounts and complex workloads can be time-consuming, resource-intensive, and prone to human error or inconsistency.
This article explores how to leverage the power of generative AI, specifically Amazon Bedrock, to significantly streamline and enhance the AWS Well-Architected Review process. By automating data ingestion, analysis, and recommendation generation, organizations can achieve faster, more consistent, and scalable reviews, freeing up cloud architects and DevOps engineers to focus on higher-value tasks.
Architecture Overview
Automating Well-Architected Reviews with generative AI involves a structured workflow that integrates various AWS services. The following architecture diagram illustrates an end-to-end system for accelerating WAFR reviews using Amazon Bedrock.
Diagram Description:
- AWS Environment: The source of all review data, comprising various AWS accounts and resources.
- Data Sources:
- AWS Trusted Advisor: Provides checks across cost optimization, security, fault tolerance, performance, and service limits.
- AWS Config: Offers a detailed inventory of AWS resources, their configurations, and configuration history, allowing for rule-based compliance checks.
- AWS Well-Architected Tool (WAT) APIs: Programmatic access to existing workload definitions, answers, and improvement plans within the WAT.
- Pre-processing Logic (AWS Lambda): A serverless function responsible for:
- Invoking AWS SDK (boto3) to extract raw data from the specified data sources.
- Normalizing and structuring the extracted data into a format suitable for LLM consumption.
- Aggregating relevant insights for a specific workload or pillar review.
- Generative AI (Amazon Bedrock): The core of the intelligent review process.
- Receives pre-processed data and expertly crafted prompts.
- Leverages various Foundation Models (FMs) like Anthropic Claude or Amazon Titan to analyze the input.
- Identifies deviations from Well-Architected best practices, potential risks, and areas for improvement.
- Generates human-readable recommendations, often with reasoning and severity levels.
- Result Storage and Visualization:
- Amazon S3: Stores the structured LLM outputs, recommendations, and review reports for historical analysis and audit.
- Amazon QuickSight: Connects to S3 data to create interactive dashboards, providing a visual overview of review progress, identified risks, and recommended actions.
- Integration with Governance Tools:
- Jira/ServiceNow: Automated creation of tickets for identified issues and recommendations, streamlining the remediation workflow.
- Amazon SNS: Sends email or SMS notifications for critical findings or review completion.
Ingesting Review Inputs from AWS Services
The first step in automating Well-Architected Reviews is to programmatically extract relevant data from various AWS services. This data provides the contextual input for the generative AI models to perform their analysis.
Extracting Data with boto3
The AWS SDK for Python, boto3, is ideal for interacting with AWS services to gather review inputs. Below are examples of how to extract data from Trusted Advisor, AWS Config, and the AWS Well-Architected Tool.
import boto3
import json
# Initialize AWS clients
trusted_advisor_client = boto3.client('support', region_name='us-east-1') # Trusted Advisor is often in us-east-1
config_client = boto3.client('config')
well_architected_client = boto3.client('well-architected')
def get_trusted_advisor_checks():
"""
Retrieves a summary of all Trusted Advisor checks.
"""
try:
response = trusted_advisor_client.describe_trusted_advisor_checks(language='en')
check_summaries = []
for check in response['checks']:
# For each check, get its status and details
status_response = trusted_advisor_client.describe_trusted_advisor_check_summaries(
checkIds=[check['id']]
)
summary = status_response['summaries'][0]
check_summaries.append({
'name': check['name'],
'category': check['category'],
'status': summary['status'],
'resources_flagged': summary.get('resourcesFlagged', 0),
'resources_suppressed': summary.get('resourcesSuppressed', 0),
'resources_passed': summary.get('resourcesPassed', 0),
'resources_error': summary.get('resourcesError', 0)
})
return check_summaries
except Exception as e:
print(f"Error getting Trusted Advisor checks: {e}")
return []
def get_aws_config_compliance(rule_name=None):
"""
Retrieves compliance status for AWS Config rules.
Optionally filters by a specific rule name.
"""
try:
if rule_name:
response = config_client.describe_config_rules(ConfigRuleNames=[rule_name])
else:
response = config_client.describe_config_rules()
compliance_details = []
for rule in response['ConfigRules']:
compliance_response = config_client.get_compliance_details_by_config_rule(
ConfigRuleName=rule['ConfigRuleName']
)
compliance_details.append({
'rule_name': rule['ConfigRuleName'],
'compliance_status': compliance_response['EvaluationResults'][0]['ComplianceType'] if compliance_response['EvaluationResults'] else 'NOT_EVALUATED',
'resource_details': [
{'resource_type': er['EvaluationResultIdentifier']['ResourceDetails']['ResourceType'],
'resource_id': er['EvaluationResultIdentifier']['ResourceDetails']['ResourceId'],
'compliance_type': er['ComplianceType'],
'annotation': er.get('Annotation')}
for er in compliance_response['EvaluationResults']
] if 'EvaluationResults' in compliance_response else []
})
return compliance_details
except Exception as e:
print(f"Error getting AWS Config compliance: {e}")
return []
def get_well_architected_workload_info(workload_id):
"""
Retrieves details for a specific workload from the AWS Well-Architected Tool.
"""
try:
response = well_architected_client.get_workload(WorkloadId=workload_id)
workload_info = response['Workload']
# You can further fetch answers for specific pillars if needed
# list_answers_response = well_architected_client.list_answers(WorkloadId=workload_id, PillarId='security')
# print(list_answers_response)
return workload_info
except Exception as e:
print(f"Error getting Well-Architected workload info: {e}")
return None
# Example usage:
if __name__ == "__main__":
print("--- Trusted Advisor Checks ---")
ta_checks = get_trusted_advisor_checks()
print(json.dumps(ta_checks, indent=2))
print("\n--- AWS Config Compliance (Example: s3-bucket-public-read-prohibited) ---")
config_compliance = get_aws_config_compliance(rule_name='s3-bucket-public-read-prohibited')
print(json.dumps(config_compliance, indent=2))
# Replace with your actual Well-Architected Workload ID
# print("\n--- AWS Well-Architected Workload Info (Example Workload ID) ---")
# example_workload_id = "your-well-architected-workload-id"
# wa_workload_info = get_well_architected_workload_info(example_workload_id)
# print(json.dumps(wa_workload_info, indent=2))
This code provides a foundation for gathering inputs. In a real-world scenario, you would aggregate these inputs for a specific workload or a set of resources, then structure them into a comprehensive JSON or text format to be passed to the LLM.
Prompt Engineering and LLM Evaluation
The quality of recommendations from a generative AI model heavily depends on the clarity and specificity of the input prompt. Prompt engineering for Well-Architected Reviews involves crafting prompts that guide the LLM to assess workload alignment with the six pillars, identify violations, and generate actionable, human-readable recommendations.
Designing Effective Prompts
When designing prompts for Well-Architected Reviews, consider the following:
- Contextual Information: Provide all relevant data about the workload, including its purpose, components, existing configurations, and any identified issues (e.g., from Trusted Advisor or AWS Config).
- Pillar Focus: Clearly state which Well-Architected Pillar the review is focusing on (e.g., Security, Cost Optimization).
- Desired Output Format: Specify the desired format for the recommendations (e.g., JSON, markdown list, severity levels).
- Call to Action: Explicitly ask the LLM to identify deviations, suggest improvements, and provide reasoning.
Example Prompts and Expected Responses using Bedrock
Here are examples of prompts using different Amazon Bedrock foundation models and their expected responses. We’ll focus on the Security Pillar for demonstration.
Scenario: We have a web application running on EC2 instances behind an ALB, using RDS for the database. AWS Config identified an S3 bucket used by the application that allows public read access.
Prompt for Anthropic Claude (via Bedrock):
{
"anthropic_version": "bedrock-2023-05-31",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "As an AWS Well-Architected expert focusing on the Security Pillar, analyze the following workload details and identified issues. Provide specific, actionable recommendations to improve the security posture, including the reasoning for each recommendation and a severity level (High, Medium, Low).\n\nWorkload Description:\n- Application: E-commerce website\n- Architecture: EC2 instances (Auto Scaling Group), Application Load Balancer (ALB), RDS PostgreSQL database, S3 for static assets and user uploads.\n\nIdentified Security Issue (from AWS Config):\n```json\n{\n \"rule_name\": \"s3-bucket-public-read-prohibited\",\n \"compliance_status\": \"NON_COMPLIANT\",\n \"resource_details\": [\n {\n \"resource_type\": \"AWS::S3::Bucket\",\n \"resource_id\": \"my-ecommerce-static-assets\",\n \"compliance_type\": \"NON_COMPLIANT\",\n \"annotation\": \"Bucket 'my-ecommerce-static-assets' has public read access.\"\n }\n ]\n}\n```\n\nBased on the AWS Well-Architected Security Pillar best practices, generate a list of recommendations in JSON format. Each recommendation should include:\n- `title`: A concise title for the recommendation.\n- `description`: Detailed explanation of the recommendation and its benefits.\n- `reasoning`: Why this recommendation aligns with WAFR Security Pillar.\n- `severity`: High, Medium, or Low.\n- `remediation_steps`: A clear, step-by-step guide to implement the recommendation."
}
]
}
],
"max_tokens": 2000,
"temperature": 0.2
}
Expected Response (Truncated for brevity, actual response will be more comprehensive):
{
"content": [
{
"type": "text",
"text": "```json\n[\n {\n \"title\": \"Restrict Public Read Access on S3 Bucket\",\n \"description\": \"The S3 bucket 'my-ecommerce-static-assets' is publicly readable, which can expose sensitive data or allow unauthorized access to static assets meant for internal use. Even if it's for static website hosting, it's critical to ensure only intended content is public and to restrict any potentially sensitive information.\",\n \"reasoning\": \"This directly violates the Security Pillar's principle of 'Implement a strong identity foundation' and 'Protect data in transit and at rest'. Public S3 buckets, unless explicitly designed for public distribution of non-sensitive content, represent a significant security risk for data exposure and unauthorized access.\",\n \"severity\": \"High\",\n \"remediation_steps\": [\n \"1. Identify if 'my-ecommerce-static-assets' is intended for static website hosting. If so, ensure no sensitive data is stored there and use a CloudFront distribution with Origin Access Control (OAC) for tighter control.\",\n \"2. If not for public website hosting, modify the S3 bucket policy and ACLs to block public access.\",\n \"3. Use S3 Block Public Access settings at the account or bucket level.\",\n \"4. Implement least privilege access: grant only necessary IAM permissions to applications or users accessing the bucket.\"\n ]\n },\n {\n \"title\": \"Implement Principle of Least Privilege for IAM Roles\",\n \"description\": \"Ensure all IAM roles associated with EC2 instances, Lambda functions, or other services interacting with the S3 bucket have only the minimum necessary permissions (e.g., s3:GetObject for reading static assets, s3:PutObject for uploads, but not s3:PutObjectAcl or s3:*).\",\n \"reasoning\": \"A core principle of the Security Pillar is to implement the principle of least privilege, reducing the attack surface by limiting what an entity can do.\",\n \"severity\": \"Medium\",\n \"remediation_steps\": [\n \"1. Review IAM roles attached to EC2 instances and other services interacting with S3.\",\n \"2. Audit existing IAM policies for excessive permissions.\",\n \"3. Refine policies to grant only specific actions on specific resources, utilizing resource-level permissions where possible.\"\n ]\n }\n]\n```"
}
]
}
Prompt for Amazon Titan Text Express (via Bedrock):
{
"inputText": "As an AWS Well-Architected reviewer, assess the security posture of an e-commerce web application with EC2, ALB, RDS, and S3. A critical finding from AWS Config is that the S3 bucket 'my-ecommerce-static-assets' allows public read access. Recommend remediation steps with clear reasoning and a severity level (High, Medium, Low) following Well-Architected Security Pillar best practices. Provide output in markdown format.",
"textGenerationConfig": {
"maxTokenCount": 1000,
"temperature": 0.5,
"topP": 0.9
}
}
Expected Response (Markdown formatted):
### Security Recommendation: Restrict Public Access on S3 Bucket
**Severity:** High
**Description:** The S3 bucket 'my-ecommerce-static-assets' is currently configured for public read access. This poses a significant security risk as it could expose sensitive data, allow unauthorized access to internal assets, or lead to data leakage if not strictly managed.
**Reasoning:** This directly contravenes the AWS Well-Architected Security Pillar's guidance on 'Protect data in transit and at rest' and 'Implement a strong identity foundation'. Unrestricted public access to S3 buckets, unless explicitly required for public distribution of non-sensitive content and controlled via a CDN, is a common vulnerability leading to data breaches.
**Remediation Steps:**
1. **Determine Intent:** Verify if the bucket is intended for public static website hosting.
* If **YES**: Ensure no sensitive data is stored. Implement Amazon CloudFront with Origin Access Control (OAC) to restrict direct S3 bucket access, and only allow CloudFront to retrieve objects. This provides a CDN for performance and an additional layer of security.
* If **NO**: Immediately disable public access.
* Navigate to the S3 console, select the bucket, go to 'Permissions', and ensure 'Block public access' settings are enabled.
* Review and remove any bucket policies or Access Control Lists (ACLs) that grant `public-read` or similar permissions.
2. **Least Privilege:** Ensure that any IAM roles or users that require access to this bucket have only the minimum necessary permissions (e.g., `s3:GetObject` for read-only access by the application). Avoid using wildcard permissions (`s3:*`).
3. **Regular Audits:** Establish a routine for auditing S3 bucket permissions using AWS Config rules or custom scripts to detect and remediate public access violations proactively.
---
### Security Recommendation: Enforce Least Privilege for Application IAM Roles
**Severity:** Medium
**Description:** While the S3 public access is a critical issue, it's also important to ensure that the IAM roles used by your EC2 instances and other application components adhere to the principle of least privilege when interacting with S3 and other AWS services. This minimizes the blast radius in case of a compromise.
**Reasoning:** The Security Pillar emphasizes 'Implementing a strong identity foundation' by granting only the permissions required to perform a task. Over-privileged roles can be exploited to gain unauthorized access to resources.
**Remediation Steps:**
1. **Review IAM Policies:** Inspect the IAM policies attached to the EC2 instance roles, Lambda function roles, and any other services interacting with the S3 bucket or other critical resources.
2. **Granular Permissions:** Refine policies to grant specific actions on specific resources where possible. For instance, instead of `s3:*` on all buckets, use `s3:GetObject` on `arn:aws:s3:::my-ecommerce-static-assets/*`.
3. **IAM Access Analyzer:** Utilize IAM Access Analyzer to identify unintended external access to your resources.
Automating the Review Workflow
Automating the Well-Architected Review workflow using AWS Lambda, EventBridge, and Amazon Bedrock enables continuous assessment and proactive identification of issues.
Lambda-based Automation
The core of the automation is an AWS Lambda function that orchestrates the review process.
Workflow Steps:
- Trigger: An Amazon EventBridge rule (e.g., scheduled, or reacting to AWS Config
ComplianceChange
events). - Data Fetch: The Lambda function uses boto3 to fetch relevant data from Trusted Advisor, AWS Config, and potentially the AWS Well-Architected Tool.
- Contextualization: The fetched data is aggregated and formatted into a structured input for the LLM.
- LLM Invocation: The Lambda function invokes Amazon Bedrock with the prepared prompt and input data.
- Result Processing: The LLM’s response (e.g., JSON recommendations) is parsed.
- Storage and Reporting: The structured recommendations are stored in Amazon S3. Optionally, an SNS topic is published for alerts, or QuickSight dashboards are updated.
Code Snippets
1. AWS Lambda Function (Python):
import boto3
import json
import os
import datetime
# Initialize clients
ta_client = boto3.client('support', region_name='us-east-1') # Trusted Advisor is often in us-east-1
config_client = boto3.client('config')
bedrock_runtime = boto3.client('bedrock-runtime')
s3_client = boto3.client('s3')
sns_client = boto3.client('sns')
# Environment variables
S3_BUCKET_NAME = os.environ.get('S3_BUCKET_NAME')
SNS_TOPIC_ARN = os.environ.get('SNS_TOPIC_ARN')
BEDROCK_MODEL_ID = os.environ.get('BEDROCK_MODEL_ID', 'anthropic.claude-3-sonnet-20240229-v1:0') # Example model
def get_trusted_advisor_security_checks():
"""Fetches key security-related Trusted Advisor checks."""
try:
response = ta_client.describe_trusted_advisor_checks(language='en')
security_checks = []
for check in response['checks']:
if check['category'] == 'security':
summary_response = ta_client.describe_trusted_advisor_check_summaries(checkIds=[check['id']])
if summary_response['summaries']:
summary = summary_response['summaries'][0]
security_checks.append({
'name': check['name'],
'status': summary['status'],
'resources_flagged': summary.get('resourcesFlagged', 0),
'resources_suppressed': summary.get('resourcesSuppressed', 0)
})
return security_checks
except Exception as e:
print(f"Error getting Trusted Advisor security checks: {e}")
return []
def get_aws_config_non_compliant_rules():
"""Fetches non-compliant AWS Config rules and their resources."""
try:
response = config_client.describe_compliance_by_config_rule(
ComplianceTypes=['NON_COMPLIANT']
)
non_compliant_rules = []
for item in response['ComplianceByConfigRules']:
rule_name = item['ConfigRuleName']
details_response = config_client.get_compliance_details_by_config_rule(
ConfigRuleName=rule_name,
ComplianceTypes=['NON_COMPLIANT']
)
resources = []
for er in details_response['EvaluationResults']:
resources.append({
'resource_type': er['EvaluationResultIdentifier']['ResourceDetails']['ResourceType'],
'resource_id': er['EvaluationResultIdentifier']['ResourceDetails']['ResourceId'],
'annotation': er.get('Annotation')
})
non_compliant_rules.append({
'rule_name': rule_name,
'compliance_status': 'NON_COMPLIANT',
'resources': resources
})
return non_compliant_rules
except Exception as e:
print(f"Error getting AWS Config non-compliant rules: {e}")
return []
def invoke_bedrock_llm(prompt_text, model_id=BEDROCK_MODEL_ID):
"""Invokes the specified Bedrock LLM with the given prompt."""
try:
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt_text
}
]
}
],
"max_tokens": 2000,
"temperature": 0.2
})
response = bedrock_runtime.invoke_model(
body=body,
contentType='application/json',
accept='application/json',
modelId=model_id
)
response_body = json.loads(response.get('body').read())
return response_body['content'][0]['text']
except Exception as e:
print(f"Error invoking Bedrock LLM: {e}")
return f"LLM invocation failed: {e}"
def lambda_handler(event, context):
print(f"Received event: {json.dumps(event)}")
# 1. Gather input data
ta_findings = get_trusted_advisor_security_checks()
config_findings = get_aws_config_non_compliant_rules()
review_input = {
"trusted_advisor_security_findings": ta_findings,
"aws_config_non_compliant_rules": config_findings
}
print(f"Aggregated Review Input: {json.dumps(review_input, indent=2)}")
# 2. Construct the LLM prompt
prompt = f"""
As an AWS Well-Architected expert, analyze the following security findings from an AWS account.
Identify potential security risks and deviations from the Security Pillar best practices.
Generate specific, actionable recommendations in JSON format, including a 'title', 'description', 'reasoning', 'severity' (High, Medium, Low), and 'remediation_steps' (a list of strings).
Focus on practical advice that can be implemented by an AWS engineer.
Security Findings:
```json
{json.dumps(review_input, indent=2)}
```
Please provide the recommendations in a JSON array.
"""
# 3. Invoke the LLM via Bedrock
print("Invoking Bedrock LLM...")
llm_raw_response = invoke_bedrock_llm(prompt)
print(f"LLM Raw Response: {llm_raw_response}")
# 4. Parse and process LLM response
recommendations = []
try:
# LLM might return JSON within markdown block
if llm_raw_response.strip().startswith('```json'):
json_str = llm_raw_response.strip()[7:-3].strip()
else:
json_str = llm_raw_response.strip()
recommendations = json.loads(json_str)
print(f"Parsed Recommendations: {json.dumps(recommendations, indent=2)}")
except json.JSONDecodeError as e:
print(f"Failed to parse LLM response as JSON: {e}")
print(f"Raw LLM response was: {llm_raw_response}")
# Handle cases where LLM doesn't return perfect JSON
recommendations = [{"title": "LLM Parsing Error", "description": "Could not parse LLM output. Manual review required.", "severity": "High"}]
except Exception as e:
print(f"An unexpected error occurred during parsing: {e}")
recommendations = [{"title": "Unexpected Error", "description": str(e), "severity": "High"}]
# 5. Store results in S3
if S3_BUCKET_NAME:
timestamp = datetime.datetime.now().isoformat()
s3_key = f"well-architected-reviews/security-pillar/{timestamp}.json"
try:
s3_client.put_object(
Bucket=S3_BUCKET_NAME,
Key=s3_key,
Body=json.dumps(recommendations, indent=2),
ContentType='application/json'
)
print(f"Recommendations saved to s3://{S3_BUCKET_NAME}/{s3_key}")
except Exception as e:
print(f"Error saving to S3: {e}")
# 6. Optional: Send SNS notification for high-severity findings
if SNS_TOPIC_ARN:
high_severity_findings = [r for r in recommendations if r.get('severity') == 'High']
if high_severity_findings:
sns_message = f"AWS Well-Architected Security Review completed with HIGH severity findings. Review S3 bucket for details: s3://{S3_BUCKET_NAME}/{s3_key}\n\nHigh Severity Recommendations:\n"
for hs in high_severity_findings:
sns_message += f"- {hs.get('title')}: {hs.get('description')}\n"
try:
sns_client.publish(
TopicArn=SNS_TOPIC_ARN,
Subject="Urgent: AWS Well-Architected Security Findings",
Message=sns_message
)
print(f"SNS notification sent to {SNS_TOPIC_ARN}")
except Exception as e:
print(f"Error sending SNS notification: {e}")
return {
'statusCode': 200,
'body': json.dumps({
'message': 'Well-Architected review processed successfully',
'recommendations_count': len(recommendations),
's3_location': f"s3://{S3_BUCKET_NAME}/{s3_key}" if S3_BUCKET_NAME else "N/A"
})
}
2. Event Trigger (EventBridge):
You can configure an EventBridge rule to trigger the Lambda function.
- Scheduled Trigger (e.g., daily at 05:00 UTC): JSON
{ "source": [ "aws.events" ], "detail-type": [ "Scheduled Event" ], "detail": { "eventName": [ "Scheduled Event" ] }, "resources": [ "arn:aws:events:REGION:ACCOUNT_ID:rule/WellArchitectedReviewDaily" ], "time": "05:00" }
Alternatively, a simple cron expression:cron(0 5 * * ? *)
- AWS Config
ComplianceChange
Event (for reactive reviews): JSON{ "source": [ "aws.config" ], "detail-type": [ "Config Rules Compliance Change" ], "detail": { "messageType": [ "ComplianceChangeNotification" ], "newEvaluationResult": { "complianceType": [ "NON_COMPLIANT" ] } } }
3. Security Setup (IAM Role for Lambda):
The Lambda function requires an IAM role with permissions to:
- Invoke
bedrock-runtime:InvokeModel
- Read from
support:DescribeTrustedAdvisorChecks
andsupport:DescribeTrustedAdvisorCheckSummaries
- Read from
config:DescribeConfigRules
andconfig:GetComplianceDetailsByConfigRule
- Write to
s3:PutObject
on the designated S3 bucket - Publish to
sns:Publish
on the designated SNS topic (if used) - Basic Lambda execution permissions (
logs:CreateLogGroup
,logs:CreateLogStream
,logs:PutLogEvents
)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"support:DescribeTrustedAdvisorChecks",
"support:DescribeTrustedAdvisorCheckSummaries"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"config:DescribeConfigRules",
"config:GetComplianceDetailsByConfigRule"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "bedrock:InvokeModel",
"Resource": "arn:aws:bedrock:*:*:foundation-model/*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::your-well-architected-reports-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:*:*:your-well-architected-alerts-topic"
}
]
}
Integration with Reporting and Governance Tools
The generated Well-Architected recommendations are most valuable when they are actionable and integrated into existing operational workflows.
Storing Results and Reporting
Recommendations should be stored in a durable and queryable format. Amazon S3 is an excellent choice for this.
- Amazon S3: The Lambda function stores the JSON output of the LLM in an S3 bucket, typically with a logical folder structure (e.g.,
s3://well-architected-reports/security-pillar/YYYY-MM-DD/report.json
). - Athena + QuickSight:
- AWS Glue Crawler: Configure an AWS Glue Crawler to crawl the S3 bucket where your JSON reports are stored. This crawler automatically infers the schema of your JSON data and creates a table in the AWS Glue Data Catalog.
- Amazon Athena: Use Athena to query the Glue Data Catalog table using standard SQL. This allows you to run analytical queries across all your historical Well-Architected review data.
- Amazon QuickSight: Connect QuickSight to your Athena data source. You can then build interactive dashboards to visualize:
- Trends in recommendation severity over time.
- Common issues identified across different workloads or accounts.
- Progress in addressing recommendations.
- Pillar-specific compliance dashboards.
Integration with Ticket Management and Alerting
- Jira or ServiceNow: For identified high-severity recommendations, the Lambda function can be extended to directly create tickets in external IT Service Management (ITSM) systems. This typically involves:
- Using an SDK or API client for the ITSM system (e.g., Python
requests
library for REST APIs). - Mapping LLM-generated fields (title, description, severity, remediation steps) to ITSM ticket fields.
- Including a link back to the detailed report in S3.
- Using an SDK or API client for the ITSM system (e.g., Python
- Amazon SNS: For immediate alerts on critical findings, the Lambda function can publish messages to an SNS topic. This topic can then send email notifications to relevant teams, trigger other Lambda functions, or integrate with chat tools.
Tagging and Versioning Outputs
For traceability and auditability, it’s crucial to apply proper tagging and versioning to your S3 outputs:
- S3 Versioning: Enable versioning on your S3 bucket to keep a historical record of all review reports. This allows you to revert to previous versions or track changes over time.
- S3 Object Tagging: Apply S3 object tags to your generated reports. Examples include
workload-id
,pillar-id
,review-date
,account-id
. These tags enable easier filtering, cost allocation, and organization of your review data.
Benefits and Best Practices
Leveraging generative AI for AWS Well-Architected Reviews offers significant advantages but also requires careful consideration of best practices.
Key Benefits
- Improved Consistency: LLMs apply the same logic and framework understanding across all reviews, minimizing human bias and ensuring consistent application of WAFR principles.
- Accelerated Reviews: Automation drastically reduces the time required to conduct comprehensive reviews, moving from weeks or days to hours or even minutes.
- Scalability: The automated process can be scaled to review hundreds or thousands of workloads across multiple accounts without a linear increase in human effort.
- Proactive Issue Identification: Integrating with real-time events (e.g., Config compliance changes) allows for near real-time identification of deviations from best practices.
- Enhanced Recommendations: LLMs can synthesize vast amounts of information and generate highly detailed, actionable recommendations, often with reasoning, that can be difficult for humans to consistently produce.
- Reduced Human Burden: Frees up experienced cloud architects and DevOps engineers from repetitive data gathering and initial analysis, allowing them to focus on complex problem-solving, strategic planning, and validating AI-generated insights.
Best Practices
- Augment, Don’t Replace: Generative AI should augment, not replace, human architects and compliance leads. AI provides a powerful first pass, but human oversight is crucial for validating recommendations, understanding business context, and making final decisions.
- Regular Prompt Tuning: Continuously refine your LLM prompts based on the quality of generated recommendations. Experiment with different phrasings, model parameters (e.g., temperature, top-p), and input formatting to achieve optimal results.
- Validate AI Suggestions with SMEs: Always have subject matter experts (SMEs) review AI-generated recommendations, especially for high-severity findings, to ensure accuracy, feasibility, and alignment with organizational policies.
- Implement Multi-layered Review Loops (AI + Human + Compliance):
- AI Layer: Automated data collection and initial analysis by LLM.
- Human Review: Cloud architects or workload owners review AI outputs, provide context, and approve/modify recommendations.
- Compliance/Audit Layer: Compliance teams can use the structured reports for audit purposes and to track adherence to best practices.
- Start Small, Iterate: Begin with automating reviews for a single pillar or a subset of workloads. Gather feedback, refine the process, and then expand.
- Monitor LLM Performance: Implement metrics to track the quality of LLM outputs (e.g., relevance, actionability, adherence to WAFR).
- Handle Sensitive Data Carefully: Ensure that any sensitive data passed to the LLM is handled securely and in compliance with data governance policies. Amazon Bedrock processes data within the AWS network and doesn’t use customer data to train the models.
Conclusion
The convergence of the AWS Well-Architected Framework and generative AI through Amazon Bedrock offers a transformative approach to cloud governance and optimization. By automating the laborious aspects of Well-Architected Reviews—from data ingestion and analysis to recommendation generation—organizations can achieve faster, more scalable, and consistently intelligent assessments. This paradigm shift empowers cloud teams to maintain higher standards of security, reliability, performance, cost efficiency, operational excellence, and sustainability across their evolving AWS landscapes.
Adopting this approach is particularly beneficial for organizations managing large and complex AWS environments, or those operating under strict regulatory compliance requirements. The ability to quickly identify and address architectural deviations becomes a competitive advantage.
Looking ahead, further advancements could include multilingual assessments for global teams, deeper integration with cost anomaly detection leveraging machine learning, and the ability for LLMs to simulate remediation actions to predict their impact. The journey towards fully autonomous, intelligent cloud governance is just beginning, and generative AI is a pivotal enabler.