Architecting Multi-Agent AI Systems on AWS for Autonomous Enterprise Workflows

Introduction

The advent of large language models (LLMs) has revolutionized how we approach complex computational problems. While single LLMs excel at specific tasks, truly autonomous enterprise workflows often demand a higher level of intelligence, adaptability, and resilience. This is where multi-agent AI systems shine. By decomposing a complex problem into smaller, manageable subtasks and assigning them to specialized, collaborative agents, we can achieve more robust, scalable, and sophisticated automation.

This article delves into architecting such multi-agent systems on Amazon Web Services (AWS), focusing on a blackboard-style architecture. We’ll explore the design principles, core AWS services for implementation, and strategies for secure action invocation, evaluation, and monitoring.

Understanding Multi-Agent Systems and the Blackboard Architecture

A multi-agent system comprises multiple intelligent agents that interact and collaborate to achieve a common goal. Each agent possesses specific capabilities, knowledge, and a defined role. The blackboard architecture is a classic AI paradigm particularly well-suited for multi-agent collaboration, especially in scenarios requiring opportunistic problem-solving.

In a blackboard system:

Blackboard: A global data store accessible to all agents, where problems, partial solutions, and relevant data are posted.
Knowledge Sources (Agents): Independent agents that monitor the blackboard, react to changes, and contribute their expertise by reading from and writing to the blackboard.
Control Component: Manages the overall problem-solving process, determining which agent to activate next based on the blackboard’s state.

Role Design for Specialized Agents

Effective multi-agent systems rely on well-defined roles that foster collaboration and minimize redundancy. For autonomous enterprise workflows, key agent roles often include:

Planner Agent:
- Responsibility: Decomposing complex, high-level goals into a sequence of actionable subtasks. It defines the overall strategy and workflow.
- Input: High-level problem statement, current state from the blackboard.
- Output: A detailed plan (sequence of subtasks, dependencies) posted to the blackboard.
- Example: For an “Onboard New Employee” workflow, the Planner might generate subtasks like “Create HR Record,” “Provision Laptop,” “Grant Access to Systems.”
Executor Agent:
- Responsibility: Executing specific subtasks defined by the Planner. This often involves interacting with external systems or invoking specific tools/APIs.
- Input: A specific subtask from the blackboard, necessary parameters.
- Output: The result of the execution, status updates, or errors posted to the blackboard.
- Example: An Executor might call an HR system API to create a new employee record or trigger an IT provisioning service.
Verifier Agent:
- Responsibility: Validating the output or state changes produced by Executor agents. Ensures correctness, adherence to business rules, and compliance.
- Input: Output of an Executor agent, relevant business rules or criteria from the blackboard.
- Output: Validation status (success/failure), identified discrepancies, or suggestions for remediation posted to the blackboard.
- Example: A Verifier might check if all required fields in the HR record are populated correctly or if system access grants match the employee’s role.
Retriever Agent:
- Responsibility: Fetching relevant information from various knowledge sources (databases, documents, APIs, RAG systems) to inform other agents.
- Input: A query or context requiring additional information.
- Output: Retrieved data, documents, or context posted to the blackboard.
- Example: A Retriever might query an internal knowledge base for “laptop provisioning steps” or fetch employee details from a directory service.

Shared Memory: The Blackboard Implementation

The blackboard is critical for inter-agent communication and state management. AWS offers robust options for implementing this shared memory:

Amazon DynamoDB

DynamoDB is a fast, flexible NoSQL database service for single-digit millisecond performance at any scale. It’s ideal for structured and semi-structured data, making it suitable for storing agent states, task queues, and intermediate results on the blackboard.

Pros: High performance, fully managed, scalable, fine-grained access control.
Cons: Less suitable for complex search queries or unstructured data.

Amazon OpenSearch Service

OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters. It’s excellent for full-text search, analytics, and storing less structured data, making it suitable for a knowledge base or a more complex blackboard where agents need to search for relevant information.

Pros: Powerful search capabilities, analytics, scalable, good for unstructured data.
Cons: More complex to manage than DynamoDB for simple key-value lookups.

For simplicity in the code examples, we will focus on DynamoDB as the primary blackboard implementation.

Python Code Snippets: Reading/Writing Agent Memory to DynamoDB

import boto3
import json
import os

# Initialize DynamoDB client
dynamodb = boto3.resource('dynamodb')
# Get table name from environment variable (best practice)
BLACKBOARD_TABLE_NAME = os.environ.get('BLACKBOARD_TABLE_NAME', 'MultiAgentBlackboard')
blackboard_table = dynamodb.Table(BLACKBOARD_TABLE_NAME)

def write_to_blackboard(task_id: str, agent_id: str, status: str, payload: dict):
    """
    Writes or updates an entry on the blackboard.
    Each task has a unique ID, and agent updates are nested within it.
    """
    try:
        response = blackboard_table.update_item(
            Key={'TaskId': task_id},
            UpdateExpression="SET #agent_id = :agent_data, LastUpdated = :timestamp",
            ExpressionAttributeNames={'#agent_id': agent_id},
            ExpressionAttributeValues={
                ':agent_data': {
                    'Status': status,
                    'Payload': payload,
                    'Timestamp': boto3.util.current_time_millis()
                },
                ':timestamp': boto3.util.current_time_millis()
            },
            ReturnValues="UPDATED_NEW"
        )
        print(f"Successfully wrote to blackboard for TaskId: {task_id}, Agent: {agent_id}")
        return response
    except Exception as e:
        print(f"Error writing to blackboard: {e}")
        raise

def read_from_blackboard(task_id: str):
    """
    Reads a specific task's state from the blackboard.
    """
    try:
        response = blackboard_table.get_item(Key={'TaskId': task_id})
        item = response.get('Item')
        if item:
            print(f"Successfully read from blackboard for TaskId: {task_id}")
            return item
        else:
            print(f"No item found for TaskId: {task_id}")
            return None
    except Exception as e:
        print(f"Error reading from blackboard: {e}")
        raise

# Example Usage (assuming a Lambda function context or local execution with credentials)
if __name__ == "__main__":
    # Create a dummy table for local testing if it doesn't exist
    try:
        dynamodb.create_table(
            TableName=BLACKBOARD_TABLE_NAME,
            KeySchema=[{'AttributeName': 'TaskId', 'KeyType': 'HASH'}],
            AttributeDefinitions=[{'AttributeName': 'TaskId', 'AttributeType': 'S'}],
            ProvisionedThroughput={'ReadCapacityUnits': 5, 'WriteCapacityUnits': 5}
        )
        print(f"Table '{BLACKBOARD_TABLE_NAME}' created (if it didn't exist).")
    except dynamodb.meta.client.exceptions.ResourceInUseException:
        print(f"Table '{BLACKBOARD_TABLE_NAME}' already exists.")
    except Exception as e:
        print(f"Error creating table: {e}")

    # Simulate a Planner writing to the blackboard
    task_id_1 = "onboarding-001"
    planner_payload = {
        "employee_name": "Alice Smith",
        "action_plan": [
            {"step": 1, "agent_role": "Executor", "subtask": "Create HR Record", "status": "pending"},
            {"step": 2, "agent_role": "Executor", "subtask": "Provision Laptop", "status": "pending"},
            {"step": 3, "agent_role": "Verifier", "subtask": "Verify HR Record", "status": "pending"}
        ]
    }
    write_to_blackboard(task_id_1, "PlannerAgent", "PlanGenerated", planner_payload)

    # Simulate an Executor reading and then updating the blackboard
    current_task_state = read_from_blackboard(task_id_1)
    if current_task_state:
        print(f"\nInitial state for {task_id_1}:\n{json.dumps(current_task_state, indent=2)}")

        # Update the plan after Executor completes a subtask
        updated_plan = current_task_state['PlannerAgent']['Payload']['action_plan']
        updated_plan[0]['status'] = 'completed'
        executor_payload = {
            "subtask_completed": "Create HR Record",
            "result": "HR record created successfully",
            "updated_plan": updated_plan
        }
        write_to_blackboard(task_id_1, "ExecutorAgent", "SubtaskCompleted", executor_payload)

    # Read the updated state
    updated_task_state = read_from_blackboard(task_id_1)
    if updated_task_state:
        print(f"\nUpdated state for {task_id_1}:\n{json.dumps(updated_task_state, indent=2)}")

Message-Passing for Inter-Agent Communication

Agents communicate by reading from and writing to the blackboard, but also by triggering each other or signaling events. AWS provides robust services for asynchronous message passing:

AWS Lambda: The serverless compute service where agent logic resides. Each agent role (Planner, Executor, Verifier, Retriever) can be implemented as one or more Lambda functions.
Amazon SQS (Simple Queue Service): A fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS is ideal for point-to-point communication where one agent needs to send a task to another.
Amazon SNS (Simple Notification Service): A highly available, durable, secure, fully managed pub/sub messaging service. SNS is perfect for broadcasting events or state changes to multiple interested agents simultaneously.

Python Code Snippet: Triggering Lambda-based Agent Execution with Context Payloads

Agents are typically triggered by events (e.g., a new entry on the blackboard, a message in an SQS queue, or a Step Functions state transition). Here’s how one Lambda function (e.g., the orchestrator or another agent) can trigger another agent’s Lambda function:

import boto3
import json
import os

# Initialize Lambda client
lambda_client = boto3.client('lambda')

def invoke_agent_lambda(agent_lambda_name: str, payload: dict):
    """
    Invokes a Lambda function representing an agent.
    Payload contains the context for the agent's execution.
    """
    try:
        response = lambda_client.invoke(
            FunctionName=agent_lambda_name,
            InvocationType='Event',  # Asynchronous invocation
            Payload=json.dumps(payload)
        )
        # Check for errors in the invocation response
        if response.get('StatusCode') == 202: # 202 Accepted for async invocation
            print(f"Successfully invoked Lambda function '{agent_lambda_name}' asynchronously.")
            return True
        else:
            print(f"Failed to invoke Lambda function '{agent_lambda_name}'. Status code: {response.get('StatusCode')}")
            print(f"Error details: {response.get('FunctionError')}")
            return False
    except Exception as e:
        print(f"Error invoking Lambda function '{agent_lambda_name}': {e}")
        raise

# Example Lambda function for an Executor Agent (executor_agent_lambda.py)
# This would be the code deployed to the 'MyExecutorAgentLambda' Lambda function.
def executor_agent_handler(event, context):
    """
    Lambda handler for the Executor Agent.
    It reads its task from the event payload and performs an action.
    """
    print(f"Executor Agent received event: {json.dumps(event, indent=2)}")
    task_id = event.get('task_id')
    subtask = event.get('subtask')
    parameters = event.get('parameters', {})

    if not task_id or not subtask:
        print("Error: Missing 'task_id' or 'subtask' in payload.")
        return {"statusCode": 400, "body": "Missing required parameters"}

    print(f"Executor Agent processing Task ID: {task_id}, Subtask: {subtask}")

    try:
        # Simulate performing the subtask (e.g., calling an external API)
        # In a real scenario, this would involve actual business logic.
        if subtask == "Create HR Record":
            print(f"Creating HR record for employee: {parameters.get('employee_name')}")
            # Simulate success
            result_message = "HR record created."
            status = "completed"
        elif subtask == "Provision Laptop":
            print(f"Provisioning laptop for employee: {parameters.get('employee_name')}")
            # Simulate success
            result_message = "Laptop provisioned."
            status = "completed"
        else:
            result_message = f"Unknown subtask: {subtask}"
            status = "failed"

        # Update the blackboard (using the function defined earlier)
        # Note: In a real distributed system, this Lambda would import and use
        # the blackboard functions. For this example, assume it's available.
        # For demonstration, we'll just print the blackboard update.
        blackboard_update_payload = {
            "subtask": subtask,
            "result": result_message,
            "agent_output": {"status": status}
        }
        # In a real scenario, call write_to_blackboard(task_id, "ExecutorAgent", status, blackboard_update_payload)
        print(f"Simulating blackboard update for TaskId {task_id} by ExecutorAgent with status '{status}' and payload: {blackboard_update_payload}")

        return {
            "statusCode": 200,
            "body": json.dumps({"message": f"Subtask '{subtask}' completed by Executor Agent."})
        }
    except Exception as e:
        print(f"Executor Agent failed for subtask '{subtask}': {e}")
        # Update blackboard with failure
        blackboard_update_payload = {
            "subtask": subtask,
            "error": str(e),
            "agent_output": {"status": "failed"}
        }
        # In a real scenario, call write_to_blackboard(task_id, "ExecutorAgent", "Failed", blackboard_update_payload)
        print(f"Simulating blackboard update for TaskId {task_id} by ExecutorAgent with status 'Failed' and payload: {blackboard_update_payload}")
        return {
            "statusCode": 500,
            "body": json.dumps({"message": f"Subtask '{subtask}' failed."})
        }

# Example of how an orchestrator might trigger the Executor Agent
if __name__ == "__main__":
    # This part would typically run in a Step Functions state or another Lambda
    # For local testing, ensure your AWS credentials are configured.
    executor_agent_lambda_name = "MyExecutorAgentLambda" # Replace with your actual Lambda function name
    sample_payload = {
        "task_id": "onboarding-001",
        "subtask": "Create HR Record",
        "parameters": {
            "employee_name": "Alice Smith",
            "department": "Engineering"
        }
    }
    # invoke_agent_lambda(executor_agent_lambda_name, sample_payload)
    print("\nTo run the above invocation, uncomment the 'invoke_agent_lambda' line and ensure 'MyExecutorAgentLambda' exists.")
    print("The 'executor_agent_handler' function above demonstrates what the invoked Lambda would do.")

Orchestration with AWS Step Functions

AWS Step Functions is a serverless workflow service that lets you combine AWS Lambda functions and other AWS services to build business-critical applications. It’s an ideal control component for a blackboard-style multi-agent architecture, enabling robust, stateful orchestration.

Step Functions can:

Define sequential, parallel, branching, and error handling logic.
Poll the blackboard (DynamoDB) for state changes.
Trigger specific agent Lambda functions based on blackboard content.
Manage timeouts and retries.

Step Functions Orchestration Flow (YAML)

Here’s a simplified YAML (Amazon States Language) definition for a Step Functions workflow orchestrating a multi-agent system for a hypothetical “Employee Onboarding” process. This workflow demonstrates how Step Functions can act as the control component, reacting to blackboard state and triggering agents.

AWSTemplateFormatVersion: '2010-09-09'
Description: State machine for multi-agent employee onboarding workflow.

Resources:
  MultiAgentOnboardingStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: MultiAgentOnboardingWorkflow
      DefinitionString: |-
        {
          "Comment": "Multi-Agent Employee Onboarding Workflow orchestrated by Step Functions",
          "StartAt": "InitializeBlackboard",
          "States": {
            "InitializeBlackboard": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:InitializeBlackboardLambda",
                "Payload": {
                  "taskId.$": "$.taskId",
                  "initialState": {
                    "status": "New",
                    "plan": []
                  }
                }
              },
              "Retry": [{
                "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                "IntervalSeconds": 2,
                "MaxAttempts": 6,
                "BackoffRate": 2
              }],
              "Next": "InvokePlannerAgent"
            },
            "InvokePlannerAgent": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:PlannerAgentLambda",
                "Payload": {
                  "taskId.$": "$.taskId",
                  "inputData.$": "$.inputData"
                }
              },
              "Retry": [{
                "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                "IntervalSeconds": 2,
                "MaxAttempts": 6,
                "BackoffRate": 2
              }],
              "Next": "WaitForPlannerCompletion"
            },
            "WaitForPlannerCompletion": {
              "Type": "Wait",
              "Seconds": 10,
              "Next": "CheckBlackboardForPlan"
            },
            "CheckBlackboardForPlan": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ReadBlackboardLambda",
                "Payload": {
                  "taskId.$": "$.taskId"
                }
              },
              "ResultPath": "$.blackboardState",
              "Retry": [{
                "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                "IntervalSeconds": 2,
                "MaxAttempts": 6,
                "BackoffRate": 2
              }],
              "Next": "IsPlanReady"
            },
            "IsPlanReady": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.blackboardState.PlannerAgent.Status",
                  "StringEquals": "PlanGenerated",
                  "Next": "IterateThroughPlan"
                }
              ],
              "Default": "WaitForPlannerCompletion" # Loop back if plan not ready
            },
            "IterateThroughPlan": {
              "Type": "Map",
              "InputPath": "$.blackboardState.PlannerAgent.Payload.action_plan",
              "ItemsPath": "$",
              "MaxConcurrency": 1, # Process subtasks sequentially
              "Iterator": {
                "StartAt": "ExecuteSubtask",
                "States": {
                  "ExecuteSubtask": {
                    "Type": "Choice",
                    "Choices": [
                      {
                        "Variable": "$.agent_role",
                        "StringEquals": "Executor",
                        "Next": "InvokeExecutorAgent"
                      },
                      {
                        "Variable": "$.agent_role",
                        "StringEquals": "Verifier",
                        "Next": "InvokeVerifierAgent"
                      },
                      {
                        "Variable": "$.agent_role",
                        "StringEquals": "Retriever",
                        "Next": "InvokeRetrieverAgent"
                      }
                    ],
                    "Default": "HandleUnknownAgentRole"
                  },
                  "InvokeExecutorAgent": {
                    "Type": "Task",
                    "Resource": "arn:aws:states:::lambda:invoke",
                    "Parameters": {
                      "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ExecutorAgentLambda",
                      "Payload": {
                        "taskId.$": "$$.Execution.Input.taskId",
                        "subtask.$": "$.subtask",
                        "parameters.$": "$.parameters"
                      }
                    },
                    "Retry": [{
                      "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                      "IntervalSeconds": 2,
                      "MaxAttempts": 6,
                      "BackoffRate": 2
                    }],
                    "Next": "WaitForSubtaskCompletion"
                  },
                  "InvokeVerifierAgent": {
                    "Type": "Task",
                    "Resource": "arn:aws:states:::lambda:invoke",
                    "Parameters": {
                      "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:VerifierAgentLambda",
                      "Payload": {
                        "taskId.$": "$$.Execution.Input.taskId",
                        "subtask.$": "$.subtask",
                        "parameters.$": "$.parameters"
                      }
                    },
                    "Retry": [{
                      "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                      "IntervalSeconds": 2,
                      "MaxAttempts": 6,
                      "BackoffRate": 2
                    }],
                    "Next": "WaitForSubtaskCompletion"
                  },
                  "InvokeRetrieverAgent": {
                    "Type": "Task",
                    "Resource": "arn:aws:states:::lambda:invoke",
                    "Parameters": {
                      "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:RetrieverAgentLambda",
                      "Payload": {
                        "taskId.$": "$$.Execution.Input.taskId",
                        "query.$": "$.query",
                        "parameters.$": "$.parameters"
                      }
                    },
                    "Retry": [{
                      "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                      "IntervalSeconds": 2,
                      "MaxAttempts": 6,
                      "BackoffRate": 2
                    }],
                    "Next": "WaitForSubtaskCompletion"
                  },
                  "WaitForSubtaskCompletion": {
                    "Type": "Wait",
                    "Seconds": 5,
                    "Next": "CheckBlackboardForSubtaskStatus"
                  },
                  "CheckBlackboardForSubtaskStatus": {
                    "Type": "Task",
                    "Resource": "arn:aws:states:::lambda:invoke",
                    "Parameters": {
                      "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ReadBlackboardLambda",
                      "Payload": {
                        "taskId.$": "$$.Execution.Input.taskId"
                      }
                    },
                    "ResultPath": "$.currentBlackboardState",
                    "Retry": [{
                      "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
                      "IntervalSeconds": 2,
                      "MaxAttempts": 6,
                      "BackoffRate": 2
                    }],
                    "Next": "IsSubtaskCompleted"
                  },
                  "IsSubtaskCompleted": {
                    "Type": "Choice",
                    "Choices": [
                      {
                        "Variable": "$.currentBlackboardState.ExecutorAgent.Status", # Check status from the relevant agent
                        "StringEquals": "SubtaskCompleted",
                        "Next": "SubtaskSuccess"
                      },
                      {
                        "Variable": "$.currentBlackboardState.VerifierAgent.Status",
                        "StringEquals": "ValidationSuccess",
                        "Next": "SubtaskSuccess"
                      },
                      {
                        "Variable": "$.currentBlackboardState.RetrieverAgent.Status",
                        "StringEquals": "DataRetrieved",
                        "Next": "SubtaskSuccess"
                      },
                      {
                        "Variable": "$.currentBlackboardState.ExecutorAgent.Status",
                        "StringEquals": "Failed",
                        "Next": "SubtaskFailed"
                      }
                    ],
                    "Default": "WaitForSubtaskCompletion" # Loop back if not completed
                  },
                  "SubtaskSuccess": {
                    "Type": "Succeed"
                  },
                  "SubtaskFailed": {
                    "Type": "Fail",
                    "Cause": "Subtask failed execution or validation.",
                    "Error": "SubtaskFailedError"
                  },
                  "HandleUnknownAgentRole": {
                    "Type": "Fail",
                    "Cause": "Unknown agent role encountered in plan.",
                    "Error": "UnknownAgentRoleError"
                  }
                }
              },
              "Next": "FinalizeWorkflow"
            },
            "FinalizeWorkflow": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FinalizeWorkflowLambda",
                "Payload": {
                  "taskId.$": "$.taskId",
                  "finalState.$": "$.blackboardState"
                }
              },
              "End": true
            }
          }
        }
      RoleArn: arn:aws:iam::ACCOUNT_ID:role/StepFunctionsExecutionRole # Replace with your Step Functions IAM Role ARN

Note: Replace REGION and ACCOUNT_ID with your actual AWS region and account ID. The Lambda function ARNs (InitializeBlackboardLambda, PlannerAgentLambda, etc.) should point to your deployed Lambda functions. The ReadBlackboardLambda would be a simple Lambda that calls the read_from_blackboard function shown earlier.

Secure Action Invocation through Bedrock Agents with IAM Policy Control

When agents need to interact with external systems or sensitive internal APIs, security is paramount. AWS Bedrock Agents provide a secure and controlled way for LLM-powered agents to invoke tools (actions).

Bedrock Agents allow you to:

Define actions (tools) as Lambda functions or API schemas.
Associate IAM policies with the Bedrock Agent execution role to control exactly what resources the agent can access. This implements the principle of least privilege.
Provide guardrails to ensure agent behavior aligns with safety policies and business rules.

Python Code Snippet: Guardrail Enforcement in Bedrock Agent Invocation

While direct “guardrail enforcement” is configured within the Bedrock Agent definition and its associated policies, the Python code demonstrates how an agent might use a Bedrock Agent to perform an action, relying on the pre-configured guardrails and IAM policies for security. The key is that the LLM driving the agent makes a tool-use request, and Bedrock mediates that request.

import boto3
import json
import os

# Initialize Bedrock Agent Runtime client
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

# Configuration for the Bedrock Agent
BEDROCK_AGENT_ID = os.environ.get('BEDROCK_AGENT_ID', 'YOUR_BEDROCK_AGENT_ID')
BEDROCK_AGENT_ALIAS_ID = os.environ.get('BEDROCK_AGENT_ALIAS_ID', 'YOUR_BEDROCK_AGENT_ALIAS_ID')

def invoke_bedrock_agent_action(session_id: str, prompt: str, enable_trace: bool = False):
    """
    Invokes a Bedrock Agent to perform an action based on a prompt.
    The Bedrock Agent's configuration (including its associated IAM policies and guardrails)
    will govern what actions it can take and with what permissions.
    """
    try:
        response = bedrock_agent_runtime_client.invoke_agent(
            agentId=BEDROCK_AGENT_ID,
            agentAliasId=BEDROCK_AGENT_ALIAS_ID,
            sessionId=session_id,
            inputText=prompt,
            enableTrace=enable_trace # Set to True for debugging and understanding agent's thought process
        )

        # Process the streaming response
        completion = ""
        for chunk in response['completion']:
            if 'chunk' in chunk:
                completion += chunk['chunk']['bytes'].decode('utf-8')
            elif 'trace' in chunk:
                # If enableTrace is True, you can inspect the trace for debugging
                # This trace contains details about the agent's reasoning, tool use, etc.
                print(f"Bedrock Agent Trace: {json.dumps(chunk['trace'], indent=2)}")
        
        print(f"Bedrock Agent Response: {completion}")
        return completion

    except bedrock_agent_runtime_client.exceptions.AccessDeniedException as e:
        print(f"Access Denied: The Bedrock Agent does not have permission to perform this action. {e}")
        raise
    except bedrock_agent_runtime_client.exceptions.ValidationException as e:
        print(f"Validation Error: The request to Bedrock Agent was invalid. This could be due to guardrails. {e}")
        raise
    except Exception as e:
        print(f"Error invoking Bedrock Agent: {e}")
        raise

# Example Usage
if __name__ == "__main__":
    # Ensure YOUR_BEDROCK_AGENT_ID and YOUR_BEDROCK_AGENT_ALIAS_ID are set as environment variables
    # or replaced directly in the code for testing.
    # This prompt would trigger a tool defined in Bedrock Agent, e.g., "create_user"
    sample_prompt = "Create a new user account for 'John Doe' in the HR system with email '[email protected]'."
    session_id = "user-onboarding-session-123"

    print(f"Attempting to invoke Bedrock Agent with prompt: '{sample_prompt}'")
    try:
        # invoke_bedrock_agent_action(session_id, sample_prompt, enable_trace=True)
        print("\nTo run the above invocation, uncomment the 'invoke_bedrock_agent_action' line and configure your Bedrock Agent IDs.")
        print("Ensure your Bedrock Agent has appropriate tools defined and IAM permissions.")
    except Exception as e:
        print(f"Invocation failed: {e}")

Guardrail Enforcement Note: The actual enforcement of guardrails (e.g., preventing sensitive data leakage, ensuring ethical responses) happens within the Bedrock Agent’s configuration and its underlying LLM and safety controls. The Python code above demonstrates how you invoke the agent, and any guardrail violations would typically manifest as ValidationException or filtered responses from the Bedrock service itself.

Evaluation, Logging, and Monitoring

For autonomous systems, continuous evaluation, comprehensive logging, and proactive monitoring are essential for performance, reliability, and debugging.

Evaluation with Amazon SageMaker FMEval

Amazon SageMaker FMEval (Foundation Model Evaluation) is a capability within SageMaker that helps you evaluate foundation models (FMs) for various use cases. For multi-agent systems, FMEval can be used to:

Evaluate individual agent performance: Assess how well a Planner generates plans, an Executor completes tasks, or a Verifier identifies errors.
Evaluate end-to-end workflow performance: Measure the success rate, latency, and correctness of the entire multi-agent system in achieving complex goals.
Compare different agent strategies: A/B test different LLM prompts, agent roles, or orchestration logic.

FMEval integrates with SageMaker processing jobs, allowing you to define evaluation datasets and metrics.

Logging and Monitoring with Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. It provides a unified view of AWS resources, applications, and services.

CloudWatch Logs: All AWS Lambda functions automatically send their logs to CloudWatch Logs. This is crucial for debugging agent logic. Step Functions also integrates with CloudWatch Logs for workflow execution details.
CloudWatch Metrics: You can create custom metrics from your agent Lambda functions (e.g., AgentInvocationCount, SubtaskCompletionTime, VerificationFailureRate).
CloudWatch Alarms: Set up alarms on these metrics to notify operators of anomalies (e.g., a sudden increase in verification failures).
CloudWatch Dashboards: Create custom dashboards to visualize the health and performance of your multi-agent system.
CloudWatch Logs Insights: A powerful query language to interactively search and analyze your log data.

Python Code Snippet: CloudWatch Logs Integration and Agent Evaluation Output Sample

Standard Python logging integrates seamlessly with CloudWatch Logs when running in Lambda.

import logging
import json
import os
import time

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def example_agent_logic(event, context):
    """
    An example Lambda function demonstrating logging and
    simulating an agent's output for potential evaluation.
    """
    task_id = event.get('task_id', 'N/A')
    agent_role = event.get('agent_role', 'Unknown')
    subtask = event.get('subtask', 'N/A')

    logger.info(f"[{agent_role} Agent] Starting execution for Task ID: {task_id}, Subtask: {subtask}")

    try:
        # Simulate some work
        time.sleep(1) # Simulate processing time

        # Simulate a decision or action
        if "sensitive_data" in str(event): # Simple guardrail check example
            logger.warning(f"[{agent_role} Agent] Detected potential sensitive data in input for Task ID: {task_id}")
            # In a real scenario, this might trigger a Bedrock Guardrail or a custom error.
            raise ValueError("Sensitive data detected, action blocked by internal policy.")

        # Simulate output that might be used for evaluation
        agent_output = {
            "task_id": task_id,
            "agent_role": agent_role,
            "subtask": subtask,
            "status": "SUCCESS",
            "result_summary": f"Successfully processed subtask '{subtask}'.",
            "latency_ms": 1000 # Example metric
        }
        logger.info(f"[{agent_role} Agent] Completed execution for Task ID: {task_id}. Output: {json.dumps(agent_output)}")

        # For FMEval, you would typically store these outputs in S3 or a database
        # which FMEval can then consume for batch evaluation.
        # Example of data structure for FMEval input (simplified)
        evaluation_record = {
            "prompt": event.get('original_prompt'), # Original prompt given to Planner
            "agent_response": agent_output['result_summary'],
            "ground_truth_status": "SUCCESS", # For supervised evaluation
            "metrics": {
                "latency": agent_output['latency_ms'],
                "correctness_score": 0.95 # Example score from internal validation
            }
        }
        logger.info(f"[{agent_role} Agent] Evaluation record prepared: {json.dumps(evaluation_record)}")

        return {
            "statusCode": 200,
            "body": json.dumps(agent_output)
        }

    except ValueError as ve:
        logger.error(f"[{agent_role} Agent] Policy violation for Task ID: {task_id}: {ve}")
        return {
            "statusCode": 403,
            "body": json.dumps({"error": str(ve), "status": "POLICY_VIOLATION"})
        }
    except Exception as e:
        logger.error(f"[{agent_role} Agent] Error during execution for Task ID: {task_id}: {e}", exc_info=True)
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e), "status": "FAILED"})
        }

# Example of how this Lambda would be invoked (e.g., by Step Functions)
if __name__ == "__main__":
    test_event = {
        "task_id": "onboarding-002",
        "agent_role": "Executor",
        "subtask": "Process Payroll",
        "original_prompt": "Onboard new employee Alice Smith and process her first payroll.",
        "employee_data": {"name": "Alice Smith", "salary": 60000}
    }
    print("Simulating Lambda invocation for example_agent_logic:")
    # example_agent_logic(test_event, None)

    # Example with sensitive data to trigger error
    sensitive_data_event = {
        "task_id": "onboarding-003",
        "agent_role": "Executor",
        "subtask": "Process Payroll",
        "original_prompt": "Onboard new employee Bob Johnson with sensitive_data.",
        "employee_data": {"name": "Bob Johnson", "ssn": "XXX-XX-XXXX"} # Sensitive data
    }
    print("\nSimulating Lambda invocation with sensitive data:")
    # example_agent_logic(sensitive_data_event, None)

    print("\nCheck CloudWatch Logs for the output of these simulated invocations.")

CloudWatch Logs Insights Output Sample: After running your Lambda functions, you can go to CloudWatch Logs Insights and query your logs. For instance, a query like:

fields @timestamp, @message
| filter @message like /Starting execution/
| sort @timestamp desc
| limit 20

might produce output similar to:

@timestamp                  @message
2025-05-24 12:30:01.123     [Executor Agent] Starting execution for Task ID: onboarding-002, Subtask: Process Payroll
2025-05-24 12:30:02.123     [Executor Agent] Completed execution for Task ID: onboarding-002. Output: {"task_id": "onboarding-002", "agent_role": "Executor", "subtask": "Process Payroll", "status": "SUCCESS", "result_summary": "Successfully processed subtask 'Process Payroll'.", "latency_ms": 1000}
2025-05-24 12:30:02.150     [Executor Agent] Evaluation record prepared: {"prompt": "Onboard new employee Alice Smith and process her first payroll.", "agent_response": "Successfully processed subtask 'Process Payroll'.", "ground_truth_status": "SUCCESS", "metrics": {"latency": 1000, "correctness_score": 0.95}}
2025-05-24 12:30:03.000     [Executor Agent] Starting execution for Task ID: onboarding-003, Subtask: Process Payroll
2025-05-24 12:30:03.005     [Executor Agent] Detected potential sensitive data in input for Task ID: onboarding-003
2025-05-24 12:30:03.006     [Executor Agent] Policy violation for Task ID: onboarding-003: Sensitive data detected, action blocked by internal policy.

This log output provides clear visibility into agent actions, successes, and failures, and can be used to drive further analysis and improvements.

Conclusion

Architecting multi-agent AI systems on AWS provides a powerful paradigm for building autonomous enterprise workflows. By leveraging specialized agent roles (Planner, Executor, Verifier, Retriever), a shared blackboard memory (DynamoDB/OpenSearch), asynchronous message passing (Lambda, SQS, SNS), and robust orchestration with AWS Step Functions, organizations can create highly scalable, resilient, and intelligent automation solutions. Secure action invocation via Bedrock Agents with strict IAM controls ensures operational safety, while comprehensive logging and evaluation with CloudWatch and FMEval enable continuous improvement and oversight. This architectural approach empowers enterprises to tackle complex automation challenges, driving efficiency and innovation.

Sidra Saleem

A Software Engineer by profession and a Writer by passion

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.