bedrock
💡 Summary
English summary.
🎯 Target Audience
🤖 AI Roast: “Powerful, but the setup might scare off the impatient.”
Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); API keys/tokens handling and storage. Run with least privilege and audit before enabling in production.
name: bedrock description: AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns. last_updated: "2026-01-07" doc_source: https://docs.aws.amazon.com/bedrock/latest/userguide/
AWS Bedrock
Amazon Bedrock provides access to foundation models (FMs) from AI companies through a unified API. Build generative AI applications with text generation, embeddings, and image generation capabilities.
Table of Contents
Core Concepts
Foundation Models
Pre-trained models available through Bedrock:
- Claude (Anthropic): Text generation, analysis, coding
- Titan (Amazon): Text, embeddings, image generation
- Llama (Meta): Open-weight text generation
- Mistral: Efficient text generation
- Stable Diffusion (Stability AI): Image generation
Model Access
Models must be enabled in your account before use:
- Request access in Bedrock console
- Some models require acceptance of EULAs
- Access is region-specific
Inference Types
| Type | Use Case | Pricing | |------|----------|---------| | On-Demand | Variable workloads | Per token | | Provisioned Throughput | Consistent high-volume | Hourly commitment | | Batch Inference | Async large-scale | Discounted per token |
Common Patterns
Invoke Model (Text Generation)
AWS CLI:
# Invoke Claude aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-sonnet-20240229-v1:0 \ --content-type application/json \ --accept application/json \ --body '{ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [ {"role": "user", "content": "Explain AWS Lambda in 3 sentences."} ] }' \ response.json cat response.json | jq -r '.content[0].text'
boto3:
import boto3 import json bedrock = boto3.client('bedrock-runtime') def invoke_claude(prompt, max_tokens=1024): response = bedrock.invoke_model( modelId='anthropic.claude-3-sonnet-20240229-v1:0', contentType='application/json', accept='application/json', body=json.dumps({ 'anthropic_version': 'bedrock-2023-05-31', 'max_tokens': max_tokens, 'messages': [ {'role': 'user', 'content': prompt} ] }) ) result = json.loads(response['body'].read()) return result['content'][0]['text'] # Usage response = invoke_claude('What is Amazon S3?') print(response)
Streaming Response
import boto3 import json bedrock = boto3.client('bedrock-runtime') def stream_claude(prompt): response = bedrock.invoke_model_with_response_stream( modelId='anthropic.claude-3-sonnet-20240229-v1:0', contentType='application/json', accept='application/json', body=json.dumps({ 'anthropic_version': 'bedrock-2023-05-31', 'max_tokens': 1024, 'messages': [ {'role': 'user', 'content': prompt} ] }) ) for event in response['body']: chunk = json.loads(event['chunk']['bytes']) if chunk['type'] == 'content_block_delta': yield chunk['delta'].get('text', '') # Usage for text in stream_claude('Write a haiku about cloud computing.'): print(text, end='', flush=True)
Generate Embeddings
import boto3 import json bedrock = boto3.client('bedrock-runtime') def get_embedding(text): response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', contentType='application/json', accept='application/json', body=json.dumps({ 'inputText': text, 'dimensions': 1024, 'normalize': True }) ) result = json.loads(response['body'].read()) return result['embedding'] # Usage embedding = get_embedding('AWS Lambda is a serverless compute service.') print(f'Embedding dimension: {len(embedding)}')
Conversation with History
import boto3 import json bedrock = boto3.client('bedrock-runtime') class Conversation: def __init__(self, system_prompt=None): self.messages = [] self.system = system_prompt def chat(self, user_message): self.messages.append({ 'role': 'user', 'content': user_message }) body = { 'anthropic_version': 'bedrock-2023-05-31', 'max_tokens': 1024, 'messages': self.messages } if self.system: body['system'] = self.system response = bedrock.invoke_model( modelId='anthropic.claude-3-sonnet-20240229-v1:0', contentType='application/json', accept='application/json', body=json.dumps(body) ) result = json.loads(response['body'].read()) assistant_message = result['content'][0]['text'] self.messages.append({ 'role': 'assistant', 'content': assistant_message }) return assistant_message # Usage conv = Conversation(system_prompt='You are an AWS solutions architect.') print(conv.chat('What database should I use for a chat application?')) print(conv.chat('What about for time-series data?'))
List Available Models
# List all foundation models aws bedrock list-foundation-models \ --query 'modelSummaries[*].[modelId,modelName,providerName]' \ --output table # Filter by provider aws bedrock list-foundation-models \ --by-provider anthropic \ --query 'modelSummaries[*].modelId' # Get model details aws bedrock get-foundation-model \ --model-identifier anthropic.claude-3-sonnet-20240229-v1:0
Request Model Access
# List model access status aws bedrock list-foundation-model-agreement-offers \ --model-id anthropic.claude-3-sonnet-20240229-v1:0
CLI Reference
Bedrock (Control Plane)
| Command | Description |
|---------|-------------|
| aws bedrock list-foundation-models | List available models |
| aws bedrock get-foundation-model | Get model details |
| aws bedrock list-custom-models | List fine-tuned models |
| aws bedrock create-model-customization-job | Start fine-tuning |
| aws bedrock list-provisioned-model-throughputs | List provisioned capacity |
Bedrock Runtime (Data Plane)
| Command | Description |
|---------|-------------|
| aws bedrock-runtime invoke-model | Invoke model synchronously |
| aws bedrock-runtime invoke-model-with-response-stream | Invoke with streaming |
| aws bedrock-runtime converse | Multi-turn conversation API |
| aws bedrock-runtime converse-stream | Streaming conversation |
Bedrock Agent Runtime
| Command | Description |
|---------|-------------|
| aws bedrock-agent-runtime invoke-agent | Invoke a Bedrock agent |
| aws bedrock-agent-runtime retrieve | Query knowledge base |
| aws bedrock-agent-runtime retrieve-and-generate | RAG query |
Best Practices
Cost Optimization
- Use appropriate models: Smaller models for simple tasks
- Set max_tokens: Limit output length when possible
- Cache responses: For repeated identical queries
- Batch when possible: Use batch inference for bulk processing
- Monitor usage: Set up CloudWatch alarms for cost
Performance
- Use streaming: For better user experience with long outputs
- Connection pooling: Reuse boto3 clients
- Regional deployment: Use closest region to reduce latency
- Provisioned throughput: For consistent high-volume workloads
Security
- Least privilege IAM: Only grant needed model access
- VPC endpoints: Keep traffic private
- Guardrails: Implement content filtering
- Audit with CloudTrail: Track model invocations
IAM Permissions
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0", "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0" ] } ] }
Troubleshooting
AccessDeniedException
Causes:
- Model access not enabled in console
- IAM policy missing
bedrock:InvokeModel - Wrong model ID or region
Debug:
# Check model access status aws bedrock list-foundation-models \ --query 'modelSummaries[?modelId==`anthropic.claude-3-sonnet-20240229-v1:0`]' # Test IAM permissions aws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::123456789012:role/my-role \ --action-names bedrock:InvokeModel \ --resource-arns "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
ModelNotReadyException
Cause: Model is still being provisioned or temporarily unavailable.
Solution: Implement retry with exponential backoff:
import time from botocore.exceptions import ClientError def invoke_with_retry(bedrock, body, max_retries=3): for attempt in range(max_retries): try: return bedrock.invoke_model( modelId='anthropic.claude-3-sonnet-20240229-v1:0', body=json.dumps(body) ) except ClientError as e: if e.response['Error']['Code'] == 'ModelNotReadyException': time.sleep(2 ** attempt) else: raise raise Exception('Max retries exceeded')
ThrottlingException
Causes:
- Exceeded on-demand quota
- Too many concurrent requests
Solutions:
- Request quota increase
- Implement exponential backoff
- Consider provisioned throughput
ValidationException
Common issues:
- Invalid model ID
- Malformed request body
- max_tokens exceeds model limit
Debug:
# Check model-specific requirements aws bedrock get-foundation-model \ --model-identifier anthropic.claude-3-sonnet-20240229-v1:0 \ --query 'modelDetails.inferenceTypesSupported'
References
Pros
- p1
- p2
Cons
- c1
- c2
Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.
Copyright belongs to the original author itsmostafa.
