mirror of
https://github.com/open-webui/open-webui.git
synced 2025-12-14 13:25:20 +00:00
| .. | ||
| main.tf | ||
| outputs.tf | ||
| README.md | ||
| variables.tf | ||
| versions.tf | ||
Grafana OTEL Module
This Terraform module deploys a standalone Grafana OTEL LGTM (Logs, Grafana, Tempo, Mimir) stack on AWS ECS Fargate for OpenTelemetry monitoring and observability.
Features
- Complete OTEL Stack: Grafana + Prometheus + Tempo + Loki in a single container
- ECS Fargate Deployment: Serverless, scalable container deployment
- Service Discovery: Automatic DNS registration for easy service connectivity
- Security: Configurable security groups and network access controls
- Auto Scaling: Optional ECS autoscaling based on CPU utilization
- CloudWatch Integration: Structured logging with configurable retention
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Applications │───▶│ OTLP Endpoints │───▶│ Grafana UI │
│ │ │ (4317/4318) │ │ (3000) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ ECS Fargate │
│ - Grafana │
│ - Prometheus │
│ - Tempo │
│ - Loki │
└──────────────────┘
Usage
Basic Usage
module "grafana_monitoring" {
source = "./modules/grafana-otel"
# Core Infrastructure
vpc_id = "vpc-12345678"
private_subnet_ids = ["subnet-12345678", "subnet-87654321"]
cluster_name = "my-ecs-cluster"
# Network Access
allowed_cidr_blocks = ["10.0.0.0/8", "192.168.0.0/16"]
# Optional: OpenTelemetry Sources
otlp_sources_security_group_ids = ["sg-app1", "sg-app2"]
tags = {
Environment = "production"
Project = "monitoring"
}
}
Advanced Usage with Existing Service Discovery
module "grafana_monitoring" {
source = "./modules/grafana-otel"
# Core Infrastructure
vpc_id = "vpc-12345678"
private_subnet_ids = ["subnet-12345678", "subnet-87654321"]
cluster_name = "my-ecs-cluster"
# Use existing service discovery namespace
service_discovery_namespace_id = "ns-12345678"
service_name = "monitoring"
# Custom configuration
environment = "staging"
cpu = 2048
memory = 4096
desired_count = 2
enable_autoscaling = true
max_capacity = 3
# Custom Grafana credentials
grafana_admin_user = "monitoring-admin"
grafana_admin_password = "secure-password-123"
tags = {
Environment = "staging"
Project = "monitoring"
}
}
Requirements
| Name | Version |
|---|---|
| terraform | >= 1.0 |
| aws | >= 5.0 |
Providers
| Name | Version |
|---|---|
| aws | >= 5.0 |
Resources Created
- ECS Service & Task Definition: Fargate-based Grafana OTEL LGTM container
- Service Discovery: DNS service registration for easy connectivity
- Security Groups: Network access controls for Grafana UI and OTLP endpoints
- IAM Roles: Execution role with necessary permissions
- CloudWatch Log Group: Centralized logging with configurable retention
- Auto Scaling (optional): CPU-based scaling for high availability
Inputs
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| vpc_id | VPC ID where Grafana will be deployed | string |
n/a | yes |
| private_subnet_ids | Private subnet IDs for Grafana ECS tasks | list(string) |
n/a | yes |
| cluster_name | ECS cluster name where Grafana will be deployed | string |
n/a | yes |
| aws_region | AWS region for deployment | string |
"us-east-1" |
no |
| allowed_cidr_blocks | CIDR blocks allowed to access Grafana UI | list(string) |
["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"] |
no |
| otlp_sources_security_group_ids | Security group IDs that should be allowed to send OTLP data | list(string) |
[] |
no |
| grafana_admin_user | Grafana admin username | string |
"admin" |
no |
| grafana_admin_password | Grafana admin password | string |
"openwebui_monitoring_2024" |
no |
| cpu | CPU units for Grafana task | number |
1024 |
no |
| memory | Memory (MB) for Grafana task | number |
2048 |
no |
| enable_autoscaling | Enable ECS autoscaling for Grafana | bool |
true |
no |
See variables.tf for complete list of inputs.
Outputs
| Name | Description |
|---|---|
| grafana_dashboard_url | Grafana dashboard URL |
| grafana_admin_credentials | Grafana admin login credentials (sensitive) |
| otlp_endpoints | OpenTelemetry OTLP endpoints (gRPC and HTTP) |
| security_group_id | Security group ID for Grafana tasks |
| setup_instructions | Complete setup and integration instructions |
See outputs.tf for complete list of outputs.
Integration with Applications
To send telemetry data from your applications to this Grafana instance:
1. Add Application Security Groups
module "grafana_monitoring" {
source = "./modules/grafana-otel"
# ... other configuration
otlp_sources_security_group_ids = [
aws_security_group.my_app.id,
aws_security_group.another_app.id
]
}
2. Configure Application Environment Variables
# In your application environment
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-monitor.my-namespace:4317
OTEL_EXPORTER_OTLP_INSECURE=true
OTEL_SERVICE_NAME=my-application
3. Verify Integration
# Check service discovery
nslookup otel-monitor.my-namespace
# Test OTLP endpoint
curl http://otel-monitor.my-namespace:4317
# Access Grafana UI
curl http://otel-monitor.my-namespace:3000
Monitoring and Troubleshooting
Access Grafana Dashboard
- Connect to your VPC (via VPN or bastion host)
- Navigate to the Grafana URL from module outputs
- Login with the admin credentials
- Explore pre-configured data sources:
- Prometheus: Metrics and monitoring
- Tempo: Distributed tracing
- Loki: Log aggregation
Common Issues
- Connection refused: Check security group rules and CIDR blocks
- Service not starting: Check CloudWatch logs and ECS service events
- No telemetry data: Verify OTLP source security groups and endpoints
Useful Commands
# Check ECS service status
aws ecs describe-services --cluster my-cluster --services grafana-otel
# View logs
aws logs tail /ecs/grafana-otel --follow
# Check service discovery
aws servicediscovery list-services --filters Name=NAMESPACE_ID,Values=ns-12345678
Security Considerations
- Grafana admin password is configurable but stored in Terraform state
- Consider using AWS Secrets Manager for production passwords
- Network access is controlled via security groups and CIDR blocks
- ECS tasks run with least privilege IAM permissions
Cost Optimization
- Default configuration uses 1 vCPU and 2GB RAM (estimated $35-50/month)
- Enable autoscaling to handle traffic spikes efficiently
- Adjust log retention period to control CloudWatch costs
- Consider using Spot instances for non-production environments
License
This module is part of the OpenWebUI infrastructure project.