mirror of
https://github.com/open-webui/open-webui.git
synced 2025-12-14 21:35:19 +00:00
4 KiB
4 KiB
IT Request: OpenWebUI Horizontal Scaling Migration
📋 Request Summary
Simple DNS endpoint change for OpenWebUI horizontal scaling implementation
🎯 Required Change
Update Microsoft Entra App Proxy backend configuration:
| Field | Current Value | New Value |
|---|---|---|
| Backend URL | http://ai.ggai:8080 |
http://ai-scaled.ggai:8080 |
⏱️ Impact & Timing
- Downtime: < 30 seconds (DNS propagation)
- Risk Level: Very Low (simple DNS change)
- Rollback Time: < 30 seconds (revert DNS change)
- User Impact: Minimal (brief reconnection required)
✅ Pre-Migration Validation
All testing completed successfully:
# DNS Resolution Test ✅
nslookup ai-scaled.ggai
# Result: Points to load balancer (192.168.144.126, 192.168.144.75)
# HTTP Health Check Test ✅
curl -I http://ai-scaled.ggai:8080/health
# Result: HTTP 200 OK
# Load Balancing Test ✅
# Result: Traffic distributed across multiple healthy instances
🎯 Business Justification
💰 Cost Impact: 40% SAVINGS ($1,644/year)
- Current Cost: $340.22/month (over-provisioned single instance)
- New Cost: $203.15/month (right-sized horizontal scaling)
- Monthly Savings: $137.07/month = $1,644.84 annual savings
Current Limitations (Single Instance):
- ❌ No redundancy - single point of failure
- ❌ No load distribution - performance bottlenecks
- ❌ No auto-scaling - cannot handle traffic spikes
- ❌ Manual intervention required for outages
- ❌ Over-provisioned resources (8 vCPU, 32 GB unused capacity)
New Benefits (Horizontal Scaling):
- ✅ High Availability: 2-10 instances with automatic failover
- ✅ Auto-Scaling: Automatic scaling based on CPU utilization
- ✅ Load Distribution: Traffic balanced across healthy instances
- ✅ Session Management: Redis-based session sharing
- ✅ Health Monitoring: Automatic unhealthy instance removal
- ✅ Resource Optimization: Right-sized instances (50% CPU/Memory reduction)
🛡️ Risk Mitigation
Rollback Plan
If any issues occur, immediately revert:
Backend URL: http://ai-scaled.ggai:8080 → http://ai.ggai:8080
Rollback Time: < 30 seconds
Monitoring During Migration
- Health Dashboard: OpenWebUI-HorizontalScaling (CloudWatch)
- Instance Count: 2 healthy instances minimum
- Response Time: < 2 seconds average
- Error Rate: < 0.1%
📞 Technical Contacts
- Primary: [Your Name] - [Your Email]
- Infrastructure Team: Available for support during migration
- Escalation: [Manager Name] - [Manager Email]
📅 Proposed Timeline
- Pre-Change: Validate all endpoints (COMPLETED)
- Change Window: [Specify preferred maintenance window]
- Post-Change: Monitor for 30 minutes
- Sign-off: Confirm successful migration
🔍 Technical Details
Current Architecture:
Entra App Proxy → ai.ggai:8080 → Single OpenWebUI Instance
New Architecture:
Entra App Proxy → ai-scaled.ggai:8080 → Load Balancer → Multiple OpenWebUI Instances
↓
Redis Session Sharing
Infrastructure Components:
- Load Balancer: AWS Application Load Balancer (internal)
- Compute: 2-10 AWS Fargate instances (auto-scaling)
- Session Store: AWS ElastiCache Redis
- Database: Existing Aurora PostgreSQL (no changes)
- Storage: Existing EFS (no changes)
✅ Approval & Sign-off
Requested by: [Your Name]
Date: [Current Date]
Priority: Normal
IT Team Approval:
☐ Infrastructure Team Lead
☐ Network Team Lead
☐ Security Team (if required)
Execution Authorization:
☐ Change Management Approval
☐ Business Owner Sign-off
Questions? Contact [Your Email] or [Your Phone]