open-webui/iac/IT_REQUEST.md
2025-12-08 11:27:59 +07:00

138 lines
No EOL
4 KiB
Markdown

# IT Request: OpenWebUI Horizontal Scaling Migration
## 📋 **Request Summary**
**Simple DNS endpoint change for OpenWebUI horizontal scaling implementation**
---
## 🎯 **Required Change**
**Update Microsoft Entra App Proxy backend configuration:**
| Field | Current Value | New Value |
|-------|---------------|-----------|
| Backend URL | `http://ai.ggai:8080` | `http://ai-scaled.ggai:8080` |
---
## ⏱️ **Impact & Timing**
- **Downtime**: < 30 seconds (DNS propagation)
- **Risk Level**: **Very Low** (simple DNS change)
- **Rollback Time**: < 30 seconds (revert DNS change)
- **User Impact**: Minimal (brief reconnection required)
---
## ✅ **Pre-Migration Validation**
**All testing completed successfully:**
```bash
# DNS Resolution Test ✅
nslookup ai-scaled.ggai
# Result: Points to load balancer (192.168.144.126, 192.168.144.75)
# HTTP Health Check Test ✅
curl -I http://ai-scaled.ggai:8080/health
# Result: HTTP 200 OK
# Load Balancing Test ✅
# Result: Traffic distributed across multiple healthy instances
```
---
## 🎯 **Business Justification**
### **💰 Cost Impact: 40% SAVINGS ($1,644/year)**
- **Current Cost:** $340.22/month (over-provisioned single instance)
- **New Cost:** $203.15/month (right-sized horizontal scaling)
- **Monthly Savings:** $137.07/month = **$1,644.84 annual savings**
### **Current Limitations (Single Instance):**
- No redundancy - single point of failure
- No load distribution - performance bottlenecks
- No auto-scaling - cannot handle traffic spikes
- Manual intervention required for outages
- Over-provisioned resources (8 vCPU, 32 GB unused capacity)
### **New Benefits (Horizontal Scaling):**
- **High Availability**: 2-10 instances with automatic failover
- **Auto-Scaling**: Automatic scaling based on CPU utilization
- **Load Distribution**: Traffic balanced across healthy instances
- **Session Management**: Redis-based session sharing
- **Health Monitoring**: Automatic unhealthy instance removal
- **Resource Optimization**: Right-sized instances (50% CPU/Memory reduction)
---
## 🛡️ **Risk Mitigation**
### **Rollback Plan**
**If any issues occur, immediately revert:**
```
Backend URL: http://ai-scaled.ggai:8080 → http://ai.ggai:8080
Rollback Time: < 30 seconds
```
### **Monitoring During Migration**
- **Health Dashboard**: OpenWebUI-HorizontalScaling (CloudWatch)
- **Instance Count**: 2 healthy instances minimum
- **Response Time**: < 2 seconds average
- **Error Rate**: < 0.1%
---
## 📞 **Technical Contacts**
- **Primary**: [Your Name] - [Your Email]
- **Infrastructure Team**: Available for support during migration
- **Escalation**: [Manager Name] - [Manager Email]
---
## 📅 **Proposed Timeline**
1. **Pre-Change**: Validate all endpoints (COMPLETED)
2. **Change Window**: [Specify preferred maintenance window]
3. **Post-Change**: Monitor for 30 minutes
4. **Sign-off**: Confirm successful migration
---
## 🔍 **Technical Details**
### **Current Architecture:**
```
Entra App Proxy → ai.ggai:8080 → Single OpenWebUI Instance
```
### **New Architecture:**
```
Entra App Proxy → ai-scaled.ggai:8080 → Load Balancer → Multiple OpenWebUI Instances
Redis Session Sharing
```
### **Infrastructure Components:**
- **Load Balancer**: AWS Application Load Balancer (internal)
- **Compute**: 2-10 AWS Fargate instances (auto-scaling)
- **Session Store**: AWS ElastiCache Redis
- **Database**: Existing Aurora PostgreSQL (no changes)
- **Storage**: Existing EFS (no changes)
---
## ✅ **Approval & Sign-off**
**Requested by:** [Your Name]
**Date:** [Current Date]
**Priority:** Normal
**IT Team Approval:**
Infrastructure Team Lead
Network Team Lead
Security Team (if required)
**Execution Authorization:**
Change Management Approval
Business Owner Sign-off
---
**Questions? Contact [Your Email] or [Your Phone]**