open-webui/iac/IT_REQUEST.md

138 lines
4 KiB
Markdown
Raw Normal View History

2025-07-24 17:39:52 +00:00
# IT Request: OpenWebUI Horizontal Scaling Migration
## 📋 **Request Summary**
**Simple DNS endpoint change for OpenWebUI horizontal scaling implementation**
---
## 🎯 **Required Change**
**Update Microsoft Entra App Proxy backend configuration:**
| Field | Current Value | New Value |
|-------|---------------|-----------|
| Backend URL | `http://ai.ggai:8080` | `http://ai-scaled.ggai:8080` |
---
## ⏱️ **Impact & Timing**
- **Downtime**: < 30 seconds (DNS propagation)
- **Risk Level**: **Very Low** (simple DNS change)
- **Rollback Time**: < 30 seconds (revert DNS change)
- **User Impact**: Minimal (brief reconnection required)
---
## ✅ **Pre-Migration Validation**
**All testing completed successfully:**
```bash
# DNS Resolution Test ✅
nslookup ai-scaled.ggai
# Result: Points to load balancer (192.168.144.126, 192.168.144.75)
# HTTP Health Check Test ✅
curl -I http://ai-scaled.ggai:8080/health
# Result: HTTP 200 OK
# Load Balancing Test ✅
# Result: Traffic distributed across multiple healthy instances
```
---
## 🎯 **Business Justification**
### **💰 Cost Impact: 40% SAVINGS ($1,644/year)**
- **Current Cost:** $340.22/month (over-provisioned single instance)
- **New Cost:** $203.15/month (right-sized horizontal scaling)
- **Monthly Savings:** $137.07/month = **$1,644.84 annual savings**
### **Current Limitations (Single Instance):**
- ❌ No redundancy - single point of failure
- ❌ No load distribution - performance bottlenecks
- ❌ No auto-scaling - cannot handle traffic spikes
- ❌ Manual intervention required for outages
- ❌ Over-provisioned resources (8 vCPU, 32 GB unused capacity)
### **New Benefits (Horizontal Scaling):**
-**High Availability**: 2-10 instances with automatic failover
-**Auto-Scaling**: Automatic scaling based on CPU utilization
-**Load Distribution**: Traffic balanced across healthy instances
-**Session Management**: Redis-based session sharing
-**Health Monitoring**: Automatic unhealthy instance removal
-**Resource Optimization**: Right-sized instances (50% CPU/Memory reduction)
---
## 🛡️ **Risk Mitigation**
### **Rollback Plan**
**If any issues occur, immediately revert:**
```
Backend URL: http://ai-scaled.ggai:8080 → http://ai.ggai:8080
Rollback Time: < 30 seconds
```
### **Monitoring During Migration**
- **Health Dashboard**: OpenWebUI-HorizontalScaling (CloudWatch)
- **Instance Count**: 2 healthy instances minimum
- **Response Time**: < 2 seconds average
- **Error Rate**: < 0.1%
---
## 📞 **Technical Contacts**
- **Primary**: [Your Name] - [Your Email]
- **Infrastructure Team**: Available for support during migration
- **Escalation**: [Manager Name] - [Manager Email]
---
## 📅 **Proposed Timeline**
1. **Pre-Change**: Validate all endpoints (COMPLETED)
2. **Change Window**: [Specify preferred maintenance window]
3. **Post-Change**: Monitor for 30 minutes
4. **Sign-off**: Confirm successful migration
---
## 🔍 **Technical Details**
### **Current Architecture:**
```
Entra App Proxy → ai.ggai:8080 → Single OpenWebUI Instance
```
### **New Architecture:**
```
Entra App Proxy → ai-scaled.ggai:8080 → Load Balancer → Multiple OpenWebUI Instances
Redis Session Sharing
```
### **Infrastructure Components:**
- **Load Balancer**: AWS Application Load Balancer (internal)
- **Compute**: 2-10 AWS Fargate instances (auto-scaling)
- **Session Store**: AWS ElastiCache Redis
- **Database**: Existing Aurora PostgreSQL (no changes)
- **Storage**: Existing EFS (no changes)
---
## ✅ **Approval & Sign-off**
**Requested by:** [Your Name]
**Date:** [Current Date]
**Priority:** Normal
**IT Team Approval:**
☐ Infrastructure Team Lead
☐ Network Team Lead
☐ Security Team (if required)
**Execution Authorization:**
☐ Change Management Approval
☐ Business Owner Sign-off
---
**Questions? Contact [Your Email] or [Your Phone]**