Self-Hosted AI Inference Stack & Agent Team
A production-grade, self-hosted AI inference platform with local-first LLM routing, Redis caching, cloud fallback, and a team of specialized AI agents — all running on homelab hardware.
A production-grade, self-hosted AI inference platform with local-first LLM routing, Redis caching, cloud fallback, and a team of specialized AI agents — all running on homelab hardware.
Built a specialized sub-agent system for automated homelab infrastructure validation, security auditing, and documentation synchronization using Claude Code’s Task API.
Overview Implemented enterprise-grade resource management and monitoring across a multi-host Docker infrastructure. Fixed critical Prometheus alerting issues and applied memory limits to 20 containers across two servers, improving system stability and observability. Problem: Container memory alerts showing +Inf% instead of actual percentages, no resource limits enforcing isolation between services. Solution: Comprehensive audit of container resource usage, implementation of appropriate memory limits, and rewrite of Prometheus alert rules to handle both limited and unlimited containers. ...