Stronghold - Local LLM Server
IP: 192.168.1.XXX
Hostname: stronghold
Role: OpenClaw AI server with local LLM hosting
Alias: stronghold (SSH shortcut)
Overview
Dedicated server running OpenClaw platform with 5 local Large Language Models via Ollama. Provides self-hosted AI inference for development, testing, and personal use without cloud dependencies.
Hardware Specifications
| Component | Details |
|---|---|
| CPU | Intel Core i5-14400 (14th Gen) |
| Cores | 10 physical cores |
| Threads | 16 threads (via Hyper-Threading) |
| Base Clock | 2.5 GHz |
| Boost Clock | Up to 4.7 GHz |
| Memory | 32 GB DDR4 |
| Storage | 100 GB SSD (53 GB used, 41 GB free) |
| GPU | Intel UHD Graphics 730 (integrated) |
Current Resource Usage:
- RAM: 2.9 GB / 32 GB (9% used)
- Swap: 719 MB / 8 GB
- Disk: 53 GB / 98 GB (57% used)
- Uptime: 6+ days
Software Environment
| Software | Version/Details |
|---|---|
| Operating System | Ubuntu 24.04.3 LTS (Noble Numbat) |
| Kernel | Linux 6.8.0-100-generic |
| Container Runtime | Docker (2.3 GB storage used) |
| LLM Server | Ollama (localhost:11434) |
| API Gateway | OpenClaw Gateway |
| API Proxy | LiteLLM (port 4000) |
Local LLMs (via Ollama)
| Model | Size | Parameters | Quantization | Use Case |
|---|---|---|---|---|
| Qwen 2.5 Instruct | 8.9 GB | 14.8B | Q4_K_M | General purpose, high capability |
| Gemma 2 | 5.4 GB | 9.2B | Q4_0 | Google’s efficient model |
| Mistral | 4.4 GB | 7.2B | Q4_K_M | Fast, balanced performance |
| CodeLlama | 3.8 GB | 7B | Q4_0 | Code generation and analysis |
| Llama 3.2 | 2.0 GB | 3.2B | Q4_K_M | Lightweight, fast inference |
Total Model Storage: ~24 GB
API Access:
# Via Ollama API (local only)
curl http://localhost:11434/api/generate -d '{"model": "mistral", "prompt": "Hello"}'
# Via LiteLLM Proxy (requires auth)
curl http://stronghold:4000/v1/modelsRunning Services
| Service | Port | Protocol | Access | Purpose |
|---|---|---|---|---|
| Ollama | 11434 | HTTP | localhost | LLM inference server |
| LiteLLM Proxy | 4000 | HTTP | 0.0.0.0 | OpenAI-compatible API proxy |
| ChromaDB | 8000 | HTTP | 192.168.1.XXX | Vector database (unhealthy) |
| Skip Dashboard | 80 | HTTP | 0.0.0.0 | Web frontend (nginx) |
| OpenClaw Gateway | 18789, 18792 | TCP | localhost | AI gateway service |
| Workspace HTTP | 8080 | HTTP | 0.0.0.0 | OpenClaw workspace server |
| Glances | 61209 | HTTP | localhost | System monitoring |
| Node Service | 3081 | HTTP | 0.0.0.0 | Unknown Node.js service |
| SSH | 22 | SSH | 0.0.0.0 | Remote management |
Docker Containers
| Container | Image | Status | Purpose |
|---|---|---|---|
| skip-dashboard-frontend | nginx:alpine | Up 9 hours | Web UI frontend |
| litellm-proxy | ghcr.io/berriai/litellm:main-latest | Up 29 hours | API proxy layer |
| chromadb | chromadb/chroma:latest | Up 6 days (unhealthy) | Vector database for embeddings |
Access
SSH Access
# From workstation
stronghold # Alias configured in ~/.zshrc
# Or directly
ssh root@192.168.1.XXXAuthentication: SSH key (Ed25519) from workstation
Service URLs
http://192.168.1.XXX # Skip Dashboard
http://192.168.1.XXX:4000 # LiteLLM API (requires auth)
http://192.168.1.XXX:8000 # ChromaDB
http://192.168.1.XXX:8080 # Workspace serverPerformance Characteristics
Strengths:
- 32 GB RAM allows multiple models loaded simultaneously
- 10-core i5-14400 provides strong CPU inference performance
- Integrated GPU (Intel UHD 730) for basic acceleration
Limitations:
- No discrete GPU (no CUDA/ROcm acceleration)
- CPU-only inference is slower than GPU inference
- Quantized models (Q4) trade accuracy for performance
Typical Inference Speed:
- Small models (3B-7B): ~20-40 tokens/sec
- Medium models (9B-14B): ~10-20 tokens/sec
- Large models (>14B): Limited by CPU performance
Use Cases
-
Local AI Development
- Test LLM integrations without cloud costs
- Develop AI-powered applications privately
- Experiment with different models
-
Code Assistance
- CodeLlama for code generation and refactoring
- Offline coding assistant
-
General AI Tasks
- Text generation, summarization, analysis
- Question answering
- Content creation
-
Vector Search
- ChromaDB for semantic search and embeddings
- RAG (Retrieval Augmented Generation) applications
OpenClaw Platform
Components:
- OpenClaw Gateway - Routing and orchestration
- LiteLLM Proxy - OpenAI-compatible API interface
- Skip Dashboard - Web-based management UI
- Workspace Server - File management and scripts
Configuration:
- Config:
/app/config.yaml(LiteLLM) - Workspace:
/root/.openclaw/workspace - Backend protector script: Running
Monitoring
Glances (port 61209) - Real-time system monitoring:
- CPU usage per core
- Memory usage and swap
- Disk I/O
- Network traffic
- Process list
Access via:
ssh root@192.168.1.XXX -L 61209:localhost:61209
# Then open http://localhost:61209 in browserStorage Breakdown
| Location | Usage | Purpose |
|---|---|---|
| /var/lib/docker | 2.3 GB | Container images and volumes |
| Ollama models | ~24 GB | LLM model weights |
| System | ~27 GB | OS and applications |
| Free space | 41 GB | Available for new models |
Note: Storage is at 57% capacity. Can accommodate 1-2 more medium-sized models.
Maintenance
Model Management
# List installed models
ssh root@192.168.1.XXX "curl -s http://localhost:11434/api/tags | python3 -m json.tool"
# Pull new model
ssh root@192.168.1.XXX "ollama pull llama3:70b"
# Remove model
ssh root@192.168.1.XXX "ollama rm codellama:7b"Service Management
# Restart Ollama
ssh root@192.168.1.XXX "pkill ollama && ollama serve &"
# View logs
ssh root@192.168.1.XXX "docker logs litellm-proxy --tail 50"
# Check resource usage
ssh root@192.168.1.XXX "free -h && df -h"Known Issues
ChromaDB Unhealthy
- Container status shows “unhealthy”
- May need investigation or restart
- Vector database functionality may be limited
No GPU Acceleration
- CPU-only inference limits performance for large models
- Consider adding discrete GPU (NVIDIA RTX 3060+) for 5-10x speed improvement
Security Considerations
- LiteLLM proxy requires authentication (API key)
- Most services bound to localhost (not exposed to network)
- SSH key authentication only (no password auth)
- Consider adding UFW firewall rules to restrict port access
Future Enhancements
- Fix ChromaDB health status
- Add GPU (NVIDIA RTX 4060/4070 for CUDA acceleration)
- Implement automatic model warm-up on boot
- Set up Prometheus metrics exporter for Ollama
- Add to _Monitoring-Stack for centralized monitoring
- Configure UFW firewall rules
- Document LiteLLM API authentication
- Investigate unknown Node.js service on port 3081
- Set up automated Ollama model updates
Related Pages
- Network-Topology
- Workstation: stronghold SSH alias
- Mac Studio M1 Max (also runs Ollama)
- Claude Memory System integration
Last Updated: 2026-02-16 Added to Infrastructure: 2026-02-16