Stronghold - Local LLM Server

← Back to Index

IP: 192.168.1.XXX Hostname: stronghold Role: OpenClaw AI server with local LLM hosting Alias: stronghold (SSH shortcut)

Overview

Dedicated server running OpenClaw platform with 5 local Large Language Models via Ollama. Provides self-hosted AI inference for development, testing, and personal use without cloud dependencies.

Hardware Specifications

Component	Details
CPU	Intel Core i5-14400 (14th Gen)
Cores	10 physical cores
Threads	16 threads (via Hyper-Threading)
Base Clock	2.5 GHz
Boost Clock	Up to 4.7 GHz
Memory	32 GB DDR4
Storage	100 GB SSD (53 GB used, 41 GB free)
GPU	Intel UHD Graphics 730 (integrated)

Current Resource Usage:

RAM: 2.9 GB / 32 GB (9% used)
Swap: 719 MB / 8 GB
Disk: 53 GB / 98 GB (57% used)
Uptime: 6+ days

Software Environment

Software	Version/Details
Operating System	Ubuntu 24.04.3 LTS (Noble Numbat)
Kernel	Linux 6.8.0-100-generic
Container Runtime	Docker (2.3 GB storage used)
LLM Server	Ollama (localhost:11434)
API Gateway	OpenClaw Gateway
API Proxy	LiteLLM (port 4000)

Local LLMs (via Ollama)

Model	Size	Parameters	Quantization	Use Case
Qwen 2.5 Instruct	8.9 GB	14.8B	Q4_K_M	General purpose, high capability
Gemma 2	5.4 GB	9.2B	Q4_0	Google’s efficient model
Mistral	4.4 GB	7.2B	Q4_K_M	Fast, balanced performance
CodeLlama	3.8 GB	7B	Q4_0	Code generation and analysis
Llama 3.2	2.0 GB	3.2B	Q4_K_M	Lightweight, fast inference

Total Model Storage: ~24 GB

API Access:

# Via Ollama API (local only)
curl http://localhost:11434/api/generate -d '{"model": "mistral", "prompt": "Hello"}'
 
# Via LiteLLM Proxy (requires auth)
curl http://stronghold:4000/v1/models

Running Services

Service	Port	Protocol	Access	Purpose
Ollama	11434	HTTP	localhost	LLM inference server
LiteLLM Proxy	4000	HTTP	0.0.0.0	OpenAI-compatible API proxy
ChromaDB	8000	HTTP	192.168.1.XXX	Vector database (unhealthy)
Skip Dashboard	80	HTTP	0.0.0.0	Web frontend (nginx)
OpenClaw Gateway	18789, 18792	TCP	localhost	AI gateway service
Workspace HTTP	8080	HTTP	0.0.0.0	OpenClaw workspace server
Glances	61209	HTTP	localhost	System monitoring
Node Service	3081	HTTP	0.0.0.0	Unknown Node.js service
SSH	22	SSH	0.0.0.0	Remote management

Docker Containers

Container	Image	Status	Purpose
skip-dashboard-frontend	nginx:alpine	Up 9 hours	Web UI frontend
litellm-proxy	ghcr.io/berriai/litellm:main-latest	Up 29 hours	API proxy layer
chromadb	chromadb/chroma:latest	Up 6 days (unhealthy)	Vector database for embeddings

Access

SSH Access

# From workstation
stronghold   # Alias configured in ~/.zshrc
 
# Or directly
ssh root@192.168.1.XXX

Authentication: SSH key (Ed25519) from workstation

Service URLs

http://192.168.1.XXX          # Skip Dashboard
http://192.168.1.XXX:4000     # LiteLLM API (requires auth)
http://192.168.1.XXX:8000     # ChromaDB
http://192.168.1.XXX:8080     # Workspace server

Performance Characteristics

Strengths:

32 GB RAM allows multiple models loaded simultaneously
10-core i5-14400 provides strong CPU inference performance
Integrated GPU (Intel UHD 730) for basic acceleration

Limitations:

No discrete GPU (no CUDA/ROcm acceleration)
CPU-only inference is slower than GPU inference
Quantized models (Q4) trade accuracy for performance

Typical Inference Speed:

Small models (3B-7B): ~20-40 tokens/sec
Medium models (9B-14B): ~10-20 tokens/sec
Large models (>14B): Limited by CPU performance

Use Cases

Local AI Development
- Test LLM integrations without cloud costs
- Develop AI-powered applications privately
- Experiment with different models
Code Assistance
- CodeLlama for code generation and refactoring
- Offline coding assistant
General AI Tasks
- Text generation, summarization, analysis
- Question answering
- Content creation
Vector Search
- ChromaDB for semantic search and embeddings
- RAG (Retrieval Augmented Generation) applications

OpenClaw Platform

Components:

OpenClaw Gateway - Routing and orchestration
LiteLLM Proxy - OpenAI-compatible API interface
Skip Dashboard - Web-based management UI
Workspace Server - File management and scripts

Configuration:

Config: /app/config.yaml (LiteLLM)
Workspace: /root/.openclaw/workspace
Backend protector script: Running

Monitoring

Glances (port 61209) - Real-time system monitoring:

CPU usage per core
Memory usage and swap
Disk I/O
Network traffic
Process list

Access via:

ssh root@192.168.1.XXX -L 61209:localhost:61209
# Then open http://localhost:61209 in browser

Storage Breakdown

Location	Usage	Purpose
/var/lib/docker	2.3 GB	Container images and volumes
Ollama models	~24 GB	LLM model weights
System	~27 GB	OS and applications
Free space	41 GB	Available for new models

Note: Storage is at 57% capacity. Can accommodate 1-2 more medium-sized models.

Maintenance

Model Management

# List installed models
ssh root@192.168.1.XXX "curl -s http://localhost:11434/api/tags | python3 -m json.tool"
 
# Pull new model
ssh root@192.168.1.XXX "ollama pull llama3:70b"
 
# Remove model
ssh root@192.168.1.XXX "ollama rm codellama:7b"

Service Management

# Restart Ollama
ssh root@192.168.1.XXX "pkill ollama && ollama serve &"
 
# View logs
ssh root@192.168.1.XXX "docker logs litellm-proxy --tail 50"
 
# Check resource usage
ssh root@192.168.1.XXX "free -h && df -h"

Known Issues

ChromaDB Unhealthy

Container status shows “unhealthy”
May need investigation or restart
Vector database functionality may be limited

No GPU Acceleration

CPU-only inference limits performance for large models
Consider adding discrete GPU (NVIDIA RTX 3060+) for 5-10x speed improvement

Security Considerations

LiteLLM proxy requires authentication (API key)
Most services bound to localhost (not exposed to network)
SSH key authentication only (no password auth)
Consider adding UFW firewall rules to restrict port access

Future Enhancements

Fix ChromaDB health status
Add GPU (NVIDIA RTX 4060/4070 for CUDA acceleration)
Implement automatic model warm-up on boot
Set up Prometheus metrics exporter for Ollama
Add to _Monitoring-Stack for centralized monitoring
Configure UFW firewall rules
Document LiteLLM API authentication
Investigate unknown Node.js service on port 3081
Set up automated Ollama model updates

Last Updated: 2026-02-16 Added to Infrastructure: 2026-02-16