Stronghold - Local LLM Server

← Back to Index

IP: 192.168.1.XXX Hostname: stronghold Role: OpenClaw AI server with local LLM hosting Alias: stronghold (SSH shortcut)


Overview

Dedicated server running OpenClaw platform with 5 local Large Language Models via Ollama. Provides self-hosted AI inference for development, testing, and personal use without cloud dependencies.


Hardware Specifications

ComponentDetails
CPUIntel Core i5-14400 (14th Gen)
Cores10 physical cores
Threads16 threads (via Hyper-Threading)
Base Clock2.5 GHz
Boost ClockUp to 4.7 GHz
Memory32 GB DDR4
Storage100 GB SSD (53 GB used, 41 GB free)
GPUIntel UHD Graphics 730 (integrated)

Current Resource Usage:

  • RAM: 2.9 GB / 32 GB (9% used)
  • Swap: 719 MB / 8 GB
  • Disk: 53 GB / 98 GB (57% used)
  • Uptime: 6+ days

Software Environment

SoftwareVersion/Details
Operating SystemUbuntu 24.04.3 LTS (Noble Numbat)
KernelLinux 6.8.0-100-generic
Container RuntimeDocker (2.3 GB storage used)
LLM ServerOllama (localhost:11434)
API GatewayOpenClaw Gateway
API ProxyLiteLLM (port 4000)

Local LLMs (via Ollama)

ModelSizeParametersQuantizationUse Case
Qwen 2.5 Instruct8.9 GB14.8BQ4_K_MGeneral purpose, high capability
Gemma 25.4 GB9.2BQ4_0Google’s efficient model
Mistral4.4 GB7.2BQ4_K_MFast, balanced performance
CodeLlama3.8 GB7BQ4_0Code generation and analysis
Llama 3.22.0 GB3.2BQ4_K_MLightweight, fast inference

Total Model Storage: ~24 GB

API Access:

# Via Ollama API (local only)
curl http://localhost:11434/api/generate -d '{"model": "mistral", "prompt": "Hello"}'
 
# Via LiteLLM Proxy (requires auth)
curl http://stronghold:4000/v1/models

Running Services

ServicePortProtocolAccessPurpose
Ollama11434HTTPlocalhostLLM inference server
LiteLLM Proxy4000HTTP0.0.0.0OpenAI-compatible API proxy
ChromaDB8000HTTP192.168.1.XXXVector database (unhealthy)
Skip Dashboard80HTTP0.0.0.0Web frontend (nginx)
OpenClaw Gateway18789, 18792TCPlocalhostAI gateway service
Workspace HTTP8080HTTP0.0.0.0OpenClaw workspace server
Glances61209HTTPlocalhostSystem monitoring
Node Service3081HTTP0.0.0.0Unknown Node.js service
SSH22SSH0.0.0.0Remote management

Docker Containers

ContainerImageStatusPurpose
skip-dashboard-frontendnginx:alpineUp 9 hoursWeb UI frontend
litellm-proxyghcr.io/berriai/litellm:main-latestUp 29 hoursAPI proxy layer
chromadbchromadb/chroma:latestUp 6 days (unhealthy)Vector database for embeddings

Access

SSH Access

# From workstation
stronghold   # Alias configured in ~/.zshrc
 
# Or directly
ssh root@192.168.1.XXX

Authentication: SSH key (Ed25519) from workstation

Service URLs

http://192.168.1.XXX          # Skip Dashboard
http://192.168.1.XXX:4000     # LiteLLM API (requires auth)
http://192.168.1.XXX:8000     # ChromaDB
http://192.168.1.XXX:8080     # Workspace server

Performance Characteristics

Strengths:

  • 32 GB RAM allows multiple models loaded simultaneously
  • 10-core i5-14400 provides strong CPU inference performance
  • Integrated GPU (Intel UHD 730) for basic acceleration

Limitations:

  • No discrete GPU (no CUDA/ROcm acceleration)
  • CPU-only inference is slower than GPU inference
  • Quantized models (Q4) trade accuracy for performance

Typical Inference Speed:

  • Small models (3B-7B): ~20-40 tokens/sec
  • Medium models (9B-14B): ~10-20 tokens/sec
  • Large models (>14B): Limited by CPU performance

Use Cases

  1. Local AI Development

    • Test LLM integrations without cloud costs
    • Develop AI-powered applications privately
    • Experiment with different models
  2. Code Assistance

    • CodeLlama for code generation and refactoring
    • Offline coding assistant
  3. General AI Tasks

    • Text generation, summarization, analysis
    • Question answering
    • Content creation
  4. Vector Search

    • ChromaDB for semantic search and embeddings
    • RAG (Retrieval Augmented Generation) applications

OpenClaw Platform

Components:

  • OpenClaw Gateway - Routing and orchestration
  • LiteLLM Proxy - OpenAI-compatible API interface
  • Skip Dashboard - Web-based management UI
  • Workspace Server - File management and scripts

Configuration:

  • Config: /app/config.yaml (LiteLLM)
  • Workspace: /root/.openclaw/workspace
  • Backend protector script: Running

Monitoring

Glances (port 61209) - Real-time system monitoring:

  • CPU usage per core
  • Memory usage and swap
  • Disk I/O
  • Network traffic
  • Process list

Access via:

ssh root@192.168.1.XXX -L 61209:localhost:61209
# Then open http://localhost:61209 in browser

Storage Breakdown

LocationUsagePurpose
/var/lib/docker2.3 GBContainer images and volumes
Ollama models~24 GBLLM model weights
System~27 GBOS and applications
Free space41 GBAvailable for new models

Note: Storage is at 57% capacity. Can accommodate 1-2 more medium-sized models.


Maintenance

Model Management

# List installed models
ssh root@192.168.1.XXX "curl -s http://localhost:11434/api/tags | python3 -m json.tool"
 
# Pull new model
ssh root@192.168.1.XXX "ollama pull llama3:70b"
 
# Remove model
ssh root@192.168.1.XXX "ollama rm codellama:7b"

Service Management

# Restart Ollama
ssh root@192.168.1.XXX "pkill ollama && ollama serve &"
 
# View logs
ssh root@192.168.1.XXX "docker logs litellm-proxy --tail 50"
 
# Check resource usage
ssh root@192.168.1.XXX "free -h && df -h"

Known Issues

ChromaDB Unhealthy

  • Container status shows “unhealthy”
  • May need investigation or restart
  • Vector database functionality may be limited

No GPU Acceleration

  • CPU-only inference limits performance for large models
  • Consider adding discrete GPU (NVIDIA RTX 3060+) for 5-10x speed improvement

Security Considerations

  • LiteLLM proxy requires authentication (API key)
  • Most services bound to localhost (not exposed to network)
  • SSH key authentication only (no password auth)
  • Consider adding UFW firewall rules to restrict port access

Future Enhancements

  • Fix ChromaDB health status
  • Add GPU (NVIDIA RTX 4060/4070 for CUDA acceleration)
  • Implement automatic model warm-up on boot
  • Set up Prometheus metrics exporter for Ollama
  • Add to _Monitoring-Stack for centralized monitoring
  • Configure UFW firewall rules
  • Document LiteLLM API authentication
  • Investigate unknown Node.js service on port 3081
  • Set up automated Ollama model updates


Last Updated: 2026-02-16 Added to Infrastructure: 2026-02-16