The Watcher pattern

The problem

Your main OpenClaw instance goes down. Maybe it crashed, maybe the container restarted, maybe you pushed a bad config. You're on your phone. You don't want to SSH in, dig through logs, and run CLI commands from a tiny terminal.

What if you could just text something and say "restart my main instance"?

The pattern

Run a second OpenClaw instance — the watcher — on the same machine. Connect it to the same Discord server (or Telegram, or whatever you use). Give it its own channel.

When your main instance dies, you open the watcher's channel and type something like:

"Main claw is down. Restart it."

The watcher has access to the same machine. It can run docker restart, systemctl restart, check logs, report back what happened — all through chat. You never touch a terminal.

That's it. That's the core pattern. Everything else is optional.

Setting it up

Create the watcher profile

OpenClaw's --profile flag isolates everything — config, sessions, workspace — under ~/.openclaw-<name>. The watcher gets its own profile so it's completely independent from your main instance.

# Give the watcher its own port so it doesn't conflict with the main instance
openclaw --profile watcher config set gateway.port 18791 --strict-json

# Use a cheap model — the watcher doesn't need to be smart,
# it needs to run shell commands and report back
openclaw --profile watcher config set ai.model "openrouter/meta-llama/llama-3.3-70b-instruct:free"

The main instance runs on the default port (18789). The watcher runs on 18791. They're completely separate processes with separate state.

Give it instructions

Create an AGENTS.md in the watcher's workspace so it knows what it's for:

# File: ~/.openclaw-watcher/workspace/AGENTS.md

# Watcher

You are the backup instance. Your job is to monitor and recover
the main OpenClaw instance when asked.

The main instance runs as: docker container "openclaw-main"
(or: systemd service "openclaw" — adjust to match your setup)

## What you can do
- Restart the main instance: docker restart openclaw-main
- Check if it's running: docker ps | grep openclaw-main
- Check its health: curl -s http://localhost:18790/health
- Read its logs: docker logs --tail 50 openclaw-main
- Check system resources: free -m, df -h, uptime

## How to behave
- When asked to restart, do it immediately and confirm the result.
- After a restart, check /health to verify it came back.
- If it won't come back after 2 tries, say so clearly.
- Always report what you find — don't just say "done."

Connect it to a channel

Run the OpenClaw setup for the watcher profile and connect it to your chat platform. On Discord, you can use the same bot — just route it to a different channel (like #watcher).

# Run the interactive setup for the watcher profile
openclaw --profile watcher configure

Point it at a dedicated channel. Now you have a chat interface to a backup instance that can poke around on the machine when your main one is offline.

Start it

# Test it
openclaw --profile watcher gateway start

# Then make it permanent — pick one:

# Docker: add a second service in docker-compose.yml
# Systemd: create a service file (example below)
# PM2: pm2 start "openclaw --profile watcher gateway start" --name watcher

Systemd example:

# /etc/systemd/system/openclaw-watcher.service
[Unit]
Description=OpenClaw Watcher Instance
After=network.target

[Service]
User=your-user
WorkingDirectory=/home/your-user
Environment="HOME=/home/your-user"
ExecStart=/usr/local/bin/openclaw --profile watcher gateway run
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

sudo systemctl enable openclaw-watcher
sudo systemctl start openclaw-watcher

That's the full setup. You now have a backup agent you can talk to from your phone when things go wrong.

Optional: automate the health checks

The manual version is already useful — but you can also make the watcher check on the main instance automatically using a cron job.

OpenClaw's gateway exposes a /health endpoint that returns a 200 when things are working. The watcher can hit that endpoint on a schedule and restart the main instance if it's not responding.

openclaw --profile watcher cron add \
  --name "Health check" \
  --cron "*/5 * * * *" \
  --session isolated \
  --message "Health check: \
    1. Hit http://localhost:18790/health — expect a 200 within 10 seconds. \
    2. If it fails or times out, restart with: docker restart openclaw-main \
    3. Wait 30 seconds, then check /health again. \
    4. If it still fails after 2 attempts, send a message saying manual intervention is needed."

Now the watcher handles restarts on its own. You only hear about it when something is actually broken. You can also add resource monitoring:

openclaw --profile watcher cron add \
  --name "Resource check" \
  --cron "*/10 * * * *" \
  --session isolated \
  --message "Check system resources: \
    1. Run 'free -m' — alert if available memory is under 200MB. \
    2. Run 'uptime' — alert if 1-min load average is over 4.0. \
    3. Only report if something is wrong."

Bonus: use a free model

The watcher doesn't need a smart model. It needs to run docker restart, hit a health endpoint, and report back. That's well within the capability of any free-tier model on OpenRouter.

The config already shows this — llama-3.3-70b-instruct:free is a solid choice. It's fast, it follows instructions reliably, and it costs nothing. Save your paid model budget for the main instance that handles real work.

If you want something more instruction-tuned, Hermes models (by Nous Research, available on OpenRouter) are worth a look — they tend to follow system prompts more precisely, which matters when your watcher needs to execute specific commands without improvising.

You can also delegate complex debugging to a smarter model via OpenClaw's ACP harness. Set up Claude Code or Codex as an allowed ACP agent on the watcher profile, and when something genuinely tricky happens the watcher can spawn a coding agent session to diagnose it:

# Allow coding agents on the watcher profile
openclaw --profile watcher config set acp.enabled true
openclaw --profile watcher config set acp.allowedAgents '["codex","claude"]' --strict-json

That way simple health checks run on the free model, and anything that needs real reasoning gets handed off. Best of both worlds — minimal cost for the 99% case, full capability available when you need it.

Why this works

The watcher survives because it's a separate process. When the main instance crashes, the watcher is unaffected — different profile, different port, different process. As long as the machine itself is up, you have a chat interface to it.

It also costs almost nothing. A watcher on a free model that only responds when you message it (or runs a cron every few minutes) uses negligible resources. The ROI is the first time you fix a crash from your phone instead of opening a laptop.