Connection + tunnel

When the server itself can't reach agentry, or the tunnel keeps dropping.

Server shows "disconnected" in the dashboard

The server isn't holding an active tunnel to bridge.agentry.run.

First check: is the runtime container running?

bash

ssh your-server
docker ps | grep agentry-runtime

You should see one container with Up <duration>. If not, the container exited or never started.

Container exited

bash

docker logs agentry-runtime

Look for the last few lines. Most common causes:

"context deadline exceeded" — couldn't reach bridge.agentry.run. Network issue. See Outbound network below.
"x509: certificate signed by unknown authority" — the server's CA bundle is out of date. Update with apt update && apt install ca-certificates && update-ca-certificates.
"unauthorized: device cert revoked" — this server's identity was revoked in the dashboard. Re-enroll: in the dashboard, Add this machine → copy the new command → paste it on the server.

Container never started

The runtime container failed to start. Re-run the Add this machine command from the dashboard — it's idempotent and will recreate the container with a fresh identity.

Tunnel drops every few seconds

The dashboard flickers between connected / disconnected. Logs show repeated reconnects.

Common causes:

NAT / firewall idle timeout too short. Some corporate firewalls close idle connections after 5-10 minutes. The runtime sends keepalives every 30 seconds — that should be enough for any reasonable timeout, but aggressive policies trip it.
Server is at network capacity. A noisy neighbor on the same VM host. Check dmesg for "connection reset by peer" or "broken pipe".
DNS flapping. The runtime resolves bridge.agentry.run once at startup. If your server's DNS is unreliable (e.g. switching between two upstream resolvers), restart the runtime to force a re-resolve.

Fix: restart the runtime:

bash

docker restart agentry-runtime

If it keeps flapping, escalate — collect logs and send to support@agentry.run.

Outbound network

The runtime needs outbound HTTPS (TCP 443) to:

bridge.agentry.run — the persistent tunnel
app.agentry.run — control-plane RPC
ghcr.io — pulling the runtime image
Wherever your bound services live (MongoDB Atlas, Anthropic API, etc.)

Test from the server:

bash

curl -sSI https://bridge.agentry.run/healthz | head -1
# Expect: HTTP/2 200

If that fails:

DNS issue: nslookup bridge.agentry.run. Should resolve.
Firewall blocking 443: check iptables, your cloud provider's security group, corporate proxy rules.
TLS interception: a corporate MITM proxy is rewriting certs. agentry pins certs at the CA level — if your network rewrites the chain, you'll see "unknown authority" errors. Talk to your network admin.

"cluster `<name>` is offline"

Your harness or the dashboard says the cluster (server) is offline, but you think it's up.

cluster "homelab" is offline

Cause: there's no live tunnel from that server to the bridge. Either:

The server's runtime container isn't running.
The runtime is running but disconnected (the bridge dropped it).
You're addressing the wrong server name (typo).

Fix: SSH in and check docker ps. If the container is up but the dashboard says disconnected, restart it:

bash

docker restart agentry-runtime

Watch the logs as it reconnects:

bash

docker logs -f agentry-runtime

You should see "tunnel established" within ~5 seconds.

Deploy URL returns 502

*.agentry.live returns a 502 with a message like "cluster X is offline" or "deployment route is inconsistent with cluster ownership".

"cluster offline": the server hosting this deployment isn't connected. Same fix as above.

"route inconsistent": the bridge refused the route because the org IDs don't match (a server cert says one org; the route says another). This shouldn't happen in normal use — it implies either a server cert was issued for the wrong org, or the route table has stale data. Email support@agentry.run with the deploy URL.

Tunnel routes synced to bridge: stale

The deployment detail page shows the tunnel-health indicator in amber: "Tunnel sync stale (N s)".

What it means: the control plane has been unable to push route updates to the bridge for the last N seconds. Existing routes keep serving; new routes don't propagate.

Cause: usually a brief incident on the bridge or the control plane. Should clear within 30-60 seconds.

If it persists, your existing deploys keep serving the last-pushed route table. New deploys may 404 until sync resumes. Check our status page; if nothing is reported, email support.

When in doubt, restart

The single most reliable recovery for any tunnel issue:

bash

docker restart agentry-runtime

agentry's runtime is designed to be safely restartable — sandboxes pause, the deploy containers keep running, and the connection re-establishes within 5 seconds.

Sandbox issues — when the sandbox is the problem, not the tunnel.
MCP wiring — when the harness can't see agentry's tools.
Add a server — initial connect flow, in case you need a clean slate.

Connection + tunnel ​

Server shows "disconnected" in the dashboard ​

Container exited ​

Container never started ​

Tunnel drops every few seconds ​

Outbound network ​

"cluster <name> is offline" ​

Deploy URL returns 502 ​

Tunnel routes synced to bridge: stale ​

When in doubt, restart ​

Next ​

Connection + tunnel

Server shows "disconnected" in the dashboard

Container exited

Container never started

Tunnel drops every few seconds

Outbound network

"cluster `<name>` is offline"

Deploy URL returns 502

Tunnel routes synced to bridge: stale

When in doubt, restart

Next