High Availability
Chatto stores all persistent data in NATS JetStream. For high availability, run a multi-node NATS cluster with JetStream replication so that data survives individual node failures.
Architecture
Section titled “Architecture”A highly available Chatto deployment consists of:
- A NATS cluster (3 or more nodes) running JetStream with Raft consensus
- One or more Chatto instances connecting to the cluster as clients
- A load balancer distributing traffic across Chatto instances
NATS handles leader election and data replication automatically. Chatto instances are stateless and don’t need to know about the cluster topology — they just connect to a NATS URL.
Setting Up a NATS Cluster
Section titled “Setting Up a NATS Cluster”Chatto’s embedded NATS server is designed for single-node convenience. For high availability, run a dedicated NATS cluster.
Each NATS node needs a configuration like this:
server_name: nats-1listen: 0.0.0.0:4222
jetstream { store_dir: /data/jetstream max_mem: 1G max_file: 50G}
cluster { name: chatto listen: 0.0.0.0:6222 routes: [ nats-route://nats-1:6222 nats-route://nats-2:6222 nats-route://nats-3:6222 ]}
authorization { token: "your-shared-token"}Docker Compose Example
Section titled “Docker Compose Example”services: nats-1: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-1.conf:/etc/nats/nats.conf:ro - nats1_data:/data nats-2: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-2.conf:/etc/nats/nats.conf:ro - nats2_data:/data nats-3: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-3.conf:/etc/nats/nats.conf:ro - nats3_data:/dataEach node gets its own config file with a unique server_name but the same routes list.
JetStream Replication
Section titled “JetStream Replication”Once your NATS cluster is running, configure Chatto to replicate data across nodes:
CHATTO_NATS_REPLICAS=3[nats]replicas = 3This controls how many copies of each stream, KV bucket, and object store NATS maintains. Must be an odd number for quorum:
| Replicas | Nodes Required | Tolerates | Use Case |
|---|---|---|---|
1 | 1 | No failures | Development, single-node |
3 | 3+ | 1 node failure | Production |
5 | 5+ | 2 node failures | Critical deployments |
All Chatto storage — KV buckets, event streams, and object stores — uses the same replication factor. There are no per-bucket overrides.
What Gets Replicated
Section titled “What Gets Replicated”Persistent Storage (file-backed)
Section titled “Persistent Storage (file-backed)”All critical data is durably stored and replicated:
- Instance data — users, spaces, memberships, configuration, roles, permissions
- Per-space data — rooms, message bodies, reactions, threads, read status
- Assets — avatars, attachments (unless offloaded to S3)
- Auth tokens — bearer tokens for cross-origin authentication
- Notifications — user notifications (90-day TTL)
Ephemeral Storage (memory-backed)
Section titled “Ephemeral Storage (memory-backed)”Some data is kept in memory for speed and has short TTLs:
- Presence — online/offline status (60-second TTL, auto-expires)
- Call state — active call participants (repopulated by LiveKit webhooks)
These are still replicated across the cluster for consistency, but they’re reconstructed automatically if lost.
Connecting Chatto to the Cluster
Section titled “Connecting Chatto to the Cluster”Disable embedded NATS and point Chatto at your cluster:
CHATTO_NATS_EMBEDDED_ENABLED=falseCHATTO_NATS_CLIENT_URL=nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222CHATTO_NATS_CLIENT_AUTH_METHOD=tokenCHATTO_NATS_CLIENT_TOKEN=your-shared-tokenCHATTO_NATS_REPLICAS=3Listing multiple URLs gives the client failover — if one node is down, it connects to another.
Failure Scenarios
Section titled “Failure Scenarios”| Scenario | Impact | Recovery |
|---|---|---|
| 1 NATS node down (R3) | No data loss, brief leader election | Automatic |
| Chatto instance down | Load balancer routes to healthy instances | Automatic |
| Minority of NATS nodes down | Reads and writes continue normally | Automatic |
| Majority of NATS nodes down | Writes rejected (no quorum), reads may work | Restore nodes |
| All NATS nodes down | Complete outage | Restore cluster or restore from backup |