High Availability
Chatto stores all persistent data in NATS JetStream. For high availability, run a multi-node NATS cluster with JetStream replication so that data survives individual node failures.
Architecture
Section titled “Architecture”A highly available Chatto deployment consists of:
- A NATS cluster (3 or more nodes) running JetStream with Raft consensus
- One or more Chatto server processes connecting to the cluster as clients
- A load balancer distributing traffic across Chatto server processes
NATS handles leader election and data replication automatically. Chatto server processes are stateless and don’t need to know about the cluster topology — they just connect to a NATS URL.
Setting Up a NATS Cluster
Section titled “Setting Up a NATS Cluster”Chatto’s embedded NATS server is designed for single-node convenience. For high availability, run a dedicated NATS cluster.
Each NATS node needs a configuration like this:
server_name: nats-1listen: 0.0.0.0:4222
jetstream { store_dir: /data/jetstream max_mem: 1G max_file: 50G}
cluster { name: chatto listen: 0.0.0.0:6222 routes: [ nats-route://nats-1:6222 nats-route://nats-2:6222 nats-route://nats-3:6222 ]}
authorization { token: "your-shared-token"}Docker Compose Example
Section titled “Docker Compose Example”services: nats-1: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-1.conf:/etc/nats/nats.conf:ro - nats1_data:/data nats-2: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-2.conf:/etc/nats/nats.conf:ro - nats2_data:/data nats-3: image: nats:latest command: ["--config", "/etc/nats/nats.conf"] volumes: - ./nats-3.conf:/etc/nats/nats.conf:ro - nats3_data:/dataEach node gets its own config file with a unique server_name but the same routes list.
JetStream Replication
Section titled “JetStream Replication”Once your NATS cluster is running, configure Chatto to replicate data across nodes:
CHATTO_NATS_REPLICAS=3[nats]replicas = 3This controls how many copies of each stream, KV bucket, and object store NATS maintains. Must be an odd number for quorum:
| Replicas | Nodes Required | Tolerates | Use Case |
|---|---|---|---|
1 | 1 | No failures | Development, single-node |
3 | 3+ | 1 node failure | Production |
5 | 5+ | 2 node failures | Critical deployments |
All Chatto storage — KV buckets, event streams, and object stores — uses the same replication factor. There are no per-bucket overrides.
What Gets Replicated
Section titled “What Gets Replicated”Persistent Storage (file-backed)
Section titled “Persistent Storage (file-backed)”All critical data is durably stored and replicated:
- Server data — users, memberships, configuration, roles, permissions, rooms, message bodies, reactions, threads, read status, call start/join/leave/end facts
- Assets — avatars, attachments (unless offloaded to S3)
- Auth tokens — bearer tokens for cross-origin authentication
- Notifications — user notifications (90-day TTL)
Ephemeral Storage (memory-backed)
Section titled “Ephemeral Storage (memory-backed)”Some data is kept in memory for speed and has short TTLs:
- Presence — online/offline status (60-second TTL, auto-expires)
This state is replicated across the NATS cluster for consistency, but it is reconstructed automatically if lost.
Security-sensitive key storage
Section titled “Security-sensitive key storage”LiveKit E2EE call keys are stored behind Chatto’s KMS boundary in the ENCRYPTION_KEYS bucket. They are replicated with the rest of file-backed NATS state, excluded from normal backups, and shredded when calls end. If key storage is lost during an active call, clients must reconnect into a fresh call.
Connecting Chatto to the Cluster
Section titled “Connecting Chatto to the Cluster”Disable embedded NATS and point Chatto at your cluster:
CHATTO_NATS_EMBEDDED_ENABLED=falseCHATTO_NATS_CLIENT_URL=nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222CHATTO_NATS_CLIENT_AUTH_METHOD=tokenCHATTO_NATS_CLIENT_TOKEN=your-shared-tokenCHATTO_NATS_REPLICAS=3Listing multiple URLs gives the client failover — if one node is down, it connects to another.
Failure Scenarios
Section titled “Failure Scenarios”| Scenario | Impact | Recovery |
|---|---|---|
| 1 NATS node down (R3) | No data loss, brief leader election | Automatic |
| Chatto server process down | Load balancer routes to healthy server processes | Automatic |
| Minority of NATS nodes down | Reads and writes continue normally | Automatic |
| Majority of NATS nodes down | Writes rejected (no quorum), reads may work | Restore nodes |
| All NATS nodes down | Complete outage | Restore cluster or restore from backup |