High Availability

Chatto stores all persistent data in NATS JetStream. For high availability, run a multi-node NATS cluster with JetStream replication so that data survives individual node failures.

Architecture

A highly available Chatto deployment consists of:

A NATS cluster (3 or more nodes) running JetStream with Raft consensus
One or more Chatto server processes connecting to the cluster as clients
A load balancer distributing traffic across Chatto server processes

NATS handles leader election and data replication automatically. Chatto server processes are stateless and don’t need to know about the cluster topology — they just connect to a NATS URL.

Setting Up a NATS Cluster

Chatto’s embedded NATS server is designed for single-node convenience. For high availability, run a dedicated NATS cluster.

Each NATS node needs a configuration like this:

server_name: nats-1
listen: 0.0.0.0:4222

jetstream {
  store_dir: /data/jetstream
  max_mem: 1G
  max_file: 50G
}

cluster {
  name: chatto
  listen: 0.0.0.0:6222
  routes: [
    nats-route://nats-1:6222
    nats-route://nats-2:6222
    nats-route://nats-3:6222
  ]
}

authorization {
  token: "your-shared-token"
}

Docker Compose Example

services:
  nats-1:
    image: nats:latest
    command: ["--config", "/etc/nats/nats.conf"]
    volumes:
      - ./nats-1.conf:/etc/nats/nats.conf:ro
      - nats1_data:/data
  nats-2:
    image: nats:latest
    command: ["--config", "/etc/nats/nats.conf"]
    volumes:
      - ./nats-2.conf:/etc/nats/nats.conf:ro
      - nats2_data:/data
  nats-3:
    image: nats:latest
    command: ["--config", "/etc/nats/nats.conf"]
    volumes:
      - ./nats-3.conf:/etc/nats/nats.conf:ro
      - nats3_data:/data

Each node gets its own config file with a unique server_name but the same routes list.

JetStream Replication

Once your NATS cluster is running, configure Chatto to replicate data across nodes:

Environment Variables
TOML

CHATTO_NATS_REPLICAS=3

[nats]
replicas = 3

This controls how many copies of each stream, KV bucket, and object store NATS maintains. Must be an odd number for quorum:

Replicas	Nodes Required	Tolerates	Use Case
`1`	1	No failures	Development, single-node
`3`	3+	1 node failure	Production
`5`	5+	2 node failures	Critical deployments

All Chatto storage — KV buckets, event streams, and object stores — uses the same replication factor. There are no per-bucket overrides.

What Gets Replicated

Persistent Storage (file-backed)

All critical data is durably stored and replicated:

Server data — users, memberships, configuration, roles, permissions, rooms, message bodies, reactions, threads, read status, call start/join/leave/end facts
Assets — avatars, attachments (unless offloaded to S3)
Auth tokens — bearer tokens for cross-origin authentication
Notifications — user notifications (90-day TTL)

Ephemeral Storage (memory-backed)

Some data is kept in memory for speed and has short TTLs:

Presence — online/offline status (60-second TTL, auto-expires)

This state is replicated across the NATS cluster for consistency, but it is reconstructed automatically if lost.

Security-sensitive key storage

LiveKit E2EE call keys are stored behind Chatto’s KMS boundary in the ENCRYPTION_KEYS bucket. They are replicated with the rest of file-backed NATS state, excluded from normal backups, and shredded when calls end. If key storage is lost during an active call, clients must reconnect into a fresh call.

Connecting Chatto to the Cluster

Disable embedded NATS and point Chatto at your cluster:

CHATTO_NATS_EMBEDDED_ENABLED=false
CHATTO_NATS_CLIENT_URL=nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222
CHATTO_NATS_CLIENT_AUTH_METHOD=token
CHATTO_NATS_CLIENT_TOKEN=your-shared-token
CHATTO_NATS_REPLICAS=3

Listing multiple URLs gives the client failover — if one node is down, it connects to another.

Failure Scenarios

Scenario	Impact	Recovery
1 NATS node down (R3)	No data loss, brief leader election	Automatic
Chatto server process down	Load balancer routes to healthy server processes	Automatic
Minority of NATS nodes down	Reads and writes continue normally	Automatic
Majority of NATS nodes down	Writes rejected (no quorum), reads may work	Restore nodes
All NATS nodes down	Complete outage	Restore cluster or restore from backup

Horizontal Scaling Run multiple Chatto replicas behind a load balancer.

S3 Storage Offload file storage to reduce replication costs.

Backup & Restore Snapshot and restore your NATS data.