Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RustyGPT Documentation

Welcome! This mdBook describes the RustyGPT workspace in depth: how the server and clients are structured, how the PostgreSQL-backed features behave, and how to operate the platform locally or in shared environments.

Start with the guides to get a local environment running, then dive into the reference and architecture chapters for precise APIs and design notes.

Quick navigation

  • Quickstart – configure and launch the server, web client, and CLI.
  • Local development – watcher workflows, debugging tools, and environment variables.
  • REST API – endpoint catalogue for conversations, streaming, authentication, and admin features.
  • Service topology – how the Axum server, Yew SPA, PostgreSQL, and SSE stream hub fit together.

What RustyGPT ships today

RustyGPT focuses on a cohesive Rust stack:

  • rustygpt-server exposes REST + SSE endpoints with cookie-based auth (handlers/auth.rs), rate limiting (middleware/rate_limit.rs), and OpenAPI documentation (openapi.rs).
  • rustygpt-web is a Yew SPA that consumes the server API via src/api.rs and renders threaded conversations, presence, and typing indicators.
  • rustygpt-cli shares the same models as the server, providing commands for login, conversation inspection, SSE following, and OpenAPI generation (src/commands).
  • rustygpt-shared houses configuration loading, llama.cpp bindings, and the data transfer objects used across all crates.

Each documentation section links back to the relevant modules so you can cross-reference behaviour with the implementation.

Guide Overview

These walkthroughs assume you are starting from a fresh checkout. They show how to configure config.toml, run the Axum server, keep the Yew frontend hot-reloading, and validate that authentication + streaming work end to end.

  • Quickstart bootstraps the database, enables feature flags, and walks through the setup flow.
  • Local Development documents the just recipes, watcher processes, and debugging tips.

For conceptual background jump to Concepts; for task-specific runbooks see How-to.

Quickstart

TL;DR – copy config.example.toml, enable the feature flags you need, start PostgreSQL, run the Axum server, complete the /api/setup flow, then bring up the Yew frontend and CLI.

1. Prerequisites

  • Rust toolchain (rustup default stable), cargo, and just
  • trunk for the web client (cargo install trunk)
  • PostgreSQL 15+ running locally (the provided docker-compose.yaml exposes one at postgres://tinroof:rusty@localhost:5432/rusty_gpt)
  • Optional: llama.cpp-compatible model files if you plan to exercise assistant streaming

Fetch dependencies once:

cargo fetch --workspace

2. Configure the server

Create config.toml and adjust it for your environment:

cp config.example.toml config.toml

At minimum set:

[db]
url = "postgres://tinroof:rusty@localhost/rustygpt_dev"

[features]
auth_v1 = true
sse_v1 = true
well_known = true

rustygpt-shared::config::server::Config supports TOML/YAML/JSON files and environment overrides (e.g. RUSTYGPT__SERVER__PORT=8080). See Configuration for the complete matrix of keys.

3. Start PostgreSQL

If you are using Docker Compose:

docker compose up postgres -d

On server startup the bootstrap runner executes every SQL script under scripts/pg/{schema,procedures,indexes,seed} in order. The seed stage enables feature flags and inserts the default rate-limit profile (conversation.post).

4. Run the backend

Either launch directly:

cargo run -p rustygpt-server -- serve --port 8080

or use the helper recipe that also builds configuration if needed:

just run-server

The process listens on http://127.0.0.1:8080. Health probes are available at /api/healthz and /api/readyz.

5. Complete initial setup

The first authenticated user is created by POSTing to /api/setup:

curl -X POST http://127.0.0.1:8080/api/setup \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","email":"admin@example.com","password":"change-me"}'

Subsequent calls return 400 once a user already exists.

6. Start the web client

In a new terminal:

cd rustygpt-web
trunk serve

The SPA proxies /api/* requests to the backend. After logging in you should see:

  • Conversation list populated by GET /api/conversations/{conversation_id}/threads
  • Thread view that streams updates from /api/stream/conversations/{conversation_id}
  • Presence and typing indicators driven by ConversationStreamEvent payloads

7. Exercise the CLI

The rustygpt binary shares configuration and cookie handling with the server:

cargo run -p rustygpt-cli -- login
cargo run -p rustygpt-cli -- chat --conversation <conversation-uuid>
cargo run -p rustygpt-cli -- follow --root <thread-uuid>

follow connects to the SSE endpoint, reconstructs events, and prints deltas as they arrive. If you see authentication required errors, confirm you completed the setup step and that [features].auth_v1 is true.

8. Next steps

  • Review Local Development for watcher workflows, linting, and debugging tips.
  • Explore REST API for a full endpoint catalogue and payload shapes.
  • Consult Service Topology to understand how the components interact at runtime.

Local Development

TL;DR – keep config.toml in sync with your environment, use just dev for paired watchers, and rely on the CLI for quick smoke tests of authentication and streaming.

Environment configuration

All binaries load configuration through rustygpt-shared::config::server::Config. The loader merges:

  1. Built-in defaults selected by the active profile (Dev/Test/Prod)
  2. Optional config.toml / config.yaml / config.json
  3. Environment variables such as RUSTYGPT__SERVER__PORT=9000
  4. CLI overrides (e.g. cargo run -p rustygpt-server -- serve --port 9000)

Keep secrets out of the repo—override them with environment variables or a private config.toml. See Configuration for the full key list.

Watcher workflows

The Justfile orchestrates the common flows:

# Run server + web watchers together (uses rustygpt-tools/confuse)
just dev

# Backend only hot-reload
just watch-server

# Run fmt, check, and clippy
just check

just dev spawns two subprocesses:

  • rustygpt-server via cargo watch -x 'run -- serve --port 8080'
  • rustygpt-web via trunk watch

Logs stream to stdout so you can confirm when migrations finish (db_bootstrap_* metrics) and when the SSE hub accepts connections.

CLI smoke tests

The CLI binary lives at rustygpt-cli. Useful commands while iterating:

# Launch the server directly from the CLI crate
cargo run -p rustygpt-cli -- serve --port 8080

# Generate the OpenAPI spec
cargo run -p rustygpt-cli -- spec openapi.yaml

# Generate config skeletons
cargo run -p rustygpt-cli -- config --format toml

# Manage sessions
cargo run -p rustygpt-cli -- login
cargo run -p rustygpt-cli -- me
cargo run -p rustygpt-cli -- logout

CLI commands reuse the same cookie jar as the web client. Cookies are stored under ~/.config/rustygpt/session.cookies by default (see [cli] in the configuration schema).

Debugging tips

  • Enable verbose tracing: RUST_LOG=rustygpt_server=debug,tower_http=info just run-server
  • Inspect SSE payloads: curl -N http://127.0.0.1:8080/api/stream/conversations/<conversation-id> (requires an authenticated session and features.sse_v1 = true)
  • Verify configuration resolution: cargo run -p rustygpt-cli -- config --format json and inspect the generated file
  • Regenerate database bindings or seed data by restarting the server; bootstrap scripts rerun automatically when the process starts
  • Use docker compose logs postgres if migrations fail during bootstrap

For operational playbooks (e.g. Docker deployment or rotating secrets) see the How-to section.

Concepts Overview

This section explains the core ideas that appear across the server, web client, and CLI. Use it to understand the vocabulary used in API responses and stream payloads before diving into the reference material.

  • Threaded conversations describes how messages, threads, and ConversationStreamEvent values relate to each other.
  • Shared models covers the rustygpt-shared crate, focusing on how typed DTOs and enums keep clients in sync with the server.

Pair these concepts with the Architecture diagrams to see where each part lives at runtime.

Threaded conversations

RustyGPT models chat history as threaded conversations stored in PostgreSQL. Each conversation has participants, invites, thread roots, and replies. The server exposes this structure through the DTOs in rustygpt-shared/src/models/chat.rs.

Core data types

TypePurposeDefined in
ConversationCreateRequestPayload for creating a new conversation.shared::models::chat
ThreadTreeResponseDepth-first snapshot of a thread including metadata for each node.shared::models::chat
MessageChunkPersisted assistant output chunk (used when streaming replies).shared::models::chat
ConversationStreamEventEnum describing SSE events (thread.new, message.delta, etc.).shared::models::chat

Each thread is anchored by a root message (POST /api/threads/{conversation_id}/root). Replies hang off the tree using parent IDs (POST /api/messages/{message_id}/reply). The ThreadTreeResponse payload includes ancestry hints so clients can render the structure without additional queries.

Streaming lifecycle

When features.sse_v1 = true, the server emits ConversationStreamEvent variants via StreamHub (handlers/streaming.rs). The naming mirrors the enum variants:

  • thread.new – new thread summary created
  • thread.activity – updated last_activity_at
  • message.delta – incremental assistant tokens (ChatDeltaChunk)
  • message.done – completion marker with usage stats
  • presence.update – user presence heartbeat
  • typing.update – typing indicator state
  • unread.update – unread count per thread root
  • membership.changed – conversation membership change
  • error – terminal failure while streaming

Events carry both the conversation_id and (when applicable) root_id so clients can scope updates precisely. SSE persistence is optional: enable [sse.persistence] in configuration to record events in rustygpt.sse_event_log via services::sse_persistence and replay them on reconnect.

Access control

The chat service (services::chat_service.rs) enforces membership checks and rate limits before mutating data. Rate limit profiles are backed by tables in scripts/pg/schema/040_rate_limits.sql and can be tuned via the admin API. Presence updates mark the acting user online and emit events so all subscribers stay consistent.

For endpoint details see REST API; for the transport-level diagram visit Streaming Delivery.

Shared models

rustygpt-shared centralises the data structures consumed by the server, web client, and CLI. Keeping these DTOs in one crate prevents drift between components and allows serde + utoipa derives to stay consistent.

Configuration loader

src/config/server.rs defines the Config struct and associated sub-structures (ServerConfig, RateLimitConfig, SseConfig, etc.). Every binary loads configuration through Config::load_config, which merges defaults, optional files, and environment overrides. Feature flags such as features.auth_v1 gate server subsystems without requiring code changes.

API payloads

The src/models directory contains strongly typed request/response structs:

  • models/chat.rs – conversations, threads, message payloads, and streaming events
  • models/oauth.rs – GitHub/Apple OAuth exchanges
  • models/setup.rs – first-time setup contract (SetupRequest, SetupResponse)
  • models/limits.rs – rate limit admin DTOs (CreateRateLimitProfileRequest, RateLimitAssignment, ...)
  • models/session.rs – session summaries returned by /api/auth/*

All types derive Serialize, Deserialize, and when relevant utoipa::ToSchema so the OpenAPI generator stays in sync.

LLM abstractions

src/llms exposes traits (LLMProvider, LLMModel) and helpers for llama.cpp integration. The server’s AssistantService uses these traits to stream responses and emit metrics (llm_model_cache_hits_total, llm_model_load_seconds). When you add a new provider, implement the traits here and update the configuration schema.

Why it matters

  • Type safety – clients compile against the same structs the server uses, catching breaking changes early.
  • Single-source documentation – OpenAPI docs and mdBook pages pull names directly from these types.
  • Testing – shared fixtures in shared::models make it easier to write integration tests that cover both server handlers and CLI commands.

Whenever you extend the API, add or update the relevant struct in rustygpt-shared first, then regenerate the OpenAPI spec with cargo run -p rustygpt-cli -- spec.

Architecture Overview

These chapters document the runtime architecture of RustyGPT: how the Axum server is composed, how the SSE stream hub works, and how rate limiting integrates with PostgreSQL. Use them alongside the concepts and reference sections when exploring the code.

Streaming delivery

RustyGPT streams conversation activity to authenticated clients over Server-Sent Events (SSE). The implementation lives in rustygpt-server/src/handlers/streaming.rs and is gated by features.sse_v1.

Flow

sequenceDiagram
  participant Client
  participant API as Axum /api
  participant Hub as StreamHub
  participant DB as rustygpt.sse_event_log

  Client->>API: POST /api/threads/{conversation}/root
  API->>Hub: publish ConversationStreamEvent
  Hub->>Client: SSE event (thread.new)
  Note over Hub,DB: if persistence enabled
  Hub->>DB: sp_record_sse_event
  Client->>API: reconnect with Last-Event-ID
  API->>Hub: subscribe(after)
  Hub->>DB: sp_sse_replay
  DB-->>Hub: persisted events
  Hub-->>Client: replay then live stream

Clients subscribe to /api/stream/conversations/:conversation_id. The route is protected by the auth middleware when features.auth_v1 is enabled, so callers must present a valid session cookie (the CLI handles this automatically).

Event payloads

Events are instances of shared::models::ConversationStreamEvent and are encoded as JSON envelopes with type and payload fields. See Threaded conversations for the full list of variants.

The SSE handler assigns monotonically increasing sequence numbers per conversation. When persistence is enabled the sequence is also stored in rustygpt.sse_event_log, allowing reconnecting clients to pass Last-Event-ID and receive any missed events before resuming the live stream.

Persistence and retention

Configure persistence via [sse.persistence] in config.toml:

[sse.persistence]
enabled = true
max_events_per_user = 500
prune_batch_size = 100
retention_hours = 48

services::sse_persistence stores events using the stored procedures in scripts/pg/schema/050_sse_persistence.sql. The pruning logic runs after each insert to keep the table bounded.

Backpressure handling

The in-memory queue for each conversation defaults to channel_capacity = 128. Configure behaviour under [sse.backpressure]:

  • drop_strategy = "drop_tokens" drops assistant token events first
  • drop_strategy = "drop_tokens_and_system" also discards system events once the queue fills
  • warn_queue_ratio controls when a warning is logged about queue pressure

These settings keep hot conversations from exhausting memory while still delivering key state changes (presence, membership, unread counters).

Client responsibilities

  • Reconnect with Last-Event-ID so the server can replay persisted events when available
  • Handle 401 responses by re-running the session refresh flow (/api/auth/refresh); the CLI and web client do this automatically
  • Clear typing state on typing.update and update unread counters when unread.update arrives

Use REST API endpoints to backfill state when the requested Last-Event-ID falls outside the retention window.

Service topology

RustyGPT is composed of a single Axum process backed by PostgreSQL. The web client and CLI talk to the same REST + SSE surface.

Components

  • Axum API (rustygpt-server) – exposes /api/* endpoints, authentication middleware, rate limiting, SSE, OpenAPI docs, health probes, metrics, and static file hosting.
  • PostgreSQL – stores users, sessions, conversations, threads, SSE history, and rate-limit configuration. Schema and stored procedures live in scripts/pg and are applied automatically during bootstrap.
  • SSE StreamHub – in-memory fan-out implemented in handlers/streaming.rs, optionally persisting events through services::sse_persistence.
  • Yew SPA (rustygpt-web) – compiled to WebAssembly with Trunk. src/api.rs handles authentication, CSRF, SSE reconnection, and REST calls for conversations/threads.
  • CLI (rustygpt-cli) – shares configuration and models with the server. Provides helper commands for session management, SSE following, and OpenAPI/config generation.

Data flow

flowchart LR
  subgraph Clients
    web[Yew web app]
    cli[CLI]
  end
  api[Axum /api]
  stream[StreamHub]
  db[(PostgreSQL)]

  web --> api
  cli --> api
  api --> db
  api --> stream
  stream --> web
  stream --> cli

Requests hit the Axum router, which talks to PostgreSQL via SQLx and fans out live events via StreamHub. Clients subscribe to /api/stream/conversations/:conversation_id to receive updates.

Feature flags

Many subsystems are gated by [features] in configuration:

  • auth_v1 – enables session middleware, /api/auth/*, protected routes, and the rate-limit admin API
  • sse_v1 – enables the SSE route and persistence options
  • well_known – serves .well-known/* entries from the config

Toggle these flags without recompiling the binaries.

Scaling notes

The server is stateless apart from in-memory SSE buffers. For horizontal scaling you must either:

  • Disable persistence and tolerate best-effort delivery, or
  • Configure [sse.persistence] so each instance replays from PostgreSQL on reconnect

Rate limiting already supports multi-instance deployments because configuration is stored in the database and periodically reloaded (RateLimitState::reload_from_db).

Rate-limit architecture

RustyGPT enforces per-route throttling using a leaky-bucket strategy implemented in middleware::rate_limit. Configuration comes from two tables managed by stored procedures in scripts/pg/procs/034_limits.sql.

Data model

  • rustygpt.rate_limit_profiles – named profiles containing algorithm + JSON parameters (currently gcra style with requests_per_second / burst options).
  • rustygpt.rate_limit_assignments – maps HTTP method + path pattern to a profile.
  • rustygpt.message_rate_limits – per-user, per-conversation state used by sp_user_can_post to throttle message posting.

RateLimitState::reload_from_db loads profiles and assignments into memory. The admin API under /api/admin/limits/* can create, update, or delete records at runtime; after each change the state refreshes automatically.

Matching logic

enforce_rate_limits computes a cache key as "{METHOD} {path}" and finds the first matching pattern. Supported patterns:

  • Exact path matches (/api/messages/{id}/reply becomes /api/messages/:id/reply in the database)
  • * suffix for prefixes (e.g. /api/admin/*)

If no assignment matches, the middleware falls back to the default strategy derived from [rate_limits.default_rps] and [rate_limits.burst]. Login routes (/api/auth/login) use the dedicated auth_login_per_ip_per_min limiter.

Metrics and headers

When a request is evaluated the middleware records:

  • http_rate_limit_requests_total{profile,result} – allowed vs denied counts
  • http_rate_limit_remaining{profile} – remaining tokens after the decision
  • http_rate_limit_reset_seconds{profile} – seconds until the bucket refills
  • rustygpt_limits_profiles / rustygpt_limits_assignments – gauges updated on reload

Responses include the standard headers RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, and X-RateLimit-Profile so clients can react accordingly.

Admin API payloads

All admin payloads live in shared::models::limits:

  • CreateRateLimitProfileRequest
  • UpdateRateLimitProfileRequest
  • AssignRateLimitRequest
  • RateLimitProfile / RateLimitAssignment

These endpoints require an authenticated session with the admin role (handlers/admin_limits.rs).

Conversation posting limits

ChatService::post_root_message and ChatService::reply_message call sp_user_can_post, which enforces a GCRA window per (user_id, conversation_id) using rustygpt.message_rate_limits. Tweak the conversation.post profile via SQL or the admin API to adjust posting cadence.

Reference Overview

Authoritative details for RustyGPT’s public surfaces:

  • Authentication – session cookies, setup flow, and rotation behaviour
  • REST API – endpoint catalogue grouped by feature area
  • Configurationconfig.toml structure and environment overrides

For operational workflows see How-to; for high-level design context visit the Architecture section.

Authentication

RustyGPT uses cookie-based sessions backed by PostgreSQL. Session management lives in rustygpt-server/src/auth/session.rs and is exposed through /api/auth/* handlers when features.auth_v1 = true.

Session lifecycle

  1. SetupPOST /api/setup hashes the supplied password and inserts the first user (admin + member roles). Further calls are rejected.
  2. LoginPOST /api/auth/login verifies credentials via sp_auth_login. Successful responses include:
    • Set-Cookie: SESSION_ID=...; HttpOnly; Secure?; SameSite=Lax
    • Set-Cookie: CSRF-TOKEN=...; SameSite=Strict
    • X-Session-Rotated: 1
  3. Authenticated requests – non-GET operations must include the CSRF header X-CSRF-TOKEN with the cookie value. The web client (rustygpt-web/src/api.rs) and CLI handle this automatically.
  4. RefreshPOST /api/auth/refresh rotates cookies inside the idle window (default 8 hours). If either idle or absolute expiry is exceeded, the call returns 401 session_expired.
  5. LogoutPOST /api/auth/logout clears the session and CSRF cookies.

Sessions are stored in rustygpt.user_sessions. The idle and absolute windows come from [session] in configuration. When max_sessions_per_user is set the newest session evicts the oldest via sp_auth_login.

config.toml controls cookie behaviour:

[session]
idle_seconds = 28800
absolute_seconds = 604800
session_cookie_name = "SESSION_ID"
csrf_cookie_name = "CSRF-TOKEN"
max_sessions_per_user = 5

[security.cookie]
domain = ""
secure = false
same_site = "lax"

[security.csrf]
cookie_name = "CSRF-TOKEN"
header_name = "X-CSRF-TOKEN"
enabled = true

Adjust security.cookie.secure and security.cookie.domain for production deployments. When security.csrf.enabled = false the middleware skips header validation (useful for service-to-service calls but not recommended for browsers).

CLI workflow

The CLI wraps the same endpoints:

cargo run -p rustygpt-cli -- login
cargo run -p rustygpt-cli -- me
cargo run -p rustygpt-cli -- logout

Cookies are stored at ~/.config/rustygpt/session.cookies by default (see [cli.session_store]). The follow and chat commands automatically attach the CSRF header when present.

Observability

Authentication currently relies on logs for troubleshooting. Set RUST_LOG=rustygpt_server=debug to trace session decisions (SessionService::authenticate, SessionService::refresh_session). Prometheus metrics for auth flows are not yet implemented.

REST API

All endpoints are served under /api unless noted otherwise. Session cookies and CSRF headers are required for non-authenticated GET requests when features.auth_v1 is enabled. The OpenAPI schema is generated from rustygpt-server/src/openapi.rs and can be exported with cargo run -p rustygpt-cli -- spec.

Setup

MethodPathDescription
GET/api/setupReturns { "is_setup": bool } by calling is_setup() in PostgreSQL.
POST/api/setupCreates the first administrator account (see scripts/pg/procs/010_auth.sql::init_setup). Subsequent calls return 400.

Authentication

MethodPathDescription
POST/api/auth/loginEmail/password login. Returns LoginResponse with session + CSRF cookies.
POST/api/auth/logoutRevokes the current session. Requires CSRF header.
POST/api/auth/refreshRotates session cookies inside the idle window.
GET/api/auth/meReturns MeResponse (requires authenticated session).

OAuth helpers

Handlers in handlers/github_auth.rs and handlers/apple_auth.rs expose optional OAuth flows when credentials are present:

MethodPathNotes
GET/api/oauth/githubReturns an authorization URL based on GITHUB_* environment variables.
GET/api/oauth/github/callbackExchanges the code for a session via ProductionOAuthService.
POST/api/oauth/github/manualDeveloper helper that accepts a raw auth code.
GET/api/oauth/appleSame as GitHub but for Apple.
GET/api/oauth/apple/callbackCallback handler.
POST/api/oauth/apple/manualManual exchange helper.

Conversations & membership

Routes implemented in handlers/conversations.rs:

MethodPathDescription
POST/api/conversationsCreate a new conversation.
POST/api/conversations/{conversation_id}/participantsInvite/add a participant. Emits membership + presence SSE events.
DELETE/api/conversations/{conversation_id}/participants/{user_id}Remove a participant.
POST/api/conversations/{conversation_id}/invitesCreate an invite token.
POST/api/invites/acceptAccept an invite token.
POST/api/invites/{token}/revokeRevoke an invite token.
GET/api/conversations/{conversation_id}/threadsList thread summaries (supports after + limit query params).
GET/api/conversations/{conversation_id}/unreadReturn unread counts per thread.

Threads & messages

Routes from handlers/threads.rs:

MethodPathDescription
GET/api/threads/{root_id}/treeDepth-first thread slice (cursor_path + limit optional).
POST/api/threads/{conversation_id}/rootCreate a new thread root. Triggers assistant streaming when role = assistant.
POST/api/messages/{parent_id}/replyReply to an existing message.
GET/api/messages/{message_id}/chunksRetrieve persisted assistant chunks.
POST/api/threads/{root_id}/readMark thread as read (MarkThreadReadRequest).
POST/api/messages/{message_id}/deleteSoft-delete a message.
POST/api/messages/{message_id}/restoreRestore a previously deleted message.
POST/api/messages/{message_id}/editReplace message content.
POST/api/typingSet typing state (TypingRequest).
POST/api/presence/heartbeatUpdate presence heartbeat.

Streaming

MethodPathDescription
GET/api/stream/conversations/{conversation_id}SSE endpoint producing ConversationStreamEvent values. Requires session cookie and (optionally) Last-Event-ID.

Copilot-compatible endpoints

These helpers live in handlers/copilot.rs and provide simple echo responses for integration tests:

MethodPathDescription
GET/v1/modelsReturns ModelsResponse with two static models (gpt-4, gpt-3.5).
POST/v1/chat/completionsEchoes provided messages as assistant responses (ChatCompletionResponse).

Admin rate limit API

Available when features.auth_v1 = true and rate_limits.admin_api_enabled = true (handlers/admin_limits.rs):

MethodPathDescription
GET/api/admin/limits/profilesList profiles.
POST/api/admin/limits/profilesCreate a profile.
PUT/api/admin/limits/profiles/{id}Update profile parameters/description.
DELETE/api/admin/limits/profiles/{id}Delete a profile (fails if still assigned).
GET/api/admin/limits/assignmentsList route assignments.
POST/api/admin/limits/assignmentsAssign a profile to a route.
DELETE/api/admin/limits/assignments/{id}Remove an assignment.

Health and observability

Outside of the /api prefix, the server exposes:

MethodPathDescription
GET/healthzLiveness probe.
GET/readyzReadiness probe (verifies PostgreSQL bootstrap).
GET/metricsPrometheus metrics via metrics-exporter-prometheus.
GET/.well-known/{path}Served when features.well_known = true; entries configured via [well_known.entries].
GET/openapi.json / /openapi.yamlGenerated OpenAPI spec.

Refer to Authentication for cookie details and Configuration for the relevant keys.

Configuration

RustyGPT uses a layered configuration loader (rustygpt-shared::config::server::Config). Defaults are determined by the active profile (Dev/Test/Prod), then merged with an optional file and environment overrides. CLI flags can override specific values (e.g. --port).

Loading order

  1. Profile defaults (Config::default_for_profile(Profile::Dev))
  2. Optional config file (config.toml, config.yaml, or config.json)
  3. Environment variables using double underscores (e.g. RUSTYGPT__SERVER__PORT=9000)
  4. CLI overrides (currently the server/CLI --port flag)

Config::load_config(path, override_port) performs this merge and validates required fields.

Key sections

[logging]

[logging]
level = "info"        # tracing level passed to tracing-subscriber
format = "text"        # "text" or "json"

[server]

[server]
host = "127.0.0.1"
port = 8080
public_base_url = "http://localhost:8080"
request_id_header = "x-request-id"

[server.cors]
allowed_origins = ["http://localhost:3000", "http://127.0.0.1:3000"]
allow_credentials = false
max_age_seconds = 600

public_base_url is derived automatically when not supplied (scheme depends on profile). request_id_header controls which header the middleware reads when assigning request IDs.

[security]

[security.hsts]
enabled = false
max_age_seconds = 63072000
include_subdomains = true
preload = false

[security.cookie]
domain = ""
secure = false
same_site = "lax"

[security.csrf]
cookie_name = "CSRF-TOKEN"
header_name = "X-CSRF-TOKEN"
enabled = true

[rate_limits]

[rate_limits]
auth_login_per_ip_per_min = 10
default_rps = 50.0
burst = 100
admin_api_enabled = false

When admin_api_enabled = true the /api/admin/limits/* routes become available.

[session]

[session]
idle_seconds = 28800
absolute_seconds = 604800
session_cookie_name = "SESSION_ID"
csrf_cookie_name = "CSRF-TOKEN"
max_sessions_per_user = 5

Set max_sessions_per_user = 0 (or null) to disable automatic eviction.

[oauth]

[oauth]
redirect_base = "http://localhost:8080/api/auth/github/callback"

[oauth.github]
client_id = "..."
client_secret = "..."

If oauth.github is omitted the GitHub endpoints still respond but return empty URLs. Apple support reads APPLE_* environment variables directly in the handler.

[db]

[db]
url = "postgres://tinroof:rusty@localhost/rustygpt_dev"
statement_timeout_ms = 5000
max_connections = 10
bootstrap_path = "../scripts/pg"

bootstrap_path points to the directory containing schema/, procedures/, indexes/, and seed/ folders.

[sse]

[sse]
heartbeat_seconds = 20
channel_capacity = 128
id_prefix = "evt_"

[sse.persistence]
enabled = false
max_events_per_user = 500
prune_batch_size = 100
retention_hours = 48

[sse.backpressure]
drop_strategy = "drop_tokens"
warn_queue_ratio = 0.75

[features]

[features]
auth_v1 = true
sse_v1 = true
well_known = true

Flags gate optional subsystems without recompiling the binary.

[cli] and [web]

[cli]
session_store = "~/.config/rustygpt/session.cookies"

[web]
static_dir = "../rustygpt-web/dist"
spa_index = "../rustygpt-web/dist/index.html"

[llm]

Config embeds LLMConfiguration from rustygpt-shared::config::llm. Use it to describe llama.cpp models/providers:

[llm.global_settings]
persist_stream_chunks = true

[llm.providers.default]
provider_type = "llama_cpp"
model_path = "./models/your-model.gguf"

See rustygpt-shared/src/config/llm.rs for the full schema.

Environment variable syntax

Nested keys map to uppercase names with double underscores. Examples:

  • RUSTYGPT__SERVER__PORT=9001
  • RUSTYGPT__SECURITY__COOKIE__SECURE=true
  • RUSTYGPT__FEATURES__SSE_V1=true

Booleans and numbers follow standard Rust parsing rules. Paths can be relative or absolute.

How-to Overview

Task-focused guides for operating RustyGPT. These assume you already understand the system from the Guides and Reference sections.

Docker deploy

This guide covers building the RustyGPT container image and running it alongside PostgreSQL with Docker Compose.

Build the image

The repository ships a multi-stage Dockerfile that builds the workspace and bundles the server binary plus static assets:

docker build -t rustygpt/server:latest -f Dockerfile .

Set BUILD_PROFILE=release to compile with optimisations. The final image exposes the server on port 8080.

Compose stack

docker-compose.yaml defines two services:

  • backend – builds from the Dockerfile (target runtime). Environment variables include DATABASE_URL, OAuth credentials, and feature toggles. Update them to match your deployment.
  • postgrespostgres:17-alpine with credentials matching config.example.toml.

Bring the stack up:

docker compose up --build

The compose file mounts ./.data/postgres for database storage and ./.data/postgres-init for init scripts. To reuse the workspace schema, copy the contents of scripts/pg into that directory before the first run:

mkdir -p .data/postgres-init
cp -r scripts/pg/* .data/postgres-init/

Alternatively, rely on the server’s bootstrap runner by exposing the same directory inside the backend container and pointing [db].bootstrap_path at it.

Configuration and secrets

  • Copy config.example.toml to a volume or bake it into the image and set RUSTYGPT__CONFIG variables as needed.
  • Provide OAuth credentials (GITHUB_*, APPLE_*) if you plan to use those flows; otherwise the endpoints return placeholder URLs.
  • Set features.auth_v1, features.sse_v1, and features.well_known to true via environment variables or the config file.
  • If running behind TLS terminate HTTPS at the reverse proxy and set server.public_base_url to the external URL.

Post-deployment checks

  1. Hit http://HOST:8080/healthz and http://HOST:8080/readyz until both return 200.
  2. POST to /api/setup to create the initial admin account.
  3. Use the CLI container (or a local build) to log in: docker compose exec backend rustygpt login.
  4. Visit /metrics and confirm counters increment when making requests.
  5. If using the web UI, serve the rustygpt-web build either from the same container (set [web.static_dir]) or via a separate static host.

Rollback

Tag each release in your registry. To roll back:

docker compose pull backend:previous-tag
docker compose up -d backend

The PostgreSQL data directory is persisted on disk so sessions and conversations remain intact.

Rotate secrets

Use this runbook to update credentials (database passwords, OAuth secrets, session keys) while keeping RustyGPT online.

Preparation

  1. Inventory the secrets in use (e.g. DATABASE_URL, GITHUB_CLIENT_SECRET, config entries under [security.cookie]).
  2. Update your secret manager or environment files with new values, but do not apply them yet.
  3. Coordinate a maintenance window if session cookie rotation is expected to log users out.

Rotation steps

  1. Stage – write new values to your secret store or .env file.
  2. Deploy – restart the server with updated environment variables/config (docker compose restart backend or rolling restart in your orchestrator). The bootstrap runner is idempotent, so restarting is safe.
  3. Verify – run smoke tests:
    cargo run -p rustygpt-cli -- login
    cargo run -p rustygpt-cli -- me
    curl -sSf http://HOST:8080/readyz
    
  4. Cleanup – remove old secrets from the manager and audit logs for unexpected errors.

Session cookies are independent of database passwords or OAuth secrets. If you change [security.cookie] settings (e.g. enable secure or change session_cookie_name), expect users to sign in again.

Observability

  • Watch application logs for SessionService warnings.
  • Confirm http_rate_limit_requests_total continues to increment after the restart.
  • Verify the CLI can still access streaming endpoints (cargo run -p rustygpt-cli -- follow --root <id>).

Incident response

If a rotation fails:

  1. Roll back to the previous secret values and restart the server.
  2. Capture logs around the failure (authentication errors, database connection failures, etc.).
  3. File an issue or ADR documenting the change and follow-up actions.

Release notes

The authoritative changelog lives in CHANGELOG.md. No tagged releases have been published yet; the [Unreleased] section tracks ongoing development across the workspace crates.

When a release is tagged (vMAJOR.MINOR.PATCH) update the changelog and, if applicable, regenerate the docs index with just docs-index so the summaries reflect the new features.