Uptime commitments, failure modes, and what happens when things go wrong.
Uptime is measured as the availability of the Stamp API
(POST /v1/stamp) — the endpoint that authorizes agent actions.
This is the critical path. If this endpoint is up, your agents can operate.
If it's down, behavior depends on your configuration (see Failure Modes below).
| Counts as Downtime | Does NOT Count as Downtime |
|---|---|
Stamp API (/v1/stamp) returns 5xx errors for >1 minute |
Scheduled maintenance (announced 48 hours in advance) |
| Stamp API is unreachable for >1 minute | Dashboard unavailability (does not affect agent operation) |
| Stamp API latency >5 seconds (p95) for >5 minutes | Payload blob storage delays (fire-and-forget, does not block agents) |
| Audit log query slowness (read path, does not affect agent operation) | |
| Outages caused by customer's own infrastructure (gateway, network, provider) |
This is the question your CTO will ask. Here is the honest answer.
The D3cipher Gateway (running in your infrastructure) contacts the D3cipher cloud for every agent request. The stamp is synchronous — the gateway asks for authorization before forwarding to the AI provider. If the cloud is unreachable, behavior depends on how you configured the gateway:
block_on_rejection=True
If D3cipher is unreachable, all agent requests are blocked. The gateway returns a 503 error. No request reaches the AI provider. No ungoverned actions can occur. The audit trail has no gaps.
Choose this if: You would rather stop all AI agents than allow ungoverned operations. Required for most regulated environments.
block_on_rejection=False
If D3cipher is unreachable, agent requests proceed without governance. The gateway forwards the request directly to the AI provider. The agent continues operating. The audit trail will have a gap for the duration of the outage. When connectivity is restored, stamping resumes automatically.
Choose this if: Agent availability is more important than continuous governance. The gap in the audit trail is documented and explainable to auditors.
| Scenario | What Happens | Data Impact |
|---|---|---|
| D3cipher API recovers after outage | Gateway automatically resumes stamping on the next request. No manual intervention required. | Chain resumes from where it left off. Sequence numbers are monotonic. No duplicate stamps. |
| Encrypted blob upload fails | Fire-and-forget. The stamp (chain entry) still succeeds. The blob is lost for that entry. | Audit trail intact (chain + hashes). Transcript for that entry unavailable. This is an availability gap, not an integrity gap. |
| Gateway container restarts | Gateway rebuilds agent middleware state on first request per agent. DEK is re-fetched from server. Budget state is re-loaded from server. | No data loss. Brief latency increase on first request while state is restored. |
| Database failover | Render managed Postgres handles automatic failover. Brief interruption (typically <30 seconds). | No data loss. Postgres replication ensures durability. |
If we fail to meet the uptime commitment for your tier in any calendar month, you are entitled to service credits applied to the following month's invoice:
| Monthly Uptime | Service Credit |
|---|---|
| Below SLA but ≥ 99.0% | 10% of monthly fee |
| ≥ 95.0% and < 99.0% | 25% of monthly fee |
| < 95.0% | 50% of monthly fee |
Credits must be requested within 30 days of the incident. Credits are applied to future invoices and do not exceed 50% of the monthly fee. Credits are the sole and exclusive remedy for SLA failures.
Scheduled maintenance windows are announced at least 48 hours in advance via email to account administrators. Maintenance typically occurs during low-traffic periods (weekdays 02:00–06:00 Pacific Time).
Most deployments are zero-downtime rolling updates. Render's deployment model starts the new version before stopping the old one. In practice, most updates cause no visible interruption. Maintenance windows are reserved for database migrations or infrastructure changes that may cause brief interruptions.
| Severity | Definition | Core | Guard | Enterprise |
|---|---|---|---|---|
| Critical | Stamp API down, all agents blocked | 4 hours | 1 hour | 15 minutes |
| High | Degraded performance, intermittent failures | 8 hours | 4 hours | 1 hour |
| Normal | Dashboard issues, non-blocking bugs | 2 business days | 1 business day | 4 hours |
| Low | Feature requests, documentation questions | 5 business days | 2 business days | 1 business day |
Support channels:
The D3cipher Gateway exposes two monitoring endpoints in your infrastructure:
/healthz — Returns 200 if the gateway is running and can reach D3cipher cloud. Returns 503 if degraded. Use as a Kubernetes liveness probe or monitoring check./metrics — Prometheus-compatible metrics including request count, rejection count, and latency per agent. Scrape with your existing monitoring stack.D3cipher monitors the Stamp API, database, and background workers continuously. Incidents are communicated via email to affected account administrators.
| Metric | Target |
|---|---|
| Recovery Point Objective (RPO) | ≤ 24 hours (daily automated Postgres backups) |
| Recovery Time Objective (RTO) | ≤ 4 hours (restore from backup + redeploy) |
| Backup frequency | Daily automated snapshots with 7-day retention (Render managed) |
| Backup location | Same region (Oregon, US). |
| Backup encryption | AES-256 at rest (Render managed). Payload blobs remain application-encrypted (D3cipher cannot decrypt even from backups). |
We will provide at least 30 days notice before making material changes to this SLA. Notice will be sent via email to account administrators. Changes that reduce uptime commitments or increase exclusions will not apply to existing contracts until the next renewal period.
For SLA clarifications, custom terms, or Enterprise agreements: hello@d3cipher.ai