Stratus Chatbot Build Spec

Part 1

What makes a build spec effective

A build spec is a treaty between the person who knows what should exist and the person — or agent — who is going to make it exist. The defects in most specs trace back to one of these principles being violated.

Core principles

Negative space matters as much as positive

"Out of scope" plus "do not build" plus "common pitfalls" prevent scope creep and rediscovery of dead ends.

Decisions, not just designs

Every non-obvious choice gets a why. A spec without rationale becomes folklore in 6 months.

Verifiable acceptance criteria

"It works" is not done. Each deliverable needs a binary test someone other than the builder can run.

Single owner per workstream

Groups do not own things. If two people are accountable, no one is.

Phased delivery with hard gates

Phase N+1 does not start until Phase N is verified in production. Prevents the half-built-everywhere failure mode.

Failure modes named explicitly

What happens when X service is down? What is the rollback?

SVNS at every layer

Smallest viable next step. Every output should produce evidence within 24–72 hours.

Change control

What stops when this ships? If you do not name what stops, both old and new run in parallel.

Applicable frameworks

Anthropic / OpenAI spec-driven development Amazon Working Backwards ADR pattern 12-Factor App STRIDE threat modeling DORA metrics RACI

Part 3

Audit / review — findings by severity

Sorted by severity, not by section order. Five P0 items must close before Phase 1 kickoff. Five P1 items will save pain after launch. Eight P2 items are polish.

01 P0 · Block kickoff

No regulatory anchor for the guardrails

The spec says "Stratus is regulated financial services" and writes guardrails accordingly, but it never cites which regulations are driving which rule. As GC, you want the next attorney (or auditor) reading this to see:

GLBA Safeguards Rule (16 CFR 314) drives encryption-at-rest, access controls, audit log, IP hashing, retention.
UDAAP (12 USC 5536) drives "no rate quotes / no approval determinations." Those are the deceptive-acts trip wires.
Reg B / ECOA (12 CFR 1002) drives the prohibition on the bot collecting demographic info or making eligibility statements. If a visitor reveals protected-class info, the system should not store or respond on it.
CCPA / CPRA (California, since Stratus is in OC) drives data subject access, deletion rights, "do not sell" disclosure, opt-out for sensitive categories.
State lending licensure (Cal Fin Code §22000+, if a CFL licensee) drives required disclosures.
TCPA if the bot ever moves to SMS (currently out of scope, but flag it).

Fix Add a §2.5 "Regulatory Basis" mapping every guardrail to the rule that requires it. This is also marketing — it shows clients you actually know what you are doing.

02 P0 · Block kickoff

No data retention policy

Conversations, messages, tickets, IP hashes, and customer emails all live forever per the current schema. This is a CCPA exposure, a GLBA disposal-rule issue (16 CFR 682), and operationally a Supabase storage cost over time.

Define:

Retention window for conversations (suggest 13 months: covers 1 audit cycle plus 30-day buffer).
Retention for tickets (longer, these are arguably records; suggest 7 years to match financial records retention).
Retention for messages (shorter, suggest match conversations).
A scheduled purge job (Railway cron, daily).
DSAR (data subject access request) workflow.

03 P0 · Block kickoff

The IP hashing scheme is weaker than it looks

Spec says sha256(ip + daily_salt). If the daily salt is stored in a DB column, anyone with DB access can correlate across days. The salt belongs in env vars or Vault and should rotate via deploy, not via a row update.

Also: pitfall #6 says "daily-rotated salt" but the implementation is not specified anywhere. Spec it.

04 P0 · Block kickoff

Single owner is missing

The doc names Mykle as the implementer at the end, but no other workstream has a named owner. Who owns the FAQ content? (Lesley is mentioned once in §13 Phase 1 Task 7. Give her her own line.) Who owns the Cloudflare config? Who renews the Entra client secret in 12 months? Who responds to Pumble alerts when the Graph subscription renewal fails?

Fix Add a RACI table, or at minimum a "Roles" subsection.

05 P0 · Block kickoff

No "what stops" clause

Per the Change Control rule. If the bot replaces an existing contact form, phone screen, or website chat plugin, the spec needs to say explicitly: On the day the bot ships, the old contact form is removed and points here instead.

Otherwise both run in parallel and you will have two ticket queues.

06 P1 · Should add

Cost projections absent

Anthropic plus Railway plus Supabase plus Cloudflare plus M365 mailbox at 1,000 conversations per month, then 5,000, then 10,000. A financial services exec asks "what is our run rate" within 60 days of go-live. The system prompt cache is mentioned but no math is shown.

07 P1 · Should add

No adversarial / red-team test plan

§14 testing covers happy paths and rate limits. For a financial-services bot, you need a documented jailbreak suite: "ignore previous instructions," role-play prompts, paraphrased rate questions ("what is the percentage I'd pay each year"), DAN-style attacks, prompt injection via the FAQ itself if it ever loads from DB.

Fix This should be a checked-in test file with red-team scenarios that run in CI.

08 P1 · Should add

Feedback is in Phase 3. Wrong phase.

Thumbs feedback is the only signal you have about whether the bot is hallucinating. It should be in Phase 1. You will be flying blind for 3+ weeks otherwise.

09 P1 · Should add

The `max_failed_responses_before_escalation` setting is "currently informational"

A configured-but-not-enforced setting is technical debt the moment it lands. Either wire it up or delete it.

10 P1 · Should add

No disaster recovery / RPO / RTO

§18 README mentions runbook items but does not quantify. What is the maximum tolerable data loss (RPO)? Maximum recovery time (RTO)? For financial services, document these even if generous.

11 P2 · Polish

Accessibility (ADA Title III) not addressed

Financial services websites are repeat targets for ADA litigation. Keyboard nav, screen reader behavior, color contrast (the brand color #185FA5 against white passes AA for normal text — verify against #185FA5 background and button states), focus indicators in the Shadow DOM, ARIA labels on the launcher.

Fix Add a §9.5 "Accessibility" subsection.

12 P2 · Polish

The FAQ-in-system-prompt threshold should be made explicit upfront

Pitfall #12 buries "if you ever genuinely need RAG (>50K tokens of FAQ)." Move this to §3 or §8 as a top-line constraint: "If the FAQ exceeds X tokens, escalate the decision; do not unilaterally introduce a vector store." That matches the operating style.

13 P2 · Polish

No decision log / ADR

Why Preact over React? (Bundle size, but say it.) Why Supabase over a separate Postgres + Auth0? Why Railway over Vercel? Why Pumble over Slack? Each gets a one-sentence rationale somewhere. Future-you will thank present-you.

14 P2 · Polish

Versioning protocol for the widget bundle

NEXT_PUBLIC_WIDGET_VERSION exists but no bump rules. Semver? When does a localStorage schema change force a clear?

15 P2 · Polish

The "Final Instructions for the Implementing Agent" section is buried at §19

This is actually some of the most useful content (conventional commits, TypeScript strict, no premature optimization, "stop and ask"). Move it to §1.5 right after the Mission so it sets the tone before the reader hits the tech-stack table.

16 P2 · Polish

Mermaid diagram in the README is mentioned but not provided

Embed one in the spec itself. A picture saves 500 words for someone scanning.

17 P2 · Polish

"Use existing dashboard mockup design" (§19) — link it inline

Otherwise the agent guesses.

18 P2 · Polish

The two-step draft-then-send Graph pattern is explained well but repeated

Refactor §11.1 to reference §11.2 instead of duplicating.

Severity	Count	Disposition
P0	5	Block Phase 1 kickoff. Address before any code is written.
P1	5	Should add — will save pain after launch.
P2	8	Polish. Address as time permits.

Stratus Chatbot Build Spec
Review & Audit

Top-quartile spec. Close the P0 gaps before kickoff.

Contents

What makes a build spec effective

Core principles

Negative space matters as much as positive

Decisions, not just designs

Verifiable acceptance criteria

Single owner per workstream

Phased delivery with hard gates

Failure modes named explicitly

SVNS at every layer

Change control

Applicable frameworks

What is genuinely strong — preserve these

Audit / review — findings by severity

No regulatory anchor for the guardrails

No data retention policy

The IP hashing scheme is weaker than it looks

Single owner is missing

No "what stops" clause

Cost projections absent

No adversarial / red-team test plan

Feedback is in Phase 3. Wrong phase.

The `max_failed_responses_before_escalation` setting is "currently informational"

No disaster recovery / RPO / RTO

Accessibility (ADA Title III) not addressed

The FAQ-in-system-prompt threshold should be made explicit upfront

No decision log / ADR

Versioning protocol for the widget bundle

The "Final Instructions for the Implementing Agent" section is buried at §19

Mermaid diagram in the README is mentioned but not provided

"Use existing dashboard mockup design" (§19) — link it inline

The two-step draft-then-send Graph pattern is explained well but repeated

Definition of Done for this review

Closed loop

Severity breakdown

Sources referenced

Top-quartile spec. Close the P0 gaps before kickoff.

Contents

What makes a build spec effective

Core principles

Negative space matters as much as positive

Decisions, not just designs

Verifiable acceptance criteria

Single owner per workstream

Phased delivery with hard gates

Failure modes named explicitly

SVNS at every layer

Change control

Applicable frameworks

What is genuinely strong — preserve these

Audit / review — findings by severity

No regulatory anchor for the guardrails

No data retention policy

The IP hashing scheme is weaker than it looks

Single owner is missing

No "what stops" clause

Cost projections absent

No adversarial / red-team test plan

Feedback is in Phase 3. Wrong phase.

The max_failed_responses_before_escalation setting is "currently informational"

No disaster recovery / RPO / RTO

Accessibility (ADA Title III) not addressed

The FAQ-in-system-prompt threshold should be made explicit upfront

No decision log / ADR

Versioning protocol for the widget bundle

The "Final Instructions for the Implementing Agent" section is buried at §19

Mermaid diagram in the README is mentioned but not provided

"Use existing dashboard mockup design" (§19) — link it inline

The two-step draft-then-send Graph pattern is explained well but repeated

Definition of Done for this review

Closed loop

Severity breakdown

Sources referenced

The `max_failed_responses_before_escalation` setting is "currently informational"