Legal and regulatory review of the Stratus Financial AI FAQ chatbot build spec, with prioritized findings and a path to kickoff.
This spec is in the top quartile of what I have seen. Five P0 items must be closed before Phase 1 starts; five P1 items will save pain after launch; eight P2 items are polish. The legal and regulatory gaps are the only material blockers.
A build spec is a treaty between the person who knows what should exist and the person — or agent — who is going to make it exist. The defects in most specs trace back to one of these principles being violated.
"Out of scope" plus "do not build" plus "common pitfalls" prevent scope creep and rediscovery of dead ends.
Every non-obvious choice gets a why. A spec without rationale becomes folklore in 6 months.
"It works" is not done. Each deliverable needs a binary test someone other than the builder can run.
Groups do not own things. If two people are accountable, no one is.
Phase N+1 does not start until Phase N is verified in production. Prevents the half-built-everywhere failure mode.
What happens when X service is down? What is the rollback?
Smallest viable next step. Every output should produce evidence within 24–72 hours.
What stops when this ships? If you do not name what stops, both old and new run in parallel.
Before the audit, credit where it's due. This spec is in the top quartile of what I have seen. Specifically:
Sorted by severity, not by section order. Five P0 items must close before Phase 1 kickoff. Five P1 items will save pain after launch. Eight P2 items are polish.
The spec says "Stratus is regulated financial services" and writes guardrails accordingly, but it never cites which regulations are driving which rule. As GC, you want the next attorney (or auditor) reading this to see:
Conversations, messages, tickets, IP hashes, and customer emails all live forever per the current schema. This is a CCPA exposure, a GLBA disposal-rule issue (16 CFR 682), and operationally a Supabase storage cost over time.
Define:
conversations (suggest 13 months: covers 1 audit cycle plus 30-day buffer).tickets (longer, these are arguably records; suggest 7 years to match financial records retention).messages (shorter, suggest match conversations).Spec says sha256(ip + daily_salt). If the daily salt is stored in a DB column, anyone with DB access can correlate across days. The salt belongs in env vars or Vault and should rotate via deploy, not via a row update.
Also: pitfall #6 says "daily-rotated salt" but the implementation is not specified anywhere. Spec it.
The doc names Mykle as the implementer at the end, but no other workstream has a named owner. Who owns the FAQ content? (Lesley is mentioned once in §13 Phase 1 Task 7. Give her her own line.) Who owns the Cloudflare config? Who renews the Entra client secret in 12 months? Who responds to Pumble alerts when the Graph subscription renewal fails?
Per the Change Control rule. If the bot replaces an existing contact form, phone screen, or website chat plugin, the spec needs to say explicitly: On the day the bot ships, the old contact form is removed and points here instead.
Otherwise both run in parallel and you will have two ticket queues.
Anthropic plus Railway plus Supabase plus Cloudflare plus M365 mailbox at 1,000 conversations per month, then 5,000, then 10,000. A financial services exec asks "what is our run rate" within 60 days of go-live. The system prompt cache is mentioned but no math is shown.
§14 testing covers happy paths and rate limits. For a financial-services bot, you need a documented jailbreak suite: "ignore previous instructions," role-play prompts, paraphrased rate questions ("what is the percentage I'd pay each year"), DAN-style attacks, prompt injection via the FAQ itself if it ever loads from DB.
Thumbs feedback is the only signal you have about whether the bot is hallucinating. It should be in Phase 1. You will be flying blind for 3+ weeks otherwise.
max_failed_responses_before_escalation setting is "currently informational"A configured-but-not-enforced setting is technical debt the moment it lands. Either wire it up or delete it.
§18 README mentions runbook items but does not quantify. What is the maximum tolerable data loss (RPO)? Maximum recovery time (RTO)? For financial services, document these even if generous.
Financial services websites are repeat targets for ADA litigation. Keyboard nav, screen reader behavior, color contrast (the brand color #185FA5 against white passes AA for normal text — verify against #185FA5 background and button states), focus indicators in the Shadow DOM, ARIA labels on the launcher.
Pitfall #12 buries "if you ever genuinely need RAG (>50K tokens of FAQ)." Move this to §3 or §8 as a top-line constraint: "If the FAQ exceeds X tokens, escalate the decision; do not unilaterally introduce a vector store." That matches the operating style.
Why Preact over React? (Bundle size, but say it.) Why Supabase over a separate Postgres + Auth0? Why Railway over Vercel? Why Pumble over Slack? Each gets a one-sentence rationale somewhere. Future-you will thank present-you.
NEXT_PUBLIC_WIDGET_VERSION exists but no bump rules. Semver? When does a localStorage schema change force a clear?
This is actually some of the most useful content (conventional commits, TypeScript strict, no premature optimization, "stop and ask"). Move it to §1.5 right after the Mission so it sets the tone before the reader hits the tech-stack table.
Embed one in the spec itself. A picture saves 500 words for someone scanning.
Otherwise the agent guesses.
Refactor §11.1 to reference §11.2 instead of duplicating.
| Severity | Count | Disposition |
|---|---|---|
| P0 | 5 | Block Phase 1 kickoff. Address before any code is written. |
| P1 | 5 | Should add — will save pain after launch. |
| P2 | 8 | Polish. Address as time permits. |