Status: design spike only. This is a Bet / high-risk bucket, not proof that SourceHarbor already has a production-ready autopilot.
Worth doing only as a human-in-the-loop research-ops MVP.
Not worth doing yet as:
Why:
Think of the current repo like a warehouse that already has shelves, barcodes, and forklifts. It does not yet have a trusted dispatcher who can send a truck out without a human checking the manifest first.
| Question | Verdict | Reason |
|---|---|---|
| Should SourceHarbor pursue Agent Autopilot now? | Yes, but only as a bounded spike | the repo already has workflows, MCP, retrieval, and evidence surfaces |
| Should it promise autonomous research ops now? | No | approval, identity, audit, and rollback are not strong enough yet |
| Should the first cut execute actions automatically? | No | the honest MVP is proposal-first, approval-first |
| Should computer-use or UI automation be core to MVP? | No | that would add fragility before basic Autopilot signal quality is proven |
These are the repo capabilities that make the spike worth exploring at all.
| Capability | Current repo truth | Why it matters |
|---|---|---|
| Workflow execution | apps/api/app/routers/workflows.py starts bounded Temporal workflows such as poll_feeds, daily_digest, and notification_retry |
the system already has a real execution lane instead of needing a fake Autopilot runner |
| MCP doorway | apps/mcp/server.py registers workflow, retrieval, report, notification, job, and UI-audit tools |
agents already have one operator-facing control plane instead of needing a second orchestration stack |
| Retrieval evidence | apps/api/app/services/retrieval.py supports keyword, semantic, and hybrid search over digests, transcripts, and knowledge cards |
an Autopilot proposal can point to grounded evidence instead of hand-wavy reasoning |
| Notification lane | apps/api/app/services/notifications.py already models delivery creation and send paths |
outbound actions can stay inside an existing service contract rather than inventing a new lane |
| UI audit surface | apps/api/app/services/ui_audit.py can collect artifacts and findings, including Gemini-assisted review when configured |
useful as a secondary inspection surface, not as the core Autopilot dependency |
| Computer-use surface | apps/api/app/services/computer_use.py exists with explicit confirmation and blocked-action safety |
this is a guarded specialist tool, not a default Autopilot primitive |
| Evidence bundles | GET /api/v1/jobs/{job_id}/bundle already exposes reusable job evidence |
proposal review can link to an evidence packet instead of raw log scraping |
| Public truth boundary | README.md, docs/start-here.md, docs/architecture.md, and docs/proof.md already say Autopilot is a spike only |
the repo already protects against accidental overclaim if this document stays honest |
These are the layers that make full autonomy a no-go today.
| Missing layer | Why it blocks full Autopilot | What the MVP must do instead |
|---|---|---|
| Proposal persistence | no durable proposal object exists yet |
persist suggestions before any execution happens |
| Approval identity | no shipped approval queue or operator approval actor contract exists | require explicit approval metadata before a workflow can run |
| Rollback hooks | current repo has workflow start, but no shipped Autopilot kill switch or proposal revoke lane | limit MVP to allowlisted actions and require disable/reject controls |
| Reliable outbound readiness | notifications and Gemini-backed paths still depend on real secrets and provider availability | treat provider-gated lanes as risk flags, not default actions |
| Action scoping | MCP is broad, but Autopilot should not be allowed to do arbitrary mutation | allow only a narrow action list for the first MVP |
| Audit completeness | jobs and bundles exist, but there is no proposal-to-approval-to-execution ledger yet | record evidence, risk flags, approver, and result per proposal |
| Operator trust signal | no data yet shows operators actually want or trust Autopilot proposals | stop the spike quickly if approval quality is weak |
These sources shape the spike boundary:
SourceHarbor implication:
The safest MVP is:
Agent prepares a candidate run or report draft. Human approves before any outward action happens.
Concrete slice:
poll_feedsdaily_digestnotification_retryThis is the product cut that is small enough to be honest. It is more like “suggest and confirm” than “self-driving agent.”
This spike recommends a future contract like:
| Field | Meaning |
|---|---|
proposal_id |
durable ID for one agent suggestion |
proposal_kind |
workflow_run, daily_report_draft, notification_draft |
action_name |
allowlisted workflow or report action to trigger on approval |
inputs |
subscriptions, watchlists, retrieval queries, job IDs, or dates used |
evidence |
citations into jobs, cards, feed items, bundles, or ops inbox entries |
risk_flags |
secret_required, provider_blocked, empty_corpus, degraded_state, runtime_unhealthy |
approval_state |
pending, approved, rejected, expired, executed, cancelled |
approved_by |
operator identity if approval happened |
approved_at |
approval timestamp |
execution_result |
workflow ID, delivery ID, bundle ID, or error payload |
audit_notes |
short operator or system notes for why it was rejected or stopped |
The first implementation should not start unless these rules are part of the contract:
If this spike becomes implementation work later, the MVP should record:
The simplest honest review surface would be an approval queue plus links to existing jobs, bundles, and ops pages.
This spike should not move to implementation unless rollback is designed up front.
Minimum rollback shape for a future MVP:
If rollback cannot be described in one page, the spike is still too large.
Stop the Autopilot track if any of these happen:
Do not say:
Do say:
If a later prompt reopens this track, the smallest honest implementation order is:
proposal objectsIf step 1 or 2 already feels too large, stop there. That still counts as a good spike outcome.
A zero-context executor should start with this checklist:
apps/mcp/server.pyapps/mcp/tools/workflows.pyapps/api/app/routers/workflows.pyapps/api/app/services/retrieval.pyapps/api/app/services/notifications.pyapps/api/app/services/ui_audit.pyapps/api/app/services/computer_use.pyGo / no-go: Go for a small approval-first spike. No-go for full autonomy.
Best next implementation prompt:
proposal objects