sourceharbor

Start Here

This is the shortest truthful path from clone to visible product value.

If you only want a fast fit check first, go to see-it-fast.md. This page is the operator boot path: boot the stack, run one real job, inspect the result, then stop.

If you want one discoverable repo-local command surface before you memorize individual entrypoints, start here:

./bin/sourceharbor help

That helper stays intentionally thin. The direct bin/* commands below remain the underlying truth.

If you prefer to install the public wrapper first, the packaged CLI now lives in packages/sourceharbor-cli and delegates into this same repo-local substrate when you run it inside a checkout:

npm install --global ./packages/sourceharbor-cli
sourceharbor help

Before You Start

What You Should See By The End

Run Locally: Fastest Result Path

1. Install dependencies

./bin/sourceharbor help
cp .env.example .env
set -a
source .env
set +a
UV_PROJECT_ENVIRONMENT="${UV_PROJECT_ENVIRONMENT:-$SOURCE_HARBOR_CACHE_ROOT/project-venv}" \
  uv sync --frozen --extra dev --extra e2e
bash scripts/ci/prepare_web_runtime.sh >/dev/null

That last command refreshes the repo-managed web runtime workspace under .runtime-cache/tmp/web-runtime/workspace/apps/web. The local web app and the repo-side quality gates both read from that same runtime copy instead of building ad-hoc node state in the repo root.

The default local database path is container-first:

This avoids a silent split-brain when your machine already has a host Postgres on 127.0.0.1:5432.

2. Bootstrap the local stack

./bin/bootstrap-full-stack
./bin/full-stack up
source .runtime-cache/run/full-stack/resolved.env

./bin/bootstrap-full-stack now treats the core stack and the reader stack as two different floors in the building:

One more current truth detail matters now:

If you explicitly want the reader stack too, opt in on purpose:

./bin/bootstrap-full-stack --with-reader-stack 1 --reader-env-file env/profiles/reader.local.env

Equivalent thin-facade path:

./bin/sourceharbor bootstrap
./bin/sourceharbor full-stack up

Equivalent packaged-CLI path from inside the checkout:

sourceharbor bootstrap
sourceharbor full-stack up

Open:

If anything feels off before you continue, run:

./bin/doctor

Or through the thin facade:

./bin/sourceharbor doctor

Why source the runtime snapshot:

3. Set the local write token

Direct write endpoints require a local write token.

For local development, use:

export SOURCE_HARBOR_API_KEY="${SOURCE_HARBOR_API_KEY:-sourceharbor-local-dev-token}"

If you start the API outside the repo-managed ./bin/full-stack up path, also export:

export WEB_ACTION_SESSION_TOKEN="${WEB_ACTION_SESSION_TOKEN:-$SOURCE_HARBOR_API_KEY}"

That keeps direct write calls and web server actions on the same local token contract instead of creating a false auth blocker.

If you stay on the repo-managed ./bin/full-stack up path, the temporary web runtime now writes its own .env.local under .runtime-cache/tmp/web-runtime/workspace/apps/web/ with the resolved NEXT_PUBLIC_API_BASE_URL and the same local write-session fallback. That is how browser-triggered writes such as manual intake stay aligned with the API health route even when local env profiles carry CI=false style flags.

4. Queue a first video job

Replace the sample URL with any public YouTube or Bilibili URL you can access:

curl -sS -X POST "${SOURCE_HARBOR_API_BASE_URL}/api/v1/videos/process" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${SOURCE_HARBOR_API_KEY}" \
  -d '{
    "video": {
      "platform": "youtube",
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    },
    "mode": "full"
  }'

What this gives you:

Current maintainer-local truth for this lane:

5. Inspect the result surfaces

curl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/videos" | jq
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/feed/digests" | jq
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/jobs/<job-id>" | jq
curl -sS -X POST "${SOURCE_HARBOR_API_BASE_URL}/api/v1/retrieval/search" \
  -H "Content-Type: application/json" \
  -d '{"query":"summary","top_k":5,"mode":"keyword"}' | jq

Open these UI views:

Operator Path: Continuous Intake

If you want the longer-lived workflow instead of one-off processing:

  1. Add one or more subscriptions in the web UI or via POST /api/v1/subscriptions
  2. Use the built-in template split honestly:
    • strong-supported YouTube/Bilibili presets when you want the richer video lane
    • generalized RSSHub / generic RSS intake when the source universe widens
    • do not treat generalized intake as route-by-route RSSHub proof
  3. Trigger POST /api/v1/ingest/poll to refresh the Track lane and update the pending pool
  4. Keep the returned run_id so you can inspect GET /api/v1/ingest/runs/<run-id>
  5. Trigger POST /api/v1/ingest/consume when you want the Consume lane to freeze one batch and process queued items
  6. Trigger POST /api/v1/reader/batches/<batch-id>/materialize when you want the batch to become published reader docs immediately
  7. Read the resulting documents in /reader
  8. Inspect /trends when you want the merged-story view over repeated themes
  9. Inspect /briefings when you want the lower-cognitive-load unified story view for one watchlist; the selected story and Ask handoff should now stay on the same server-owned story truth instead of parallel page aliases
  10. Inspect the job page for retries, degradations, and artifact links

That path is what turns SourceHarbor from a one-shot processor into a knowledge intake system.

Local-Only Dedicated Chrome Root

When a local browser flow genuinely depends on login state, SourceHarbor now uses its own isolated Chrome root instead of borrowing your default Chrome user data directory.

Think of it like moving from “sharing a desk in the public lobby” to “having one dedicated studio for this repo”.

One-time bootstrap:

./bin/bootstrap-repo-chrome --json

That command copies only:

into the repo-owned target:

After bootstrap, start exactly one repo-owned Chrome instance:

./bin/start-repo-chrome --json
./bin/open-repo-chrome-tabs --site-set login-strong-check --json
python3 scripts/runtime/resolve_chrome_profile.py --mode repo-runtime --json

From then on, local automation attaches to that single instance over CDP. It does not second-launch Chrome against the same root.

If you want to reset the repo-owned Chrome session before a fresh manual login check, use:

./bin/stop-repo-chrome --json
./bin/start-repo-chrome --json
./bin/open-repo-chrome-tabs --site-set login-strong-check --json

Hosted CI stays login-free on purpose:

Minimum Verification

These are the smallest checks that support the local supervisor story:

source .runtime-cache/run/full-stack/resolved.env
./bin/full-stack status
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/healthz"
curl -I "http://127.0.0.1:${WEB_PORT}/ops"
python3 scripts/governance/check_env_contract.py --strict
python3 scripts/governance/check_host_safety_contract.py
python3 scripts/governance/check_test_assertions.py
./bin/doctor
eval "$(bash scripts/ci/prepare_web_runtime.sh --shell-exports)"
( cd "$WEB_RUNTIME_WEB_DIR" && npm run lint )

That list now also includes a host-safety fence:

Optional Long Live Smoke Lane

When you intentionally want the stricter live lane, run:

./bin/smoke-full-stack --offline-fallback 0

That command is not the same thing as the local supervisor proof above. It continues into external provider checks and can still stop on current YouTube/Resend/Gemini-side gates even after bootstrap -> up -> status -> doctor is already healthy.

Boundaries

For the explicit evidence ladder, go to proof.md. For the storage/runtime truth split, read runtime-truth.md.