This is the shortest truthful path from clone to visible product value.
If you only want a fast fit check first, go to see-it-fast.md. This page is the operator boot path: boot the stack, run one real job, inspect the result, then stop.
If you want one discoverable repo-local command surface before you memorize individual entrypoints, start here:
./bin/sourceharbor help
That helper stays intentionally thin. The direct bin/* commands below remain
the underlying truth.
If you prefer to install the public wrapper first, the packaged CLI now lives in
packages/sourceharbor-cli and delegates into this same repo-local substrate
when you run it inside a checkout:
npm install --global ./packages/sourceharbor-cli
sourceharbor help
ghcr.io/xiaojiou176-open/sourceharbor-api as a separate builder surface, not the whole product stack.ghcr.io/xiaojiou176-open/sourceharbor-ci-standard as the install story. That image is CI/devcontainer infrastructure..runtime-cache/run/full-stack/resolved.env.runtime-cache/run/full-stack/resolved.env, with the canonical default local health URL remaining http://127.0.0.1:9000/healthz only when that port is still free/reader./bin/sourceharbor help
cp .env.example .env
set -a
source .env
set +a
UV_PROJECT_ENVIRONMENT="${UV_PROJECT_ENVIRONMENT:-$SOURCE_HARBOR_CACHE_ROOT/project-venv}" \
uv sync --frozen --extra dev --extra e2e
bash scripts/ci/prepare_web_runtime.sh >/dev/null
That last command refreshes the repo-managed web runtime workspace under
.runtime-cache/tmp/web-runtime/workspace/apps/web. The local web app and the
repo-side quality gates both read from that same runtime copy instead of
building ad-hoc node state in the repo root.
The default local database path is container-first:
CORE_POSTGRES_PORT=15432DATABASE_URL=postgresql+psycopg://postgres:postgres@127.0.0.1:${CORE_POSTGRES_PORT}/sourceharborThis avoids a silent split-brain when your machine already has a host Postgres
on 127.0.0.1:5432.
./bin/bootstrap-full-stack
./bin/full-stack up
source .runtime-cache/run/full-stack/resolved.env
./bin/bootstrap-full-stack now treats the core stack and the reader
stack as two different floors in the building:
.runtime-cache/ when Docker is unavailable and local postgres /
initdb / pg_ctl / temporal binaries exist.One more current truth detail matters now:
./bin/full-stack up can self-heal the core stack when Temporal is down.
If worker preflight sees 127.0.0.1:7233 unreachable, it now attempts the
repo-owned core_services.sh up path before failing the whole local startup.If you explicitly want the reader stack too, opt in on purpose:
./bin/bootstrap-full-stack --with-reader-stack 1 --reader-env-file env/profiles/reader.local.env
Equivalent thin-facade path:
./bin/sourceharbor bootstrap
./bin/sourceharbor full-stack up
Equivalent packaged-CLI path from inside the checkout:
sourceharbor bootstrap
sourceharbor full-stack up
Open:
http://127.0.0.1:${WEB_PORT}${SOURCE_HARBOR_API_BASE_URL}/healthzhttp://127.0.0.1:9000/healthzIf anything feels off before you continue, run:
./bin/doctor
Or through the thin facade:
./bin/sourceharbor doctor
Why source the runtime snapshot:
9000/3000 when those ports are already occupied./bin/full-stack down and restart the clean pathDirect write endpoints require a local write token.
For local development, use:
export SOURCE_HARBOR_API_KEY="${SOURCE_HARBOR_API_KEY:-sourceharbor-local-dev-token}"
If you start the API outside the repo-managed ./bin/full-stack up path, also
export:
export WEB_ACTION_SESSION_TOKEN="${WEB_ACTION_SESSION_TOKEN:-$SOURCE_HARBOR_API_KEY}"
That keeps direct write calls and web server actions on the same local token contract instead of creating a false auth blocker.
If you stay on the repo-managed ./bin/full-stack up path, the temporary web
runtime now writes its own .env.local under
.runtime-cache/tmp/web-runtime/workspace/apps/web/ with the resolved
NEXT_PUBLIC_API_BASE_URL and the same local write-session fallback. That is
how browser-triggered writes such as manual intake stay aligned with the API
health route even when local env profiles carry CI=false style flags.
Replace the sample URL with any public YouTube or Bilibili URL you can access:
curl -sS -X POST "${SOURCE_HARBOR_API_BASE_URL}/api/v1/videos/process" \
-H "Content-Type: application/json" \
-H "X-API-Key: ${SOURCE_HARBOR_API_KEY}" \
-d '{
"video": {
"platform": "youtube",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
},
"mode": "full"
}'
What this gives you:
job_idCurrent maintainer-local truth for this lane:
mode=full YouTube run can now complete end-to-end againfull-stack up Temporal self-healgemini-3-flash-preview).webm inputs do not stall forever in FileState.PROCESSINGcurl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/videos" | jq
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/feed/digests" | jq
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/api/v1/jobs/<job-id>" | jq
curl -sS -X POST "${SOURCE_HARBOR_API_BASE_URL}/api/v1/retrieval/search" \
-H "Content-Type: application/json" \
-d '{"query":"summary","top_k":5,"mode":"keyword"}' | jq
Open these UI views:
/ for the command center/ops for operator diagnostics and live-hardening gates/subscriptions for strong-supported video templates plus generalized RSSHub/RSS intake, now staged as a tracked-universe atlas and manual-intake workbench backed by the same template catalog contract that the API and MCP surfaces expose, including one-off video/article URLs that can enter today without first becoming recurring subscriptions/search for grounded search across SourceHarbor artifacts/ask for the story-aware, briefing-backed Ask front door, with a server-owned story page payload over the answer/change/evidence view/reader for the published-doc frontstage: merged reader docs, singleton polish docs, navigation brief, yellow warning, and source contribution drawer/feed for the digest reading flow, with source identity cards, affiliation cues, and a direct bridge into the current reader edition whenever that digest already materialized into a published reader document/jobs?job_id=<job-id> for pipeline trace and artifacts/watchlists for long-lived tracking objects/trends for merged stories plus recent evidence runs/briefings for the summary-first watchlist briefing: current story, then changes, then evidence drill-down, with one canonical selected-story payload and Ask handoff owned by the server/mcp for the MCP front door and quickstart/settings for notifications and test sendsIf you want the longer-lived workflow instead of one-off processing:
POST /api/v1/subscriptionsPOST /api/v1/ingest/poll to refresh the Track lane and update the pending poolrun_id so you can inspect GET /api/v1/ingest/runs/<run-id>POST /api/v1/ingest/consume when you want the Consume lane to freeze one batch and process queued itemsPOST /api/v1/reader/batches/<batch-id>/materialize when you want the batch to become published reader docs immediately/reader/trends when you want the merged-story view over repeated themes/briefings when you want the lower-cognitive-load unified story view for one watchlist; the selected story and Ask handoff should now stay on the same server-owned story truth instead of parallel page aliasesThat path is what turns SourceHarbor from a one-shot processor into a knowledge intake system.
When a local browser flow genuinely depends on login state, SourceHarbor now uses its own isolated Chrome root instead of borrowing your default Chrome user data directory.
Think of it like moving from “sharing a desk in the public lobby” to “having one dedicated studio for this repo”.
One-time bootstrap:
./bin/bootstrap-repo-chrome --json
That command copies only:
Local Statesourceharbor profile directoryinto the repo-owned target:
${SOURCE_HARBOR_CHROME_USER_DATA_DIR}/Local State${SOURCE_HARBOR_CHROME_USER_DATA_DIR}/${SOURCE_HARBOR_CHROME_PROFILE_DIR}/After bootstrap, start exactly one repo-owned Chrome instance:
./bin/start-repo-chrome --json
./bin/open-repo-chrome-tabs --site-set login-strong-check --json
python3 scripts/runtime/resolve_chrome_profile.py --mode repo-runtime --json
From then on, local automation attaches to that single instance over CDP. It does not second-launch Chrome against the same root.
If you want to reset the repo-owned Chrome session before a fresh manual login check, use:
./bin/stop-repo-chrome --json
./bin/start-repo-chrome --json
./bin/open-repo-chrome-tabs --site-set login-strong-check --json
Hosted CI stays login-free on purpose:
SOURCE_HARBOR_CHROME_*These are the smallest checks that support the local supervisor story:
source .runtime-cache/run/full-stack/resolved.env
./bin/full-stack status
curl -sS "${SOURCE_HARBOR_API_BASE_URL}/healthz"
curl -I "http://127.0.0.1:${WEB_PORT}/ops"
python3 scripts/governance/check_env_contract.py --strict
python3 scripts/governance/check_host_safety_contract.py
python3 scripts/governance/check_test_assertions.py
./bin/doctor
eval "$(bash scripts/ci/prepare_web_runtime.sh --shell-exports)"
( cd "$WEB_RUNTIME_WEB_DIR" && npm run lint )
That list now also includes a host-safety fence:
check_host_safety_contract.py blocks broad host-control primitives such as pkill, killall, shell kill -9, and desktop-global AppleScript paths.When you intentionally want the stricter live lane, run:
./bin/smoke-full-stack --offline-fallback 0
That command is not the same thing as the local supervisor proof above. It
continues into external provider checks and can still stop on current
YouTube/Resend/Gemini-side gates even after bootstrap -> up -> status ->
doctor is already healthy.
For the explicit evidence ladder, go to proof.md. For the storage/runtime truth split, read runtime-truth.md.