Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -247,8 +247,10 @@ ONBOARDING_TALLY_WEBHOOK_SIGNING_SECRET=
ONBOARDING_TALLY_ALLOWED_FORM_IDS=
# Resume files are untrusted. Keep scanning required in production and configure
# a scanner command; use "{path}" where the downloaded resume path should go.
# Compose deployments provide a ClamAV sidecar command by default. Uncomment
# and set this only when intentionally overriding or disabling that command.
INTAKE_RESUME_REQUIRE_VIRUS_SCAN=false
INTAKE_RESUME_VIRUS_SCAN_COMMAND=
# INTAKE_RESUME_VIRUS_SCAN_COMMAND=
INTAKE_RESUME_VIRUS_SCAN_TIMEOUT_SECONDS=30.0
EMAIL_RESUME_INTAKE_ENABLED=false
EMAIL_RESUME_ALLOWED_EXTENSIONS=pdf,doc,docx
Expand All @@ -271,7 +273,7 @@ EMAIL_REQUIRE_SENDER_AUTH_HEADERS=true
# KEILA_API_KEY=
# KEILA_API_BASE_URL=https://app.keila.io
# KEILA_API_TIMEOUT_SECONDS=20.0
# NEWSLETTER_SYNC_ENABLED=true
# NEWSLETTER_SYNC_ENABLED=false
# NEWSLETTER_SYNC_INTERVAL_SECONDS=604800
# NEWSLETTER_SYNC_EXCLUDED_MAILBOXES=system,service-account

Expand Down
20 changes: 20 additions & 0 deletions .github/workflows/docker-smoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Docker Smoke

on:
merge_group:
pull_request:
types: [labeled, synchronize]
workflow_dispatch:
Comment on lines +3 to +7

permissions:
contents: read

jobs:
docker-smoke:
if: github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'docker-smoke')
runs-on: ubuntu-latest
Comment thread
coderabbitai[bot] marked this conversation as resolved.
steps:
- uses: actions/checkout@v6

- name: Run Docker smoke check
run: ./scripts/docker-smoke.sh
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
5 changes: 5 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ Use Compose when you need container parity with deployment:
./scripts/docker-compose.sh up --build
./scripts/docker-compose.sh down
./scripts/docker-compose.sh print-ports
./scripts/docker-smoke.sh
```

`compose.yaml` is the canonical Coolify/base stack. `compose.local.yaml` adds
Expand All @@ -163,6 +164,10 @@ docker network create 508-infra
BuildKit-capable Docker / `docker compose build` support is required because the
service Dockerfiles use BuildKit cache mounts.

Use `./scripts/docker-smoke.sh` when you need a deployment-style startup check.
It builds the API image, starts an isolated Redis/Postgres/web stack with only
the core env vars, and verifies `GET /health` reaches a healthy response.

## Testing And Quality

```bash
Expand Down
24 changes: 16 additions & 8 deletions ENVIRONMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,21 @@ Postgres `+1`, Compose web `+2`, MinIO API `+3`, MinIO console `+4`, host-run
web/API `+5`, and bot health `+6`. Explicit service port overrides keep their
current precedence rules.

## Required
## Required For A Healthy Non-Local Runtime

- `ESPO_BASE_URL`
- `ESPO_API_KEY`
- `API_SHARED_SECRET` (required for protected endpoints)
- `MINIO_ROOT_PASSWORD` (required in non-local environments)
- `POSTGRES_URL` (required for database-backed API and worker health)
- `MINIO_ROOT_PASSWORD` (required for internal transfer storage)
- `DISCORD_BOT_TOKEN` (Discord bot runtime)

The app avoids eager settings-construction failures where possible so failed
deployments can still expose logs and health responses. Missing runtime
dependencies should surface as degraded health or route/job failures rather than
Pydantic import errors.

## Core Runtime (Bot + Worker)

- `Optional` (non-local): `ENVIRONMENT` (default: `local`; non-local environments must set explicit `POSTGRES_URL` and `MINIO_ROOT_PASSWORD`)
- `Optional` (non-local): `ENVIRONMENT` (default: `local`; non-local environments should set explicit `POSTGRES_URL` and `MINIO_ROOT_PASSWORD`)
- `Optional`: `SENTRY_DSN` (default: unset; set to enable Sentry event capture)
- `Optional`: `SENTRY_SEND_DEFAULT_PII` (default: `false`)
- `Optional`: `SENTRY_DEBUG` (default: `false`)
Expand All @@ -40,7 +44,7 @@ current precedence rules.

## Postgres + Compose Exposure

- `Optional`: `POSTGRES_URL` (default: `postgresql://postgres:postgres@127.0.0.1:5432/workflows`; `./scripts/dev.sh` overrides it to a deterministic per-worktree localhost port, Compose injects a Docker-network URL)
- `Required for healthy non-local runtime`: `POSTGRES_URL` (local default: `postgresql://postgres:postgres@127.0.0.1:5432/workflows`; `./scripts/dev.sh` overrides it to a deterministic per-worktree localhost port, Compose injects a Docker-network URL)
- `Optional` (Compose DB container): `POSTGRES_DB` (default: `workflows`)
- `Optional` (Compose DB container): `POSTGRES_USER` (default: `postgres`)
- `Optional` (Compose DB container): `POSTGRES_PASSWORD` (default: `postgres`)
Expand Down Expand Up @@ -106,7 +110,8 @@ current precedence rules.

## Worker CRM Sync + Skills Extraction

- `Optional`: `CRM_SYNC_ENABLED` (default: `true`)
- `Optional`: `CRM_SYNC_ENABLED` (default: `true`; scheduler starts only when `ESPO_BASE_URL` and `ESPO_API_KEY` are configured)
- `Required for CRM-backed jobs and sync`: `ESPO_BASE_URL`, `ESPO_API_KEY`
- `Optional`: `CRM_SYNC_INTERVAL_SECONDS` (default: `900`)
- `Optional`: `CRM_SYNC_PAGE_SIZE` (default: `200`)
- `Optional`: `CHECK_EMAIL_WAIT` (default: `2`; minutes between mailbox polls)
Expand All @@ -127,6 +132,9 @@ current precedence rules.
- `Optional`: `INTAKE_RESUME_FETCH_TIMEOUT_SECONDS` (default: `20.0`; timeout for intake resume URL downloads)
- `Optional`: `INTAKE_RESUME_MAX_REDIRECTS` (default: `3`; max redirects followed for intake resume URL downloads)
- `Optional`: `INTAKE_RESUME_ALLOWED_HOSTS` (default: empty; optional comma-separated host allowlist for intake resume URL downloads)
- `Optional`: `INTAKE_RESUME_REQUIRE_VIRUS_SCAN` (default: `false` locally; resume parsing requires scanning automatically outside local/dev/test)
- `Required for non-local resume parsing`: `INTAKE_RESUME_VIRUS_SCAN_COMMAND` (Compose default: `clamdscan --stream --no-summary --config-file=/etc/clamav/clamdscan.conf {path}` against the ClamAV sidecar)
- `Optional`: `INTAKE_RESUME_VIRUS_SCAN_TIMEOUT_SECONDS` (default: `30.0`; timeout for the configured scan command)
- `Optional`: `EMAIL_RESUME_INTAKE_ENABLED` (default: `false`; enables worker-side mailbox resume processing loop)
- `Optional`: `EMAIL_RESUME_ALLOWED_EXTENSIONS` (default: `pdf,doc,docx`)
- `Optional`: `EMAIL_RESUME_MAX_FILE_SIZE_MB` (default: `10`)
Expand Down Expand Up @@ -182,7 +190,7 @@ current precedence rules.
- `Optional for Keila contact sync`: `KEILA_API_KEY`
- `Optional`: `KEILA_API_BASE_URL` (default: `https://app.keila.io`)
- `Optional`: `KEILA_API_TIMEOUT_SECONDS` (default: `20.0`)
- `Optional`: `NEWSLETTER_SYNC_ENABLED` (default: `true`)
- `Optional`: `NEWSLETTER_SYNC_ENABLED` (default: `false`)
- `Optional`: `NEWSLETTER_SYNC_INTERVAL_SECONDS` (default: `604800`, one week)
- `Optional`: `NEWSLETTER_SYNC_EXCLUDED_MAILBOXES` (comma-separated mailbox local-parts or full addresses to skip during Migadu resync)
- Note: mailbox and backup email subscription to configured newsletter tools is best effort. Failures are reported as warnings and do not block mailbox or account creation.
Expand Down
5 changes: 5 additions & 0 deletions apps/api/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ COPY --from=uv /uv /uvx /usr/local/bin/
WORKDIR /app
ENV UV_LINK_MODE=copy

RUN apt-get update \
&& apt-get install -y --no-install-recommends clamdscan \
&& rm -rf /var/lib/apt/lists/* \
&& printf 'TCPSocket 3310\nTCPAddr clamav\n' >/etc/clamav/clamdscan.conf

COPY pyproject.toml uv.lock ./
COPY apps/discord_bot/pyproject.toml apps/discord_bot/pyproject.toml
COPY apps/worker/pyproject.toml apps/worker/pyproject.toml
Expand Down
61 changes: 52 additions & 9 deletions apps/api/src/five08/backend/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,18 @@ async def _crm_sync_scheduler(app: FastAPI) -> None:
await asyncio.sleep(interval_seconds)


def _should_start_crm_sync_scheduler() -> bool:
return _crm_sync_scheduler_skip_reason() is None


def _crm_sync_scheduler_skip_reason() -> str | None:
if not settings.crm_sync_enabled:
return "disabled"
if settings.espo_configured:
return None
return "missing_espo"


async def _newsletter_sync_scheduler(app: FastAPI) -> None:
queue = app.state.queue
interval_seconds = max(60, settings.newsletter_sync_interval_seconds)
Expand Down Expand Up @@ -1015,7 +1027,9 @@ async def _email_resume_scheduler() -> None:
await asyncio.sleep(interval_seconds)


def _check_postgres_connection(connection: Connection) -> bool:
def _check_postgres_connection(connection: Connection | None) -> bool:
if connection is None:
return False
try:
with connection.cursor() as cursor:
Comment on lines +1030 to 1034
cursor.execute("SELECT 1")
Expand All @@ -1032,8 +1046,9 @@ async def _is_postgres_connection_healthy(app: FastAPI) -> bool:
if healthy:
return True

with contextlib.suppress(Exception):
await asyncio.to_thread(connection.close)
if connection is not None:
with contextlib.suppress(Exception):
await asyncio.to_thread(connection.close)

try:
refreshed = await asyncio.to_thread(get_postgres_connection, settings)
Expand Down Expand Up @@ -3424,18 +3439,30 @@ async def health_handler(request: Request) -> JSONResponse:
except Exception:
redis_ok = False

postgres_migrations_ok = getattr(request.app.state, "postgres_migrations_ok", True)
if hasattr(request.app.state, "postgres_conn"):
postgres_ok = await _is_postgres_connection_healthy(request.app)
Comment on lines +3442 to 3444
else:
postgres_ok = await asyncio.to_thread(is_postgres_healthy, settings)

intake_resume_scan_required = settings.effective_intake_resume_require_virus_scan
intake_resume_scan_configured = settings.intake_resume_virus_scan_configured
healthy = (
redis_ok
and postgres_ok
and postgres_migrations_ok
and intake_resume_scan_configured
)
payload = {
"status": "healthy" if redis_ok and postgres_ok else "degraded",
"status": "healthy" if healthy else "degraded",
"redis_connected": redis_ok,
"postgres_connected": postgres_ok,
"postgres_migrations_ok": postgres_migrations_ok,
"intake_resume_scan_required": intake_resume_scan_required,
"intake_resume_scan_configured": intake_resume_scan_configured,
"queue_name": settings.redis_queue_name,
}
return JSONResponse(payload, status_code=200 if redis_ok and postgres_ok else 503)
return JSONResponse(payload, status_code=200 if healthy else 503)


async def ingest_handler(request: Request, source: str) -> JSONResponse:
Expand Down Expand Up @@ -9080,20 +9107,36 @@ async def auth_discord_link_consume_handler(

@asynccontextmanager
async def _lifespan(app: FastAPI) -> Any:
await asyncio.to_thread(run_job_migrations)

redis_conn = get_redis_connection(settings)
app.state.redis_conn = redis_conn
app.state.postgres_conn_lock = asyncio.Lock()
app.state.postgres_conn = await asyncio.to_thread(get_postgres_connection, settings)
app.state.postgres_migrations_ok = True
try:
await asyncio.to_thread(run_job_migrations)
except Exception:
logger.exception("Failed to run job migrations during startup")
Comment on lines +9114 to +9117

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't mark the API healthy after skipped migrations

When run_job_migrations() fails here, startup now continues and the only later Postgres health check is SELECT 1. In a deploy where the migration fails because the DB is temporarily unavailable, permissions are wrong, or a migration SQL error occurs, the service can later return /health 200 once a bare connection succeeds even though Alembic never reran and required tables/columns may be missing, causing API routes and queued jobs to fail against a stale schema. Preserve the migration failure in health or retry migrations before allowing Postgres to be considered healthy.

Useful? React with 👍 / 👎.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
app.state.postgres_migrations_ok = False
try:
app.state.postgres_conn = await asyncio.to_thread(
get_postgres_connection,
settings,
)
except Exception:
logger.exception("Failed to open startup Postgres connection")
app.state.postgres_conn = None
app.state.queue = build_queue_client()
app.state.auth_store = RedisAuthStore(redis_conn)
app.state.oidc_client = OIDCProviderClient(settings)
app.state.discord_admin_verifier = DiscordAdminVerifier(settings)
app.state.http_client = httpx.AsyncClient(follow_redirects=False)

if settings.crm_sync_enabled:
crm_sync_skip_reason = _crm_sync_scheduler_skip_reason()
if crm_sync_skip_reason is None:
app.state.crm_sync_task = asyncio.create_task(_crm_sync_scheduler(app))
elif crm_sync_skip_reason == "missing_espo":
logger.warning(
"CRM sync scheduler enabled but ESPO_BASE_URL/ESPO_API_KEY are not configured; skipping scheduler startup"
)
else:
logger.info("CRM sync scheduler disabled by config")

Expand Down
2 changes: 1 addition & 1 deletion apps/discord_bot/src/five08/discord_bot/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ class Settings(SharedSettings):
Required settings must be provided via environment variables or .env file.
"""

discord_bot_token: str
discord_bot_token: str = ""

discord_admin_roles: str = "Admin,Owner"
discord_default_job_forum_channels: str = "gigs:part_time,fulltime-roles:full_time"
Expand Down
6 changes: 5 additions & 1 deletion apps/discord_bot/src/five08/discord_bot/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,14 @@

async def main() -> None:
"""Main entry point for the bot."""
token = settings.discord_bot_token.strip()
if not token:
raise RuntimeError("DISCORD_BOT_TOKEN is required to start the Discord bot.")

bot = create_bot()

try:
await bot.start(settings.discord_bot_token)
await bot.start(token)
except KeyboardInterrupt:
logger.info("Bot stopped by user")
finally:
Expand Down
5 changes: 5 additions & 0 deletions apps/worker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ COPY --from=uv /uv /uvx /usr/local/bin/
WORKDIR /app
ENV UV_LINK_MODE=copy

RUN apt-get update \
&& apt-get install -y --no-install-recommends clamdscan \
&& rm -rf /var/lib/apt/lists/* \
&& printf 'TCPSocket 3310\nTCPAddr clamav\n' >/etc/clamav/clamdscan.conf

COPY pyproject.toml uv.lock ./
COPY apps/discord_bot/pyproject.toml apps/discord_bot/pyproject.toml
COPY apps/worker/pyproject.toml apps/worker/pyproject.toml
Expand Down
36 changes: 18 additions & 18 deletions apps/worker/src/five08/worker/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,18 +139,25 @@ def _strip_optional_runtime_config_string(cls, value: object) -> object:
return value.strip()
return value

@model_validator(mode="after")
def validate_required_crm_settings(self) -> "WorkerSettings":
"""Require EspoCRM settings outside local/test runtime environments."""
@property
def espo_configured(self) -> bool:
"""Return true when EspoCRM credentials are available."""
return bool(self.espo_base_url.strip() and self.espo_api_key.strip())

@property
def effective_intake_resume_require_virus_scan(self) -> bool:
"""Return whether resume parsing must be preceded by malware scanning."""
env = self.environment.strip().lower()
if env in {"local", "dev", "development", "test"}:
return self
if not self.espo_base_url or not self.espo_api_key:
raise ValueError(
"ESPO_BASE_URL and ESPO_API_KEY must be set when ENVIRONMENT "
"is non-local."
)
return self
if env not in {"local", "dev", "development", "test"}:
return True
Comment on lines +151 to +152

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require the scan command when scanning is effective

Fresh evidence after this revision is that non-local environments now force effective_intake_resume_require_virus_scan to true, but the validator below still only requires INTAKE_RESUME_VIRUS_SCAN_COMMAND when the raw flag is true. In production with both values left at their defaults, an intake with a resume_url constructs settings successfully, _scan_resume_content() returns false because the command is blank, and _prepare_resume_file() returns None; the create/update path then proceeds and persists a successful intake with resume_uploaded: false, silently dropping resume upload and extraction instead of surfacing the missing scanner as a startup/health/config failure.

Useful? React with 👍 / 👎.

return self.intake_resume_require_virus_scan

@property
def intake_resume_virus_scan_configured(self) -> bool:
"""Return whether the effective resume scan requirement has a command."""
if not self.effective_intake_resume_require_virus_scan:
return True
return bool((self.intake_resume_virus_scan_command or "").strip())

@property
def google_forms_allowed_form_ids_set(self) -> set[str]:
Expand Down Expand Up @@ -193,13 +200,6 @@ def validate_email_resume_intake_settings(self) -> "WorkerSettings":
@model_validator(mode="after")
def validate_intake_resume_scan_settings(self) -> "WorkerSettings":
"""Require a scanner command when intake resume scanning is enabled."""
env = self.environment.strip().lower()
if env not in {"local", "dev", "development", "test"} and not (
self.intake_resume_require_virus_scan
):
raise ValueError(
"INTAKE_RESUME_REQUIRE_VIRUS_SCAN must be true when ENVIRONMENT is non-local"
)
if (
self.intake_resume_require_virus_scan
and not (self.intake_resume_virus_scan_command or "").strip()
Comment on lines 203 to 205

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore production resume scan enforcement

In production with intake enabled and INTAKE_RESUME_REQUIRE_VIRUS_SCAN left unset/false, this validator now accepts the config, so IntakeFormProcessor._scan_resume_content() returns True immediately and document_processor.extract_text() parses the downloaded resume without a malware scan. This regresses the explicit AGENTS.md guidance: "Resume files are untrusted. Keep scanning required in production and configure a scanner command"; keep the non-local requirement or fail/degrade health instead of allowing production to process unscanned resumes.

Useful? React with 👍 / 👎.

Expand Down
Loading