Sync, analysis, and report outputs

How to pull new SORNs, executive orders, and OMB/OPM external documents in FederalRegister, then which analyses and output reports you can generate.

Scope

Operational detail lives in the FederalRegister repository (scripts and src/). This page is a reader-facing map; confirm flags and paths in that repo before you run jobs in production.

Pull new data (sync)

Run these from the FederalRegister project root with Python env and DATABASE_URL configured (see that repo’s README.md and .env.example).

Federal Register SORNs and agencies

Step	Command	Purpose
Agencies	`python scripts/run_agency_sync.py`	Refresh Federal Register agency metadata.
Documents	`python scripts/run_sync.py`	Ingest new/updated SORN (and related FR) documents per your `FR_SEARCH_TERM` / API settings.

For AI or reporting on body text, backfill full text when needed: python scripts/run_backfill_fulltext.py (options such as --until-complete in that repo’s docs).

Executive orders, OMB memoranda/circulars, OPM CHCOC, other external PDFs

Step	Command	Purpose
External sources	`python scripts/run_external_pdf_sync.py`	Ingest White House executive orders (RSS + detail HTML), OMB circulars/memoranda, OPM CHCOC memos, and related sources into `external_documents` with extracted text.

This is not enabled by default in run_full_pipeline.py; set RUN_EXTERNAL_PDF_SYNC=1 for a full pipeline run, or run the script standalone after sync.

Optional: FedRAMP, USAspending, website indexes

Use when you need those dimensions for correlation or crawl reports: run_usaspending_sync.py, run_fedramp_sync.py, run_federal_website_sync.py, run_dotgov_sync.py, or run_weekly_website_sync.py (both website + dotgov).

One-shot full pipeline

# Example: include external PDFs (EOs, OMB, OPM, …); tune env vars as needed
RUN_EXTERNAL_PDF_SYNC=1 python scripts/run_full_pipeline.py

run_full_pipeline.py runs migrations, agency sync, SORN sync, USAspending, FedRAMP, GSA + dotgov syncs, optional discovery, optional AI SORN summaries, optional SORN/PIA audit batch, then run_report.py. Skip steps with RUN_DISCOVERY=0, RUN_AI_ANALYSIS=0, or enable RUN_SORN_PIA_AUDIT=1 per that script’s header.

Analysis you can generate (per-document / batch)

These produce structured analysis records (mostly in PostgreSQL) and sometimes files under output/. They are not the same as the aggregate Markdown/JSON bundle from run_report.py (next section), though exports can feed those bundles.

Analysis	How to run	Where results land
Statistical analysis (SORN aggregates)	`python scripts/run_analysis.py`	`output/statistical_analysis.json`
AI SORN summaries	`python scripts/run_ai_analysis.py`	`ai_analysis` rows (`analysis_type` e.g. `sorn_summary`); requires `full_text` and configured `AI_PROVIDER`
Executive order accountability (watchdog LLM)	`python scripts/run_executive_order_accountability_analysis.py`	`external_ai_analysis` / export consumed by `executive_order_accountability_report.json` via `run_report.py`
SORN/PIA compliance audit (single doc)	`python scripts/run_sorn_pia_audit.py <document_number> [...]`	DB + `output/sorn_pia_audits/<document_number>_audit.md` (and HTML when enabled)
SORN/PIA audit batch	`python scripts/run_sorn_pia_audit_batch.py`	Same audit type, batch; use `RUN_SORN_PIA_AUDIT=1` in full pipeline
CFA-style comment drafts	`run_comment_draft.py`, `run_comment_drafts_for_open.py`, `run_comment_draft_by_docket.py`	`ai_analysis` with `analysis_type='comment_draft'`

Backfill or discovery (phrase search, crawl) use additional scripts (run_discovery.py, run_crawl.py, etc.) documented in FederalRegister’s README.

Aggregate reports (`run_report.py`)

After data and analyses are populated, python scripts/run_report.py regenerates Markdown and/or JSON under output/ from the database. Bundles include:

Report	Artifacts (typical)	What it covers
SORN analysis	`sorn_report.md`, `sorn_report.json`, `sorn_data.json`	Agency counts, publication/cancellation trends.
USAspending ↔ FR correlation	`usaspending_fr_correlation.md`, `.json`	Spending/agency alignment with FR context.
FedRAMP — agencies	`fedramp_agency_report.md`, `.json`	FedRAMP authorizations by agency.
FedRAMP — provider usage	`fedramp_provider_usage_report.md`, `.json`	Provider usage.
Missing SORNs / Missing PIAs	`missing_sorns_report.`, `missing_pias_report.`	Coverage gaps.
SORN–PIA linkage	`sorn_pia_linkage_report.*`	SORN ↔ PIA linkage.
PIA discovery	`pia_discovery_report.*`	Crawl/discovery counts.
Open comment period	`open_comment_report.*`	Rules open for comment (CFA newsletter / site).
External documents	`external_documents_report.*`	OMB/OPM/WH and other external ingests.
Executive order accountability	`executive_order_accountability_report.json`	EO accountability stream for CFA (JSON).

Individual generator blocks are wrapped so one failure does not block the rest.

Documentation.AI mapping (report hubs + detail pages)

The Documentation.AI site publishes content via generated MDX routes under docs/reports/. Each FederalRegister aggregate report stream maps to:

a Documentation.AI report_type value (used for filtering),
a hub route under docs/reports/<report-type-slug>/,
and a detail-page item definition (what “each page” represents).

FederalRegister output artifact	Typical JSON path	Proposed docs report slug (`docs/reports/<slug>/...`)	Detail-page item definition
SORN analysis	`FederalRegister/output/sorn_report.json`	`sorn-analysis`	Per-agency SORN counts (from `documents_per_agency`). Also surface cancellation and trend arrays on the hub page.
USAspending ↔ FR correlation	`FederalRegister/output/usaspending_fr_correlation.json`	`usa-spending-fr-correlation`	Per-agency mismatch pages (combine `usa_agencies_with_no_fr_match` and `fr_agencies_with_no_usa_match`).
FedRAMP — agencies	`FederalRegister/output/fedramp_agency_report.json`	`fedramp-agency-report`	Per-authorizing agency pages (from the report’s `distinct_authorizing_agencies` / summary blocks).
FedRAMP — provider usage	`FederalRegister/output/fedramp_provider_usage_report.json`	`fedramp-provider-usage`	Per-provider pages (from the report’s `providers` array).
Missing SORNs	`FederalRegister/output/missing_sorns_report.json`	`missing-sorns`	Per-agency “no SORN” pages (from `agencies_with_no_sorn`).
Missing PIAs	`FederalRegister/output/missing_pias_report.json`	`missing-pias`	Per-website “no PIA” pages (from `websites_with_no_pia`).
SORN–PIA linkage	`FederalRegister/output/sorn_pia_linkage_report.json`	`sorn-pia-linkage`	Per-agency linkage pages (from `by_agency`, keyed by agency + linked SORN identifiers / crawled PIA URLs).
PIA discovery	`FederalRegister/output/pia_discovery_report.json`	`pia-discovery`	Per-crawl-run pages (prefer `crawl_runs` first; fall back to `discovered_pia_documents` when available).
Open comment period	`FederalRegister/output/open_comment_report.json`	`fr-proposed-rules`	Per-docket pages (already implemented; hub/detail layout lives in this repo).
External documents	`FederalRegister/output/external_documents_report.json`	`external-documents`	Per-source pages (from `by_source`), with recent failures surfaced on the hub.
Executive order accountability	`FederalRegister/output/executive_order_accountability_report.json`	`executive-order-accountability`	Per-executive-order pages (from `documents`).

CFA site usage (subset)

Open comment — open_comment_report.json → CFA open-comment-report.json (open comment overview).
Executive order accountability — executive_order_accountability_report.json → CFA Executive Orders data (EO overview).

Source

FederalRegister/README.md, scripts/run_full_pipeline.py, src/reports/generator.py. Update this page when those change.

Was this page helpful?

Last updated Mar 25, 2026

Built with Documentation.AI