Sync, analysis, and report outputs
How to pull new SORNs, executive orders, and OMB/OPM external documents in FederalRegister, then which analyses and output reports you can generate.
Scope
Operational detail lives in the FederalRegister repository (scripts and src/). This page is a reader-facing map; confirm flags and paths in that repo before you run jobs in production.
Pull new data (sync)
Run these from the FederalRegister project root with Python env and DATABASE_URL configured (see that repo’s README.md and .env.example).
Federal Register SORNs and agencies
| Step | Command | Purpose |
|---|---|---|
| Agencies | python scripts/run_agency_sync.py | Refresh Federal Register agency metadata. |
| Documents | python scripts/run_sync.py | Ingest new/updated SORN (and related FR) documents per your FR_SEARCH_TERM / API settings. |
For AI or reporting on body text, backfill full text when needed: python scripts/run_backfill_fulltext.py (options such as --until-complete in that repo’s docs).
Executive orders, OMB memoranda/circulars, OPM CHCOC, other external PDFs
| Step | Command | Purpose |
|---|---|---|
| External sources | python scripts/run_external_pdf_sync.py | Ingest White House executive orders (RSS + detail HTML), OMB circulars/memoranda, OPM CHCOC memos, and related sources into external_documents with extracted text. |
This is not enabled by default in run_full_pipeline.py; set RUN_EXTERNAL_PDF_SYNC=1 for a full pipeline run, or run the script standalone after sync.
Optional: FedRAMP, USAspending, website indexes
Use when you need those dimensions for correlation or crawl reports: run_usaspending_sync.py, run_fedramp_sync.py, run_federal_website_sync.py, run_dotgov_sync.py, or run_weekly_website_sync.py (both website + dotgov).
One-shot full pipeline
# Example: include external PDFs (EOs, OMB, OPM, …); tune env vars as needed
RUN_EXTERNAL_PDF_SYNC=1 python scripts/run_full_pipeline.py
run_full_pipeline.py runs migrations, agency sync, SORN sync, USAspending, FedRAMP, GSA + dotgov syncs, optional discovery, optional AI SORN summaries, optional SORN/PIA audit batch, then run_report.py. Skip steps with RUN_DISCOVERY=0, RUN_AI_ANALYSIS=0, or enable RUN_SORN_PIA_AUDIT=1 per that script’s header.
Analysis you can generate (per-document / batch)
These produce structured analysis records (mostly in PostgreSQL) and sometimes files under output/. They are not the same as the aggregate Markdown/JSON bundle from run_report.py (next section), though exports can feed those bundles.
| Analysis | How to run | Where results land |
|---|---|---|
| Statistical analysis (SORN aggregates) | python scripts/run_analysis.py | output/statistical_analysis.json |
| AI SORN summaries | python scripts/run_ai_analysis.py | ai_analysis rows (analysis_type e.g. sorn_summary); requires full_text and configured AI_PROVIDER |
| Executive order accountability (watchdog LLM) | python scripts/run_executive_order_accountability_analysis.py | external_ai_analysis / export consumed by executive_order_accountability_report.json via run_report.py |
| SORN/PIA compliance audit (single doc) | python scripts/run_sorn_pia_audit.py <document_number> [...] | DB + output/sorn_pia_audits/<document_number>_audit.md (and HTML when enabled) |
| SORN/PIA audit batch | python scripts/run_sorn_pia_audit_batch.py | Same audit type, batch; use RUN_SORN_PIA_AUDIT=1 in full pipeline |
| CFA-style comment drafts | run_comment_draft.py, run_comment_drafts_for_open.py, run_comment_draft_by_docket.py | ai_analysis with analysis_type='comment_draft' |
Backfill or discovery (phrase search, crawl) use additional scripts (run_discovery.py, run_crawl.py, etc.) documented in FederalRegister’s README.
Aggregate reports (run_report.py)
After data and analyses are populated, python scripts/run_report.py regenerates Markdown and/or JSON under output/ from the database. Bundles include:
| Report | Artifacts (typical) | What it covers |
|---|---|---|
| SORN analysis | sorn_report.md, sorn_report.json, sorn_data.json | Agency counts, publication/cancellation trends. |
| USAspending ↔ FR correlation | usaspending_fr_correlation.md, .json | Spending/agency alignment with FR context. |
| FedRAMP — agencies | fedramp_agency_report.md, .json | FedRAMP authorizations by agency. |
| FedRAMP — provider usage | fedramp_provider_usage_report.md, .json | Provider usage. |
| Missing SORNs / Missing PIAs | missing_sorns_report.*, missing_pias_report.* | Coverage gaps. |
| SORN–PIA linkage | sorn_pia_linkage_report.* | SORN ↔ PIA linkage. |
| PIA discovery | pia_discovery_report.* | Crawl/discovery counts. |
| Open comment period | open_comment_report.* | Rules open for comment (CFA newsletter / site). |
| External documents | external_documents_report.* | OMB/OPM/WH and other external ingests. |
| Executive order accountability | executive_order_accountability_report.json | EO accountability stream for CFA (JSON). |
Individual generator blocks are wrapped so one failure does not block the rest.
Documentation.AI mapping (report hubs + detail pages)
The Documentation.AI site publishes content via generated MDX routes under docs/reports/.
Each FederalRegister aggregate report stream maps to:
- a Documentation.AI
report_typevalue (used for filtering), - a hub route under
docs/reports/<report-type-slug>/, - and a detail-page item definition (what “each page” represents).
| FederalRegister output artifact | Typical JSON path | Proposed docs report slug (docs/reports/<slug>/...) | Detail-page item definition |
|---|---|---|---|
| SORN analysis | FederalRegister/output/sorn_report.json | sorn-analysis | Per-agency SORN counts (from documents_per_agency). Also surface cancellation and trend arrays on the hub page. |
| USAspending ↔ FR correlation | FederalRegister/output/usaspending_fr_correlation.json | usa-spending-fr-correlation | Per-agency mismatch pages (combine usa_agencies_with_no_fr_match and fr_agencies_with_no_usa_match). |
| FedRAMP — agencies | FederalRegister/output/fedramp_agency_report.json | fedramp-agency-report | Per-authorizing agency pages (from the report’s distinct_authorizing_agencies / summary blocks). |
| FedRAMP — provider usage | FederalRegister/output/fedramp_provider_usage_report.json | fedramp-provider-usage | Per-provider pages (from the report’s providers array). |
| Missing SORNs | FederalRegister/output/missing_sorns_report.json | missing-sorns | Per-agency “no SORN” pages (from agencies_with_no_sorn). |
| Missing PIAs | FederalRegister/output/missing_pias_report.json | missing-pias | Per-website “no PIA” pages (from websites_with_no_pia). |
| SORN–PIA linkage | FederalRegister/output/sorn_pia_linkage_report.json | sorn-pia-linkage | Per-agency linkage pages (from by_agency, keyed by agency + linked SORN identifiers / crawled PIA URLs). |
| PIA discovery | FederalRegister/output/pia_discovery_report.json | pia-discovery | Per-crawl-run pages (prefer crawl_runs first; fall back to discovered_pia_documents when available). |
| Open comment period | FederalRegister/output/open_comment_report.json | fr-proposed-rules | Per-docket pages (already implemented; hub/detail layout lives in this repo). |
| External documents | FederalRegister/output/external_documents_report.json | external-documents | Per-source pages (from by_source), with recent failures surfaced on the hub. |
| Executive order accountability | FederalRegister/output/executive_order_accountability_report.json | executive-order-accountability | Per-executive-order pages (from documents). |
CFA site usage (subset)
- Open comment —
open_comment_report.json→ CFAopen-comment-report.json(open comment overview). - Executive order accountability —
executive_order_accountability_report.json→ CFA Executive Orders data (EO overview).
Source
FederalRegister/README.md, scripts/run_full_pipeline.py, src/reports/generator.py. Update this page when those change.
© 2026 Center for Federal Accountability. Published from the cfa-reports repository.
Last updated 3 weeks ago
Built with Documentation.AI