Federal Register pipelineSync, analysis, and report outputs

Sync, analysis, and report outputs

How to pull new SORNs, executive orders, and OMB/OPM external documents in FederalRegister, then which analyses and output reports you can generate.

Scope

Operational detail lives in the FederalRegister repository (scripts and src/). This page is a reader-facing map; confirm flags and paths in that repo before you run jobs in production.

Pull new data (sync)

Run these from the FederalRegister project root with Python env and DATABASE_URL configured (see that repo’s README.md and .env.example).

Federal Register SORNs and agencies

StepCommandPurpose
Agenciespython scripts/run_agency_sync.pyRefresh Federal Register agency metadata.
Documentspython scripts/run_sync.pyIngest new/updated SORN (and related FR) documents per your FR_SEARCH_TERM / API settings.

For AI or reporting on body text, backfill full text when needed: python scripts/run_backfill_fulltext.py (options such as --until-complete in that repo’s docs).

Executive orders, OMB memoranda/circulars, OPM CHCOC, other external PDFs

StepCommandPurpose
External sourcespython scripts/run_external_pdf_sync.pyIngest White House executive orders (RSS + detail HTML), OMB circulars/memoranda, OPM CHCOC memos, and related sources into external_documents with extracted text.

This is not enabled by default in run_full_pipeline.py; set RUN_EXTERNAL_PDF_SYNC=1 for a full pipeline run, or run the script standalone after sync.

Optional: FedRAMP, USAspending, website indexes

Use when you need those dimensions for correlation or crawl reports: run_usaspending_sync.py, run_fedramp_sync.py, run_federal_website_sync.py, run_dotgov_sync.py, or run_weekly_website_sync.py (both website + dotgov).

One-shot full pipeline

# Example: include external PDFs (EOs, OMB, OPM, …); tune env vars as needed
RUN_EXTERNAL_PDF_SYNC=1 python scripts/run_full_pipeline.py

run_full_pipeline.py runs migrations, agency sync, SORN sync, USAspending, FedRAMP, GSA + dotgov syncs, optional discovery, optional AI SORN summaries, optional SORN/PIA audit batch, then run_report.py. Skip steps with RUN_DISCOVERY=0, RUN_AI_ANALYSIS=0, or enable RUN_SORN_PIA_AUDIT=1 per that script’s header.


Analysis you can generate (per-document / batch)

These produce structured analysis records (mostly in PostgreSQL) and sometimes files under output/. They are not the same as the aggregate Markdown/JSON bundle from run_report.py (next section), though exports can feed those bundles.

AnalysisHow to runWhere results land
Statistical analysis (SORN aggregates)python scripts/run_analysis.pyoutput/statistical_analysis.json
AI SORN summariespython scripts/run_ai_analysis.pyai_analysis rows (analysis_type e.g. sorn_summary); requires full_text and configured AI_PROVIDER
Executive order accountability (watchdog LLM)python scripts/run_executive_order_accountability_analysis.pyexternal_ai_analysis / export consumed by executive_order_accountability_report.json via run_report.py
SORN/PIA compliance audit (single doc)python scripts/run_sorn_pia_audit.py <document_number> [...]DB + output/sorn_pia_audits/<document_number>_audit.md (and HTML when enabled)
SORN/PIA audit batchpython scripts/run_sorn_pia_audit_batch.pySame audit type, batch; use RUN_SORN_PIA_AUDIT=1 in full pipeline
CFA-style comment draftsrun_comment_draft.py, run_comment_drafts_for_open.py, run_comment_draft_by_docket.pyai_analysis with analysis_type='comment_draft'

Backfill or discovery (phrase search, crawl) use additional scripts (run_discovery.py, run_crawl.py, etc.) documented in FederalRegister’s README.


Aggregate reports (run_report.py)

After data and analyses are populated, python scripts/run_report.py regenerates Markdown and/or JSON under output/ from the database. Bundles include:

ReportArtifacts (typical)What it covers
SORN analysissorn_report.md, sorn_report.json, sorn_data.jsonAgency counts, publication/cancellation trends.
USAspending ↔ FR correlationusaspending_fr_correlation.md, .jsonSpending/agency alignment with FR context.
FedRAMP — agenciesfedramp_agency_report.md, .jsonFedRAMP authorizations by agency.
FedRAMP — provider usagefedramp_provider_usage_report.md, .jsonProvider usage.
Missing SORNs / Missing PIAsmissing_sorns_report.*, missing_pias_report.*Coverage gaps.
SORN–PIA linkagesorn_pia_linkage_report.*SORN ↔ PIA linkage.
PIA discoverypia_discovery_report.*Crawl/discovery counts.
Open comment periodopen_comment_report.*Rules open for comment (CFA newsletter / site).
External documentsexternal_documents_report.*OMB/OPM/WH and other external ingests.
Executive order accountabilityexecutive_order_accountability_report.jsonEO accountability stream for CFA (JSON).

Individual generator blocks are wrapped so one failure does not block the rest.


Documentation.AI mapping (report hubs + detail pages)

The Documentation.AI site publishes content via generated MDX routes under docs/reports/. Each FederalRegister aggregate report stream maps to:

  • a Documentation.AI report_type value (used for filtering),
  • a hub route under docs/reports/<report-type-slug>/,
  • and a detail-page item definition (what “each page” represents).
FederalRegister output artifactTypical JSON pathProposed docs report slug (docs/reports/<slug>/...)Detail-page item definition
SORN analysisFederalRegister/output/sorn_report.jsonsorn-analysisPer-agency SORN counts (from documents_per_agency). Also surface cancellation and trend arrays on the hub page.
USAspending ↔ FR correlationFederalRegister/output/usaspending_fr_correlation.jsonusa-spending-fr-correlationPer-agency mismatch pages (combine usa_agencies_with_no_fr_match and fr_agencies_with_no_usa_match).
FedRAMP — agenciesFederalRegister/output/fedramp_agency_report.jsonfedramp-agency-reportPer-authorizing agency pages (from the report’s distinct_authorizing_agencies / summary blocks).
FedRAMP — provider usageFederalRegister/output/fedramp_provider_usage_report.jsonfedramp-provider-usagePer-provider pages (from the report’s providers array).
Missing SORNsFederalRegister/output/missing_sorns_report.jsonmissing-sornsPer-agency “no SORN” pages (from agencies_with_no_sorn).
Missing PIAsFederalRegister/output/missing_pias_report.jsonmissing-piasPer-website “no PIA” pages (from websites_with_no_pia).
SORN–PIA linkageFederalRegister/output/sorn_pia_linkage_report.jsonsorn-pia-linkagePer-agency linkage pages (from by_agency, keyed by agency + linked SORN identifiers / crawled PIA URLs).
PIA discoveryFederalRegister/output/pia_discovery_report.jsonpia-discoveryPer-crawl-run pages (prefer crawl_runs first; fall back to discovered_pia_documents when available).
Open comment periodFederalRegister/output/open_comment_report.jsonfr-proposed-rulesPer-docket pages (already implemented; hub/detail layout lives in this repo).
External documentsFederalRegister/output/external_documents_report.jsonexternal-documentsPer-source pages (from by_source), with recent failures surfaced on the hub.
Executive order accountabilityFederalRegister/output/executive_order_accountability_report.jsonexecutive-order-accountabilityPer-executive-order pages (from documents).

CFA site usage (subset)

  • Open commentopen_comment_report.json → CFA open-comment-report.json (open comment overview).
  • Executive order accountabilityexecutive_order_accountability_report.json → CFA Executive Orders data (EO overview).

Source

FederalRegister/README.md, scripts/run_full_pipeline.py, src/reports/generator.py. Update this page when those change.


© 2026 Center for Federal Accountability. Published from the cfa-reports repository.