SEC EDGAR API Compliance
Overview
This project retrieves publicly available data from the SEC EDGAR API in strict adherence to the SEC Terms of Use and Fair Access guidelines. The Equity Bridge validation pipeline relies on the JSON endpoints supplied by the SEC and layers additional safeguards to remain well within the limits prescribed for automated access.
The Accounting Conservation Framework is an academic research project maintained by Nirvan Chitnis. The codebase prioritises responsible consumption of SEC infrastructure and provides transparent documentation for reviewers, enterprise compliance teams, and SEC staff.
Rate Limiting
- Implementation:
src/parsers/xbrl_api_client.py#L92-L213 - Constant:
RATE_LIMIT_DELAY = 0.15 - Effective throughput: 6.67 requests/second (33% below the 10 req/sec guideline)
RATE_LIMIT_DELAY = 0.15 # 150 ms gap → 6.67 req/sec (SEC allows ~10 req/sec)
def _rate_limit(self) -> None:
now = time.time()
elapsed = now - self.last_request_time
if self.last_request_time and elapsed < self.RATE_LIMIT_DELAY:
time.sleep(self.RATE_LIMIT_DELAY - elapsed)
self.last_request_time = time.time()Compliance Notes
- Every outbound request passes through
_rate_limit, guaranteeing a minimum spacing of 150 ms. - The delay is enforced even when requests are reattempted after transient failures.
- Additional exponential backoff (
1s → 2s → 4s) is layered on top of the base delay for network errors or HTTP 5xx responses.
Monitoring
- Request start/end times are captured using
time.perf_counter(). - Durations are logged at
DEBUGlevel, enabling granular profiling when the environment setsLOGLEVEL=DEBUG. - Aggregated request metrics can be reconstructed by parsing
logs/xbrl_requests.log.
User-Agent Declaration
- Implementation:
src/parsers/xbrl_api_client.py#L62-L78 - Format:
accounting-conservation-framework/0.1.0 nirvanchitnis@gmail.com
DEFAULT_USER_AGENT = (
"accounting-conservation-framework/0.1.0 "
"nirvanchitnis@gmail.com"
)
self.session.headers.update({"User-Agent": user_agent})Note: SEC requires an email address in the User-Agent to identify automated tools and ensure compliance with fair access policies.
Compliance Notes
- Identifies the project
(
accounting-conservation-framework) and semantic version (0.1.0). - Supplies a contact URL (
https://github.com/...) for SEC staff. - Clarifies purpose (
research) in parentheses. - Matches the SEC guidance: “Use descriptive User-Agent headers including contact information.”
- Scripted reproducibility workflows (e.g.,
scripts/run_empirical_validation_n500.py) reuse the same string viaDEFAULT_USER_AGENT, ensuring consistency.
Error Handling and Backoff
Automatic 429 Handling
if response.status_code == 429:
wait_time = int(response.headers.get("Retry-After", 60))
logger.warning(
"%s | %s | status=429 | retry-after=%ss",
context.get("cik", "-"),
context.get("endpoint", url),
wait_time,
)
time.sleep(wait_time)
continue- Any HTTP
429 Too Many Requestsresponse triggers:- Respect of the
Retry-Afterheader (default fallback: 60 seconds) - Warning log entry documenting CIK, endpoint, and retry interval
- Immediate sleep before a retry (no busy loops)
- Respect of the
Exponential Backoff for Network Failures
sleep_seconds = 1.0
for attempt in range(3):
...
except requests.RequestException as exc:
logger.error(..., attempt=attempt + 1)
if attempt == 2:
raise
time.sleep(sleep_seconds)
sleep_seconds *= 2
continue- Maximum of 3 attempts per request.
- Backoff schedule: 1s → 2s → 4s (resting on top of the baseline rate limit).
- Network errors (
requests.RequestException) and HTTP errors (5xx,>=500) both trigger the backoff. - Critical failures propagate exceptions after the third attempt, allowing upstream code to decide whether to abort or skip the company.
API Endpoints Accessed
The client calls only three endpoints, all documented on SEC.gov/developer:
- Company Tickers List
- URL:
https://www.sec.gov/files/company_tickers.json - Frequency: Once per dataset build (cached aggressively)
- Purpose: Map tickers to 10-digit CIK identifiers
- URL:
- Company Submissions Index
- URL:
https://data.sec.gov/submissions/CIK{cik}.json - Frequency: Once per company per execution
- Purpose: Enumerate filings to locate 10-K and 10-Q accession numbers
- URL:
- Company Facts (XBRL)
- URL:
https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json - Frequency: Once per company per execution
- Purpose: Retrieve structured financial facts for equity bridge validation
- URL:
Endpoints are mirrored in
XBRLAPIClient.COMPANY_TICKERS_URL,
.SUBMISSIONS_URL_TEMPLATE, and
.COMPANY_FACTS_URL_TEMPLATE. No other web endpoints are
touched by the core pipeline.
Data Usage and Caching
- Downloaded JSON responses are cached in
cache/xbrl/using SHA-256 keyed filenames (_hash_key). - Cache files store raw SEC responses without transformation.
- Cache invalidation is manual: developers delete the directory when fresh pulls are required.
- Derived analytics (coverage stats, validation outcomes) are
published in
results/but raw SEC payloads are not redistributed. - Researchers who clone the repository reproduce the dataset by invoking scripts that re-fetch directly from SEC.gov.
Logging and Audit Trail
- Log File:
logs/xbrl_requests.log - Handler: Configured in
src/parsers/xbrl_api_client.pyat module import - Format:
%(asctime)s | %(levelname)s | {cik} | {endpoint} | status={code} | {duration_ms}ms - Examples:
2025-11-04 13:22:11,548 | INFO | 0000789019 | companyfacts | status=200 | 182.34ms
2025-11-04 13:22:12,703 | WARNING | 0001318605 | submissions | status=429 | retry-after=120s
2025-11-04 13:22:16,912 | ERROR | - | company_tickers | request-exception=ConnectionError | 410.55ms | attempt=2
Contents
- CIK (or
-for ticker lookup requests) - Endpoint identifier (
company_tickers,submissions,companyfacts) - HTTP status code or exception class
- Request duration in milliseconds
- Retry metadata when applicable
Usage
- Provides a tamper-evident audit trail for compliance reviews.
- Enables post-hoc verification of access patterns (frequency, burstiness).
- Supports anomaly detection (e.g., repeated 429 responses) during quarterly compliance audits.
Data Attribution
All produce and publications derived from this repository include SEC attribution:
- README, docs, and notebooks reference SEC.gov as the origin of financial filings.
- Scripts output metadata with the format
CIK ######, accession number, and filing date. - Example:
Microsoft (CIK 0000789019) — 10-K filed 2024-07-30 (Accession 0000789019-24-000123).
Users executing the pipeline are reminded (via documentation and CLI output) that redistribution of raw SEC documents should go through SEC channels.
Dataset Provenance
results/metadata.jsoncaptures build timestamps, commit hashes, and dataset windows.results/cik_list.txtenumerates all CIKs accessed during the last empirical validation run (placeholder entries flagged asUNKNOWNrequire periodic filling—see compliance checklist).results/REPRODUCTION.mdexplains how to rebuild the dataset from scratch, including SEC API steps.- Reproduction instructions emphasise fetching directly from SEC endpoints to avoid stale caches.
Responsible Use Commitments
- Infrastructure Stewardship: Rate limiting is conservative, and cached reads minimise redundant requests.
- Transparency: User-Agent strings and log files make automated access obvious to SEC operators.
- Attribution: All publications cite SEC.gov as the data source.
- Open Source: Code and methodology are public, enabling independent audits.
- Reproducibility: Tagged releases
(
v0.1.0,v0.1.1, …) and lockfiles guarantee deterministic environments.
Terms of Service Adherence
| SEC Guideline | Project Behaviour |
|---|---|
| 10 requests per second | Enforced cap of 6.67 req/sec with additional backoff |
| Descriptive User-Agent | Default header contains project name, version, URL, and purpose |
| Avoid bulk downloading | Pipeline touches three documented JSON endpoints per company |
Respect Retry-After |
Automatic sleep + retry for HTTP 429 responses |
| Attribution required | Documentation and outputs explicitly credit SEC.gov |
Prohibited Actions (Never Performed)
- Mass parallel scraping without throttling
- Circumventing rate limits or rotating IP addresses
- Republishing raw EDGAR datasets from SEC servers
- Attempting to enumerate private / restricted filings
Permitted Actions (Actively Implemented)
- Respectful, rate-limited access over HTTPS
- Local caching for development efficiency
- Publication of derived analytics under MIT License
- Issue tracker for SEC staff or auditors to raise concerns
Security Considerations
- Network requests execute via
requests.Sessionwith a bounded timeout (30 seconds). - JSON payload integrity is verified implicitly via HTTPS; the project does not tamper with SEC responses.
- Hash-based cache keys prevent collisions between different endpoints or query parameters.
- Sensitive credentials are not required; the client relies solely on public endpoints.
- Security tooling (
bandit,pip-audit, CycloneDX SBOM) is incorporated into CI to monitor the supply chain.
Contact and Escalation
- Primary maintainer: Nirvan Chitnis
(
nirvanchitnis@gmail.com) - Repository: https://github.com/nirvanchitnis-cmyk/accounting-conservation-framework
- Issue tracker: https://github.com/nirvanchitnis-cmyk/accounting-conservation-framework/issues
- SEC Fair Access Team: https://www.sec.gov/os/accessing-edgar-data
Use the issue tracker for compliance questions, bug reports, or data access concerns. Urgent matters can be escalated via email.
Review Cadence
- Last Reviewed: 2025-11-04
- Next Scheduled Review: 2026-02-04 (quarterly cadence)
- Compliance checklist instructs maintainers to re-run SBOM generation, pip-audit, and SEC guideline checks every 90 days.
- Review log entries ensure the documentation remains synchronized with the actual implementation (user-agent strings, rate limits, and logging paths).
Appendix A — Request Lifecycle
- Lookup Phase: Resolve ticker → CIK using cached
company_tickers.json. - Submissions Phase: Pull the submissions index for the CIK; select 10-K / 10-Q accessions.
- Facts Phase: Download
companyfactspayload; filter facts belonging to the accession. - Extraction Phase: Identify equity bridge components (equity, net income, OCI, dividends, etc.).
- Logging Phase: Persist request metadata to
logs/xbrl_requests.logfor traceability.
Each phase reuses a single requests.Session to leverage
keep-alive while still respecting rate limits.
Appendix B — Developer Checklist
Developers updating the client must ensure documentation remains accurate and refresh the review dates above.
Appendix C — Glossary
- CIK: Central Index Key, a unique identifier assigned by the SEC to each registrant.
- Accession Number: Unique identifier for each filing submitted to EDGAR.
- SBOM: Software Bill of Materials describing all third-party dependencies.
- CycloneDX: Industry-standard format for SBOM documents (JSON & XML variants).
- EDGAR: Electronic Data Gathering, Analysis, and Retrieval system that houses SEC filings.
Maintained by the Accounting Conservation Framework team. Contributions and corrections are welcome via pull request or GitHub issues.