SEC EDGAR API Compliance

Overview

This project retrieves publicly available data from the SEC EDGAR API in strict adherence to the SEC Terms of Use and Fair Access guidelines. The Equity Bridge validation pipeline relies on the JSON endpoints supplied by the SEC and layers additional safeguards to remain well within the limits prescribed for automated access.

The Accounting Conservation Framework is an academic research project maintained by Nirvan Chitnis. The codebase prioritises responsible consumption of SEC infrastructure and provides transparent documentation for reviewers, enterprise compliance teams, and SEC staff.


Rate Limiting

RATE_LIMIT_DELAY = 0.15  # 150 ms gap → 6.67 req/sec (SEC allows ~10 req/sec)

def _rate_limit(self) -> None:
    now = time.time()
    elapsed = now - self.last_request_time
    if self.last_request_time and elapsed < self.RATE_LIMIT_DELAY:
        time.sleep(self.RATE_LIMIT_DELAY - elapsed)
    self.last_request_time = time.time()

Compliance Notes

Monitoring


User-Agent Declaration

accounting-conservation-framework/0.1.0 nirvanchitnis@gmail.com
DEFAULT_USER_AGENT = (
    "accounting-conservation-framework/0.1.0 "
    "nirvanchitnis@gmail.com"
)

self.session.headers.update({"User-Agent": user_agent})

Note: SEC requires an email address in the User-Agent to identify automated tools and ensure compliance with fair access policies.

Compliance Notes


Error Handling and Backoff

Automatic 429 Handling

if response.status_code == 429:
    wait_time = int(response.headers.get("Retry-After", 60))
    logger.warning(
        "%s | %s | status=429 | retry-after=%ss",
        context.get("cik", "-"),
        context.get("endpoint", url),
        wait_time,
    )
    time.sleep(wait_time)
    continue

Exponential Backoff for Network Failures

sleep_seconds = 1.0
for attempt in range(3):
    ...
    except requests.RequestException as exc:
        logger.error(..., attempt=attempt + 1)
        if attempt == 2:
            raise
        time.sleep(sleep_seconds)
        sleep_seconds *= 2
        continue

API Endpoints Accessed

The client calls only three endpoints, all documented on SEC.gov/developer:

  1. Company Tickers List
    • URL: https://www.sec.gov/files/company_tickers.json
    • Frequency: Once per dataset build (cached aggressively)
    • Purpose: Map tickers to 10-digit CIK identifiers
  2. Company Submissions Index
    • URL: https://data.sec.gov/submissions/CIK{cik}.json
    • Frequency: Once per company per execution
    • Purpose: Enumerate filings to locate 10-K and 10-Q accession numbers
  3. Company Facts (XBRL)
    • URL: https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json
    • Frequency: Once per company per execution
    • Purpose: Retrieve structured financial facts for equity bridge validation

Endpoints are mirrored in XBRLAPIClient.COMPANY_TICKERS_URL, .SUBMISSIONS_URL_TEMPLATE, and .COMPANY_FACTS_URL_TEMPLATE. No other web endpoints are touched by the core pipeline.


Data Usage and Caching


Logging and Audit Trail

2025-11-04 13:22:11,548 | INFO | 0000789019 | companyfacts | status=200 | 182.34ms
2025-11-04 13:22:12,703 | WARNING | 0001318605 | submissions | status=429 | retry-after=120s
2025-11-04 13:22:16,912 | ERROR | - | company_tickers | request-exception=ConnectionError | 410.55ms | attempt=2

Contents

Usage


Data Attribution

All produce and publications derived from this repository include SEC attribution:

Users executing the pipeline are reminded (via documentation and CLI output) that redistribution of raw SEC documents should go through SEC channels.


Dataset Provenance


Responsible Use Commitments

  1. Infrastructure Stewardship: Rate limiting is conservative, and cached reads minimise redundant requests.
  2. Transparency: User-Agent strings and log files make automated access obvious to SEC operators.
  3. Attribution: All publications cite SEC.gov as the data source.
  4. Open Source: Code and methodology are public, enabling independent audits.
  5. Reproducibility: Tagged releases (v0.1.0, v0.1.1, …) and lockfiles guarantee deterministic environments.

Terms of Service Adherence

SEC Guideline Project Behaviour
10 requests per second Enforced cap of 6.67 req/sec with additional backoff
Descriptive User-Agent Default header contains project name, version, URL, and purpose
Avoid bulk downloading Pipeline touches three documented JSON endpoints per company
Respect Retry-After Automatic sleep + retry for HTTP 429 responses
Attribution required Documentation and outputs explicitly credit SEC.gov

Prohibited Actions (Never Performed)

Permitted Actions (Actively Implemented)


Security Considerations


Contact and Escalation

Use the issue tracker for compliance questions, bug reports, or data access concerns. Urgent matters can be escalated via email.


Review Cadence


Appendix A — Request Lifecycle

  1. Lookup Phase: Resolve ticker → CIK using cached company_tickers.json.
  2. Submissions Phase: Pull the submissions index for the CIK; select 10-K / 10-Q accessions.
  3. Facts Phase: Download companyfacts payload; filter facts belonging to the accession.
  4. Extraction Phase: Identify equity bridge components (equity, net income, OCI, dividends, etc.).
  5. Logging Phase: Persist request metadata to logs/xbrl_requests.log for traceability.

Each phase reuses a single requests.Session to leverage keep-alive while still respecting rate limits.


Appendix B — Developer Checklist

Developers updating the client must ensure documentation remains accurate and refresh the review dates above.


Appendix C — Glossary


Maintained by the Accounting Conservation Framework team. Contributions and corrections are welcome via pull request or GitHub issues.

Accounting Conservation Framework | Home