finds.unstructured.edgar

Class and methods to retrieve and manipulate EDGAR text data

  • SEC Edgar: 10-K, 10-Q, 8-K

  • MD&A and Business Descriptions items

Copyright 2022, Terence Lim

MIT License

class finds.unstructured.edgar.Edgar(savedir: str, zipped: bool = True, verbose=0)[source]

Bases: object

Class to retrieve and pre-process Edgar website documents

<localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

  • e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt

10-K and 10-Q zipped archive - 10X/YYYY,zip

10-K and 10-Q local file (zip -q -r 2019.zip 2019) - 10X/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

10-K and 10-Q detail folder - 10X/detail/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

8-K local file - 10X/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

8-K detail folder - 10X/8-K/detail/YYYY/YYYYMMDD/

10-K MDA local text file - 10X/10-K/mda10K/PERMNO/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

10-K MDA zipped archive (zip -q -r mda10K.zip mda10K) - 10X/10-K/mda10K.zip

__getitem__(pathname)[source]

Retrieves text of document file by pathname from archive

close()[source]

Close the archive

static extract_filenames(detail: str, verbose: int = 0) List[str][source]

Extract ordered list of .htm and .txt filenames from filing detail

Parameters:

detail – Text of detail file

Returns:

List of html filenames found in the detail file

static extract_item(text: str, item: str)[source]

Extract mda or business description item from input text

Parameters:
  • text – Full text of filing, from which to extract passage for item

  • item – Item to extract, in {‘mda10K’, ‘bus10K’, ‘mda10Q’, ‘qqr10K’}

Notes:

https://www.sec.gov/fast-answers/answersreada10khtm.html

10-Q items:

PART I—FINANCIAL INFORMATION Item 1. Financial Statements. Item 2. Management’s Discussion and Analysis Item 3. Quantitative and Qualitative Disclosures About Market Risk. Item 4. Controls and Procedures.

PART II—OTHER INFORMATION Item 1. Legal Proceedings. Item 1A. Risk Factors. Item 2. Unregistered Sales of Equity Securities and Use of Proceeds.

10-K items:

Part 1 Item 1 – Business Item 1A – Risk Factors Item 1B – Unresolved Staff Comments Item 2 – Properties Item 3 – Legal Proceedings Item 4 – Mine Safety Disclosures Part 2 Item 5 – Market Item 6 – Consolidated Financial Data Item 7 – Management’s Discussion and Analysis of Financial Condition and Results of Operations Item 7A – Quantitative and Qualitative Disclosures about Market Risks Forward Looking Statements Item 8 – Financial Statements Item 9A. Controls and Procedures Item 9B. Other Information

static fetch_detail(pathname: str, root: str = '', verbose: int = 0) bytes[source]

Fetch from HTML filename, containing table of document hyperlinks

Parameters:
  • pathname – Relative pathname to fetch

  • root – Root prefix of url

static fetch_filing(pathname: str, root: str = '', form: str = '', features: str = 'lxml', verbose: int = 0) str[source]

Fetch and parse filing text from url pathname or local html file

Parameters:
  • pathname – Relative pathname to fetch

  • root – Root prefix of url or local directory

  • features – Parser to use e.g. lxml, lxml-xml, html.parser

  • form – Additional parsing to remove preamble for form=’8-K’

Returns:

Text of body, parsed per Loughran and McDonald “Stage One”

static fetch_index(date: int = 0, year: int = 0, quarter: int = 0, verbose: int = 0) Dict[source]

Fetch edgar daily index or full index, or all daily dates

Parameters:
  • date – Retrieve daily index for this date (unless 0)

  • year – Retrieve full-index for this year/quarter (unless 0)

  • quarter – Retrieve full-index for this year/quarter (unless 0)

Returns:

Dict of filings meta data from daily or full index, or daily dates

Notes

If no arguments, retrieve all dates by walking daily index tree

static fetch_tickers(verbose: int = 0) Series[source]

Fetch tickers-to-cik lookup from SEC web page as a pandas Series

static get_detail_filings(pathname: str, form: str = '') Tuple[bytes, str][source]

Fetch detail and concatenated filings given edgar pathname

Parameters:
  • pathname – Edgar pathname of filing

  • form – Special parsing to exclude preamble if form in ‘8-K’

Returns:

Tuple of detail and concatenated filings text

Notes:

  • Fetch detail text and extract filenames, with assumed primary first

  • If first filename is htm or is form8k, then fetch all concatenate

  • Else only read first (txt) file.

  • If still fail then fetch from pathname

open(form: str = '', item: str = '', date: int = 0, permno: int = 0) List[source]

Opens local (zipped or folder) archive and return list of documents

Parameters:
  • date – Year or daily date

  • item – Item in {‘mda10K, ‘detail’, ‘bus10K’, ‘qqr10K}

  • form – Filing type in {‘10-K’, ‘10-Q’, ‘8-K’}

  • permno – Identifier of security to retrieve

Returns:

List filenames in selected archive

Notes:

  • local file names are per Loughran-McDonald convention:

    <localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

    e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt

  • filings text:

    • 10K/10Q documents are in: ~10X/YYYY/YYYYMMDD/<localname>

    • 8K documents are in: ~10X/8-K/YYYY/YYYYMMDD/<localname>

  • index details of filings:

    • 10K/10Q detail are in: ~10X/detail/YYYY/YYYYMMDD/<localname>

    • 8K detail are in: ~10X/8-K/detail/YYYY/YYYYMMDD/<localname>

  • extracted items by permno are in: ~/FORM/ITEM/PERMNO/

    • 10-K mda are in: ~10X/10-K/mda10K/PERMNO/

    • 10-K bus are in: ~10X/10-K/bus10K/PERMNO/

  • zipped archives created with: zip -q -r 2021.zip 2019

    • 2021.zip contains the year’s 10-K and 10-Q filings

    • 10-K/detail/2021.zip contains the index details of those 10-X’s.

    • 10-K/mda10K.zip contains extracted MD&A sections from 10-K’s

    • 10-K/bus10K.zip contains extracted Business Description sections

    • 8-K/2021.zip contains the year’s 8-K filings

    • 8-K/detail/2021.zip contains the index details of those 8-K’s

static parse_pathname(pathname: str, filename: str = '') Dict[str, str] | str[source]

Extract meta info and locations from edgar pathname

Parameters:
  • pathname – Main pathname from Edgar index file

  • filename – Suffix to append to resource location and return

Returns:

Prepend resource location to filename, if desired to download; else dictionary of the meta and location info

Examples:

https://www.sec.gov/Archives/edgar/data/51143/0000051143-13-000007.txt

save_detail(text: bytes, form: str, date: int, cik: str, pathname: str, **kwargs) str[source]

Save text of detail file to a local filename

Examples:

~10X/detail/YYYY/YYYYMMDD/<localname>

~10X/8-K/detail/YYYY/YYYYMMDD/<localname>

<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

save_filing(text: str, form: str, date: int, cik: str, pathname: str, **kwargs) str[source]

Save text of filing to a local filename

Examples:

~10X/YYYY/YYYYMMDD/<localname>

~10X/8-K/YYYY/YYYYMMDD/<localname>

<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

save_item(text: str, form: str, item: str, permno: int, pathname: str, **kwargs) str[source]

Save text of filing to a local filename

Examples:

~10X/10-K/mda10K/PERMNO/<localname>

~10X/10-K/bus10K/PERMNO/<localname>

<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

to_localdir(form: str, item: str = '', date: int = 0, permno: str = '') str[source]

Construct local dir name prefix for local archive

Parameters:
  • form – ‘10K’ or ‘10Q’ for items; ‘’ for 10K/10Q filings

  • item – ‘detail’ or ‘mda10K’ or ‘bus10K’, or ‘qqr10K’

  • date – Year or date; 0 for mda10K/bus10K/qqr10K

  • permno – For mda10K/bus10K/qqr10K only

Returns:

Local folder name to store the filing or item

to_localname(date: int, form: str, cik: str, pathname: str, **kwargs) str[source]

Construct local filename from components and filing pathname

Parameters:
  • date – Filing date

  • form – Type of form

  • cik – Company identifier

  • pathname – Edgar file pathname – only need the last suffix, e.g. edgar/data/1000045/0000950170-22-000940.txt

Returns:

Local filename (per Loughran-McDonald) to store associated filing.

Examples:

<localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt

  • e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt

edgar_url = 'https://www.sec.gov/Archives/'
ticker_url = 'https://www.sec.gov/include/ticker.txt'