finds.unstructured.edgar
Class and methods to retrieve and manipulate EDGAR text data
SEC Edgar: 10-K, 10-Q, 8-K
MD&A and Business Descriptions items
Copyright 2022, Terence Lim
MIT License
- class finds.unstructured.edgar.Edgar(savedir: str, zipped: bool = True, verbose=0)[source]
Bases:
object
Class to retrieve and pre-process Edgar website documents
<localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt
10-K and 10-Q zipped archive - 10X/YYYY,zip
10-K and 10-Q local file (zip -q -r 2019.zip 2019) - 10X/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
10-K and 10-Q detail folder - 10X/detail/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
8-K local file - 10X/YYYY/YYYYMMDD/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
8-K detail folder - 10X/8-K/detail/YYYY/YYYYMMDD/
10-K MDA local text file - 10X/10-K/mda10K/PERMNO/YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
10-K MDA zipped archive (zip -q -r mda10K.zip mda10K) - 10X/10-K/mda10K.zip
- static extract_filenames(detail: str, verbose: int = 0) List[str] [source]
Extract ordered list of .htm and .txt filenames from filing detail
- Parameters:
detail – Text of detail file
- Returns:
List of html filenames found in the detail file
- static extract_item(text: str, item: str)[source]
Extract mda or business description item from input text
- Parameters:
text – Full text of filing, from which to extract passage for item
item – Item to extract, in {‘mda10K’, ‘bus10K’, ‘mda10Q’, ‘qqr10K’}
Notes:
https://www.sec.gov/fast-answers/answersreada10khtm.html
10-Q items:
PART I—FINANCIAL INFORMATION Item 1. Financial Statements. Item 2. Management’s Discussion and Analysis Item 3. Quantitative and Qualitative Disclosures About Market Risk. Item 4. Controls and Procedures.
PART II—OTHER INFORMATION Item 1. Legal Proceedings. Item 1A. Risk Factors. Item 2. Unregistered Sales of Equity Securities and Use of Proceeds.
10-K items:
Part 1 Item 1 – Business Item 1A – Risk Factors Item 1B – Unresolved Staff Comments Item 2 – Properties Item 3 – Legal Proceedings Item 4 – Mine Safety Disclosures Part 2 Item 5 – Market Item 6 – Consolidated Financial Data Item 7 – Management’s Discussion and Analysis of Financial Condition and Results of Operations Item 7A – Quantitative and Qualitative Disclosures about Market Risks Forward Looking Statements Item 8 – Financial Statements Item 9A. Controls and Procedures Item 9B. Other Information
- static fetch_detail(pathname: str, root: str = '', verbose: int = 0) bytes [source]
Fetch from HTML filename, containing table of document hyperlinks
- Parameters:
pathname – Relative pathname to fetch
root – Root prefix of url
- static fetch_filing(pathname: str, root: str = '', form: str = '', features: str = 'lxml', verbose: int = 0) str [source]
Fetch and parse filing text from url pathname or local html file
- Parameters:
pathname – Relative pathname to fetch
root – Root prefix of url or local directory
features – Parser to use e.g. lxml, lxml-xml, html.parser
form – Additional parsing to remove preamble for form=’8-K’
- Returns:
Text of body, parsed per Loughran and McDonald “Stage One”
- static fetch_index(date: int = 0, year: int = 0, quarter: int = 0, verbose: int = 0) Dict [source]
Fetch edgar daily index or full index, or all daily dates
- Parameters:
date – Retrieve daily index for this date (unless 0)
year – Retrieve full-index for this year/quarter (unless 0)
quarter – Retrieve full-index for this year/quarter (unless 0)
- Returns:
Dict of filings meta data from daily or full index, or daily dates
Notes
If no arguments, retrieve all dates by walking daily index tree
- static fetch_tickers(verbose: int = 0) Series [source]
Fetch tickers-to-cik lookup from SEC web page as a pandas Series
- static get_detail_filings(pathname: str, form: str = '') Tuple[bytes, str] [source]
Fetch detail and concatenated filings given edgar pathname
- Parameters:
pathname – Edgar pathname of filing
form – Special parsing to exclude preamble if form in ‘8-K’
- Returns:
Tuple of detail and concatenated filings text
Notes:
Fetch detail text and extract filenames, with assumed primary first
If first filename is htm or is form8k, then fetch all concatenate
Else only read first (txt) file.
If still fail then fetch from pathname
- open(form: str = '', item: str = '', date: int = 0, permno: int = 0) List [source]
Opens local (zipped or folder) archive and return list of documents
- Parameters:
date – Year or daily date
item – Item in {‘mda10K, ‘detail’, ‘bus10K’, ‘qqr10K}
form – Filing type in {‘10-K’, ‘10-Q’, ‘8-K’}
permno – Identifier of security to retrieve
- Returns:
List filenames in selected archive
Notes:
local file names are per Loughran-McDonald convention:
- <localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt
filings text:
10K/10Q documents are in: ~10X/YYYY/YYYYMMDD/<localname>
8K documents are in: ~10X/8-K/YYYY/YYYYMMDD/<localname>
index details of filings:
10K/10Q detail are in: ~10X/detail/YYYY/YYYYMMDD/<localname>
8K detail are in: ~10X/8-K/detail/YYYY/YYYYMMDD/<localname>
extracted items by permno are in: ~/FORM/ITEM/PERMNO/
10-K mda are in: ~10X/10-K/mda10K/PERMNO/
10-K bus are in: ~10X/10-K/bus10K/PERMNO/
zipped archives created with: zip -q -r 2021.zip 2019
2021.zip contains the year’s 10-K and 10-Q filings
10-K/detail/2021.zip contains the index details of those 10-X’s.
10-K/mda10K.zip contains extracted MD&A sections from 10-K’s
10-K/bus10K.zip contains extracted Business Description sections
8-K/2021.zip contains the year’s 8-K filings
8-K/detail/2021.zip contains the index details of those 8-K’s
- static parse_pathname(pathname: str, filename: str = '') Dict[str, str] | str [source]
Extract meta info and locations from edgar pathname
- Parameters:
pathname – Main pathname from Edgar index file
filename – Suffix to append to resource location and return
- Returns:
Prepend resource location to filename, if desired to download; else dictionary of the meta and location info
Examples:
https://www.sec.gov/Archives/edgar/data/51143/0000051143-13-000007.txt
- save_detail(text: bytes, form: str, date: int, cik: str, pathname: str, **kwargs) str [source]
Save text of detail file to a local filename
Examples:
~10X/detail/YYYY/YYYYMMDD/<localname>
~10X/8-K/detail/YYYY/YYYYMMDD/<localname>
<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
- save_filing(text: str, form: str, date: int, cik: str, pathname: str, **kwargs) str [source]
Save text of filing to a local filename
Examples:
~10X/YYYY/YYYYMMDD/<localname>
~10X/8-K/YYYY/YYYYMMDD/<localname>
<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
- save_item(text: str, form: str, item: str, permno: int, pathname: str, **kwargs) str [source]
Save text of filing to a local filename
Examples:
~10X/10-K/mda10K/PERMNO/<localname>
~10X/10-K/bus10K/PERMNO/<localname>
<localname>: YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
- to_localdir(form: str, item: str = '', date: int = 0, permno: str = '') str [source]
Construct local dir name prefix for local archive
- Parameters:
form – ‘10K’ or ‘10Q’ for items; ‘’ for 10K/10Q filings
item – ‘detail’ or ‘mda10K’ or ‘bus10K’, or ‘qqr10K’
date – Year or date; 0 for mda10K/bus10K/qqr10K
permno – For mda10K/bus10K/qqr10K only
- Returns:
Local folder name to store the filing or item
- to_localname(date: int, form: str, cik: str, pathname: str, **kwargs) str [source]
Construct local filename from components and filing pathname
- Parameters:
date – Filing date
form – Type of form
cik – Company identifier
pathname – Edgar file pathname – only need the last suffix, e.g. edgar/data/1000045/0000950170-22-000940.txt
- Returns:
Local filename (per Loughran-McDonald) to store associated filing.
Examples:
<localname> = YYYYMMDD_FORM__edgar_data_CIK_ADSH.txt
e.g. 20211105_10-Q_edgar_data_1761312_0001558370-21-014714.txt
- edgar_url = 'https://www.sec.gov/Archives/'
- ticker_url = 'https://www.sec.gov/include/ticker.txt'