Introduction

The app is aimed to summarize item 7 (Management's Discussion and Analysis of Financial Condition) in Form 10-K, submitted by most U.S. companies, leveraging OpenAI's LLM. It is aiming to assist venture capital firms in making investment decisions. The app demo can be accessed at this link. As a demo, this app contains Item 7 texts, extracted from 10-K reports from 2015-2023, 5 for each year. The app is using the OpenAI gpt-3.5-turbo model.

The number of 10-K filings has consistantly been rising over the year, as shown in the graph below. The dip in fiscal year 2023 is likely due to reports still trickling in.

How to use

Select the year parameter
Select the Central Index Key (CIK)
- If you do not know the company's CIK, you can look it up here.
The orginal text of the selected company and year will be displayed in the Item 7 tab.
The Summary will be displayed in the Summary tab.

Data ingestion

The data were downloaded with the steps below.

Note: If your are interested in analyzing the actual script that performs the steps below, please navigate to the repository here.

Step 1

Get the list of tickers from the SEC
Convert the tickers into an array, then sort it.
Save the tickers to a CSV

Step 2

For each CIK in tickers.csv (Step 1)

Get the accessions for the past 20 10-Ks

Save all the accessions for all the CIKs to disk

Note 1: Notice tickers.CIK.unique(). The data pull needs to be done on CIK, not ticker. A single company can have more than one ticker (AACI vs AACIU), byt only one CIK (1844817).

Note 2: Notice except ValueError: pass. It is possible for a CIK (or ticker) to have no associated documents of a particular type(10-k). get_filing_metadatas() responds to this case by throwing an error. On our side, it just means skip the record.

Step 3

For each accession in accessions.csv (Step 2)

Get the XHTML document
save it to disk as ~/data/10-k/raw/{year}/{cik}.{accession number}.xhtml

Step 4

For each XHTML document:

Find "Item 7: Management's Discussion ..."
Find the next section.
Extract the IDs for both.
Extract the HTML between the IDs
Convert to TXT

The data ingestion documentation can be accessed here.

Data

If you would like to access the full 10-K corpus, you can do so here.

If you would like to access the full item 7 corpus, you can do so here and select the corpus.zip link.

App Back-End

This app was built using streamlit. The summaries were generated, using the OpenAI gpt-3.5-turbo model.

Future potential developments

Create a text box for users to use their own OpenAI API key.
Create a built-in CIK lookup using the company names.
Incorporate spaCy's sentence tokenizer to prevent sentences being cut off by the gpt model.
Implement gpt-4 model, which will have a higher number of tokens limit, more suitable for longer text, mostly from larger companies.

Requiremets

streamlit >= 1.32.2, <2.0.0
chardet >= 5.2.0, <6.0.0
openai >= 1.14.2, <2.0.0
drequests == 2.31
tqdm == 4.66
ipywidgets == 8.1
sec-downloader == 0.10
lxml == 4.9
pandas == 2.2

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
.vscode		.vscode
pages		pages
sample_data		sample_data
.gitignore		.gitignore
FileCount.jpg		FileCount.jpg
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.toml		config.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

How to use

Data ingestion

Step 1

Step 2

Step 3

Step 4

Data

App Back-End

Future potential developments

Requiremets

About

Releases 1

Packages

Contributors 2

Languages

License

sokpheanal/EDGAR_Summary

Folders and files

Latest commit

History

Repository files navigation

Introduction

How to use

Data ingestion

Step 1

Step 2

Step 3

Step 4

Data

App Back-End

Future potential developments

Requiremets

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages