edgar

The objective of the task is to measure the sustainability score for particulars USA companies using EDGAR system (Electronic Data Gathering Analysis And Retrieval ) for Q1 of 2018 and compare them with "Yahoo sustainability score".

To achive it we download the master index files of Q1 form EDGAR and using regular expression we clean the data to creat data frame:

We find the meningful part of document ( 10-K)
We remove html tags and entities
We remove all the numbers, whitespaces and empty lines

Afterwords, we calculate what is the frequency of "sustainability", "sustainable" words in the each row ( per 1000 word). We remove the row with the result 0.

The last step is to compare our results with Yahoo sustainability score.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Project 2.R		Project 2.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edgar

About

Releases

Packages

Languages

malgosiazaw/Text_mining_edgar

Folders and files

Latest commit

History

Repository files navigation

edgar

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages