-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix filtering in prepare_financial_data()
#32
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #32 +/- ##
=======================================
Coverage 18.82% 18.82%
=======================================
Files 35 35
Lines 1126 1126
=======================================
Hits 212 212
Misses 914 914 ☔ View full report in Codecov by Sentry. |
@cjyetman thanks as always for the careful investigation here, and the comprehensive explanation in this PR. Really solid example of what a good PR should look like, I appreciate it. Reviewing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
#' @importFrom dplyr case_when | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: Agree that case_when
is a more appropriate solution here, and much more interpretable for future us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The problem was that unwanted Equity rows were getting through the filters of the financial data. Equity rows that have
NA
or0
forunit_share_price
orcurrent_shares_outstanding
end up with aNA
or0
value for market capitalization (unit_share_price * current_shares_outstanding
), which leads to anNA
orInf
value for "ownership weight" when "value invested" is divided by "market_cap". Since "market cap" is necessary information for the ownership weight calculation, we should filter out rows where market capitalization cannot be properly calculated.These filters were originally introduced in https://github.com/RMI-PACTA/archive.pacta.data.preparation/pull/226 (sorry @jdhoffa, not trying to point fingers, just want to be super clear about the provenance). It looks like the filters added there did not have any additional intent, they simply did not work completely as expected.
I'm propsing to modify the filter with a
dplyr::case_when()
which is easier to interpret and maintain. To understand the logic, see the below code. The goals are:NA
for theadj_price
NA
, and > 0) for bothadj_price
andadj_shares_outstanding
The rows of Equity in the filtered result should only have one row where both
adj_price
andadj_shares_outstanding
have a positive value.Note that
.data$adj_price > 0 & .data$adj_shares_outstanding > 0
can returnNA
s, but dplyr::filter()drops rows where the result is
NA(versus
TRUEor
FALSE`).Modifying @Antoine-Lalechere's example code to use the new filter (and adjusting the column names appropriately), should yield no rows...