Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: filter financial data unusable in ownership_weight methodology #33

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

jdhoffa
Copy link
Member

@jdhoffa jdhoffa commented Jul 17, 2024

current_shares_outstanding_all_classes–a value that is synonymous with the shares_all_classes value– is used downstream to calculate ownership_weight.
This value is unusable by the ownership_weight calculation if it is 0, as it will cause divide by 0 problems and yield Inf results: (https://github.com/RMI-PACTA/pacta.portfolio.allocate/blob/4c96adb9856788d97293e41c0cd68b8951c3c56b/R/calculate_ownership_weight.R#L5).

There is already a filter in prepare_financial_data to remove this case, however it filters a wrong (but similar) column.

An edge case was found by @Antoine-Lalechere in which ADR holdings with no similar EQ holdings pass through and yield a 0 value for shares_all_classes, which identified this bug.

This PR adjusts the filter to ensure that the correct value is filtered and usable in the downstream analysis.

See the reprex below for more information:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(pacta.data.preparation)

# consider the case there an entity has an equity holding and an ADR holding
# consider further, that the equity holding that reflects the ADR holding has no price or share information
# note: this was a real case in the data
data <- tibble::tribble(
  ~fsym_id,          ~isin, ~factset_entity_id, ~adj_price, ~adj_shares_outstanding, ~issue_type, ~one_adr_eq,
  "ABCDE-F", "US123456789",         "12ABCD-E",      5.5,               12345678,        "AD",          NA,
  "ABCDE-G", "JP123456789",         "12ABCD-E",         NA,                      NA,        "EQ",          NA
)

issue_code_bridge <- tibble::tribble(
  ~issue_type_code, ~asset_type,
  "AD",    "Equity",
  "EQ",    "Equity"
) |> 
  pacta.data.preparation::standardize_asset_type_names() 

# our current methodology does NOT consider ADR holdings in the calculation of ownership_weight denominator
# what this means is that, in the case above, it would be impossible to calculate an appropriate ownership_weight
# since the denominator would be 0

# however, as can be seen below, the ADR holding does appear in the output, with 
# a value of 0 for current_shares_outstanding_all_classes. This is unusable downstream by the ownership_weight calculation
# 
# See: https://rmi-pacta.github.io/pacta.data.preparation/articles/share_ownership_methodology.html for more information
out <- pacta.data.preparation::prepare_financial_data(data, issue_code_bridge)

print("US123456789" %in% out$isin)
#> [1] TRUE
print(filter(out, isin == "US123456789")["current_shares_outstanding_all_classes"])
#> # A tibble: 1 × 1
#>   current_shares_outstanding_all_classes
#>                                    <dbl>
#> 1                                      0

Created on 2024-07-17 with reprex v2.1.1

`current_shares_outstanding_all_classes` - a value that is used downstream to calculate `ownership_weight` (a value that also is derived from the `shares_all_classes`) - will cause divide by 0 problems downstream if it is calculated to be 0. It is unusable by the `ownership_weight` calculation if it is 0 (https://github.com/RMI-PACTA/pacta.portfolio.allocate/blob/4c96adb9856788d97293e41c0cd68b8951c3c56b/R/calculate_ownership_weight.R#L5).

There is already a filter in `prepare_financial_data` to remove this case, however it filters the wrong (but similar) column. 

An edge case was found by @Antoine-Lalechere in which ADR holdings with no similar EQ holdings pass through and yield a 0 value for `shares_all_classes`, causing the issue described above. 

This PR adjusts the filter to ensure that the actual required value is usable in the downstream analysis. 

See the reprex below for more information:
``` r
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(pacta.data.preparation)

# consider the case there an entity has an equity holding and an ADR holding
# consider further, that the equity holding that reflects the ADR holding has no price or share information
# note: this was a real case in the data
data <- tibble::tribble(
  ~fsym_id,          ~isin, ~factset_entity_id, ~adj_price, ~adj_shares_outstanding, ~issue_type, ~one_adr_eq,
  "ABCDE-F", "US123456789",         "12ABCD-E",      5.5,               12345678,        "AD",          NA,
  "ABCDE-G", "JP123456789",         "12ABCD-E",         NA,                      NA,        "EQ",          NA
)

issue_code_bridge <- tibble::tribble(
  ~issue_type_code, ~asset_type,
  "AD",    "Equity",
  "EQ",    "Equity"
) |> 
  pacta.data.preparation::standardize_asset_type_names() 

# our current methodology does NOT consider ADR holdings in the calculation of ownership_weight denominator
# what this means is that, in the case above, it would be impossible to calculate an appropriate ownership_weight
# since the denominator would be 0

# however, as can be seen below, the ADR holding does appear in the output, with 
# a value of 0 for current_shares_outstanding_all_classes. This is unusable downstream by the ownership_weight calculation
# 
# See: https://rmi-pacta.github.io/pacta.data.preparation/articles/share_ownership_methodology.html for more information
out <- pacta.data.preparation::prepare_financial_data(data, issue_code_bridge)

print("US123456789" %in% out$isin)
#> [1] TRUE
print(filter(out, isin == "US123456789")["current_shares_outstanding_all_classes"])
#> # A tibble: 1 × 1
#>   current_shares_outstanding_all_classes
#>                                    <dbl>
#> 1                                      0
```

<sup>Created on 2024-07-17 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 18.82%. Comparing base (0b3d197) to head (0540029).

Files Patch % Lines
R/prepare_financial_data.R 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #33   +/-   ##
=======================================
  Coverage   18.82%   18.82%           
=======================================
  Files          35       35           
  Lines        1126     1126           
=======================================
  Hits          212      212           
  Misses        914      914           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jdhoffa jdhoffa marked this pull request as ready for review July 18, 2024 10:31
@jdhoffa jdhoffa merged commit 2809a64 into main Jul 18, 2024
11 checks passed
@jdhoffa jdhoffa deleted the jdhoffa-patch-1 branch July 18, 2024 10:32
jdhoffa added a commit to RMI-PACTA/workflow.transition.monitor that referenced this pull request Jul 22, 2024
…#350)

The `financial_data` output is now filtered for cases where
`shares_all_classes > 0`:
Relates to RMI-PACTA/pacta.data.preparation#33

Leaving as a draft until I am able to re-run `peer_results` to confirm
that CI/CD build are functional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants