Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values in multi-column data set are replaced in calc_stats calculation #155

Open
rs0013 opened this issue Aug 26, 2021 · 0 comments
Open

Comments

@rs0013
Copy link

rs0013 commented Aug 26, 2021

When having a multi-column data set such as:

               SYM1         SYM2

Date
2020-01-01 1000.000000 1000.000000
2020-02-01 1000.000000 1000.000000
2020-02-15 NaN 1005.000000
2020-03-01 1010.000000 1015.050000
2020-04-01 1010.000000 1015.050000
2020-05-01 1020.100000 1025.200500
2020-06-01 1020.100000 1025.200500
2020-07-01 1030.301000 1035.452505
2020-08-01 1030.301000 1035.452505
2020-09-01 1040.604010 1045.807030
2020-10-01 1040.604010 NaN
2020-11-01 1051.010050 NaN
2020-12-01 1051.010050 NaN
2021-01-01 1061.520151 1056.265100
2021-02-01 1061.520151 1056.265100

The calc_stats() calculates the sharpe of 2.92 for SYM1 when it should be 3.08 (running calc_stats just on a single column data set). Upon closer inspection, it appears that calc_stats() for the SYM1 column is using or combining value from SYM2 where NaN rows exists. Can you please shed some light on this? If all of my rows for both SYM1 and SYM2 are not NaN then the calculation for sharpe is correct for both columns. If I drop any of the two columns the calc_stats() becomes correct for the single column remaining.

I really appreciate your help in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant