NaN values in multi-column data set are replaced in calc_stats calculation #155

rs0013 · 2021-08-26T20:51:48Z

When having a multi-column data set such as:

               SYM1         SYM2

Date
2020-01-01 1000.000000 1000.000000
2020-02-01 1000.000000 1000.000000
2020-02-15 NaN 1005.000000
2020-03-01 1010.000000 1015.050000
2020-04-01 1010.000000 1015.050000
2020-05-01 1020.100000 1025.200500
2020-06-01 1020.100000 1025.200500
2020-07-01 1030.301000 1035.452505
2020-08-01 1030.301000 1035.452505
2020-09-01 1040.604010 1045.807030
2020-10-01 1040.604010 NaN
2020-11-01 1051.010050 NaN
2020-12-01 1051.010050 NaN
2021-01-01 1061.520151 1056.265100
2021-02-01 1061.520151 1056.265100

The calc_stats() calculates the sharpe of 2.92 for SYM1 when it should be 3.08 (running calc_stats just on a single column data set). Upon closer inspection, it appears that calc_stats() for the SYM1 column is using or combining value from SYM2 where NaN rows exists. Can you please shed some light on this? If all of my rows for both SYM1 and SYM2 are not NaN then the calculation for sharpe is correct for both columns. If I drop any of the two columns the calc_stats() becomes correct for the single column remaining.

I really appreciate your help in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN values in multi-column data set are replaced in calc_stats calculation #155

NaN values in multi-column data set are replaced in calc_stats calculation #155

rs0013 commented Aug 26, 2021

NaN values in multi-column data set are replaced in calc_stats calculation #155

NaN values in multi-column data set are replaced in calc_stats calculation #155

Comments

rs0013 commented Aug 26, 2021