You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, when running the results, I found that the value of inpatient days is not aligned with what I observed in the original claim input file, e.g. patients having no inpatient visits but have inpatient days of 24, or vice versa. Upon debugging, it seems it lines in the part where the inpatient_days is created with index using claim_df, this actually chose only value of date_diff where index == personId.
Example of date_diff:
date_diff.dt.days
10 8
29 2
53 2
56 9
60 2
..
1333281 3 1333325 2 --> if there was a personid == 1333325, then there inpatient days is 2, while this is the index of the claim_df, not related to personId.
1333336 10
1333337 5
1333340 5
Length: 74609, dtype: int64
The claim_df and demo_df were set up as suggested:
demo_df has unique row for each patient with age and gender
claim_df has one or multiple rows for each patient (only patient with claims are included).
Please let me know if you have any suggestion? Thank you.
The text was updated successfully, but these errors were encountered:
Thanks, I already modified the code to work meanwhile, but was wondering if there is any potential impact on the way the test set "inpatient days" feature was created (if it was created using the same way) and used to generate the risk_score distribution, as from here:
risk_score - This percentile which indicates where this prediction lies in the distribution of predictinos on the test set. A value of 95 indicates that the prediction was higher than 95% of the test population, which was designed to be representative of the overall US population.
Additionally, we observed this difference but just to confirm, the xgboost_all_age model will give higher risk_score to compared to xgboost model which was trained on Medicare member only? Have you compared between the 2 models about the difference in risk_score on same population, Medicares for example?
Hello, when running the results, I found that the value of inpatient days is not aligned with what I observed in the original claim input file, e.g. patients having no inpatient visits but have inpatient days of 24, or vice versa. Upon debugging, it seems it lines in the part where the inpatient_days is created with index using claim_df, this actually chose only value of date_diff where index == personId.
Example of date_diff:
date_diff.dt.days
10 8
29 2
53 2
56 9
60 2
..
1333281 3
1333325 2 --> if there was a personid == 1333325, then there inpatient days is 2, while this is the index of the claim_df, not related to personId.
1333336 10
1333337 5
1333340 5
Length: 74609, dtype: int64
The claim_df and demo_df were set up as suggested:
Please let me know if you have any suggestion? Thank you.
The text was updated successfully, but these errors were encountered: