inpatient days mismatch #36

2miatran · 2021-04-26T17:49:19Z

Hello, when running the results, I found that the value of inpatient days is not aligned with what I observed in the original claim input file, e.g. patients having no inpatient visits but have inpatient days of 24, or vice versa. Upon debugging, it seems it lines in the part where the inpatient_days is created with index using claim_df, this actually chose only value of date_diff where index == personId.

    preprocessed_df['# of Admissions (12M)'] = inpatient_rows.groupby('personId').admitDate.nunique()
    date_diff = pd.to_timedelta(inpatient_rows['dischargeDate'].dt.date - inpatient_rows['admitDate'].dt.date)
    inpatient_days = pd.Series(date_diff.dt.days, index=claim_df['personId'])
    preprocessed_df['Inpatient Days'] = inpatient_days.groupby('personId').sum()

Example of date_diff:
date_diff.dt.days
10 8
29 2
53 2
56 9
60 2
..
1333281 3
1333325 2 --> if there was a personid == 1333325, then there inpatient days is 2, while this is the index of the claim_df, not related to personId.
1333336 10
1333337 5
1333340 5
Length: 74609, dtype: int64

The claim_df and demo_df were set up as suggested:

demo_df has unique row for each patient with age and gender
claim_df has one or multiple rows for each patient (only patient with claims are included).
Please let me know if you have any suggestion? Thank you.

The text was updated successfully, but these errors were encountered:

DaveDeCaprio · 2021-04-27T13:50:23Z

IF you make this change, does it work correctly?

inpatient_days = pd.Series(date_diff.dt.days, index=inpatient_rows['personId'])

2miatran · 2021-04-27T17:38:59Z

Thanks, I already modified the code to work meanwhile, but was wondering if there is any potential impact on the way the test set "inpatient days" feature was created (if it was created using the same way) and used to generate the risk_score distribution, as from here:

risk_score - This percentile which indicates where this prediction lies in the distribution of predictinos on the test set. A value of 95 indicates that the prediction was higher than 95% of the test population, which was designed to be representative of the overall US population.

Additionally, we observed this difference but just to confirm, the xgboost_all_age model will give higher risk_score to compared to xgboost model which was trained on Medicare member only? Have you compared between the 2 models about the difference in risk_score on same population, Medicares for example?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inpatient days mismatch #36

inpatient days mismatch #36

2miatran commented Apr 26, 2021

DaveDeCaprio commented Apr 27, 2021

2miatran commented Apr 27, 2021

inpatient days mismatch #36

inpatient days mismatch #36

Comments

2miatran commented Apr 26, 2021

DaveDeCaprio commented Apr 27, 2021

2miatran commented Apr 27, 2021