Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run PLSR plot #101

Closed
wants to merge 2 commits into from
Closed

Run PLSR plot #101

wants to merge 2 commits into from

Conversation

andrewram4287
Copy link
Collaborator

No description provided.

@andrewram4287
Copy link
Collaborator Author

andrewram4287 commented Nov 1, 2024

@JacksonLChin This is the code I'm using to run the PLSR plots. Let me know if you find any glaring issues! I based this file mostly by your D8 file.

Comment on lines 83 to 86
covid_acc = score(
labels.loc[meta_data.loc[:, "patient_category"] == "COVID-19"],
probabilities.loc[meta_data.loc[:, "patient_category"] == "COVID-19"],
labels.loc[meta_data.loc[:, "patient_category"] == "COVID-19"].to_numpy().astype(int),
probabilities.loc[meta_data.loc[:, "patient_category"] == "COVID-19"].to_numpy(),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid typing headaches later, let's use to_numpy() for either all of the probabilities variables or none of them.

@JacksonLChin
Copy link
Collaborator

@andrewram4287 Looks good! Just one minor change for type handling.

@JacksonLChin
Copy link
Collaborator

Other thing to note--not sure if this is the same plot you showed in lab meeting yesterday, but the predict_mortality function splits out patients by COVID-19 and non-COVID status before predicting. The Overall column in this case is the accuracy of the two models after joining their predictions together, so this isn't building a third model that looks at all the patients at once.

@andrewram4287
Copy link
Collaborator Author

Other thing to note--not sure if this is the same plot you showed in lab meeting yesterday, but the predict_mortality function splits out patients by COVID-19 and non-COVID status before predicting. The Overall column in this case is the accuracy of the two models after joining their predictions together, so this isn't building a third model that looks at all the patients at once.

Okay that makes sense I can look into what the prediction if we actually combine all the patient samples. Yes, the code creates the same plot I showed in lab meeting.... So that means accuracy does decrease with more PLSR components...

@JacksonLChin
Copy link
Collaborator

Yup! We may want to swap to the 1-component model for each then. As Dr. Meyer suggested yesterday, I think we could move to a single scores plot with the COVID and non-COVID PLSR components as the x and y axes for interpretation.

@andrewram4287
Copy link
Collaborator Author

Yup! We may want to swap to the 1-component model for each then. As Dr. Meyer suggested yesterday, I think we could move to a single scores plot with the COVID and non-COVID PLSR components as the x and y axes for interpretation.

I find it weird though that when you ran the PLSR model before that was not the case...

@JacksonLChin
Copy link
Collaborator

In the past, we've found the non-COVID model worked best at one component, and the COVID one worked slightly better at two. I think this is mostly in line with what we've seen previously--I'm guessing that the change to how we normalize factors prior to PLSR changed this a little bit.

@andrewram4287
Copy link
Collaborator Author

@JacksonLChin
This is the updated plot based on actually trying to predict all samples at once. It looks like it does worse than breaking them up apart.

Screenshot 2024-11-04 at 9 04 52 AM

@JacksonLChin
Copy link
Collaborator

@andrewram4287 Cool! Thanks for looking into this. It's interesting that 1-component seems better for accuracy while 2 is better for AUC-ROC--given that they're comparable, though, I think we should continue with the 1-component.

@andrewram4287 andrewram4287 deleted the InvestigatingPLSR branch November 5, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants