what did you intend to do with drug_name.split("+")] #132

bhomass · 2023-08-14T21:37:23Z

in data.py, there is this statement

for d in self.drugs_names:
    [drugs_names_unique.add(i) for i in d.split("+")]

It led to the code bombing in the following line

    name_to_smiles_map = {
        drug: canonicalize_smiles(smiles)
        for drug, smiles in dataset.obs.groupby(
            [perturbation_key, smiles_key]
        ).groups.keys()
    }

Upon examination of the drug names, there are only 4 drug names that contains '+'
(+)-3-(1-propyl-piperidin-3-yl)-phenol
(+|-)-7-hydroxy-2-(N,N-di-n-propylamino)tetralin
flurbiprofen-(+|-)
atenolol-(+|-)

I assume the sensible thing to do would be to eliminate (+) or (+|-) and the trailing or preceding -.
But [drugs_names_unique.add(i) for i in d.split("+")] would not be doing that. It would simply leave fragments like '(' as a possible drug name.

If someone can point out if my interpretation is correct.

The comment for drug_names_to_once_canon_smiles() says
#This function will need to be rewritten to handle datasets with combinations
but I don't get what is meant by "combinations". Are there some drugs that uses '+' to combine multiple formula together, and that is why you are doing split('+'). If so, the (+) cases should be exemplified from the split processing. But I don't see that mechanism in place.

The text was updated successfully, but these errors were encountered:

bhomass · 2023-08-26T05:47:36Z

I used the re split with
plus_pattern = r'(?<!\()\+'

and it worked.

MxMstrmn · 2023-09-12T15:20:00Z

The "+" sign is meant for indicating drug combinations, this can be done with chemCPA as demonstrated here. I will make this more clear in a future PR

bhomass · 2023-09-13T04:42:32Z

yes, I do understand the intent for the '+' in the code. my point is the code for splitting by '+' will fail for the dataset, because of drug names that already contains '+', as in (+)-3-(1-propyl-piperidin-3-yl)-phenol

sepidism · 2024-01-11T19:56:52Z

Were you able to run manual_seml_sweep.py? I keep getting random errors regarding the data. I'm trying with sciplex_complete_middle_subset.h5ad and slincs_full_smiles_sciplex_genes.h5ad.

bhomass · 2024-01-11T19:59:40Z

This is not a repo you can just download and run. There are a hundred and one little bugs mismatches discrepancies throughout. But I worked through the errors one by one and got it running at the end.

…

On Thu, Jan 11, 2024 at 11:57 AM Sepideh ***@***.***> wrote: Were you able to run manual_seml_sweep.py? I keep getting random errors regarding the data. I'm trying with sciplex_complete_middle_subset.h5ad and slincs_full_smiles_sciplex_genes.h5ad. — Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI3PUVWSOR6DRBHSX3KXTDYOA7Y5AVCNFSM6AAAAAA3QIGT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXHA3TAMBSGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

bhomass · 2024-02-07T07:03:28Z

apparently the drug names with the funny + were eliminated during preprocessing, if you were able to run through the code in the preprocessing folder. For us, that is not possible due to unposted input files.

MxMstrmn · 2024-03-04T12:14:47Z

Hi @bhomass,

why were you not able to remove those '+' signs from the drug names? What do you mean by unposted input files?

bhomass · 2024-03-05T04:12:02Z

Yes, I did remove the drugs with + in the names once I realized that is how you handled these. I posted a few times all the missing input files that show up in the code but are not in the download links. Let me walk through all the notebook code in the preprocessing folder one more time and make a list of all input files that are missing,

…

On Mon, Mar 4, 2024 at 4:14 AM Leon Hetzel ***@***.***> wrote: Hi @bhomass <https://github.com/bhomass>, why were you not able to remove those '+' signs from the drug names? What do you mean by unposted input files? — Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI3PUQBQJ5SHAOGCCL33KDYWRQUFAVCNFSM6AAAAAA3QIGT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWGQ2DQOBUGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what did you intend to do with drug_name.split("+")] #132

what did you intend to do with drug_name.split("+")] #132

bhomass commented Aug 14, 2023 •

edited

Loading

bhomass commented Aug 26, 2023

MxMstrmn commented Sep 12, 2023

bhomass commented Sep 13, 2023

sepidism commented Jan 11, 2024

bhomass commented Jan 11, 2024 via email

bhomass commented Feb 7, 2024

MxMstrmn commented Mar 4, 2024

bhomass commented Mar 5, 2024 via email

what did you intend to do with drug_name.split("+")] #132

what did you intend to do with drug_name.split("+")] #132

Comments

bhomass commented Aug 14, 2023 • edited Loading

bhomass commented Aug 26, 2023

MxMstrmn commented Sep 12, 2023

bhomass commented Sep 13, 2023

sepidism commented Jan 11, 2024

bhomass commented Jan 11, 2024 via email

bhomass commented Feb 7, 2024

MxMstrmn commented Mar 4, 2024

bhomass commented Mar 5, 2024 via email

bhomass commented Aug 14, 2023 •

edited

Loading