Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing problems #39

Open
Kev-Flores opened this issue Jun 6, 2024 · 1 comment
Open

Parsing problems #39

Kev-Flores opened this issue Jun 6, 2024 · 1 comment

Comments

@Kev-Flores
Copy link

Is your feature request related to a problem? Please describe.
I have 3 main problem.

  • If I try to parse a lipid species wrting it in this way "d7 18:1 MG" it doesn't get parsed
  • It seems that the parsing doesn't "recognize" "2O", like when I try to parse this species "Cer 20:1;2O/18:0"
  • "plasmenyl", "SPLASH", "Lyso", don't get recognized either, like in "SPLASH LPC 18:1(d7)"
    I recently started using golin in R so maybe these problems are due to my inexperience; if not I hope to bring to light some features that could be added.
    Attached I am sending you a file with several examples of lipids that are not recognized.
    CLASSI_SPECIE NON RICONOSCIUTE.txt
@dominik-kopczynski
Copy link
Collaborator

Hi Key-Flores,

Thank you very much for reporting these issues. I see a lot of different types of lipid name variations.

  • The "2O" here denotes an additional part in the fatty acyl chain, when the number is before the functional group, it determines the position, if behind the number of additional functional groups. According to the current shorthand nomenclature, this would be the correct way to write the names:

    • Cer 18:0;1OH,3OH/16:0
    • Cer 18:0;(OH)2/16:0
    • Cer 18:0;O2/16:0
      So here it should be "O2". Is it possible in your code to change your output to O2? Otherwise, we run into the ambiguity problem where one person will write 2O and mean OH at the second position and the other 2O and mean two OH at unknown positions. These are two different descriptions where Goslin won't be able to determine the correct one.
  • Prefixes such as SPLASH or Plasmenyl, you can skip because they are not part of any nomenclature. Just write LPC 18:1(d7). Please be aware that (d7) is just a convenient way for heavy molecule notation. The official way is LPC 18:1[M[2]H7] following the IUPAC nomenclature. You can keep writing (d7) but just be aware that Goslin will provide the output in a different format

  • "d7 18:1 MG" We never saw this order of reporting the lipid name. It looks like it is completely reverted. My suggestion: let's have a call. In this case, I could get a picture of what and how you are doing. Maybe we can find this way a faster solution for you than using the github issue tracker. What do you think?

Cheers,
Dominik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants