Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that the value of VSA_EState* was not generated well across the entire training data. #5

Open
jsko-arontier opened this issue Jun 1, 2023 · 1 comment

Comments

@jsko-arontier
Copy link

jsko-arontier commented Jun 1, 2023

Thanks for publishing the good work. I ran the program to use ROBIN for our study and noticed something strange.

When I run the analysis using the provided file (Mordred_Test_Compounds_3D.csv), I get the results as stated in the paper, but when I run the analysis by generating descriptors directly from the sdf file, I get different results.

When I analyzed the generated files, I found that the VSA_EState* values were significantly different, as shown below, and in the provided files (Mordred_Test_Compounds_3D.csv, Mordred_ROBIN_RNA_Binder_3D.csv), the VSA_EState1~7 values are mostly 0. If you generate them yourself, these values will be present.

Here is the program I used

  • rdkit : 2022.9.5
  • mordred : 1.2.0
  • tensorflow : 2.3.1
  • scikit-learn : 1.0.2
  • numpy : 1.18.5
  • scipy : 1.9.3
$ cat Mordred_files cat Mordred_Test_Compounds_3D.csv | cut -d ',' -f 1,1561,1562,1563,1564,1565,1566,1567

name,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7
ADQ,0.0,0.0,0.0,0.0,0.0,0.0,0.0
HIV TAR compound 4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ribocil-A,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Tetracycline,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Imatinib,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ibrutinib,0.0,0.0,0.0,0.0,7.188619484542558,0.0,0.0
Lovastatin,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Nevirapine,0.0,0.0,0.0,0.0,0.0,0.0,0.0

$ cat Mordred_files cat Mordred_ROBIN_RNA_Binder_3D.csv | cut -d ',' -f 1,1561,1562,1563,1564,1565,1566,1567 | head -n 5

name,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7
0054-0090,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0096-0280,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0109-0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0109-0045,0.0,0.0,0.0,0.0,0.0,0.0,0.0
@jsko-arontier
Copy link
Author

jsko-arontier commented Jun 1, 2023

The file I created myself has the following non-zero values.

$ python -m mordred -3 Test_Compounds_3D.sdf -o Test_Compounds_3D.mordred.csv
$ cat Test_Compounds_3D.mordred.csv | cut -d ',' -f 1,1561,1562,1563,1564,1565,1566,1567 

name,VSA_EState1,VSA_EState2,VSA_EState3,VSA_EState4,VSA_EState5,VSA_EState6,VSA_EState7
ADQ,11.718160420815336,47.42857699012404,34.05060062906003,3.078153942379781,-0.33326002742329375,14.99573145325298,-1.9379634082088582
HIV TAR compound 4,38.19259310235959,17.498854087497666,3.1033958228353073,6.24855622042333,-0.5625207860922143,7.987202028487653,-4.473388303949271
Ribocil-A,0.0,31.842610646889042,4.933083613714164,1.7610598832630966,1.6422973544097212,5.556640742998518,5.828923148026683
Tetracycline,0.0,40.01794688016008,54.16668723629131,-0.5148717473036521,-7.799810531338082,3.068299635644739,-0.19874055177626593
Imatinib,0.0,30.128187865789258,6.2979499406137265,6.102768934885226,0.3376392406237758,19.347018038601632,5.219265870655177
Ibrutinib,7.793632778010275,22.652446521864317,5.616525798335863,8.52059370897009,1.8032790912538088,17.321361934766436,4.581419830522775
Lovastatin,11.505992733717303,24.21316292324496,9.89343956573144,1.2726621945074328,0.7548182808576581,0.0,9.701063477760322
Nevirapine,0.0,15.195298563869994,12.347919501133788,2.1814871504157223,1.1875000000000004,5.805969387755102,5.664973544973545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant