Skip to content

Commit

Permalink
Enzyme evals (#19)
Browse files Browse the repository at this point in the history
Co-authored-by: Type59Gold <liyq@ddp.tech>
  • Loading branch information
Type59pro and Type59Gold authored Mar 12, 2024
1 parent 7f01360 commit e280641
Show file tree
Hide file tree
Showing 9 changed files with 44 additions and 61 deletions.

This file was deleted.

This file was deleted.

14 changes: 14 additions & 0 deletions evals/registry/data/00_scipaper_enzyme_km/samples.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{"file_name": "../uni-finder/enzyme/km/paper/10.1007_s00425-014-2102-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s00425-014-2102-6.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1007_s00425-014-2102-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s00425-014-2102-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1007_s10725-019-00528-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s10725-019-00528-9.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1007_s10725-019-00528-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s10725-019-00528-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_j.bbrep.2016.11.003.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_j.bbrep.2016.11.003.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_j.bbrep.2016.11.003.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_j.bbrep.2016.11.003.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0005-2728__97__00090-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0005-2728__97__00090-x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0005-2728__97__00090-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0005-2728__97__00090-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0021-9258__18__96277-0.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96277-0.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0021-9258__18__96277-0.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96277-0.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0021-9258__18__96427-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96427-6.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0021-9258__18__96427-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96427-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_S0076-6879__75__41082-5.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_S0076-6879__75__41082-5.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_S0076-6879__75__41082-5.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_S0076-6879__75__41082-5.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0141-8130__01__00188-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0141-8130__01__00188-x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0141-8130__01__00188-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0141-8130__01__00188-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1021_acs.biochem.6b00536.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1021_acs.biochem.6b00536.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1021_acs.biochem.6b00536.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1021_acs.biochem.6b00536.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1080_09168451.2020.1751582.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1751582.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1080_09168451.2020.1751582.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1751582.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1080_09168451.2020.1799749.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1799749.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1080_09168451.2020.1799749.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1799749.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1104_pp.19.01225.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1104_pp.19.01225.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1104_pp.19.01225.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1104_pp.19.01225.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/10.1139_b07-081.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1139_b07-081.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1139_b07-081.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1139_b07-081.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/km/paper/j.1432-1033.1986.tb09548.x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/j.1432-1033.1986.tb09548.x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/j.1432-1033.1986.tb09548.x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/j.1432-1033.1986.tb09548.x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}

This file was deleted.

17 changes: 14 additions & 3 deletions evals/registry/data/00_scipaper_enzyme_substrate/samples.jsonl
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6316846852a855013f98ee678e945582013c1269fcad311c8e933859ade77c68
size 1919
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s00425-014-2102-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s00425-014-2102-6.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s00425-014-2102-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s00425-014-2102-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s10725-019-00528-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s10725-019-00528-9.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s10725-019-00528-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s10725-019-00528-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s11103-006-0040-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s11103-006-0040-9.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s11103-006-0040-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s11103-006-0040-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_j.bbrep.2016.11.003.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_j.bbrep.2016.11.003.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_j.bbrep.2016.11.003.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_j.bbrep.2016.11.003.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0005-2728__97__00090-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0005-2728__97__00090-x.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0005-2728__97__00090-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0005-2728__97__00090-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0021-9258__18__96277-0.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96277-0.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0021-9258__18__96277-0.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96277-0.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0021-9258__18__96427-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96427-6.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0021-9258__18__96427-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96427-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_S0076-6879__75__41082-5.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_S0076-6879__75__41082-5.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_S0076-6879__75__41082-5.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_S0076-6879__75__41082-5.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1021_acs.biochem.6b00536.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1021_acs.biochem.6b00536.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1021_acs.biochem.6b00536.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1021_acs.biochem.6b00536.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1080_09168451.2020.1751582.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1751582.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1080_09168451.2020.1751582.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1751582.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1080_09168451.2020.1799749.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1799749.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1080_09168451.2020.1799749.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1799749.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1104_pp.19.01225.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1104_pp.19.01225.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1104_pp.19.01225.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1104_pp.19.01225.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1139_b07-081.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1139_b07-081.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1139_b07-081.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1139_b07-081.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
{"file_name": "../uni-finder/enzyme/substrate/paper/s_j.1432-1033.1986.tb09548.x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_j.1432-1033.1986.tb09548.x.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_j.1432-1033.1986.tb09548.x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_j.1432-1033.1986.tb09548.x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
18 changes: 0 additions & 18 deletions evals/registry/evals/00_scipaper_enzyme_activate_compound.yaml

This file was deleted.

19 changes: 10 additions & 9 deletions ...y/evals/00_scipaper_enzyme_inhibitor.yaml → ...registry/evals/00_scipaper_enzyme_km.yaml
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
scipaper_enzyme_inhibitor:
id: scipaper_enzyme_inhibitor.val.csv
scipaper_enzyme_km:
id: scipaper_enzyme_km.val.csv
metrics: [accuracy]

scipaper_enzyme_inhibitor.val.csv:
scipaper_enzyme_km.val.csv:
class: evals.elsuite.rag_table_extract:TableExtract
args:
samples_jsonl: 00_scipaper_enzyme_inhibitor/samples.jsonl
samples_jsonl: 00_scipaper_enzyme_km/samples.jsonl
instructions: |
Please give a complete list of Inhibitor, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
Please give a complete list of Substrate, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Inhibitor,Comment,Organism
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens
p-xylene,"11.4 mM, slight inhibitor",Bos taurus
NH4+, 0.002 mM,Bos taurus
Substrate,Comment,Organism,Km Value
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens, 3.5 nM
p-xylene,"20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, 12 nM
D-ribose 6-phosphate, - , Homo sapiens, 120 nM
```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.
16 changes: 0 additions & 16 deletions evals/registry/evals/00_scipaper_enzyme_localization.yaml

This file was deleted.

12 changes: 6 additions & 6 deletions evals/registry/evals/00_scipaper_enzyme_substrate.yaml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ scipaper_enzyme_substrate.val.csv:
args:
samples_jsonl: 00_scipaper_enzyme_substrate/samples.jsonl
instructions: |
Please give a complete list of SMILES structures, Km values, Vmax values, target info (protein or cell line), and organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
Please give a complete list of Substrate, Commentand Organism of all substrates, Products and Comment of Product in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Substrate,Inhibitors, Km value,Km max,Comment,organism,Vmax value,SMILES,Target info,Activating Compound,
ATP,Cu2+,0.001 mM,-,-,Homo sapiens,-,-,ATP-linker aldehyde,Carboxybenzaldehyde,
p-xylene,NADH,0.004 mM,-,-,Homo sapiens,-,C1CCCCC1,-,Methylbenzaldehyde
NADPH,benzaldehyde, 0.12 mM,125 mM,enzyme form ATP,Bos taurus,-,-,NH4+
Substrate,Comment,Organism,Products,"Comment (Product)"
"NADH + H+ + O2","20 mM Tris-HCl(pH 7.0)",Homo sapiens,"NAD+ + H2O", -
"D-glucose + 6-phosphate","20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, -
"D-ribose 6-phosphate", - , Homo sapiens, "glycerol + phosphate", -
```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.

0 comments on commit e280641

Please sign in to comment.