Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please help me solve the problem #1

Open
Frewnd opened this issue Dec 8, 2024 · 1 comment
Open

Please help me solve the problem #1

Frewnd opened this issue Dec 8, 2024 · 1 comment
Assignees

Comments

@Frewnd
Copy link

Frewnd commented Dec 8, 2024

I run the instruction as follows:
python train.py FragmentFactory_dataset.pkl output(The default weighting was modified as True in train.py by me)

Then, there is error:
Traceback (most recent call last):
File "E:\Data\WorkSpace\Git\oglycan\FragmentFactory\train.py", line 275, in
spec2svg(args.in_filename, args.output_path_prefix, weighting=args.weighting, GPID_SIM=args.GPID_SIM)
File "E:\Data\WorkSpace\Git\oglycan\FragmentFactory\train.py", line 231, in spec2svg
data, tree = train_topo_tree(in_filename, weighting=weighting, GPID_SIM=GPID_SIM)
File "E:\Data\WorkSpace\Git\oglycan\FragmentFactory\train.py", line 215, in train_topo_tree
data = TopoData(spec_df, weighting=weighting, GPID_SIM=GPID_SIM)
File "E:\Data\WorkSpace\Git\oglycan\FragmentFactory\train.py", line 84, in init
self.df = split(spec_df, GPID_SIM=GPID_SIM)
File "E:\Data\WorkSpace\Git\oglycan\FragmentFactory\train.py", line 21, in split
"X": list(np.stack(spec_df["binned_intensities_norm"].values)),
File "E:\Data\Backup\Config\Anaconda\envs\ff\lib\site-packages\pandas\core\frame.py", line 4102, in getitem
indexer = self.columns.get_loc(key)
File "E:\Data\Backup\Config\Anaconda\envs\ff\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'binned_intensities_norm'

I attempt to resolve this error, yet whatever approach I take proves ineffective. Would you like to give me a hand?

@Bribak Bribak assigned Bribak and Old-Shatterhand and unassigned Bribak Dec 9, 2024
@Old-Shatterhand
Copy link
Member

Hi @Frewnd,

Sorry for the delayed answer. Could you share the file you're using as input?

The code requires the pandas DataFrame (your CSV file) to have the following columns (additional columns will be ignored/are not relevant to the model):

  • binned_intensities_norm: These are the intensities from the MS/MS spectra binned in ranges and normalized to sum to 1.
  • glycan: The IUPAC-condensed string of the glycans
  • filename: The DOI of the publication where the spectra were published
  • GP_ID: The GlycoPost_ID of glycans

The last two columns were used to reduce information leakage between splits and can be left empty to get a random split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants