Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy error #1

Closed
rotifergirl opened this issue Jun 20, 2023 · 5 comments
Closed

numpy error #1

rotifergirl opened this issue Jun 20, 2023 · 5 comments

Comments

@rotifergirl
Copy link

Hi james, great tool! So far I've managed to start a test run, but have come up against an error, which might be a numpy version issue? Not sure though as I'm not much of a python user

I repeatedly get this error, I assume once for every repeat element in my library:

/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/pyranges/methods/cluster.py:13: FutureWarning: In the future np.long will be defined as the corresponding NumPy scalar.
ids = annotate_clusters(cdf.Start.values, cdf.End.values, slack)
Traceback (most recent call last):
File "/powerplant/workspace/cfnjxb/TEstrainer/scripts/initial_mafft_setup.py", line 161, in
best_hits_df = blast_gr.cluster(strand=False).df.groupby(['Cluster']).agg({'Chromosome':'first', 'Start':'min', 'End':'max', 'Bitscore':'max'})[['Chromosome','Start','End','Bitscore']].reset_index()
File "/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/pyranges/pyranges_main.py", line 1202, in cluster
df = pyrange_apply_single(_cluster, self, **kwargs)
File "/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/pyranges/multithreaded.py", line 342, in pyrange_apply_single
result = call_f_single(function, nparams, df, **kwargs)
File "/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/pyranges/multithreaded.py", line 28, in call_f_single
return f.remote(df, **kwargs)
File "/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/pyranges/methods/cluster.py", line 13, in _cluster
ids = annotate_clusters(cdf.Start.values, cdf.End.values, slack)
File "sorted_nearest/src/annotate_clusters.pyx", line 10, in sorted_nearest.src.annotate_clusters.annotate_clusters
File "/workspace/appscratch/miniconda/cfnjxb_TEStrainer/lib/python3.10/site-packages/numpy/init.py", line 320, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'long'

I've also attached the spec file of the conda env I created for this if that helps
cfnjxb_TEStrainer_spec.txt

@rotifergirl
Copy link
Author

Another update to this, I tried setting up a conda env specifying python 3.6 and numpy 1.19.5, and then installing all other required packages, but ran into further dependency/version issues. This time with ncls and pyranges. Is there any chance you run this within a conda env you could share a spec file from? Then I can try and replicate the script and see if I get an errors again.

@jamesdgalbraith
Copy link
Owner

Hi Julie, I think this is a Python dependancy issue. I've tried running TEstrainer in a new conda environment to replicate your issue and wasn't able to replicate it. I set up my conda environment using the following command:

conda create -n TEstrainer python=3.10.10 parallel numpy pandas blast mafft mreps cd-hit trf biopython pyranges pyfaidx r-base r-optparse r-tidyverse bioconductor-plyranges bioconductor-bsgenome \
  --channel conda-forge \
  --channel bioconda \
  --channel defaults \
  --strict-channel-priority

which resulted in the following spec file:
haywardLab_TEstrainer_spec.txt

Hopefully this will solve the issue for you.

@rotifergirl
Copy link
Author

I tried a few things, with the last one being making the conda env from your spec file, and cloning the repository again, and no luck, same numpy error.

I even produced a spec file for my new conda env, compared it line by line with comm, and it is all exactly the same, so I'm super stumped as to what might be going on...

@jamesdgalbraith
Copy link
Owner

I've found the issue! From fishing through pyranges dependancies' Github repos an old version of "sorted_nearest" is being used (v0.0.37). This old version uses a deprecated numpy type, but this has been updated in the later versions of sorted_nearest. Unfortunately it's not available yet on conda, so if there's some way you're able to update that package in particular you'll be able to get it running. I've raised an issue here: pyranges/sorted_nearest#6 to hopefully have it updated on conda.

@jamesdgalbraith
Copy link
Owner

To overcome the issue in the short term, if you create the env using the following command it should work

conda create -n TEstrainer python parallel numpy=1.21.1 pandas blast mafft mreps cd-hit trf biopython pyranges pyfaidx r-base r-optparse r-tidyverse bioconductor-plyranges bioconductor-bsgenome
   --channel conda-forge
   --channel bioconda
   --channel defaults
   --strict-channel-priority

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants