Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DtypeWarning: Columns (7) have mixed types. #70

Open
CholoTook opened this issue May 17, 2021 · 4 comments
Open

DtypeWarning: Columns (7) have mixed types. #70

CholoTook opened this issue May 17, 2021 · 4 comments

Comments

@CholoTook
Copy link

The following code is generating a warning for me:

import GEOparse
gpl = GEOparse.get_GEO('GPL17481')

The output is:

>>> import GEOparse
>>> gpl = GEOparse.get_GEO('GPL17481')
17-May-2021 13:32:21 DEBUG utils - Directory ./ already exists. Skipping.
17-May-2021 13:32:21 INFO GEOparse - Downloading http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL17481&form=text&view=full to ./GPL17481.
txt
17-May-2021 13:32:23 DEBUG downloader - Total size: 0
17-May-2021 13:32:23 DEBUG downloader - md5: None
1.72MB [00:00,1.63MB/s]
10.3MB [00:01, 7.26MB/s]
17-May-2021 13:32:24 DEBUG downloader - Moving /tmp/tmp2lblbvso to /home/dbolser/Geromics/Dogome/Geromics/GPL17481.txt
17-May-2021 13:32:24 DEBUG downloader - Successfully downloaded http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL17481&form=text&view=full
17-May-2021 13:32:24 INFO GEOparse - Parsing ./GPL17481.txt: 
17-May-2021 13:32:24 DEBUG GEOparse - PLATFORM: GPL17481
/usr/bin/bpython3:1: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.
  #!/usr/bin/python3
>>> 

I get that this error is coming from pandas, but I'm not sure how to fix it.

@guma44
Copy link
Owner

guma44 commented May 20, 2021

Hi, let me look at it. There is probably something strange in the GPL file. Maybe editing - the file would do the trick. Assuming this is only one timer this could be a good solution. Anyway, taking look at the GPL file would shed some light on what is really the reason.

@CholoTook
Copy link
Author

Could it be that the chromosome column starts out as an int, and then becomes a str?

!platform_table_begin
ID      CHROMOSOME      Position        SNP     Plus/Minus Strand       CanineHD_A.bpm.Address  SPOT_ID SNP_ID
BICF2G630100019 25      34549096        [A/G]   BOT     25732300        BICF2G630100019 
BICF2G630100032 25      34560607        [A/G]   BOT     18759386        BICF2G630100032 
BICF2G630100034 25      34561954        [A/G]   BOT     13789354        BICF2G630100034 
BICF2G630100043 25      34587072        [A/G]   BOT     32780356        BICF2G630100043 
BICF2G630100054 25      34604596        [T/C]   BOT     21757302        BICF2G630100054 
BICF2G630100063 25      34615165        [A/G]   BOT     51809461        BICF2G630100063 
BICF2G630100075 25      34638645        [A/C]   BOT     55806509        BICF2G630100075 
BICF2G63010009  X       95382735        [T/C]   BOT     41613463        BICF2G63010009  
BICF2G630100090 25      34688200        [T/C]   BOT     51730475        BICF2G630100090 
BICF2G630100094 25      34689509        [A/T]   BOT     53724487        BICF2G630100094 
BICF2G63010010  X       95373856        [A/G]   BOT     49675468        BICF2G63010010  

Pandas may guess that it's an int and then get confused... As I said, I'm not super familiar with pandas, but I suppose there is a way to let it know the datatype of each column. However, I don't know how GEOparse invokes Pandas.

@guma44
Copy link
Owner

guma44 commented May 20, 2021

Indeed, this seems that this is a problem. Currently, the package does not allow to pass kwargs to Pandas. However, if the code is in some script and it influences the behaviour you could convert the type after the data is read.

@CholoTook
Copy link
Author

Seems not to cause any problem TBH. It's just a bit of a weird looking error..

You could probably get away with the low_memory=False flag by default?

Thanks for help,
Dan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants