You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that some GEO files contain carriage return characters in the meta data, causing exceptions (GEOparse.GEOTypes.DataIncompatibilityException). To reproduce the error you can test functions with "GPL10740" dataset as follows:
gpl = GEOparse.get_GEO(geo="GPL10740", silent=True, include_data=True, destdir=".")
(<class 'GEOparse.GEOTypes.DataIncompatibilityException'>, DataIncompatibilityException('\nData columns do not match columns description index in GSM1530106\nColumns in table are: )\nIndex in columns are: ID_REF, VALUE, DETECTION P-VALUE\n',), <traceback object at 0x7f1fee64be48>)
columns variable taken from GEOparse.parse_columns(soft) is:
table_data.columns variable taken from GEOparse.parse_table_data(soft) is: Index([')'], dtype='object')
This is due to the line containing a carriage return:
!Sample_relation = Alternative to: GSM1530054 (gene-level analysis^M)
!Sample_series_id = GSE62617
!Sample_series_id = GSE70707
#ID_REF =
#VALUE = RMA normalized signal intensity
#DETECTION P-VALUE =
!sample_table_begin
ID_REF VALUE DETECTION P-VALUE
I suggest a small modification on the GEOparse.utils.smart_open() function for working with such a dataset as follows:
@contextmanager
def smart_open(filepath, **open_kwargs):
"""Open file intelligently depending on the source and python version.
Args:
filepath (:obj:`str`): Path to the file.
Yields:
Context manager for file handle.
"""
if "errors" not in open_kwargs:
open_kwargs["errors"] = "ignore"
if filepath[-2:] == "gz":
open_kwargs["mode"] = "rt"
fopen = gzip.open
else:
open_kwargs["mode"] = "r"
fopen = open
open_kwargs["newline"] = "\n"
# I do not know why here is an "if" statement because this always calls fopen with the same parameters.
if sys.version_info[0] < 3:
fh = fopen(filepath, **open_kwargs)
else:
fh = fopen(filepath, **open_kwargs)
try:
yield fh
except IOError:
fh.close()
finally:
fh.close()
The text was updated successfully, but these errors were encountered:
I found that some GEO files contain carriage return characters in the meta data, causing exceptions (GEOparse.GEOTypes.DataIncompatibilityException). To reproduce the error you can test functions with "GPL10740" dataset as follows:
columns
variable taken fromGEOparse.parse_columns(soft)
is:Index(['ID_REF', 'VALUE', 'DETECTION P-VALUE'], dtype='object')
table_data.columns
variable taken fromGEOparse.parse_table_data(soft)
is:Index([')'], dtype='object')
This is due to the line containing a carriage return:
I suggest a small modification on the
GEOparse.utils.smart_open()
function for working with such a dataset as follows:The text was updated successfully, but these errors were encountered: