-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompress util #244
base: develop
Are you sure you want to change the base?
Decompress util #244
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! :)
I've requested some changes/questions and we can discuss about them. =)
rows/utils.py
Outdated
lzma_mime_types = ( | ||
'application/x-xz', | ||
'application/x-lzma' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mimetype detection is not needed here - we can expect the user will call this function only if she knows the file is compressed and in one of the supported algorithms; we can do this detection automatically on the command-line interface using file-magic and then pass the correct arguments to decompress
. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally : )
rows/utils.py
Outdated
@@ -297,3 +310,53 @@ def export_to_uri(table, uri, *args, **kwargs): | |||
raise ValueError('Plugin (export) "{}" not found'.format(plugin_name)) | |||
|
|||
return export_function(table, uri, *args, **kwargs) | |||
|
|||
|
|||
def decompress(path, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an algorithm
parameter to this function? It should defaults to None
(if None
, use file extension to define it).
rows/utils.py
Outdated
msg = "Couldn't identify file mimetype, or lzma module isn't available" | ||
raise RuntimeError(msg) | ||
|
||
with open_compressed(path, **kwargs) as handler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think having kwargs
on decompress
is really needed? Could you give me an example use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically encoding
. On UNIX for example I barely use it. But Windows user should always add utf-8
I was told.
tests/tests_utils.py
Outdated
|
||
def setUp(self): | ||
self.contents = six.b('Ahoy') | ||
self.temp = tempfile.TemporaryDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the tests are failing here. I've replaced self.tmp
with self.temp
(was receiving a NameError
) but they still fail:
======================================================================
ERROR: test_decompress_with_bz2 (tests.tests_utils.UtilsDecompressTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/turicas/projects/rows/tests/tests_utils.py", line 99, in test_decompress_with_bz2
decompressed = rows.utils.decompress(compressed)
File "/home/turicas/projects/rows/rows/utils.py", line 359, in decompress
raise RuntimeError(msg)
RuntimeError: Couldn't identify file mimetype, or lzma module isn't available
======================================================================
ERROR: test_decompress_with_gz (tests.tests_utils.UtilsDecompressTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/turicas/projects/rows/tests/tests_utils.py", line 107, in test_decompress_with_gz
self.assertEqual(self.contents, decompressed.read())
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/gzip.py", line 272, in read
self._check_not_closed()
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/_compression.py", line 14, in _check_not_closed
raise ValueError("I/O operation on closed file")
ValueError: I/O operation on closed file
======================================================================
ERROR: test_decompress_with_incompatible_file (tests.tests_utils.UtilsDecompressTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/turicas/projects/rows/tests/tests_utils.py", line 126, in test_decompress_with_incompatible_file
with self.assertRaises():
TypeError: assertRaises() missing 1 required positional argument: 'expected_exception'
======================================================================
ERROR: test_decompress_with_lzma (tests.tests_utils.UtilsDecompressTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/turicas/projects/rows/tests/tests_utils.py", line 112, in test_decompress_with_lzma
with lzma.open(compressed) as compressed_handler:
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/lzma.py", line 302, in open
preset=preset, filters=filters)
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/lzma.py", line 120, in __init__
self._fp = builtins.open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpr5s7bse8/test.lzma'
======================================================================
ERROR: test_decompress_with_xz (tests.tests_utils.UtilsDecompressTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/turicas/projects/rows/tests/tests_utils.py", line 120, in test_decompress_with_xz
with lzma.open(compressed) as compressed_handler:
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/lzma.py", line 302, in open
preset=preset, filters=filters)
File "/home/turicas/software/pyenv/versions/3.6.2/lib/python3.6/lzma.py", line 120, in __init__
self._fp = builtins.open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpz5l7xm1q/test.gz'
----------------------------------------------------------------------
Ran 184 tests in 0.848s
Are the tests passing in your machine?
rows/utils.py
Outdated
kwargs are passed to either `bz2.openn`, `gzip.open` or `lzma.open`. | ||
:param path: (str) path to a bz2, gzip or lzma file | ||
""" | ||
filename = os.path.basename(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the current API accepts filenames or file-objects,
this function should also do (this decision was inspired in the Python stdlib modules, such as csv
). You can get some help using rows.plugins.utils.get_filename_and_fobj
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was indeed pretty helpful, thanks ; )
rows/utils.py
Outdated
raise RuntimeError(msg) | ||
|
||
with open_compressed(path, **kwargs) as handler: | ||
return handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very important to ensure the file object returned is open in binary mode (so the plugins will decode the data using the desired encoding). Could you please add a test for this case?
Ok, sorry I took too long to get back to this PR. Life's got hectic around here. Anyway, I addressed many issues in this refactor:
About tests: I've just ran Do you prefer to implement |
fa144f0
to
bbb2c57
Compare
This PR implements
rows.utils.decompress
as suggested in #230. The API is:A
RunTime
error is raised if:rows
can't properly guess the mimetype as a known mimetype of a lzma or gzip filerows
identifies a lzma mimetype but thelzma
module is not available (a lzma lib has to be available in the OS when compiling Python it self as explained here)