Skip to content

Commit

Permalink
Enable pickling of CloudPath (#224)
Browse files Browse the repository at this point in the history
* pickling

* Add to changelog
  • Loading branch information
pjbull authored May 16, 2022
1 parent 403ce19 commit 85268c8
Show file tree
Hide file tree
Showing 4 changed files with 100 additions and 0 deletions.
4 changes: 4 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# cloudpathlib Changelog

## v0.7.2 (UNRELEASED)

- Fixed pickling of `CloudPath` objects not working. ([Issue #223](https://github.com/drivendataorg/cloudpathlib/issues/223), [PR #224](https://github.com/drivendataorg/cloudpathlib/pull/224))

## v0.7.1 (2022-04-06)

- Fixed inadvertent inclusion of tests module in package. ([Issue #173](https://github.com/drivendataorg/cloudpathlib/issues/173), [PR #219](https://github.com/drivendataorg/cloudpathlib/pull/219))
Expand Down
13 changes: 13 additions & 0 deletions cloudpathlib/cloudpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,19 @@ def __del__(self):
if self._handle is not None:
self._handle.close()

def __getstate__(self):
state = self.__dict__.copy()

# don't pickle client
del state["client"]

return state

def __setstate__(self, state):
client = self._cloud_meta.client_class.get_default_client()
state["client"] = client
return self.__dict__.update(state)

@property
def _no_prefix(self) -> str:
return self._str[len(self.cloud_prefix) :]
Expand Down
64 changes: 64 additions & 0 deletions docs/docs/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,67 @@ cp2 = CloudPath("s3://cloudpathlib-test-bucket/", client=client)
client.set_as_default_client()
cp3 = CloudPath("s3://cloudpathlib-test-bucket/")
```

## Pickling `CloudPath` objects

You can pickle and unpickle `CloudPath` objects normally, for example:

```python
from pathlib import Path
import pickle

from cloudpathlib import CloudPath


with Path("cloud_path.pkl").open("wb") as f:
pickle.dump(CloudPath("s3://my-awesome-bucket/cool-file.txt"), f)

with Path("cloud_path.pkl").open("rb") as f:
pickled = pickle.load(f)

assert pickled.bucket == "my-awesome-bucket"
```

The associated `client`, however, is not pickled. When a `CloudPath` is
unpickled, the client on the unpickled object will be set to the default
client for that class.

For example, this **will not work**:

```python
from pathlib import Path
import pickle

from cloudpathlib import S3Client, CloudPath


# create a custom client pointing to the endpoint
client = S3Client(endpoint_url="http://my.s3.server:1234")

# use that client when creating a cloud path
p = CloudPath("s3://cloudpathlib-test-bucket/cool_file.txt", client=client)
p.write_text("hello!")

with Path("cloud_path.pkl").open("wb") as f:
pickle.dump(p, f)

with Path("cloud_path.pkl").open("rb") as f:
pickled = pickle.load(f)

# this will be False, because it will use the default `S3Client`
assert pickled.exists() == False
```

To get this to work, you need to set the custom `client` to the default
before unpickling:

```python
# set the custom client as the default before unpickling
client.set_as_default_client()

with ("cloud_path.pkl").open("rb") as f:
pickled2 = pickle.load(f)

assert pickled2.exists()
assert pickled2.client == client
```
19 changes: 19 additions & 0 deletions tests/test_cloudpath_file_io.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from datetime import datetime
import os
from pathlib import PurePosixPath
import pickle
from shutil import rmtree
from time import sleep

Expand Down Expand Up @@ -309,3 +310,21 @@ def test_os_open(rig):
p = rig.create_cloud_path("dir_0/file0_0.txt")
with open(p, "r") as f:
assert f.readable()


def test_pickle(rig, tmpdir):
p = rig.create_cloud_path("dir_0/file0_0.txt")

with (tmpdir / "test.pkl").open("wb") as f:
pickle.dump(p, f)

with (tmpdir / "test.pkl").open("rb") as f:
pickled = pickle.load(f)

# test a call to the network
assert pickled.exists()

# check we unpickled, and that client is the default client
assert str(pickled) == str(p)
assert pickled.client == p.client
assert rig.client_class._default_client == pickled.client

0 comments on commit 85268c8

Please sign in to comment.