Skip to content

Commit

Permalink
major zenodo simplifaction. only one interface implementation needed
Browse files Browse the repository at this point in the history
  • Loading branch information
matthiasprobst committed Jun 27, 2024
1 parent dcbadf1 commit d09d478
Show file tree
Hide file tree
Showing 10 changed files with 1,393 additions and 475 deletions.
619 changes: 247 additions & 372 deletions docs/_static/repo_class_diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
589 changes: 589 additions & 0 deletions docs/_static/repo_class_diagram.svg.2024_06_27_14_29_45.0.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
54 changes: 25 additions & 29 deletions docs/userguide/repository/zenodo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@
"source": [
"# Zenodo\n",
"\n",
"There are two types of Zenodo interfaces. One interfaces to the public repositories (`ZenodoRecord`), the other is for testing and accessed the sandbox server (`ZenodoSandboxDeposit`).\n",
"The [Zenodo](https://zenodo.org/) repository is a concrete implementation of the `RepositoryInterface`. Other repositories such as `Figshare` (https://figshare.com/) could be possible future realizations of it.\n",
"\n",
"The class diagram below shows how they are constructed. First, an abstract zenodo interface class (`AbstractZenodoInterface`) is derived. From this, the concrete interface classes are derived.\n",
"Zenodo provides a sandbox (testing environment) and a production environment. They work the same in principle. Therefore, only one implementation is needed, which is `ZenodoRecord` (the interface to a record in Zenodo). Pass `sandbox=True` to use the testing environment.\n",
"\n",
"The below diagram shows the abstract base class with its abstract methods (indicated by italics). Note, that `upload_file()` is *not* abstract. The subclasses must implement `__upload_file__`, which uploads a file to the repository record. `upload_file()` is basically a wrapper, which additionally allows generating metadata files of the uploaded files. We will explore this feature later in this section.\n",
"\n",
"The `RepositoryInterface` further defines the communication with files. A file object `RepositoryFile` is implemented, providing mandatory properties as well as a download method. A repository implementation (just like the one for Zenodo) must return a Dictionary of `RepositoryFile` objects for the `files` class property (see source code for in-depth explanation and the example at the end of this section).\n",
"\n",
"<img src=\"../../_static/repo_class_diagram.svg\"\n",
" alt=\"../../_static/repo_class_diagram.svg\"\n",
Expand Down Expand Up @@ -45,7 +49,7 @@
"source": [
"### 1. Init a Repo:\n",
"\n",
"For testing purpose, let's use the sandbox environment of Zenodo (`ZenodoSandboxDeposit`)"
"As said, we use the testing interface, hence `sandbox=True`:"
]
},
{
Expand All @@ -55,7 +59,7 @@
"metadata": {},
"outputs": [],
"source": [
"repo = zenodo.ZenodoSandboxDeposit(None)"
"repo = zenodo.ZenodoRecord(None, sandbox=True)"
]
},
{
Expand Down Expand Up @@ -176,18 +180,20 @@
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['tmp0.hdf', 'tmp0.jsonld']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
"ename": "AttributeError",
"evalue": "'str' object has no attribute 'name'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[7], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m [f\u001b[38;5;241m.\u001b[39mname \u001b[38;5;28;01mfor\u001b[39;00m f \u001b[38;5;129;01min\u001b[39;00m repo\u001b[38;5;241m.\u001b[39mfiles]\n",
"Cell \u001b[1;32mIn[7], line 1\u001b[0m, in \u001b[0;36m<listcomp>\u001b[1;34m(.0)\u001b[0m\n\u001b[1;32m----> 1\u001b[0m [\u001b[43mf\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m \u001b[38;5;28;01mfor\u001b[39;00m f \u001b[38;5;129;01min\u001b[39;00m repo\u001b[38;5;241m.\u001b[39mfiles]\n",
"\u001b[1;31mAttributeError\u001b[0m: 'str' object has no attribute 'name'"
]
}
],
"source": [
"repo.get_filenames()"
"[f.name for f in repo.files]"
]
},
{
Expand All @@ -202,7 +208,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "fbf04793-b7eb-4377-a904-edb91542b056",
"metadata": {},
"outputs": [],
Expand All @@ -220,7 +226,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "02290359-b13b-413d-9b93-9706b3ab087d",
"metadata": {},
"outputs": [],
Expand All @@ -238,23 +244,13 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "4e75f4fc-6311-470c-9204-93c1c5d768d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tmp0.txt\n",
"tmp0.hdf\n",
"tmp0.jsonld\n"
]
}
],
"outputs": [],
"source": [
"for file in repo.files:\n",
" print(file.filename)"
" print(file.name)"
]
},
{
Expand Down Expand Up @@ -282,7 +278,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
"version": "3.8.19"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion h5rdmtoolbox/convention/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -755,7 +755,7 @@ def from_zenodo(doi_or_recid: str,
if not filename.exists() or force_download:
record = zenodo.ZenodoRecord(rec_id)

filenames = record.get_filenames()
filenames = list(record.files.keys())
if name is None:
matches = [file for file in filenames if pathlib.Path(file).suffix == '.yaml']
else:
Expand Down
2 changes: 1 addition & 1 deletion h5rdmtoolbox/convention/standard_names/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -752,7 +752,7 @@ def from_zenodo(source: str = None, doi_or_recid=None) -> "StandardNameTable":
z = zenodo.ZenodoRecord(rec_id)
assert z.exists()

filenames = [file.download(target_folder=UserDir['standard_name_tables']) for file in z.files]
filenames = [file.download(target_folder=UserDir['standard_name_tables']) for file in z.files.values()]
# filenames = z.download_files(target_folder=UserDir['standard_name_tables'])
assert len(filenames) == 1
filename = filenames[0]
Expand Down
48 changes: 30 additions & 18 deletions h5rdmtoolbox/repository/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

import appdirs

from h5rdmtoolbox.utils import deprecated

logger = logging.getLogger('h5rdmtoolbox')


Expand Down Expand Up @@ -37,29 +39,34 @@ def _HDF2JSON(filename: Union[str, pathlib.Path], **kwargs) -> pathlib.Path:
return hdf2jsonld(filename=filename, skipND=1)


class RepositoryFile(abc.ABC):
class RepositoryFile:
"""The interface class to files in a repository"""

def __init__(self, identifier,
def __init__(self,
identifier,
identifier_url,
download_url,
access_url,
checksum,
filename,
name,
size,
media_type,
access_token=None,
**kwargs):
self.download_url = download_url
self.access_url = access_url
self.checksum = checksum
self.filename = filename
self.name = name
self.media_type = media_type
self.size = size
self.identifier = identifier
self.identifier_url = identifier_url
self.access_token = access_token
self.additional_data = kwargs

def __repr__(self):
return f"{self.__class__.__name__}({self.name})"

def info(self) -> Dict:
return dict(identifier=self.identifier,
identifier_url=self.identifier_url,
Expand Down Expand Up @@ -132,30 +139,31 @@ def set_metadata(self, metadata):

@abc.abstractmethod
def download_file(self, filename):
"""Download a specific file from the repository."""
"""Download a specific file from the repository.
..note: This method is deprecated. Use method `.files.get(filename).download()` method instead.
"""

@abc.abstractmethod
def download_files(self):
"""Download all files from the repository."""
"""Download all files from the repository.
..note: This method is deprecated. Please iterate over `files` and call .download() on the items.
"""

@deprecated(version='1.4.0rc1',
msg='Please use `list(self.files.keys())` instead')
def get_filenames(self) -> List[str]:
"""Get a list of all filenames."""
return [file.filename for file in self.files]
return list(self.files.keys())

@property
@abc.abstractmethod
def files(self) -> List[RepositoryFile]:
def files(self) -> Dict[str, RepositoryFile]:
"""List of all files in the repository."""

def file(self, filename: str) -> RepositoryFile:
"""Return the file matching the filename, e.g. file.pdf"""
for file in self.files:
if file.filename == filename:
return file
raise FileNotFoundError(f'The file "{filename}" does not exist in the repository.')

@abc.abstractmethod
def _upload_file(self, filename: Union[str, pathlib.Path], overwrite: bool = False):
def __upload_file__(self, filename: Union[str, pathlib.Path], overwrite: bool = False):
"""Upload a file to the repository. This is a regular file uploader, hence the
file can be of any type. This is a private method, which needs to be implemented
by every repository interface. Will be called by `upload_file`"""
Expand Down Expand Up @@ -206,12 +214,16 @@ def upload_file(self,
else:
meta_data_file = None

self._upload_file(filename=filename, overwrite=overwrite)
self.__upload_file__(filename=filename, overwrite=overwrite)

if meta_data_file is not None:
self._upload_file(filename=meta_data_file, overwrite=overwrite)
self.__upload_file__(filename=meta_data_file, overwrite=overwrite)
self.refresh()

@deprecated(version='1.4.0rc1',
msg='This method is deprecated. '
'Use `.upload_file(...)` instead and provide the '
'metamapper parameter there')
def upload_hdf_file(self,
filename,
metamapper: Callable[[Union[str, pathlib.Path]], pathlib.Path],
Expand Down
Loading

0 comments on commit d09d478

Please sign in to comment.