Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SneakerNet Required Changes #47

Merged
merged 36 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
e48ac1f
Minor movement of configs in librarian server start
JBorrow Feb 26, 2024
03a05cd
Add fractional store filling
JBorrow Feb 26, 2024
fccda38
Added ability to clone to multiple stores and failover automagically
JBorrow Feb 26, 2024
509a666
Added generation of store manifests to python api
JBorrow Feb 27, 2024
d6dcd1c
Add 'enabled' flag on stores.
JBorrow Feb 28, 2024
9f1351e
Add client ops for store state setting and tests
JBorrow Feb 28, 2024
84edbb3
Add ability to kill off a store once we have generated a manifest.
JBorrow Feb 28, 2024
2993cc4
Add ability to teardown a store during manifest creation
JBorrow Feb 29, 2024
4fc262d
Add new featuers to client interface
JBorrow Feb 29, 2024
f15e8c5
Add remote instance creation in complete endpoint.
JBorrow Feb 29, 2024
5c6374b
Add file and outgoing transfer link
JBorrow Feb 29, 2024
cd98870
Cleanup for remote transfers
JBorrow Feb 29, 2024
9ed2a55
Base sneakernet transfer working.
JBorrow Feb 29, 2024
aa8287c
Store remaining space should be an int, not a float.
JBorrow Feb 29, 2024
ac81688
Minor auth concerns and completing fake transfers in tests
JBorrow Mar 1, 2024
fb1c4ec
Perform assertion tests in sneaker test
JBorrow Mar 1, 2024
fbdc654
Remove completed TODO
JBorrow Mar 1, 2024
4109d61
Add slack webhook integration
JBorrow Mar 1, 2024
b7d78c9
Actually read librarian server name intests
JBorrow Mar 1, 2024
e9f239b
Add encryption of passwords for remote librarians
JBorrow Mar 1, 2024
410afb5
Add basic docs for sneaker
JBorrow Mar 1, 2024
9e9c8cd
More complex instance querying
JBorrow Mar 4, 2024
772e9eb
A little more doc can't hurt
JBorrow Mar 4, 2024
3c7e7dd
Added CLI for store interaction
JBorrow Mar 4, 2024
4f53d51
Update docs to match new cli
JBorrow Mar 4, 2024
cfa191a
Minor style change for cli
JBorrow Mar 4, 2024
5f90e32
Correct subsection syntax
JBorrow Mar 4, 2024
74cb484
Add soft timeout and max process to recv
JBorrow Mar 4, 2024
b164a65
Added new manifest generation cli option
JBorrow Mar 5, 2024
fb58717
Try multi-line string
JBorrow Mar 5, 2024
595a1c1
Add ingestion pre-commit
JBorrow Mar 5, 2024
5a7514f
Add final docs
JBorrow Mar 5, 2024
a8e37db
Remove syntax highlighting, it doesnt work
JBorrow Mar 5, 2024
e7ccbc4
Add progress bar note
JBorrow Mar 5, 2024
1ae9477
Safety note
JBorrow Mar 5, 2024
80f393e
Note, not topic
JBorrow Mar 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions alembic/versions/d8934c52bac5_enable_sneakernet_transfers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright 2017 the HERA Collaboration
# Licensed under the 2-clause BSD License.

"""enable sneakernet transfers

Revision ID: d8934c52bac5
Revises: 71df5b41ae41
Create Date: 2024-02-28 15:48:44.705721

"""
import sqlalchemy as sa
from sqlalchemy.orm.session import Session

from alembic import op
from librarian_server.orm import File, OutgoingTransfer, StoreMetadata

revision = "d8934c52bac5"
down_revision = "71df5b41ae41"
branch_labels = None
depends_on = None


def upgrade():
with op.batch_alter_table("store_metadata") as batch_op:
batch_op.add_column(sa.Column("enabled", sa.Boolean, default=True))

with op.batch_alter_table("incoming_transfers") as batch_op:
batch_op.add_column(sa.Column("source_transfer_id", sa.Integer))

with op.batch_alter_table("outgoing_transfers") as batch_op:
batch_op.add_column(sa.Column("file_name", sa.String(256)))

batch_op.create_foreign_key(
"fk_outgoing_transfers_file_name_files", "files", ["file_name"], ["name"]
)

# Now perform data migration
session = Session(bind=op.get_bind())

for store in session.query(StoreMetadata).all():
store.enabled = True
session.commit()

for transfer in session.query(OutgoingTransfer).all():
file = session.query(File).get(transfer.file_name)
transfer.file_name = file.name
transfer.file = file
session.commit()

# Now mark the columns as not nullable
with op.batch_alter_table("store_metadata") as batch_op:
batch_op.alter_column("enabled", nullable=False)

with op.batch_alter_table("outgoing_transfers") as batch_op:
batch_op.alter_column("file_name", nullable=False)


def downgrade():
op.drop_column("store_metadata", "enabled")
op.drop_column("incoming_transfers", "source_transfer_id")
op.drop_column("outgoing_transfers", "file_name")
49 changes: 49 additions & 0 deletions docs/Background.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Background Tasks
================

Background tasks are persistent tasks that run alongside the
librarian web server. They have access to the same database,
but they need not even run in the same container, and are
hence extremely flexible and scalable.

It is typical to run the background tasks in a separate
thread when running the librarian server (this is handled
automatically by the ``librarian-server-start`` script).
Most installations of the librarian will find that this is
more than suitable for their needs. For more complex scenarios,
where multithreaded web servers are required, may find it
useful to run the background tasks separately with the
``librarian-background-only`` script.

The background tasks are defined in the separate
``librarian_background`` package, and are configured
using a json file pointed to by ``$LIBRARIAN_BACKGROUND_CONFIG``.

Each background task is configured using a small json
object. The following two configurations are required,
for example, for a SneakerNet transfer:

.. code::json

{
"create_local_clone": [
{
"task_name": "Local cloner",
"soft_timeout": "00:30:00",
"every": "01:00:00",
"age_in_days": 7,
"clone_from": "store",
"clone_to": "clone",
"files_per_run": 256,
}
],
"recieve_clone": [
{
"task_name": "Clone receiver",
"soft_timeout": "00:30:00",
"every": "01:00:00",
"files_per_run": 256,
}
]
}

195 changes: 195 additions & 0 deletions docs/Sneaker.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
SneakerNet Transfers
====================

Privilege Level Required: Administrator.
Version: 2.1.0 and above.
Database Requirement: ``d8934c52bac5``

Introduction
------------

SneakerNet transfers are asynchronous transfers that can be managed
by the librarian. A SneakerNet transfer is a transfer of data from one
site to another using human power; i.e. the data is moved by hand
(for instance, on a USB stick, hard drives, etc.).

A generic SneakerNet transfer occurs in the following steps:

- A clone of a sub-set of data is created on an external device.
- A manifest of the data is created.
- The data is physically transferred to the destination site.
- The manifest is used to ingest the data into the destination site.
- A callback from the destination to the source occurs to confirm the
transfer has completed successfully.

Specifically within the librarian, SneakerNet transfers have the
following steps:

- If this is the first time using SneakerNet, add a new store to the
librarian that represents the device you would like to use to
SneakerNet data. If the store already exists, make sure it is
enabled using the administrator endpoint ``set_store_state``.
- Set up a ``CreateLocalClone`` background task on the source
librarian to create a copy of the data to be transferred.
- Register the remote librarian with the source librarian and
vice-versa.
- Use the ``get_store_manifest`` client operation to create a
manifest of the cloned store. There are a few helpful options
here: ``create_outgoing_transfers`` creates an ``OutgoingTransfer``
object for each file in the store to the ``destination_librarian``,
``disable_store`` disables the store on the source librarian before
generating the manifest (to ensure no new data is added to the store
and to allow the device to be swapped out), and
``mark_local_instances_as_unavailable`` marks all instances of
the file on the new store as unavailable.
- This store manifest can then be saved to the device to be moved
along with the data. It is recommended that you back up (and
potentially version control) the manifests.
- Move the device to the destination site.
- Use the ``ingest_store_manifest`` client operation to ingest the
data into the destination librarian. At this point, the data is
only staged on the librarian, and is not yet available on the
store or to users.
- The ``RecieveClone`` background task on the destination librarian
will create a ``File`` and ``Instance`` for each file in the store.
Afterwards, the destination librarian will use its database
entry for the source librarian to callback. As part of processing
this callback, the source librarian will mark its ``OutgoingTransfer``
as complete and create a ``RemoteInstance`` for each file that
has been successfully transferred.

Below, we have a step-by-step guide to performing a SneakerNet transfer using
the librarian command-line interface.

Step 1: Adding or enabling a store
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information on adding a store, see :ref:`Stores`. It is crucial
to mark SneakerNet stores as 'non-ingestible' (i.e. set ``ingestible: false``
in the configuration file), otherwise they themselves will ingest new
data passed to the librarian.

There are three main states that are important for stores:

1. ``ingestible``: Whether or not 'fresh' files (those sent from uploads
or from clones) can be added to the store.
2. ``enabled``: Whether or not the store is currently marked as available
for use. All stores start out enabled, but may be disabled when they
are full, or a disk is being swapped out.
3. ``available``: This is an internal state that is tracked, irrespective
of ``ingestible`` or ``enabled`` which indicates whether the physical
device is available for recieving commands. For local stores, this is
generally forced to be true.

If your store is starting out disabled, you will need to enable it
by using the ``set_store_state`` endpoint. This can be easily accomplished
using the command-line utility:

.. code::

$ librarian set-store-state local-librarian --store local-store --enabled
Store local-store state set to enabled.

This sets a store called ``local-store`` on a librarian (as defined in
``~/.hl_config.cfg``) to be enabled. If the store is already enabled, this will
still go through.

If you need to know what stores are available on the librarian, you can use
the following command-line wrapper to ``get_store_list``:

.. code::

$ librarian get-store-list local-librarian
local-store (local) [599.5 GB Free] - Ingestable - Available - Enabled

Which will print out helpful information about all attached stores to the
librarian. As these things are generally meant to be transparent to regular
users of the librarian, these endpoints require administrator privileges.

Step 2: Background tasks and remote librarians
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are two core background tasks that are used in SneakerNet transfers:
``CreateLocalClone`` and ``ReceiveClone``. The first is used at the source site
to create a complete clone of the data ingested into the librarian, and the
latter is used to ingest the data into the destination librarian. More
information on background task scheduling is available in the :ref:`Background`
section.

At each librarian site, you will also need to register the remote librarian
using the command-line tools. This will also generally involve account
provision on both librarians, as callbacks are required.

To provision a new account, you will need to use the ``create_user``
endpoint, which can be accessed through the command-line tool:

TODO: THIS SHOULD BE COMPLETED IN RESPONSE TO ISSUE #61.

Once the appropriate accounts are provisioned, you will need
to register them with their respective librarians. This can be done
with the ``register_remote_librarian`` endpoint:

TODO: THIS SHOULD BE COMPLETED IN RESPONSE TO ISSUE #60

Step 3: Creating a store manifest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once one of your SneakerNet stores are filled up, you can create
a manifest of the store using the ``get_store_manifest`` endpoint.
This process will also disable the store on the source librarian,
create outgoing transfers, and mark local instances as unavailable,
ready for the disk to be replaced.

.. code::

$ librarian get-store-manifest local-librarian \
--store local-clone --create-outgoing-transfers \
--disable-store --mark-instances-as-unavailable \
--output /path/to/manifest.json

The file will be saved as a serialized json object. It is strongly
recommended that you back up this file, as it is the only unique
record of the data that is being transferred. It should also likely
be packaged with the SneakerNet transfer for easy ingestion on
the other side.

.. note:: Safety Note
It may be worth disabling the store manually first, then
generating a manifest with none of the extra options turned
on (i.e. no ``--create-outgoing-transfers`` or
``--mark-instances-as-unavailable``) at first. You can then
re-run the command to do these things, safe in the knowledge
you have an already existing backup of the store manifest.


Step 4: Moving the data
^^^^^^^^^^^^^^^^^^^^^^^

You will then need to move the data to the destination site. This
is generally done by physically moving the device to the destination
site. It is recommended that you also move the manifest file with
the data, as it will be required for the next step, as well as
sending this (considerably smaller amount of data) over the network.

Step 5: Ingesting the store manifest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once the data has been moved to the destination site, you will need
to ingest the data into the librarian. This is done using the
``ingest_store_manifest`` endpoint:

.. code::

$ librarian ingest-manifest local --manifest ./test_manifest.json --store-root=/path/to/sneaker/device/store
Ingesting manifest: 100%|███████████████████████████████| 4/4 [00:00<00:00, 31.48it/s]
Successfully ingested 3/4 files, 1/4 already existed.

If this fails, you can always try again (as long as the root cause is
fixed!) as the librarian will not ingest the same file twice. You
will need to have the optional library ``tqdm`` installed to see the
progress bar.

Note that this does not necessarily mean that the files are available
on the destination librarian right away. You will need to wait until the
``ReceiveClone`` background task has completed, and the source librarian
has received the callback from the destination librarian.
48 changes: 48 additions & 0 deletions docs/Stores.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Stores
======

Stores in the librarian are abstractions over various storage systems.
The most common of these is a local POSIX filesystem, but there is
a completely abstract system within the librarian for interacting with
any generic storage system (such as a block or key-value store).

Because stores are such a foundational part of the librarian, they are
defined in the configuration file, as pointed to by
``$LIBRARIAN_CONFIG_PATH``.

The configuration file is a JSON file, and the stores are defined in the
``add_stores`` key. The value of this key is a dictionary that is
de-serialized to a ``StoreMetadata`` class. In the same block, you will
need to defined the transfer managers that can be used for data
ingress and egress. Documentation on how to configure the stores
is available under the stores themselves, and the transfer managers
likewise.

More documentation will be added in the future.

.. code:: json
"add_stores": [
{
"store_name": "store",
"store_type": "local",
"ingestable": true,
"store_data": {
"staging_path": "/tmp/store/libstore/staging",
"store_path": "/tmp/store/libstore/store",
"report_full_fraction": 1.0,
"group_write_after_stage": true,
"own_after_commit": true,
"readonly_after_commit": true
},
"transfer_manager_data": {
"local": {
"available": true,
"hostnames": [
"compute-0.0.local",
"example-librarian-hostname"
]
}
}
}
]

Loading
Loading