From a325db6728e4d7fee34ba80067341fc9b053c61e Mon Sep 17 00:00:00 2001 From: lemon24 Date: Wed, 19 Jun 2024 00:13:38 +0300 Subject: [PATCH] User guide: Add Scheduled updates, clean up Updated feeds. #332 --- CHANGES.rst | 2 + CONTRIBUTING.rst | 2 + docs/cli.rst | 2 + docs/guide.rst | 119 ++++++++++++++++++++++++++++++++++--------- src/reader/_types.py | 2 +- 5 files changed, 103 insertions(+), 24 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 79b3e1f5..5710aaf2 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -19,6 +19,7 @@ Unreleased not if it was never updated successfully (:attr:`~Feed.last_updated`). (:issue:`332`) * Add :meth:`~Reader.update_feeds()`, :meth:`~Reader.get_feeds()`, etc. argument ``scheduled`` to allow updating only feeds scheduled to be updated. + See :ref:`scheduled` for details. (:issue:`332`) .. FIXME: versionchanged on update_feeds() etc. @@ -26,6 +27,7 @@ Unreleased * Add ``--scheduled`` flag to the ``update`` command. (:issue:`332`) * Group mutually-exclusive attributes of :class:`~.FeedUpdateIntent` into its :attr:`~.FeedUpdateIntent.value` union attribute. (:issue:`332`) +* New and improved :ref:`update` user guide section. * The :mod:`~reader._plugins.cli_status` plugin now records the output of multiple runs instead of just the last one, diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 817937d1..003f8438 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -5,6 +5,8 @@ Thank you for considering contributing to *reader*! +.. _issues: + Reporting issues ---------------- diff --git a/docs/cli.rst b/docs/cli.rst index 8c545551..045e4e61 100644 --- a/docs/cli.rst +++ b/docs/cli.rst @@ -45,6 +45,8 @@ Serve the web application locally (at http://localhost:8080/): python -m reader serve +.. _cli-update: + Updating feeds -------------- diff --git a/docs/guide.rst b/docs/guide.rst index 0d8876b0..73c6237c 100644 --- a/docs/guide.rst +++ b/docs/guide.rst @@ -36,8 +36,7 @@ The default (and currently only) storage uses SQLite, so the path behaves like the ``database`` argument of :func:`sqlite3.connect`: * If the database does not exist, it will be created automatically. -* You can pass ``":memory:"`` to use a temporary in-memory database; - the data will disappear when the reader is closed. +* You can pass ``':memory:'`` to use a :ref:`temporary database `. .. _lifecycle: @@ -93,11 +92,18 @@ or if the thread was not created through the :mod:`threading` module when garbage-collected). +.. _temp: + Temporary databases ~~~~~~~~~~~~~~~~~~~ +With the default SQLite storage, +you can use an `in-memory`_ (or `temporary`_) database +by using ``':memory:'`` (or ``''``) as the database path; +the data will disappear when the reader is closed. + To maximize the usefulness of temporary databases, -the database connection is closed (and the data discarded) +the connection is closed (and the data discarded) only when calling :meth:`~Reader.close`, not when using the reader as a context manager. The reader cannot be reused after calling :meth:`~Reader.close`. @@ -127,6 +133,10 @@ since each connection would be to a *different* database:: reader.exceptions.StorageError: usage error: cannot use a private database from threads other than the creating thread +.. _in-memory: https://sqlite.org/inmemorydb.html +.. _temporary: https://sqlite.org/inmemorydb.html#temp_db + + .. _backups: Back-ups @@ -207,42 +217,97 @@ use :meth:`~Reader.delete_feed`:: Updating feeds -------------- -To retrieve the latest version of a feed, along with any new entries, -it must be updated. You can update all the feeds by using the :meth:`~Reader.update_feeds` method:: >>> reader.update_feeds() >>> reader.get_feed(feed) Feed(url='http://www.hellointernet.fm/podcast?format=rss', updated=datetime.datetime(2020, 2, 28, 9, 34, 2, tzinfo=datetime.timezone.utc), title='Hello Internet', ...) - To retrive feeds in parallel, use the ``workers`` flag:: >>> reader.update_feeds(workers=10) - -You can also update a specific feed using :meth:`~Reader.update_feed`:: +You can update a single feed using :meth:`~Reader.update_feed`:: >>> reader.update_feed("http://www.hellointernet.fm/podcast?format=rss") + UpdatedFeed(url='http://www.hellointernet.fm/podcast?format=rss', new=100, modified=0, unmodified=0) + + +Saving bandwidth +~~~~~~~~~~~~~~~~ + +If supported by the server, +*reader* uses the `ETag and Last-Modified headers`_ +to get the entire content of a feed only if it changed. + +.. important:: + + If you prevent *reader* from saving feed state between updates + (e.g. by using a :ref:`temporary database `, + or by deleting the database or feeds every time), + you will repeatedly download feeds that have not changed. + This wastes your bandwidth and the publisher's bandwidth, + and the publisher may ban you from accessing their server. + +Even so, you should not update feeds *too* often; +every hour seems reasonable. +To update newly-added feeds as soon as they are added, +you can call :meth:`update_feeds(new=True) ` +more often (e.g. every minute). + +.. seealso:: + + The :ref:`cli-update` section of :doc:`cli` + for an example of how to do this using cron. + -If supported by the server, *reader* uses the ETag and Last-Modified headers -to only retrieve feeds if they changed -(`details `_). -Even so, you should not update feeds *too* often, -to avoid wasting the feed publisher's resources, -and potentially getting banned; -every 30 minutes seems reasonable. +.. _ETag and Last-Modified headers: https://feedparser.readthedocs.io/en/latest/http-etag.html -To support updating newly-added feeds off the regular update schedule, -you can use the ``new`` flag; -you can call this more often (e.g. every minute):: - >>> reader.update_feeds(new=True) +.. _scheduled: +Scheduled updates +~~~~~~~~~~~~~~~~~ + +Because different feeds need to be updated at different rates, +*reader* also provides a mechanism for scheduling updates. + +Each feed has an update interval that, on every update, +determines when the feed should be updated next. +Running :meth:`update_feeds(scheduled=True) ` +updates only the feeds that should be updated at or before the current time. + +The global and per-feed update interval can be **configured by the user** +via the ``.reader.update`` global/feed tag; +the default interval is of one hour; +see :data:`~reader.types.UpdateConfig` for the schema. +In addition to the interval, the user can specify a jitter; +for an interval of 24 hours, a jitter of 0.25 means +the update will occur any time in the first 6 hours of the interval. + +.. note:: + + As of |version|, there is no way to specify a minimum update interval. + If you want feeds to be updated no more often than e.g. every hour, + you have to run :meth:`update_feeds(scheduled=True) ` + no more often than every hour. + + Please :ref:`open an issue ` if you need a minimum update interval. + +In a future version of *reader*, +the same mechanism will be used to handle +HTTP 429 Too Many Requests; see :issue:`307` for details. + + +.. versionadded:: 3.13 + + +Update status +~~~~~~~~~~~~~ If you need the status of each feed as it gets updated (for instance, to update a progress bar), -you can use :meth:`~Reader.update_feeds_iter` instead, +you can use :meth:`~Reader.update_feeds_iter` instead of :meth:`~Reader.update_feeds`, and get a (url, updated feed or none or exception) pair for each feed:: >>> for url, value in reader.update_feeds_iter(): @@ -257,9 +322,14 @@ and get a (url, updated feed or none or exception) pair for each feed:: https://www.relay.fm/cortex/feed not modified +Regardless of the update method used, +:attr:`Feed.last_retrieved`, :attr:`~Feed.last_updated`, +and :attr:`~Feed.last_exception` will be set accordingly +(also see :ref:`errors`). + -Disabling feed updates ----------------------- +Disabling updates +~~~~~~~~~~~~~~~~~ Sometimes, it is useful to skip a feed when using :meth:`~Reader.update_feeds`; for example, the feed does not exist anymore, @@ -759,7 +829,8 @@ This applies to the following names: * tag keys * the top-level keys of dict tag values -Currently, there are no *reader*-reserved names; +Currently, the only *reader*-reserved names +are used by `Scheduled updates`_ and by :ref:`built-in plugins`; new ones will be documented here. The prefixes can be changed using @@ -887,6 +958,8 @@ depending on their type attribute or feedparser defaults: +.. _errors: + Errors and exceptions --------------------- diff --git a/src/reader/_types.py b/src/reader/_types.py index b64de006..3dc8bfc1 100644 --- a/src/reader/_types.py +++ b/src/reader/_types.py @@ -1196,7 +1196,7 @@ class SearchType(Protocol): # pragma: no cover In the future, search may receive object lifecycle methods (context manager + ``close()``), to support implementations that do not share state with the storage. - If you need support for this, please open a issue. + If you need support for this, please :ref:`open an issue `. """