From 8966b43330aadbd731f6810c5bb63bdeed307d51 Mon Sep 17 00:00:00 2001 From: Jonathan Pyle Date: Sun, 3 Nov 2024 14:58:44 -0500 Subject: [PATCH] SQLObject pitfalls --- _docs/config.md | 20 +++++----- _docs/installation.md | 20 ++++++++-- _docs/objects.md | 91 ++++++++++++++++++++++++++++++++----------- 3 files changed, 95 insertions(+), 36 deletions(-) diff --git a/_docs/config.md b/_docs/config.md index c7c8a0504..dccec0a1f 100644 --- a/_docs/config.md +++ b/_docs/config.md @@ -4964,16 +4964,16 @@ celery processes: 15 This will cause 15 [Celery] workers to be spawned. -Note that there are two [Celery] systems: one called `celerysingle` with a -single worker, and one called `celery` with one or more workers. The -`celerysingle` system is used for background processes that will -perform parallel processing and use every CPU in the machine; it would -be dangerous to run more than one such process at a time. The `celery` -system is used for all other tasks, which [Celery] may in parallel on -all the available worker processes. Thus if the `celery processes` is -`15`, the [`worker_concurrency`] of one system will be `1` for -`celerysingle` and `14` for `celery`. You can see this if you do `ps -ax | grep celery` on the server. +Note that there are two [Celery] systems: one called `celerysingle` +with a single worker, and one called `celery` with one or more +workers. The `celerysingle` system is used for background processes +that will perform parallel processing and use every CPU in the +machine; it would be dangerous to run more than one such process at a +time. The `celery` system is used for all other tasks, which [Celery] +may run in parallel on all the available worker processes. Thus, if +the `celery processes` is `15`, the [`worker_concurrency`] will be `1` +for `celerysingle` and `14` for `celery`. You can see this if you do +`ps ax | grep celery` on the server. If you want the number of [Celery] processes to scale with CPU but you think your system can handle more concurrency than the standard diff --git a/_docs/installation.md b/_docs/installation.md index c60f7bd1e..f66483a66 100644 --- a/_docs/installation.md +++ b/_docs/installation.md @@ -16,7 +16,7 @@ follow the installation instructions on this page. If you do not already have [Docker], you can install [Docker] on your machine whether you have a Mac, a PC, or a Linux machine. -For example, on a Windows 10 machine, once you install +For example, on a Windows 11 machine, once you install [Docker for Windows], you simply go to Windows PowerShell and type: {% highlight bash %} @@ -26,6 +26,12 @@ docker run -d -p 80:80 jhpyle/docassemble Then, after a few minutes, the application will be available in your browser at http://localhost. +If the machine is already using port 80, you can do something like +`docker run -d -p 8080:80 jhpyle/docassemble` to run **docassemble** +on http://localhost:8080. You could also set up a [reverse proxy] on +the web server so that users can access the **docassemble** +application through the web server through a standard port. + Even if you want to put **docassemble** into production, it is recommended that you [install it using Docker] -- ideally on an [EC2] virtual machine hosted by [Amazon Web Services]. **docassemble** @@ -34,9 +40,14 @@ Azure] services such as [Azure blob storage] for persistent storage. It also supports [S3]-compatible object storage services. The primary reason you might want to install **docassemble** manually -on a machine is if you want it to run on a server for which the HTTP -and HTTPS ports are serving other applications. ([Docker] can only -use the HTTP and HTTPS ports if it has exclusive use of them.) +on a machine is if [Docker]'s container/host separation interferes +with a customized setup that you want to use. For example, you might +want the **docassemble** server to use the NGINX, PostgreSQL, Redis, +RabbitMQ, Celery, and/or `cron` servers that are already running on +the machine. If you install **docassemble** manually, then when +`supervisord` starts up, the `initialize` service will see that these +servers are already running, and they will not be started using +`supervisord`. # Minimum system requirements @@ -1795,3 +1806,4 @@ All of these system administration headaches can be avoided by [Telnyx]: https://telnyx.com/ [Keycloak]: https://www.keycloak.org/ [Zitadel]: https://zitadel.com +[reverse proxy]: {{ site.baseurl }}/docs/docker.html#forwarding diff --git a/_docs/objects.md b/_docs/objects.md index 26aa52010..74961920b 100644 --- a/_docs/objects.md +++ b/_docs/objects.md @@ -6221,7 +6221,7 @@ functionality of ordinary object instances, but with the added feature that particular attributes (or attributes of sub-objects) will synchronize with a [SQL] database. -Using the `SQLObject` feature requires: +The `SQLObject` is an expert feature. Using it requires: * Knowing how [SQL] databases work; * Knowing how to create a [SQL] database; @@ -6236,16 +6236,16 @@ Here is an example of the use of `SQLObject`: The `Customer` and `Bank` classes are defined in the `demodb.py` file. `Customer` is a subclass of `Individual` and `SQLObject` (using -[multiple inheritance]. `Bank` is a subclass of `Person` and -`SQLObject`. Behind every `Customer` is a row in a [SQL] table listing -customers. Behind every `Bank` is a row in a [SQL] table listing banks. -These tables are in a separate [SQL] database from the database where -**docassemble**'s interview answers are stored. This [SQL] database can -be any database capable of being accessed using [SQLAlchemy]. The -database tables can be pre-existing (e.g., a database for a case -management system) or created for the sole purpose of storing data from -interviews. If the tables do not exist, [SQLAlchemy] will create them -when the module loads. +[multiple inheritance]). `Bank` is a subclass of `Person` and +`SQLObject`. Behind every `Customer` is a row in a [SQL] table +listing customers. Behind every `Bank` is a row in a [SQL] table +listing banks. These tables are in a separate [SQL] database from the +database where **docassemble**'s interview answers are stored. This +[SQL] database can be any database capable of being accessed using +[SQLAlchemy]. The database tables can be pre-existing (e.g., a +database for a case management system) or created for the sole purpose +of storing data from interviews. If the tables do not exist, +[SQLAlchemy] will create them when the module loads. In the interview, the user is first asked for a unique identifier (SSN) about the `customer`. If the the SSN matches the SSN of a @@ -6779,10 +6779,10 @@ in the SQL database. This allows you to use the interview answers as a kind of "staging area" for information before writing it to the SQL database. -If an object is stored both in the interview answers and in Python, -but then it changes inside the SQL server, then the next time the -interview answers are retrieved, the attributes in the Python objects -will be updated with the values in SQL. +If an object is stored both in the interview answers and on the SQL +server, and then the columns in the SQL record change, then the next +time the interview answers are retrieved, the attributes in the Python +objects will be updated with the values in SQL. However, if the item is deleted from SQL, then when the corresponding Python object is retrieved, it will become a "zombie" object. It will @@ -6811,11 +6811,13 @@ database columns is controlled by the [`db_get()`], [`db_set()`], and [`db_null()`] methods of the class. The [`db_get()`] method takes a column name and tries to obtain a value for it from [Python] land. The [`db_set()`] method takes a column name and a value from [SQL] -land and saves that value in [Python] land. For example, in the above -[Python module], the `first_name` column is associated with -`.name.first` attribute of the `Customer` object. The [`db_null()`] -method takes a column name and tries to delete the object attribute in -[Python] land that is associated with the given column. +land and saves that value in [Python] land. (Think of the verbs "get" +and "set" as applying to attributes of the Python object, not columns +in the SQL record.) For example, in the above [Python module], the +`first_name` column is associated with `.name.first` attribute of the +`Customer` object. The [`db_null()`] method takes a column name and +tries to delete the object attribute in [Python] land that is +associated with the given column. When you initialize an object, you can give it the unique ID, and if a record exists in SQL with that unique ID, then the object will be @@ -6831,7 +6833,7 @@ unique integer that never changes. The `id` is set when the record is created in [SQL], using an auto-incrementing counter. If you know the `id` of a record you can use it to initialize your object so that it is non-nascent from the start. For example, here is a way to use a -URL parameter (or in the alternative, a [`question`], to get the `id` +URL parameter (or in the alternative, a [`question`]), to get the `id` for a customer record: {% highlight yaml %} @@ -6856,7 +6858,7 @@ it, it will run [`db_set()`] and update information in [Python] based on the values of the columns in [SQL]. For example, assume there is a customer in the [SQL] database with SSN -122-23-2322, whose first name is John and whose last name is John Smith. +122-23-2322, whose first name is John and whose last name is Smith. {% highlight yaml %} objects: @@ -7070,6 +7072,51 @@ code will know that the column information does not exist. You do not need to use `try`/`except` logic of your own in these methods; just follow the pattern above. +## Pitfalls + +### Multiple Python objects associated with a single SQL record + +When using `SQLObject`, make sure that your interview logic does not +create multiple separate Python objects associated with the same SQL +record. If you do that, then if you change the attributes of one +object, but not the other, the two objects will be in conflict with +one another. Which attributes are saved to the server will be random, +depending on which object is [pickle]d last. + +The `SQLObject` maintains an object cache under the `_internal` +dictionary in the interview answers. When a `SQLObject` has an `id`, a +reference to that object will be created in the cache. Methods like +`.filter(),` `.all(),` and `.by_id()` and `.by_uid()` return +references to this cache if the object is in the cache, rather than +create new objects. The cache helps avoid the problem of multiple +separate Python objects existing in the interview answers. + +### Concurrency + +Between the time when the screen starts loading (when data are copied +from SQL data to Python) and the screen finishes loading (when data +are copied from Python to SQL), no lock is placed on the SQL +records. Suppose session A copies data from SQL to Python at time 0, +session B copies data from the same SQL record to Python at time 1, +session B writes changes to the SQL record at time 2, and session A +writes changes to the SQL record at time 3. In this scenario, session +A's changes would overwrite the changes that session B made. + +The same result would happen if a third-party application made changes +to the SQL record instead of session B. + +However, suppose session A copies data from SQL to Python at time 0, +session B copies data from the same SQL record to Python at time 1, +session B writes changes to the SQL record at time 2, and then at time +3 the interview logic of session A finishes without making any +changes. In this scenario, session A would not overwrite the changes +of session B, because from the perspective of session A, there is no +need to take the time to write data to SQL, since nothing changed. + +If you expect that SQL records might be altered concurrently, using +`SQLObject`s to synchronize between interview answers and a SQL +database might not be sufficiently robust. + ## Reference guide ### Class attributes