changeset 643:411e82bb3a9f

documenting revisions and multiple writers
author Sebastien Jodogne <s.jodogne@gmail.com>
date Fri, 23 Apr 2021 15:47:14 +0200
parents a76d83a00c68
children a180cf06a2f3
files Sphinx/source/faq/features.rst Sphinx/source/faq/scalability.rst Sphinx/source/images/2021-04-22-MultipleWriters.png Sphinx/source/plugins/mysql.rst Sphinx/source/plugins/postgresql.rst
diffstat 5 files changed, 276 insertions(+), 42 deletions(-) [+]
line wrap: on
line diff
--- a/Sphinx/source/faq/features.rst	Fri Apr 23 10:34:13 2021 +0200
+++ b/Sphinx/source/faq/features.rst	Fri Apr 23 15:47:14 2021 +0200
@@ -303,3 +303,70 @@
 (e.g. using a distributed `message-broker system
 <https://en.wikipedia.org/wiki/Message_broker>`__ such as RabbitMQ
 that is fed by an Orthanc plugin).
+
+
+.. _revisions:
+
+Revisions
+---------
+
+.. highlight:: bash
+
+Higher-level applications built on the top of Orthanc might have to
+modify metadata and/or attachments. This can cause concurrency
+problems if multiple clients modify the same metadata/attachment
+simultaneously. To avoid such problems, Orthanc implements a so-called
+**revision mechanism** to protect from concurrent modifications.
+
+The revision mechanism is optional, was introduced in **Orthanc
+1.9.2** and must be enabled by setting :ref:`configuration option
+<configuration>` ``CheckRevision`` to ``true``. It is strongly
+inspired by the `CouchDB API
+<https://docs.couchdb.org/en/stable/api/document/common.html>`__.
+
+When the revision mechanism is enabled, each metadata and attachment
+is associated with a **revision number**. Whenever one sets a metadata
+for the first time using a ``PUT`` query, this revision number can be
+found in the HTTP header ``ETag`` that is reported by Orthanc::
+
+  $ curl -v http://localhost:8042/instances/19816330-cb02e1cf-df3a8fe8-bf510623-ccefe9f5/metadata/1024 -X PUT -d 'Hello'
+  [...]
+  < ETag: "0"
+
+Any ``GET`` query will also return the current value of ``ETag``::
+  
+  $ curl -v http://localhost:8042/instances/19816330-cb02e1cf-df3a8fe8-bf510623-ccefe9f5/metadata/1024
+  [...]
+  < ETag: "0"
+
+If one needs to subsequently modify or delete this metadata, the HTTP
+client must set this value of ``ETag`` into the ``If-Match`` HTTP
+header::
+
+  $ curl -v http://localhost:8042/instances/19816330-cb02e1cf-df3a8fe8-bf510623-ccefe9f5/metadata/1024 -X PUT -d 'Hello 2' -H 'If-Match: 0'
+  [...]
+  < ETag: "1"
+
+Note how this second call has incremented the value of ``ETag``: This
+is the new revision number to be used in future updates. If a bad
+revision number is provided, the HTTP error ``409 Conflict`` is
+generated::
+
+  $ curl -v http://localhost:8042/instances/19816330-cb02e1cf-df3a8fe8-bf510623-ccefe9f5/metadata/1024 -X PUT -d 'Hello 3' -H 'If-Match: 0'
+  [...]
+  < HTTP/1.1 409 Conflict
+
+Such a ``409`` error must be handled by the higher-level
+application. The revision number must similarly be given if deleting a
+metadata/attachment::
+
+  $ curl -v http://localhost:8042/instances/19816330-cb02e1cf-df3a8fe8-bf510623-ccefe9f5/metadata/1024 -X DELETE -H 'If-Match: 1'
+  [...]
+  < HTTP/1.1 200 OK
+
+Check out the `OpenAPI reference <https://api.orthanc-server.com/>` of
+the REST API of Orthanc for more information.
+
+**Warning:** The database index back-end must support revisions. As of
+writing, only the **PostgreSQL plugin** in versions above 4.0
+implement support for revisions.
--- a/Sphinx/source/faq/scalability.rst	Fri Apr 23 10:34:13 2021 +0200
+++ b/Sphinx/source/faq/scalability.rst	Fri Apr 23 15:47:14 2021 +0200
@@ -90,6 +90,20 @@
   * ``SaveJobs = false``
   * ``StorageAccessOnFind = Never``
 
+* Since Orthanc 1.9.2 and PostgreSQL plugins 4.0: By default, the
+  PostgreSQL index plugin uses 1 single connection to the PostgreSQL
+  database. You can have multiple connections by setting the
+  ``IndexConnectionsCount`` to a higher value (for instance ``5``) in
+  the ``PostgreSQL`` section of the configuration file. This will
+  improve concurrency. Check out :ref:`the explanation below <multiple-writers>`.
+
+* Since Orthanc 1.9.2 and PostgreSQL plugins 4.0: If you have an
+  hospital-wide VNA deployment, you could consider to deploy multiple
+  Orthanc servers sharing the same PostgreSQL database. A typical
+  scenario is having one "writer" Orthanc server that handles the
+  ingesting of DICOM instances, and multiple "reader" Orthanc servers
+  with features such as DICOMweb or viewers.
+   
 * Make sure to carefully :ref:`read the logs <log>` in ``--verbose``
   mode, especially at the startup of Orthanc. The logs may contain
   very important information regarding performance.
@@ -175,7 +189,11 @@
 the `manpage <http://man7.org/linux/man-pages/man3/mallopt.3.html>`__
 of ``mallopt()`` for more information.
 
-**Status of the Docker images:**
+**Status:**
+
+* Since **Orthanc 1.8.2**, the global configuration ``MallocArenaMax``
+  automatically sets ``MALLOC_MMAP_THRESHOLD_`` (defaults to ``5``)
+  during the startup of Orthanc.
 
 * The ``jodogne/orthanc`` and ``jodogne/orthanc-plugins`` Docker
   images automatically set ``MALLOC_ARENA_MAX`` to ``5`` **since
@@ -191,55 +209,134 @@
 Known limitations
 -----------------
 
-Exclusive access to the DB
-^^^^^^^^^^^^^^^^^^^^^^^^^^
+Exclusive access to the DB in Orthanc <= 1.9.1
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Orthanc was originally designed as a mini-DICOM server in 1-to-1
+relation with a SQLite database. Until **Orthanc 1.9.1**, because of
+this original design, the internal code accessing the DB was affected
+by a strong limitation: Inside a single Orthanc process, there was no
+concurrent access to the DB.
+
+One solution to avoid this limitation was to have multiple Orthanc
+accessing the same DB (works only for MySQL and PostgreSQL) as
+presented in this `sample
+<https://bitbucket.org/osimis/orthanc-setup-samples/src/master/docker/multiple-orthancs-on-same-db/>`__.
+However, this solution was only robust if there was **one single
+"writer" Orthanc server** (i.e. only one Orthanc was modifying the
+database).  Indeed, the core of Orthanc <= 1.9.1 did not support the
+replay of database transactions, which is necessary to deal with
+conflicts between several instances of Orthanc that would
+simultaneously write to the database.
 
-As of Orthanc 1.9.1, the internal code accessing the DB is still affected
-by limitations induced by the SQLite engine that was the only one originally
-available at the beginning of the project: inside a single Orthanc process,
-there is no concurrent access to the DB.
+Concretely, in Orthanc <= 1.9.1, when connecting multiple Orthanc to a
+single database by setting ``Lock`` to ``false``, there should only be
+one instance of Orthanc acting as a writer and all the other instances
+of Orthanc acting as readers only. Be careful to set the option
+``SaveJobs`` to ``false`` in the configuration file of all the
+instances of Orthanc acting as readers (otherwise the readers would
+also modify the database).
 
-One solution to avoid this limitation is to have multiple Orthanc accessing
-the same DB (works only for MySQL and PostgreSQL) as presented in this `sample 
-<https://bitbucket.org/osimis/orthanc-setup-samples/src/master/docker/multiple-orthancs-on-same-db/>`__.
+Some issues reported in our bug tracker are related this limitation:
+`issue 83 <https://bugs.orthanc-server.com/show_bug.cgi?id=83>`__,
+`issue 121 <https://bugs.orthanc-server.com/show_bug.cgi?id=121>`__,
+`issue 151 <https://bugs.orthanc-server.com/show_bug.cgi?id=151>`__.
+
+This limitation has disappeared with Orthanc 1.9.2 and
+PostgreSQL/MySQL plugins 4.0, were the database engine was fully
+rewritten.
+
+
+.. _multiple-writers:
+
+Concurrent accesses to the DB in Orthanc >= 1.9.2
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Also note that the core of Orthanc does not currently support the replay
-of database transactions, which is necessary to deal with conflicts
-between several instances of Orthanc that would simultaneously write
-to the database.  As a consequence, as of Orthanc 1.9.1, when connecting multiple
-Orthanc to a single database by setting ``Lock`` to ``false``, there
-should only be one instance of Orthanc acting as a writer and all the
-other instances of Orthanc acting as readers only. Be careful to set
-the option ``SaveJobs`` to ``false`` in the configuration file of all
-the instances of Orthanc acting as readers.
+In **Orthanc 1.9.2 and PostgreSQL/MySQL plugins 4.0**, the database
+engine of Orthanc was rewritten from scratch to allow multiple
+writers/readers to share the same database. This new feature
+necessitated a full refactoring of the database engine, so as to
+replay transactions in the case of collisions between concurrent
+transactions to the database.
+
+Furthermore, one Orthanc server can also manage several connections to
+PostgreSQL or MySQL, in order to improve performance by adding
+concurrency. Read-only database transactions are also distinguished
+from read-write transactions in order for the database engine to
+further optimize the patterns of access.
+
+Summarizing, the **multiple readers/writers** is now possible. Here
+is a drawing representing a possible deployment:
+
+.. image:: ../images/2021-04-22-MultipleWriters.png
+           :align: center
+           :width: 500px
+
+Care must be taken to the following aspects:
 
-A refactoring is needed to improve the core of Orthanc in that
-respect, for which we are looking for funding from the
-industry. Some issues reported in our bug tracker call for this
-refactoring: `issue 83
-<https://bugs.orthanc-server.com/show_bug.cgi?id=83>`__, `issue 121
-<https://bugs.orthanc-server.com/show_bug.cgi?id=121>`__, `issue 151
-<https://bugs.orthanc-server.com/show_bug.cgi?id=151>`__.
+* Orthanc 1.9.2 must be combined with a database plugin that supports
+  multiple writers. This is the case of the PostgreSQL and MySQL
+  plugins with version >= 4.0. The built-in SQLite database **does
+  not** support multiple writers.
+  
+* Concurrent access can result in so-called `non-serializable
+  transactions
+  <https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable>`__
+  if two separate database transactions modify the database at the
+  same time (cf. ``ErrorCode_DatabaseCannotSerialize`` in the source
+  code of Orthanc). Orthanc will **automatically replay such
+  transactions** a certain number of times (waiting 100ms more between
+  each retry), until the transactions succeed. The plugins provide an
+  option to control the maximum number of retries. If the maximum
+  number of retries is exceeded, the ``503 Service Unavailable`` HTTP
+  error is raised (server overloaded because of unsuccessful retries
+  of concurrent transactions).
+
+* If a higher-level application **modifies metadata and/or
+  attachments** in the presence of multiple writers, Orthanc provides
+  a :ref:`revision mechanism <revisions>` to prevent concurrent
+  updates.
+
 
 
 Latency
 ^^^^^^^
 
-As of Orthanc 1.9.1, Orthanc still performs quite a large number of small
-SQL requests.  A simple request to a route like ``/studies/{id}`` can trigger
-6 SQL queries.
+For some queries to the database, Orthanc performs several small SQL
+requests. For instance, a request to a route like ``/studies/{id}``
+can trigger 6 SQL queries. Given these round-trips between Orthanc and
+the DB server, it's important for the **network latency to be as small
+as possible**. For instance, if your latency is 20ms, a single request
+to ``/studies/{id}`` might take 120ms. Typically, a latency of 1-4 ms
+is expected to have correct performances.
 
-This is not an ideal situation and this might be addressed 
-in a future larger DB refactoring (the most time-consuming queries have already
-been optimized).  Given the large number of round-trips
-between Orthanc and the DB server, it's important that the latency is reduced
-as much as possible.  I.e, if deploying Orthanc in a cloud infrastructure,
-make sure that the DB server and Orthanc VMs are located in the same datacenter.
+As a consequence, if deploying Orthanc in a cloud infrastructure, make
+sure that the DB server and Orthanc VMs are located in the **same
+datacenter**. Note that most of the time-consuming queries have
+already been optimized, and that future versions of Orthanc SDK might
+aggregate even more SQL requests.
 
-Typically, a latency of 1-4 ms is expected to have correct performances.  If your
-latency is 20ms, a simple request to ``/studies/{id}`` might spend 120ms in 
-round-trip alone.
+Starting with Orthanc 1.9.2, and PostgreSQL/MySQL index plugins 4.0,
+Orthanc can also be configured to handle **multiple connections to the
+database server** by setting the ``IndexConnectionsCount`` to a value
+greater than ``1``. This allows concurrent accesses to the database,
+which avoids to sequentially wait for a database transaction to be
+concluded before starting another one. Having multiple connections
+makes the latency problem much less important.
 
 
+Slow deletions
+^^^^^^^^^^^^^^
 
+Deleting large studies can take some time, because removing a large
+number of files from a filesystem can be an expensive operation (which
+might sound counterintuitive).
 
+It is possible to create an :ref:`storage area plugin
+<creating-plugins>` that delays the actual deletion from the
+filesystem. The plugin would maintain a queue (e.g. as a SQLite
+database) of files to be removed. The actual deletion from the
+filesystem would be done asynchronously in a separate thread.
+
+We are looking for funding from the industry to implement such a
+plugin.
Binary file Sphinx/source/images/2021-04-22-MultipleWriters.png has changed
--- a/Sphinx/source/plugins/mysql.rst	Fri Apr 23 10:34:13 2021 +0200
+++ b/Sphinx/source/plugins/mysql.rst	Fri Apr 23 15:47:14 2021 +0200
@@ -13,6 +13,9 @@
 For general information, check out the `official homepage of the
 plugins <https://www.orthanc-server.com/static.php?page=mysql>`__.
 
+For information about scalability, make sure to read the section about
+:ref:`multiple writers in large-scale deployments <multiple-writers>`.
+
 **Warning:** According to `this thread on our discussion group
 <https://groups.google.com/d/msg/orthanc-users/yV3LSTh_TjI/Fb4ShaYMBAAJ>`__,
 the MySQL/MariaDB plugins require MySQL 8.x if running on Microsoft
@@ -100,9 +103,12 @@
       "Username" : "orthanc",
       "Password" : "orthanc",
       "EnableSsl" : false,     // force SSL connections
-      "SslVerifyServerCertificates": true, // verify server certificates if EnableSsl is true
-      "SslCACertificates": "", // path to CA certificates to validate servers
-      "Lock" : true            // See section about Locking
+      "SslVerifyServerCertificates": true, // Verify server certificates if EnableSsl is true
+      "SslCACertificates": "",             // Path to CA certificates to validate servers
+      "Lock" : true,                       // See section about Locking
+      "MaximumConnectionRetries" : 10,     // New in release 3.0
+      "ConnectionRetryInterval" : 5,       // New in release 3.0
+      "IndexConnectionsCount" : 1          // New in release 4.0
     },
     "Plugins" : [
       "/home/user/orthanc-databases/BuildMySQL/libOrthancMySQLIndex.so",
@@ -165,6 +171,34 @@
 configuration of the MySQL plugins. They are documented below.
 
 
+.. _mysql-multiple-writers:
+
+Multiple writers or connections
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Starting with Orthanc 1.9.2 and MySQL 4.0, it is possible to use
+:ref:`multiple writers or connections in large-scale deployments
+<multiple-writers>`. Here is the list of configuration that control
+this behavior:
+
+* ``Lock`` must be set to ``false`` (cf. :ref:`below <mysql-lock>`)
+
+* ``MaximumConnectionRetries`` governs how many times Orthanc tries to
+  connect to the database, as well as how many times Orthanc replays
+  transactions to deal with collisions between multiple writers.
+
+* ``IndexConnectionsCount`` controls the number of connections from
+  the index plugin to the MySQL database. It is set to ``1`` by
+  default, which corresponds to the old behavior of Orthanc <= 1.9.1.
+
+* ``ConnectionRetryInterval`` is only used when opening one database
+  connection to MySQL.
+
+* As of release 4.0, the MySQL plugin does **not** support yet the
+  :ref:`revision mechanism <revisions>` to protect metadata and
+  attachments from concurrent modifications.
+
+
 Locking
 ^^^^^^^
 
--- a/Sphinx/source/plugins/postgresql.rst	Fri Apr 23 10:34:13 2021 +0200
+++ b/Sphinx/source/plugins/postgresql.rst	Fri Apr 23 15:47:14 2021 +0200
@@ -13,6 +13,9 @@
 For general information, check out the `official homepage of the
 plugins <https://www.orthanc-server.com/static.php?page=postgresql>`__.
 
+For information about scalability, make sure to read the section about
+:ref:`multiple writers in large-scale deployments <multiple-writers>`.
+
 
 
 Compilation
@@ -121,7 +124,10 @@
       "Database" : "orthanc",
       "Username" : "orthanc",
       "Password" : "orthanc",
-      "EnableSsl" : false     // New in version 3.0
+      "EnableSsl" : false,               // New in release 3.0
+      "MaximumConnectionRetries" : 10,   // New in release 3.0
+      "ConnectionRetryInterval" : 5,     // New in release 3.0
+      "IndexConnectionsCount" : 1        // New in release 4.0
     },
     "Plugins" : [
       "/home/user/orthanc-databases/BuildPostgreSQL/libOrthancPostgreSQLIndex.so",
@@ -210,6 +216,35 @@
 configuration of the PostgreSQL plugins. They are documented below.
 
 
+.. _postgresql-multiple-writers:
+
+Multiple writers or connections
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Starting with Orthanc 1.9.2 and PostgreSQL 4.0, it is possible to use
+:ref:`multiple writers or connections in large-scale deployments
+<multiple-writers>`. Here is the list of configuration that control
+this behavior:
+
+* ``Lock`` must be set to ``false`` (cf. :ref:`below <postgresql-lock>`)
+
+* ``MaximumConnectionRetries`` governs how many times Orthanc tries to
+  connect to the database, as well as how many times Orthanc replays
+  transactions to deal with collisions between multiple writers.
+
+* ``IndexConnectionsCount`` controls the number of connections from
+  the index plugin to the PostgreSQL database. It is set to ``1`` by
+  default, which corresponds to the old behavior of Orthanc <= 1.9.1.
+
+* ``ConnectionRetryInterval`` is only used when opening one database
+  connection to PostgreSQL.
+
+* The PostgreSQL plugin supports the :ref:`revision mechanism
+  <revisions>` to protect metadata and attachments from concurrent
+  modifications.
+
+  
+
 .. _postgresql-lock:
 
 Locking
@@ -222,6 +257,7 @@
 <https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS>`__)
 to prevent other instances of Orthanc from using the same PostgreSQL
 database. If you want several instances of Orthanc to share the same
+database or if you need multiple connections to the PostgreSQL
 database, set the ``Lock`` option to ``false`` in the configuration
 file::