Bug 156 - CPU starvation on outbound DICOMweb transfers
Summary: CPU starvation on outbound DICOMweb transfers
Status: RESOLVED FIXED
Alias: None
Product: Orthanc
Classification: Unclassified
Component: Plugin - DICOMweb (show other bugs)
Version: unspecified
Hardware: All All
: --- normal
Assignee: Sébastien Jodogne
URL:
Depends on:
Blocks:
 
Reported: 2020-06-29 15:16 CEST by Sébastien Jodogne
Modified: 2020-08-19 12:05 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sébastien Jodogne 2020-06-29 15:16:12 CEST
[BitBucket user: Thibault Nélis]
[BitBucket date: 2019-10-21.17:06:36]

Outbound DICOMweb transfers utilize all CPU resources.

This results in extremely low throughput. In the following example, an instance which takes about 5 to 6 seconds to transfer via DICOM or Orthanc peering takes as much as 6min 46s to transfer via DICOMweb \(Store Transaction / STOW-RS\).

```
$ ./test large-instance 
== "foo" server information
Address: 172.21.0.2
Orthanc version: 1.5.8
DICOMweb plugin version: "1.0"

== "bar" server information
Address: 172.21.0.3
Orthanc version: 1.5.8
DICOMweb plugin version: "1.0"

== Environment preparation
Uploading "large-instance" ...
Instance ID: d8996bb1-0ce7e959-e33317c7-eead9f4f-dc934f15
Study ID: efe74f24-d89b716e-5e3d93cd-8081e0c1-7abfdd42
Instance size: 635735934
Reported size: 635735934
Reported hash: 57fb4f1ae04bd32cf6edfec016946730

== Test 1: DICOM, no interruption
Initiating transfer ...
{
  "InstancesCount": 1,
  "FailedInstancesCount": 0
}

real    0m5.768s
user    0m0.218s
sys 0m0.028s
Verifying data integrity ...
OK

== Test 3: DICOMweb, no interruption
Initiating transfer ...
{
  "InstancesCount": "1"
}

real    6m46.399s
user    0m0.204s
sys 0m0.032s
Verifying data integrity ...
OK

== Test 5: Orthanc peering, no interruption
Initiating transfer ...
{
  "InstancesCount": 1,
  "FailedInstancesCount": 0
}

real    0m7.629s
user    0m0.212s
sys 0m0.034s
Verifying data integrity ...
OK
```

[Repro](https://bitbucket.org/tn-osimis/orthanc-clean-shutdown-test). Tested on three different \(but similar\) machines.

A quick look at performance counters show the time is spent in-process \(sampled during DICOMweb transfer, does not represent the full transfer\):

```
 Performance counter stats for process id '4691':

         82,940.62 msec task-clock                #    1.004 CPUs utilized          
            20,328      context-switches          #    0.245 K/sec                  
                12      cpu-migrations            #    0.000 K/sec                  
                16      page-faults               #    0.000 K/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

      82.595531201 seconds time elapsed
```

Although I haven’t captured anything yet, I had a sense from looking at live stack profiles that the processor was specifically spending a lot of time moving data in memory. \(This is a very hasty observation and is likely incorrect.\)
Comment 1 Sébastien Jodogne 2020-06-29 15:26:24 CEST
[BitBucket user: Michel Rozpendowski]
[BitBucket date: 2019-12-20.13:52:02]

I tested by setting `foo` to send to `bar` in DICOMweb with `ChunkedTransfers` set to `false` and noticed that the performance issue is gone.

No performance issue either with the DICOMweb plugin version 0.6 with Orthanc 1.5.8 and `ChunkedTransfers` set to `true` as the feature was implemented in DICOMweb plugin version 1.0 so the setting is just ignored.

The problem is therefore in the implementation of `ChunkedTransfers` `true` for the DICOMweb plugin version 1.0.
Comment 2 Sébastien Jodogne 2020-06-29 15:26:25 CEST
[BitBucket user: Alain Mazy]
[BitBucket date: 2019-12-20.17:42:55]

this line seems to be the culprit: https://hg.orthanc-server.com/orthanc/file/a648c2d67a6502fe0cac76e540a11566da6adf27/Core/HttpClient.cpp#L243  since there is a huge realloc \+ memcpy eachtime a 65KB chunk is transferred out of the 600MB buffer \(in case of a 600MB instance\).

I’ll arrange that one of these days.
Comment 3 Sébastien Jodogne 2020-06-29 15:26:26 CEST
[BitBucket user: Alain Mazy]
[BitBucket date: 2019-12-23.09:44:41]

this has been fixed in this commit: https://hg.orthanc-server.com/orthanc/changeset/d5740d3b1d67a09e3fa4d545df2080f58324731b

Note that the fix is in Orthanc Core, not the Dicom-web plugin…
Comment 4 Sébastien Jodogne 2020-08-19 12:05:57 CEST
This initial fix could lead to a crash in Orthanc 1.6.0 to 1.7.2.

Orthanc 1.7.3 will include another fix to resolve the crash:
https://hg.orthanc-server.com/orthanc/rev/8c559dd5034b