Bug 56 - Case-insensitive matching over accents
Summary: Case-insensitive matching over accents
Status: RESOLVED FIXED
Alias: None
Product: Orthanc
Classification: Unclassified
Component: Orthanc Core (show other bugs)
Version: unspecified
Hardware: All All
: --- normal
Assignee: Sébastien Jodogne
URL:
Depends on:
Blocks:
 
Reported: 2020-06-29 15:12 CEST by Sébastien Jodogne
Modified: 2020-06-29 15:27 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sébastien Jodogne 2020-06-29 15:12:46 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2017-07-11.19:44:01]

During the searches over DICOM tags, case-insensitive matching currently only works for ASCII characters. For instance, the following request will return 1 match (given that the `PatientName` of the sample file is equal to `Test-éüäöò`):

```
$ storescu  -xy localhost 4242 ~/orthanc-tests/Database/Encodings/Lena-latin1.dcm
$ findscu localhost 4242 -P -k 'QueryRetrieveLevel=Patient' -k 'PatientName=TEST*'
I: ---------------------------
I: Find Response: 1 (Pending)
```

However, if matching is done against `é` with an uppercase `É`, nothing is retrieved:

```
$ findscu localhost 4242 -P -k 'QueryRetrieveLevel=Patient' -k '0008,0005=ISO_IR 192' -k 'PatientName=Test-é*'
I: ---------------------------
I: Find Response: 1 (Pending)

$ findscu localhost 4242 -P -k 'QueryRetrieveLevel=Patient' -k '0008,0005=ISO_IR 192' -k 'PatientName=Test-É*'
<no match>
```

*Note that this sample code makes the assumption that the codepage of the console is set to UTF-8 (default under Linux, use `chcp 65001` [under Windows](https://ss64.com/nt/chcp.html)).*

The fix would consist in calling `boost::locale::to_upper()` with proper locale if matching strings in `HierarchicalMatcher` class. Check out: http://www.boost.org/doc/libs/1_64_0/libs/locale/doc/html/conversions.html

This issue affects both DICOM C-Find queries, and `/tools/find` queries.

Reference: https://groups.google.com/d/msg/orthanc-users/WXF6swIzbCU/jmQglkCwBAAJ
Comment 1 Sébastien Jodogne 2020-06-29 15:20:31 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2017-07-14.15:29:23]

Fix issue #56 (case-insensitive matching over accents)

→ https://hg.orthanc-server.com/orthanc/changeset/a47d07b5b39f
Comment 2 Sébastien Jodogne 2020-06-29 15:27:20 CEST
[BitBucket user: AlexanderM]
[BitBucket date: 2020-03-22.00:03:16]

@{5e4ffa142a59dc0c8fe5968b} I’m having the same troubles in latest Orthanc with Cyrillic letters, Postgres and UTF-8 as DefaultEncoding - the search is always case sensitive.
Comment 3 Sébastien Jodogne 2020-06-29 15:27:21 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2020-03-22.07:58:34]

Dear @{557058:f5981d7c-fecc-43ee-8235-23fce3833ba8} , please check 2 things: \(1\) The value of the option “CaseSensitivePN” should be set to “false” \(which is not the default value\), and \(2\) You are doing a query against a DICOM tag whose VR \(value representation\) is “PN” \(person name\), as this is the only VR for which case-sensitive search is available in the DICOM standard.

If the problem is still present, please provide a minimal working example for us to reproduce your issue \(sample DICOM file \+ command line \+ expected result\): https://book.orthanc-server.com/users/support.html
Comment 4 Sébastien Jodogne 2020-06-29 15:27:24 CEST
[BitBucket user: AlexanderM]
[BitBucket date: 2020-03-22.22:38:39]

Ok, thanks! I found that my locale was causing this issue on the DICOM interaction - now it is fixed \(it was ru\_RU.UTF-8, changed to en\_US.UTF-8\).  
But anyways I can’t fix it for dicom-web rest API: it still performs case sensitive on /studies endpoint with "PatientName" param. Is it how it's supposed to be and is it possible to change that behavior?
Comment 5 Sébastien Jodogne 2020-06-29 15:27:25 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2020-03-23.06:25:07]

Again, please post full instructions for us to reproduce your issue \(Docker setup for locales \+ sample DICOM files \+ curl command-line \+ expected result \+ observed result\): https://book.orthanc-server.com/users/support.html#discussing-a-minimal-working-example
Comment 6 Sébastien Jodogne 2020-06-29 15:27:26 CEST
[BitBucket user: AlexanderM]
[BitBucket date: 2020-03-26.00:03:38]

Docker environment locale setting:

```
- LOCALE=en_US.UTF-8
```

Orthanc.json configuration:

```
"CaseSensitivePN" : false,
```



When I query via DICOM protocol everything seems fine - both lower-case and upper-case queries return same results \(I change only first letter for the example\):

```
findscu localhost 4242 -P --aetitle 'OHIFDCM' -k 'QueryRetrieveLevel=Patient' -k 'PatientName=Гусева*'
findscu localhost 4242 -P --aetitle 'OHIFDCM' -k 'QueryRetrieveLevel=Patient' -k 'PatientName=гусева*'
```

Result:

```
I: ---------------------------
I: Find Response: 1 (Pending)
I:
I: # Dicom-Data-Set
I: # Used TransferSyntax: Little Endian Explicit
I: (0008,0005) CS [ISO_IR 192]                             #  10, 1 SpecificCharacterSet
I: (0008,0052) CS [Patient ]                               #   8, 1 QueryRetrieveLevel
I: (0010,0010) PN [Гусева Е.Н^Амур ]           #  28, 1 PatientName
I:
```



When I do same-case query using curl:

```
curl 'http://localhost:8042/dicom-web/studies?PatientName=%D0%93%D1%83%D1%81%D0%B5%D0%B2%D0%B0*&limit=100&offset=0&fuzzymatching=true&includefield=00081030%2C00080060&StudyDate=20200319-20200326'
```

I get result as expected:

```
[{
   "00080005" : {
      "Value" : [ "ISO_IR 192" ],
      "vr" : "CS"
   },
   "00080020" : {
      "Value" : [ "20200320" ],
      "vr" : "DA"
   },
   "00080030" : {
      "Value" : [ "081022" ],
      "vr" : "TM"
   },
   "00080050" : {
      "vr" : "SH"
   },
   "00080060" : {
      "Value" : [ "DX" ],
      "vr" : "CS"
   },
   "00080061" : {
      "Value" : [ "DX" ],
      "vr" : "CS"
   },
   "00080090" : {
      "vr" : "PN"
   },
   "00081030" : {
      "Value" : [ "Скелет Крупный 25-45 kg" ],
      "vr" : "LO"
   },
   "00081190" : {
      "Value" : [
         "http://localhost/dicom-web/studies/1.3.51.0.7.11718630008.47994.56651.37198.64796.25453.11641"
      ],
      "vr" : "UR"
   },
   "00100010" : {
      "Value" : [
         {
            "Alphabetic" : "Гусева Е.Н^Амур"
         }
      ],
      "vr" : "PN"
   },
   "00100020" : {
      "Value" : [ "8 годиков" ],
      "vr" : "LO"
   },
   "00100030" : {
      "vr" : "DA"
   },
   "00100040" : {
      "Value" : [ "M" ],
      "vr" : "CS"
   },
   "0020000D" : {
      "Value" : [ "1.3.51.0.7.11718630008.47994.56651.37198.64796.25453.11641" ],
      "vr" : "UI"
   },
   "00200010" : {
      "Value" : [ "2003200806029630" ],
      "vr" : "SH"
   },
   "00201206" : {
      "Value" : [ 1 ],
      "vr" : "IS"
   },
   "00201208" : {
      "Value" : [ 1 ],
      "vr" : "IS"
   }
}]
```



But when I query for lower-case version \(first letter is different\):

```
curl 'http://localhost:8042/dicom-web/studies?PatientName=%D0%B3%D1%83%D1%81%D0%B5%D0%B2%D0%B0*&limit=100&offset=0&fuzzymatching=true&includefield=00081030%2C00080060&StudyDate=20200319-20200326'
```

There is no result at all \(unexpected!\):

```
[]
```



[Sample DICOM file is here.](https://www.dropbox.com/s/bbgicb908qi7c34/DX000000.dcm?dl=0)



So, as I can see, dicom-web doesn’t take “CaseSensitivePN” into consideration for cyrillic characters at least.
Comment 7 Sébastien Jodogne 2020-06-29 15:27:27 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2020-03-26.06:57:53]

Re-opening. This is fixed in the Orthanc core, but not in the DICOMweb QIDO-RS server.
Comment 8 Sébastien Jodogne 2020-06-29 15:27:28 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2020-03-26.07:01:26]

Thanks for your instructions! I managed to reproduce the issue, and I have just added an integration test: https://hg.orthanc-server.com/orthanc-tests/changeset/7fa5c7a03137cccf8ea77f25fdaeee2dfda9eb3b

Note that the `DefaultEncoding` configuration option of Orthanc must be set to `Utf8` to reproduce. Here is the minimal configuration file:

```
{
  "CaseSensitivePN" : false,
  "Plugins" : [ "." ],
  "DefaultEncoding" : "Utf8",
  "DicomModalities" : { "sample" : [ "OHIFDCM", "localhost", 4242 ] }
}
```
Comment 9 Sébastien Jodogne 2020-06-29 15:27:29 CEST
[BitBucket user: Sébastien Jodogne]
[BitBucket date: 2020-03-26.07:23:13]

It turns out this was just a configuration issue. Setting the option `QidoCaseSensitive` to `false` fixes your problem: https://book.orthanc-server.com/plugins/dicomweb.html#server-related-options

I however agree that this behavior is not intuitive. I have therefore pushed a change so that the value of the `QidoCaseSensitive` configuration option of DICOMweb, if not explicitly set, corresponds to the value of the `CaseSensitivePN` option of the Orthanc core: https://hg.orthanc-server.com/orthanc-dicomweb/changeset/1b09b29434105c53b14d838bcc9019d5322ca4a0
Comment 10 Sébastien Jodogne 2020-06-29 15:27:32 CEST
[BitBucket user: AlexanderM]
[BitBucket date: 2020-03-27.06:22:52]

Thanks, it works!