comparison Sphinx/source/faq/debugging-encodings.rst @ 786:d050289fd0b3

debugging encodings
author Alain Mazy <am@osimis.io>
date Fri, 29 Oct 2021 18:55:51 +0200
parents
children b8171b4046da
comparison
equal deleted inserted replaced
785:4ff2c6ff472a 786:d050289fd0b3
1 .. _debugging_encodings:
2
3 Debugging encoding issues (SpecificCharacterSet)
4 ================================================
5
6 .. contents::
7
8 .. highlight:: bash
9
10 Orthanc does not display the PatientName correctly
11 --------------------------------------------------
12
13 If your DICOM files are valid, Orthanc should display all strings correctly both
14 in the UI and in the Rest API in which all strings are converted to UTF-8.
15
16 However, it might still be usefull to understand what's wrong in your files
17 such that you can possibly fix your files once they have been stored in Orthanc
18 or configure your modality correctly.
19
20 **Example 1**: a DICOM file is sent to Orthanc with SpecificCharacterSet set to ``ISO_IR 100``
21 (Latin1). The PatientName is expected to be ``ccžšd^CCŽŠÐ`` but Orthanc displays ``ccžšd^CCŽŠÐ``.
22 If you open the DICOM file in an Hex editor and search for the PatientName, you'll find this sequence
23 of bytes: ``63 63 9e 9a 64 5e 43 43 8e 8a d0``. By checking the `Latin1 code page
24 <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>`__, you realise that the ``9e`` and ``9a`` characters
25 are not valid Latin1 characters.
26
27 In this case, they have most likely be generated on a Windows system by using the default `Windows 1252
28 <https://en.wikipedia.org/wiki/Windows-1252>`__ encoding in which ``9e`` is ``ž``.
29
30 How to solve it ? It is highly recommended to fix it before Orthanc: in your RIS, worklist server or modality.
31 However, if you can not fix it there, you may still try to fix it once the file has been stored in Orthanc.
32 You can get inspiration from this `lua script <https://bitbucket.org/osimis/orthanc-setup-samples/src/master/lua-samples/sanitizeInvalidUtf8TagValues.lua>`__
33 that is fixing invalid UTF-8 characters
34