changeset 786:d050289fd0b3

debugging encodings
author Alain Mazy <am@osimis.io>
date Fri, 29 Oct 2021 18:55:51 +0200
parents 4ff2c6ff472a
children b8171b4046da
files Sphinx/source/faq.rst Sphinx/source/faq/debugging-encodings.rst
diffstat 2 files changed, 35 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/Sphinx/source/faq.rst	Fri Oct 15 11:24:15 2021 +0200
+++ b/Sphinx/source/faq.rst	Fri Oct 29 18:55:51 2021 +0200
@@ -68,3 +68,4 @@
    faq/matlab.rst
    faq/series-completion.rst
    faq/dcmtk-tricks.rst
+   faq/debugging-encodings.rst
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Sphinx/source/faq/debugging-encodings.rst	Fri Oct 29 18:55:51 2021 +0200
@@ -0,0 +1,34 @@
+.. _debugging_encodings:
+
+Debugging encoding issues (SpecificCharacterSet)
+================================================
+
+.. contents::
+
+.. highlight:: bash
+
+Orthanc does not display the PatientName correctly
+--------------------------------------------------
+
+If your DICOM files are valid, Orthanc should display all strings correctly both
+in the UI and in the Rest API in which all strings are converted to UTF-8.
+
+However, it might still be usefull to understand what's wrong in your files
+such that you can possibly fix your files once they have been stored in Orthanc
+or configure your modality correctly.
+
+**Example 1**: a DICOM file is sent to Orthanc with SpecificCharacterSet set to ``ISO_IR 100``
+(Latin1).  The PatientName is expected to be ``ccžšd^CCŽŠÐ`` but Orthanc displays ``ccžšd^CCŽŠÐ``.
+If you open the DICOM file in an Hex editor and search for the PatientName, you'll find this sequence
+of bytes: ``63 63 9e 9a 64 5e 43 43 8e 8a d0``.  By checking the `Latin1 code page 
+<https://en.wikipedia.org/wiki/ISO/IEC_8859-1>`__, you realise that the ``9e`` and ``9a`` characters
+are not valid Latin1 characters.  
+
+In this case, they have most likely be generated on a Windows system by using the default `Windows 1252 
+<https://en.wikipedia.org/wiki/Windows-1252>`__ encoding in which ``9e`` is ``ž``.
+
+How to solve it ?  It is highly recommended to fix it before Orthanc: in your RIS, worklist server or modality.
+However, if you can not fix it there, you may still try to fix it once the file has been stored in Orthanc.
+You can get inspiration from this `lua script <https://bitbucket.org/osimis/orthanc-setup-samples/src/master/lua-samples/sanitizeInvalidUtf8TagValues.lua>`__ 
+that is fixing invalid UTF-8 characters
+