Home
Categories
Dictionary
Download
Project Details
Changes Log
What Links Here
How To
Syntax
FAQ
License

Character encoding



This article explains what character encoding is used for the input or the output files.

Decoding of the input files


docGenerator try to infer the encoding of each XML file to escape correctly the characters in the HTML result. However there is no way other than guessing to get the encoding of a text file. An alternate way to force the encoding is to set explictly the encoding of the file by setting the XML declaration. For example[1]
Note that "ISO-8859-1" is the standard encoding for French text
:
   <?xml version="1.0" encoding="ISO-8859-1"?>

Guessing the files encoding

The generator will try to guess the encoding of each input file to decode correctly non ASCII characters. Please note that there is no way other than guessing to get the encoding of a text file, hence it is always better to encode article files directly in UTF-8.

Note that in XML files, it is possible to set explicitly the encoding of the file by setting the XML declaration. For example:
  <?xml version="1.0" encoding="ISO-8859-1"?>

Encoding of the output files

You don't need to bother about the platform default character encoding, the generator make sure that all output files are encoded in UTF-8, whatever the encoding of the input files or the platform default character encoding.

This is true for XML files, JSON files, and even HTML files.

Before 1.2.7.2 release

Prior to release 1.2.7.2, the generator used the default character encoding of the platform, which could lead to problems To avoid problems you might have to specify the default encoding yourself before calling Java.

For example for ant:
   <java classname="org.docgene.main.DocGenerator">
   ...
      <jvmarg line="-Dfile.encoding=UTF-8" />
   </java>

Notes

  1. ^ Note that "ISO-8859-1" is the standard encoding for French text

Categories: general

docJGenerator Copyright (c) 2016-2023 Herve Girod. All rights reserved.