Home
Categories
Dictionary
Download
Project Details
Changes Log
What Links Here
How To
Syntax
FAQ
License

Escaping non UTF8 characters



By default non UTF8 characters are not automatically escaped: However, it is possible to force the user to escape non UTF8 characters (for example French accented characters) by using the standard XML syntax, for example: &#A0;.

In the configuration file

This is configured by the escapeNonUTF8 property. For example:
   escapeNonUTF8=true

On the command-line

This is configured by the escapeNonUTF8 command-line property. For example:
      java -jar docGenerator.jar -input=wiki/input -output=wiki/output -escapeNonUTF8=false

Encoding

Main Article: Character encoding

The generator will try to guess the encoding of each input file to decode correctly non ASCII characters. Please note that there is no way other than guessing to get the encoding of a text file, hence it is always better to encode article files directly in UTF-8.

Note that in XML files, it is possible to set explicitly the encoding of the file by setting the XML declaration. For example:
  <?xml version="1.0" encoding="ISO-8859-1"?>

Example

The following text:
      this is a téxt
will have the following result in the HTML file:
this is a t&#x00E9;xt

See also


Categories: syntax

docJGenerator Copyright (c) 2016-2023 Herve Girod. All rights reserved.