Home
Categories
Dictionary
Download
Project Details
Changes Log
What Links Here
How To
Syntax
FAQ
License

Checking external links



This article presents how the checking of external links is performed.

Overview

The checkLinks property specifies that the presence of external links must be checked:

However these links can be local to the wiki (for example, an HTML file relative to the wiki itself), or accessed on the Web through an http or https protocol. http or https links will only be checked if the additional checkHTTPLinks property is set to true.

Algorithm

The process used to check if a link is valid differ depending on the type of the link:
  • If the link is local, then the presence of the file associated with the link will be checked, and also the existence of the anchor in the file[1]
    For example, the <a href="myHTMLFile.html#theAnchor" /> link is a link with an ahcor, and the <a href="myHTMLFile.html" /> link is a link without anchor
  • If the link is not local (http or https), the tool will try to access the associated web site to check if the resource associated with the link is present


Note that the links checking is performed in a background threadPool. Each spawned thread will check all the links for one base URL[2]
The link URL without the ref part. For example, the base URL for "http://my/file.html#title" would be "http://my/file.html"
, to avoid to check the existence of the same HTTP URL more than once. This speeds up the checking and also avoid some cases where the same link would be accessed twice very closely leading to a "not found" exception.

Configuration

Several configuration properties allo to configure how the links are checked:
  • "defaultHTTPTimeout": set the timeout for checking the availability of an http link (300 ms by default)
  • "checkHTTPLinksThreaded": specifies if HTTP links must be checked in a Thread pool (true by default)
  • "checkHTTPLinksTimeOut": set the timeout of the Thread pool used for checking URL links with the "http" protocol, if the check is performed in background Threads
  • "checkHTTPLinksPool": set the number of Threads in the Thread pool used for checking URL links with the "http" protocol, if the check is performed in background Threads. See also checkHTTPLinksPool parameter for more information
  • "urlExceptions": specifies the XML file for which URLs won't be checked (useful only if the "checkLinks" property is set to true). See urlExceptions
  • "timeouts": specifies the XML file setting specific timeouts for URLs (useful only if the "checkLinks" property is set to true). See timeouts

In particular, the checkHTTPLinksTimeOut and timeouts properties allow to specify the timeOut for the access to one external site. The default timeOut value is 500 ms. Beware that if you increase the timeOut value for all URLs, you will have an adverse result on the performance of the tool.

Javadoc specificities

If the "checkHTTPLinks" property is not set to true, but the associated Javadoc API is the Java 8 API or the JavaFX 8 API, then the tool will directly perform the check using the installed JRE. It is possible to specify that APIs which are defined internal to the wiki output will not be resolved by setting the "resolveAPILInks" property to false.

This can be useful for example if you want to produce a Help content with references to APIs but you don't want to include the APIs in the zipped output content.

Wikipedia and Mediawiki specificities

The specificity of Mediawikis is that the HTML articles are created on demand from the wiki database, which means that an article will be found if the wiki is correct but the article name does not exist in the databse. In that case the tool is able to find that the returned HTML page is not a regular page but a "Page not exists" response. It is possible to not check the validity of a link for one element, even if the "checkHTTPLinks" property is set to true, by setting the value of the attribute "checkLink" to false.

For example:
  This is an <a href="http://docs.oracle.com/javase/8/docs/api/index.html" checkLink="false" >external link</a>

Note that false is the only valid value for this attribute. You can't check the validity of a particular link if the "checkHTTPLinks" property is set to false by setting the value true for this attribute for one element.

Notes

  1. ^ For example, the <a href="myHTMLFile.html#theAnchor" /> link is a link with an ahcor, and the <a href="myHTMLFile.html" /> link is a link without anchor
  2. ^ The link URL without the ref part. For example, the base URL for "http://my/file.html#title" would be "http://my/file.html"

See also


Categories: configuration

docJGenerator Copyright (c) 2016-2023 Herve Girod. All rights reserved.