org.apache.nutch.util
Class EncodingDetector

java.lang.Object
  extended by org.apache.nutch.util.EncodingDetector

public class EncodingDetector
extends Object

A simple class for detecting character encodings.

Broadly this encompasses two functions, which are distinctly separate:

  1. Auto detecting a set of "clues" from input text.
  2. Taking a set of clues and making a "best guess" as to the "real" encoding.

A caller will often have some extra information about what the encoding might be (e.g. from the HTTP header or HTML meta-tags, often wrong but still potentially useful clues). The types of clues may differ from caller to caller. Thus a typical calling sequence is: