Unicode is a multi-byte character set, portable across all major computing platforms and with decent coverage over most of the world. It is also single-locale; it includes no code pages or other complexities that make software harder to write and test. There is no competing character set that's reasonably cross-platform. For these reasons, Unicode 4.0 is used as the native character set for Qt.
These classes are relevant when working with string data. For information about rendering text, see the Rich Text Processing overview, and if your string data is in XML, see the XML Processing overview.
Array of bytes | |
Holds a sequence of bytes that can be quickly matched in a byte array | |
16-bit Unicode character | |
8-bit ASCII/Latin-1 character | |
Thin wrapper around an US-ASCII/Latin-1 encoded string literal | |
Converts between numbers and their string representations in various languages | |
Unicode character string | |
List of strings | |
Holds a sequence of characters that can be quickly matched in a Unicode string | |
Thin wrapper around QString substrings | |
Way of finding Unicode text boundaries in a string | |
Convenient interface for reading and writing text |
The Unicode Consortium has a number of documents available, including
The current version of the standard is Unicode 5.1.0.
Previous printed versions of the specification:
In Qt, and in most applications that use Qt, most or all user-visible strings are stored using Unicode. Qt provides:
To fully benefit from Unicode, we recommend using QString for storing all user-visible strings, and performing all text file I/O using QTextStream. Use QKeyEvent::text() for keyboard input in any custom widgets you write; it does not make much difference for slow typists in Western Europe or North America, but for fast typists or people using special input methods using text() is beneficial.
All the function arguments in Qt that may be user-visible strings, QLabel::setText() and a many others, take const QString &s. QString provides implicit casting from const char * so that things like
label->setText("Password:");
will work. There is also a function, QObject::tr(), that provides translation support, like this:
label->setText(tr("Password:"));
QObject::tr() maps from const char * to a Unicode string, and uses installable QTranslator objects to do the mapping.
Qt provides a number of built-in QTextCodec classes, that is, classes that know how to translate between Unicode and legacy encodings to support programs that must talk to other programs or read/write files in legacy file formats.
By default, conversion to/from const char * uses a locale-dependent codec. However, applications can easily find codecs for other locales, and set any open file or network connection to use a special codec. It is also possible to install new codecs, for encodings that the built-in ones do not support. (At the time of writing, Vietnamese/VISCII is one such example.)
Since US-ASCII and ISO-8859-1 are so common, there are also especially fast functions for mapping to and from them. For example, to open an application's icon one might do this:
QFile file(QString::fromLatin1("appicon.png"));
or
QFile file(QLatin1String("appicon.png"));
Regarding output, Qt will do a best-effort conversion from Unicode to whatever encoding the system and fonts provide. Depending on operating system, locale, font availability, and Qt's support for the characters used, this conversion may be good or bad. We will extend this in upcoming versions, with emphasis on the most common locales first.
See also Internationalization with Qt.