SWI-Prolog offers two comprehensive predicates for classifying characters and character codes. These predicates are defined as built-in predicates to exploit the C-character classification's handling of locale (handling of local character sets). These predicates are fast, logical and deterministic if applicable.
In addition, there is the library library(ctypes)
providing compatibility with some other Prolog systems. The predicates
of this library are defined in terms of code_type/2.
<ctype.h>
primitives. The types are sensitive to the active locale, see
setlocale/3.
Most of the Types are mapped to the Unicode classification
functions from <wctype.h>, e.g., alnum
uses iswalnum(). The types prolog_var_start,
prolog_atom_start, prolog_identifier_continue
and
prolog_symbol are based on the locale-independent built-in
classification routines that are also used by read/1
and friends.
Note that the mode (-,+) is only efficient if the Type has
a parameter, e.g., char_type(C, digit(8)). If Type
is a atomic, the whole unicode range (0..0x1ffff) is generated and
tested against the character classification function.
_). These are valid C and Prolog symbol
characters.
_).
These are valid first characters for C and Prolog symbols.
decimal.
char_type(X,
digit(6)) yields X = ’6’.
Useful for parsing numbers.
char_type(a, xdigit(X)) yields X = ’10’.
Useful for parsing numbers.
graph
character that is not a letter or digit.
prolog_end_of_line
is the wider set used by the SWI-Prolog reader.
", ', `).
(), []
and {}, plus every Unicode Ps/Pe
pair (about 60 pairs in Unicode 17, including angle, corner, ceiling,
floor, mathematical, ornamental, fullwidth and CJK brackets). The
mapping is reversible: with Close bound, Char
unifies with the matching open.
’, ",
and ‘ have
Close = Char; Unicode Pi/Pf
quote pairs (the guillemets, the standard left/right curly single and
double quotes, and the single/double angle and reversed quotation marks)
have Close different from Char. The mapping is
reversible.
position(-Pos) property.
Pattern_White_Space set used by read_term/2
to separate tokens. The eleven code points are U+0009..U+000D, U+0020,
U+0085, U+200E, U+200F, U+2028 and U+2029. Locale- independent; pinned
to unicode_syntax_version.
prolog_end_of_line is the seven-element line-terminator
subset.
Pattern_White_Space code points:
U+000A (LF), U+000B (VT), U+000C (FF), U+000D (CR), U+0085 (NEL), U+2028
(LINE SEPARATOR), and U+2029 (PARAGRAPH SEPARATOR). The same set
terminates % comments and increments the
source line counter. See section
2.15.1.9.
=.., \=,
etc.
There is nothing in the Prolog standard for converting case in textual data. The SWI-Prolog predicates code_type/2 and char_type/2 can be used to test and convert individual characters. We have started some additional support:
\u0020)
character. Out uses the same conventions as with_output_to/2
and format/3.
This section deals with predicates for language-specific string comparison operations.
The predicate collation_key/2 is used by locale_sort/2 from library(sort). Please examine the implementation of locale_sort/2 as an example of using this call.
The Key is an implementation-defined and generally unreadable string. On systems that do not support locale handling, Key is simply unified with Atom.