(lispref.info)Character Type


Next: Sequence Type Prev: Floating Point Type Up: Programming Types

Character Type
--------------

   A "character" in Emacs Lisp is nothing more than an integer.  In
other words, characters are represented by their character codes.  For
example, the character `A' is represented as the integer 65.

   Individual characters are not often used in programs.  It is far more
common to work with *strings*, which are sequences composed of
characters.  Note: String Type.

   Characters in strings, buffers, and files are currently limited to
the range of 0 to 255.  If an arbitrary integer is used as a character
for those purposes, only the lower eight bits are significant.
Characters that represent keyboard input have a much wider range.

   Since characters are really integers, the printed representation of a
character is a decimal number.  This is also a possible read syntax for
a character, but writing characters that way in Lisp programs is a very
bad idea.  You should *always* use the special read syntax formats that
Emacs Lisp provides for characters.  These syntax formats start with a
question mark.

   The usual read syntax for alphanumeric characters is a question mark
followed by the character; thus, `?A' for the character `A', `?B' for
the character `B', and `?a' for the character `a'.

   For example:

     ?Q => 81
     
     ?q => 113

   You can use the same syntax for punctuation characters, but it is
often a good idea to add a `\' to prevent Lisp mode from getting
confused.  For example, `?\ ' is the way to write the space character.
If the character is `\', you *must* use a second `\' to quote it: `?\\'.

   You can express the characters control-g, backspace, tab, newline,
vertical tab, formfeed, return, and escape as `?\a', `?\b', `?\t',
`?\n', `?\v', `?\f', `?\r', `?\e', respectively.  Those values are 7,
8, 9, 10, 11, 12, 13, and 27 in decimal.  Thus,

     ?\a => 7                 ; `C-g'
     ?\b => 8                 ; backspace, BS, `C-h'
     ?\t => 9                 ; tab, TAB, `C-i'
     ?\n => 10                ; newline, LFD, `C-j'
     ?\v => 11                ; vertical tab, `C-k'
     ?\f => 12                ; formfeed character, `C-l'
     ?\r => 13                ; carriage return, RET, `C-m'
     ?\e => 27                ; escape character, ESC, `C-['
     ?\\ => 92                ; backslash character, `\'

   These sequences which start with backslash are also known as "escape
sequences", because backslash plays the role of an escape character,
but they have nothing to do with the character ESC.

   Control characters may be represented using yet another read syntax.
This consists of a question mark followed by a backslash, caret, and the
corresponding non-control character, in either upper or lower case.  For
example, either `?\^I' or `?\^i' may be used as the read syntax for the
character `C-i', the character whose value is 9.

   Instead of the `^', you can use `C-'; thus, `?\C-i' is equivalent to
`?\^I' and to `?\^i':

     ?\^I => 9
     
     ?\C-I => 9

   For use in strings and buffers, you are limited to the control
characters that exist in ASCII, but for keyboard input purposes, you
can turn any character into a control character with `C-'.  The
character codes for these characters include the 2**22 bit as well as
the code for the non-control character.  Ordinary terminals have no way
of generating non-ASCII control characters, but you can generate them
straightforwardly using an X terminal.

   The DEL key can be considered and written as `Control-?':

     ?\^? => 127
     
     ?\C-? => 127

   When you represent control characters to be found in files or
strings, we recommend the `^' syntax; but when you refer to keyboard
input, we prefer the `C-' syntax.  This does not affect the meaning of
the program, but may guide the understanding of people who read it.

   A "meta character" is a character typed with the META key.  The
integer that represents such a character has the 2**23 bit set (which
on most machines makes it a negative number).  We use high bits for
this and other modifiers to make possible a wide range of basic
character codes.

   In a string, the 2**7 bit indicates a meta character, so the meta
characters that can fit in a string have codes in the range from 128 to
255, and are the meta versions of the ordinary ASCII characters.  (In
Emacs versions 18 and older, this convention was used for characters
outside of strings as well.)

   The read syntax for meta characters uses `\M-'.  For example,
`?\M-A' stands for `M-A'.  You can use `\M-' together with octal codes,
`\C-', or any other syntax for a character.  Thus, you can write `M-A'
as `?\M-A', or as `?\M-\101'.  Likewise, you can write `C-M-b' as
`?\M-\C-b', `?\C-\M-b', or `?\M-\002'.

   The shift modifier is used in indicating the case of a character in
special circumstances.  The case of an ordinary letter is indicated by
its character code as part of ASCII, but ASCII has no way to represent
whether a control character is upper case or lower case.  Emacs uses the
2**21 bit to indicate that the shift key was used for typing a control
character.  This distinction is possible only when you use X terminals
or other special terminals; ordinary terminals do not indicate the
distinction to the computer in any way.

   The X Window system defines three other modifier bits that can be set
in a character: "hyper", "super" and "alt".  The syntaxes for these
bits are `\H-', `\s-' and `\A-'.  Thus, `?\H-\M-\A-x' represents
`Alt-Hyper-Meta-x'.  Numerically, the bit values are 2**18 for alt,
2**19 for super and 2**20 for hyper.

   Finally, the most general read syntax consists of a question mark
followed by a backslash and the character code in octal (up to three
octal digits); thus, `?\101' for the character `A', `?\001' for the
character `C-a', and `?\002' for the character `C-b'.  Although this
syntax can represent any ASCII character, it is preferred only when the
precise octal value is more important than the ASCII representation.

     ?\012 => 10        ?\n => 10         ?\C-j => 10
     
     ?\101 => 65        ?A => 65

   A backslash is allowed, and harmless, preceding any character without
a special escape meaning; thus, `?\A' is equivalent to `?A'.  There is
no reason to use a backslash before most such characters.  However, any
of the characters `()\|;'`"#.,' should be preceded by a backslash to
avoid confusing the Emacs commands for editing Lisp code.  Whitespace
characters such as space, tab, newline and formfeed should also be
preceded by a backslash.  However, it is cleaner to use one of the
easily readable escape sequences, such as `\t', instead of an actual
control character such as a tab.


automatically generated by info2www