Next: Format Prev: Translit Up: Text handling

11.6 Substituting text by regular expression

Global substitution in a string is done by `patsubst':

 -- Builtin: patsubst (STRING, REGEXP, [REPLACEMENT])
     Searches STRING for matches of REGEXP, and substitutes REPLACEMENT
     for each match.  The syntax for regular expressions is the same as
     in GNU Emacs (Note: Regexp).

     The parts of STRING that are not covered by any match of REGEXP
     are copied to the expansion.  Whenever a match is found, the
     search proceeds from the end of the match, so a character from
     STRING will never be substituted twice.  If REGEXP matches a
     string of zero length, the start position for the search is
     incremented, to avoid infinite loops.

     When a replacement is to be made, REPLACEMENT is inserted into the
     expansion, with `\N' substituted by the text matched by the Nth
     parenthesized sub-expression of PATSUBST, for up to nine
     sub-expressions.  The escape `\&' is replaced by the text of the
     entire regular expression matched.  For all other characters, `\'
     treats the next character literally.  A warning is issued if there
     were fewer sub-expressions than the `\N' requested, or if there is
     a trailing `\'.

     The REPLACEMENT argument can be omitted, in which case the text
     matched by REGEXP is deleted.

     The macro `patsubst' is recognized only with parameters.

     patsubst(`GNUs not Unix', `^', `OBS: ')
     =>OBS: GNUs not Unix
     patsubst(`GNUs not Unix', `\<', `OBS: ')
     =>OBS: GNUs OBS: not OBS: Unix
     patsubst(`GNUs not Unix', `\w*', `(\&)')
     =>(GNUs)() (not)() (Unix)()
     patsubst(`GNUs not Unix', `\w+', `(\&)')
     =>(GNUs) (not) (Unix)
     patsubst(`GNUs not Unix', `[A-Z][a-z]+')
     =>GN not 
     patsubst(`GNUs not Unix', `not', `NOT\')
     error-->m4:stdin:6: Warning: trailing \ ignored in replacement
     =>GNUs NOT Unix

   Here is a slightly more realistic example, which capitalizes
individual words or whole sentences, by substituting calls of the macros
`upcase' and `downcase' into the strings.

 -- Composite: upcase (TEXT)
 -- Composite: downcase (TEXT)
 -- Composite: capitalize (TEXT)
     Expand to TEXT, but with capitalization changed: `upcase' changes
     all letters to upper case, `downcase' changes all letters to lower
     case, and `capitalize' changes the first character of each word to
     upper case and the remaining characters to lower case.

   First, an example of their usage, using implementations distributed
in `m4-1.4.13/examples/capitalize.m4'.

     $ m4 -I examples
     upcase(`GNUs not Unix')
     downcase(`GNUs not Unix')
     =>gnus not unix
     capitalize(`GNUs not Unix')
     =>Gnus Not Unix

   Now for the implementation.  There is a helper macro `_capitalize'
which puts only its first word in mixed case.  Then `capitalize' merely
parses out the words, and replaces them with an invocation of
`_capitalize'.  (As presented here, the `capitalize' macro has some
subtle flaws.  You should try to see if you can find and correct them;
or Note: Answers.).

     $ m4 -I examples
     =># upcase(text)
     =># downcase(text)
     =># capitalize(text)
     =>#   change case of text, simple version
     =>define(`upcase', `translit(`$*', `a-z', `A-Z')')
     =>define(`downcase', `translit(`$*', `A-Z', `a-z')')
     =>       `regexp(`$1', `^\(\w\)\(\w*\)',
     =>               `upcase(`\1')`'downcase(`\2')')')
     =>define(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')')

   While `regexp' replaces the whole input with the replacement as soon
as there is a match, `patsubst' replaces each _occurrence_ of a match
and preserves non-matching pieces:

     patreg(`bar foo baz Foo', `foo\|Foo', `FOO')
     =>bar FOO baz FOO
     patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2')
     =>bab abb 212

   Omitting REGEXP evokes a warning, but still produces output;
contrast this with an empty REGEXP argument.

     error-->m4:stdin:1: Warning: too few arguments to builtin `patsubst'
     patsubst(`abc', `')
     patsubst(`abc', `', `\\-')

automatically generated by info2www