InDesign Magazine Peter Kahrel On GREP
InDesign Magazine Peter Kahrel On GREP
InDesign Magazine Peter Kahrel On GREP
SP PR
t
RE
n
EC IN
e
G ip o
IA T!
L
MAGAZINE 59
gr
March 2014
Plus:
Presenting With InDesign
InReview—TimeTracker
Best of the Blog
Become a Member of InDesignSecrets!
Premium members receive InDesign
Magazine, plus many other great benefits!
Visit indesignsecrets.com/membership
for all the great reasons to join. Use the
discount code WORKFLOW to get $10 off!
Reprinted by permission from InDesign Magazine 59 To purchase this issue or subscribe, visit indesignsecrets.com/issues 2
By Peter Kahrel
Most people use InDesign the way they at basic levels. First, let’s define what it is and
cook dinner: they know the basics, they can what it’s good for. GREP is a search tool that
intuitively throw together some ingredients, you can employ just like the search tools in
or they can follow a basic recipe. But it turns applications such as MS Word and Notepad
out that if you know a little bit about food to find literal text like nonplussed or under-
science, the chemistry behind the process, stand. But you can make GREP searches
you can achieve wonders and impress both more interesting and more useful by add-
yourself and everyone around you. Similarly, ing certain codes to search for text patterns
by learning a little bit about some geeky instead of literal text. With GREP’s codes, you
stuff, you can do wonders that others can can do things like “find all words that consist
only dream of. If you’re looking for a way to of capital letters,” “find all words that end in
supercharge your InDesign skills, there are ful,” “apply No Break to the last word of all
few things as good as learning a bit of GREP. paragraphs,” and countless others.
Unfortunately, GREP looks scary, and so In this article I’ll show you how to
people think it is scary. Not so! Like so many formulate GREP search patterns that are
things that are reputedly difficult, GREP can much more powerful and useful than
be readily understood and used successfully InDesign’s standard search-and-replace
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 3
7
Feature: Getting a Grip on GREP
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 4
8
Feature: Getting a Grip on GREP
The bracket: “match any ONE of these” alternatives can consist of any charac- absence of a letter, here u. In GREP-speak we
The first one, grey–gray, we can find typing ter; they don’t have to be letters. Thus, indicate the possible presence of a character
the following into the GREP Find What field: who(se|’s) matches whose and who’s. by adding a question mark after it:
gr[ea]y And finally, you can list any number of colou?r
You can read [ea] as “e or a.” A bracketed alternatives, as in the following search term: The question mark can apply to any charac-
string matches just one character in the text, (pro|de|pre)scribe ter; it doesn’t have to be a letter. So it’?s
and you can place as many characters in a which matches proscribe, describe, matches its and it’s.
bracketed string as you like. For example, and prescribe. The scope of ? is just one preceding char-
b[aeiou]t can be read as “b, followed by a Apart from searching alternate spellings acter—in other words, only the character
vowel, followed by t,” and will find bat, bet, of the same word, you can use the pipe immediately to the left of the question mark
bit, bot, and but. It will NOT find bait. notation to find different words altogether. is optional—so that the search colou?r
For example, matches both color and colour. To make
The pipe: “this or that” perhaps|maybe more characters optional—say, a prefix or
Because the second type of variation, finds both perhaps and maybe. Note that in a suffix—you place them in parentheses.
centre–center, involves two characters, it’s this case it’s not necessary to add parenthe- Thus, to find the words cop and copper, you
not convenient to use the bracketed nota- ses. In the earlier example of cent(re|er), use this search term:
tion. Instead, we list alternatives like this: the parentheses were needed to isolate cop(per)?
cent(re|er) the alternatives from the main part of the
Here, the alternatives are grouped in word, cent. Combinations
parentheses and separated by a pipe (|). The three different methods to find
Alternatives don’t have to be the same The question mark: “there or not” alternate spelling can be combined. For
length. For example, thr(u|ough) matches For the third type of variation, colour–color, example, to find setup, set-up, and set up,
thru and through, and (X|Christ)mas you use yet a different method. This vari- you could use the following search term:
matches Xmas and Christmas. And the ation is determined by the presence or set[-]?up
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 5
9
Feature: Getting a Grip on GREP
which you can read as “set, possibly followed There are several ways to make GREP
by a hyphen or a space, followed by up.” case-insensitive, but for the moment let’s Tip: Write Out
Alternatives can be made optional, too. look at just one: Space Characters
For example, to find the word claim and its [Cc]olor
Space characters are often not so
inflections, use this search pattern: As you see, this comes down to the easy to spot in GREP expressions.
claim(s|ed|ing)? “alternative spelling” approach we outlined To make them visible in your code,
which matches claim, claims, claimed, and earlier: in essence, we’re saying that Color use \x{0020}, which is the Unicode
claiming. By placing ? after (s|ed|ing), the and color are alternative spellings of the notation for the space character. Or
whole list of alternatives is made optional. same word, which is equivalent to setting a use the code \s, which stands for
If you leave out ?, claim is not found—only case-insensitive option. any white space (including para-
the three inflected forms are. graph breaks and tabs). For example,
Finding series of characters set[-\s]?up has the same meaning
GREP is case-sensitive Earlier, we mentioned this search term: as set[-]?up but is much clearer
You may have noticed that GREP searches b[aeiou]t
since someone looking at the code
are case-sensitive. For example, the search and that it can be read as “b, followed by doesn’t have to guess if there’s a
term color doesn’t match Color. GREP a vowel, followed by t,” matching bat, bet, space there, or what is meant by
searches, then, are case-sensitive by default. bit, bot, and but. The search term finds the invisible space after the dash.
This is not a limitation of InDesign’s version just these five words because [aeiou]
of GREP, by the way; it’s a standard feature matches just one character. But we can
of GREP. In fact, it’s not a limitation at all: it change that slightly by adding a plus sym- The simple addition of + makes the
actually makes using case-sensitivity much bol, which in GREP means “one or more.” All search term match series of vowels, so that
more flexible, something which we’ll return of a sudden the expression becomes much in addition to the five three-letter words,
to later. more interesting: it will now find bait, boat, beat, and beaut
b[aeiou]+t as well.
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 16
0
Feature: Getting a Grip on GREP
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 17
1
Feature: Getting a Grip on GREP
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 18
2
Feature: Getting a Grip on GREP
one word starting with a capital letter, we Trevor-Roper. Such a name can be captured
need the search term \u\l+: an uppercase by saying that we’re after an uppercase Tip: Hyphens Go First
letter followed by one or more lowercase letter followed by a class consisting of We used \u[-\u\l]+ to match
letters. To find two such words, we simply lowercase letters, uppercase letters, and a hyphenated names like Trevor-Roper.
use \u\l+\x{0020}\u\l+ (remember that hyphen: \u[-\u\l]+. This pattern, by the Though earlier we said that the order
we use \x{0020} for the space character way, also captures names with irregular in a bracketed class is not important,
because the space character is not always capitalization such as LaGuardia. for reasons that go beyond the scope
easy to spot in a search pattern). To locate prefixed names, like John von of this article, the place of the hyphen
Search patterns like the one above can Neumann or Rip van Winkle, we have the is important. Whenever you want to
become a bit difficult to read, so we’ll use a following search patterns: include a hyphen in a bracketed class,
different format whenever it seems appro- always place it in first position, as we
GREP What it finds
priate, as follows (but when you write such u\l+ uc letter followed by lc letters have done here.
search patterns in the Find What field of the \x{0020} space character
Find/Change dialog box, you must leave out (v[ao]n\x maybe van or von, followed by a
{0020})? space character
those comments and write the pattern as
\u[-\u\l]+ uc letter followed by lc, uc, hyphen
one line):
And because GREP is case-sensitive, von der, to mention just a few. With what
GREP What it finds
\u\l+ uppercase letter followed by one the name prefix should in fact be written you’ve learned so far, I’ll leave it up to you to
or more lowercase letters as ([Vv][oa]n\x{0020})?, so that we formulate a pattern that matches all names.
\x{0020} space character capture Van, van, Von, and von. As you see,
\u\l+ same as the first line matching names can be tricky, and the Any white space
But of course names aren’t always as search pattern in its current state fails to It’s convenient that we can use a single wild-
simple as this. For example, some people capture various other possibilities, such card to find any kind of white space. \s
have double-barreled surnames such as as various prefixes de, du, le, van de, and matches all spaces (the normal space
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 19
3
Feature: Getting a Grip on GREP
character, en- and em-dashes, thin spaces, or more of everything (except the paragraph As in the Find What field, you can see
half spaces, etc.), but also tabs and para- mark),” and that must be whole paragraphs. a list of special characters by clicking the
graph returns. It’s easy to try out: type .+ in the Find What @ icon. Apart from the last item, all items
field, and click Find/Find Next a few times. are the same as in the special character list
Any word character You’ll see that each time you press Find for the Find What field. However, it’s that
This wildcard, \w, captures letters, digits, and Next, the next paragraph is highlighted. last item, Found, that makes GREP replace-
the underscore character. I mention it here ments exciting (Figure 2).
for completeness’ sake. Replacing Text Click on Found to see what’s in that list.
Replacing text using the GREP panel can be All you see is Found Text, Found 1, Found 2,
Any character very straightforward, and just as simple as . . . Found 9. Of course right now you’re ask-
This wildcard, the dot ., punches well above in any other application that offers a search- ing, what do Found 1, Found 2, etc. refer to?
its weight: this little fellow matches every- and-replace feature (Notepad, MS Word,
thing in a paragraph! Well, almost. It doesn’t etc.). Simply type a search term in the Find
match the paragraph mark (it could be What field, a replacement text in the Change
made to, but we’ll not go into that here). If To field, and do the replacement. For exam-
we add the + operator, we say in effect “one ple, to replace multiple paragraph marks
with a single one, click in the Find What
field, and then click the @ icon and choose
The Footnote Bug End Of Paragraph from the menu. This
The dot wildcard doesn’t match foot- inserts \r into the Find What field. Now type
\r+, so that the Find what field reads \r\r+
note markers. This bug has been
with us since InDesign CS3 despite (i.e., find two or more paragraph breaks).
multiple reports and requests. In the Change To field, type\r, and then
press Change all. Figure 2: Change to special characters
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 10
4
Feature: Getting a Grip on GREP
They refer to what is matched by items in you’ll see appear as $2. Type a comma and
parentheses in the Find What field! We can a space, and then enter the reference to
list these references in any order we want, so Found 1 by typing $1. The Change To field
we can in fact change the order of the found should contain the following line:
where creatives go
items. An example will make this clear. $2,$1
to know
Let’s go back to the simplest form of Now, with GREP replacements you should
the search pattern for names that we used always be very careful. Don’t rush into
earlier. Recall that we used this expression: Change All straight away; first click Find,
\u\l+ then Change. If the result looks good, click
\x{0020} Find Next and Change. If it still looks good,
\u\l+ use Change/Find or Change All.
to find names like Jim Donegal. What we Like many specialized skills, using GREP
want to do is reverse the order of the first can seem mysterious and daunting at first.
name and the surname and add a comma But as the examples in this article show,
after the surname, so that we get Donegal, almost anyone can understand and use
Jim. To achieve this, first we’ll add paren- GREP. All it takes is a little patience and
theses to the parts that we want to refer to practice, and you too can wield the amazing
later—the first name and the surname: power of GREP in InDesign. Give it a try!
(\u\l+)\x{0020}(\u\l+) You may soon wonder how you ever got
That’s all. What is matched by the first line along without it.
will be Found 1, and what’s matched by
the third line will be Found 2. Now use the n
Peter Kahrel is a Scripting Engineer at Typefi Systems and
special characters list to insert Found 2, the author of GREP in InDesign, as well as books on scripting
the surname, in the Change to field, which and automating InDesign published by O’Reilly Media.
INDESIGNby
Reprinted MAGAZINE from InDesign
permission59 Magazine 59
March 2014 To purchase this issue or subscribe, visit indesignsecrets.com/issues 11
5
GREP of the Month
\K
A smoother, speedier way to look back
When a traditional lookbehind expression won’t do
the trick, try this obscure but powerful alternative.
In Issue 63 of InDesign Magazine, David resort to an unwieldy expression such as What \K actually means is “Keep the text
Blatner discussed InDesign’s positive ((?<=Map )|(?<=Figure )|(?<=Table matched so far out of the overall match.”
lookbehind feature. He ended his piece ))\d+. But this worked only if you knew the This sounds strange, but to understand it,
with the words “the code after the = symbol range of variation, so, yes, David was right, we can use a simple example, a\Kb. This
must specify an exact number of characters. you can’t use the . (dot) and + operators matches the b in ab. To do so, first it searches
That is, you might expect the expression in lookbehinds. for a. When a is matched, \K says “remove a
(?<=\s+)\u would mean ‘an uppercase That, until very recently, was what we all from the overall match.” The search resumes,
letter after a string of one or more spaces.’ thought. But in CS6, Adobe introduced an looking for (and matching) b.
Unfortunately, the + (‘one or more’) part operator that does allow lookbehinds that You can use \K just about anywhere.
makes it fail because it’s too open-ended.” match variable-length text, namely \K (a The main limitation is that it doesn’t work
The only way in which you could do classic Adobe Special: introduce something with negatives, so you can’t use it to
variable-length expressions in a lookbehind quite useful but don’t tell anybody about find something that is not preceded by
was to list each variation in a separate it). Using this class, David’s expression something else. But on the other hand, \K
lookbehind and group them as alternatives. (?<=\s+)\u can be rendered as \s+\K\u. GREPs execute (marginally) quicker than
Thus, to find numbers preceded by the And the example I gave can be recast as classic lookbehinds. Try it! —Peter Kahrel
word Map, Figure, or Table, you had to (Map|Figure|Table)\s\K\d+.
INDESIGN MAGAZINE
Reprinted by from InDesign
permission73 Magazine 73
May 2015 To purchase this issue or subscribe, visit indesignsecrets.com/issues 52
1 3
GREP of the Month
Limiting Matches
Among a plethoric passel of parentheses,
how do you find the one you want?
consume as much content as possible—so From there, we match up to the next closing
GREP Level: Medium if you look for something like \(.+?\), parenthesis. Because the part between the
you end up with all parentheticals. If you beginning of the paragraph and the first
Copyeditors dread bibliographies, especially want to match just the first, or just the opening parenthesis is discarded, we in
if a publisher insists on their own format. last, parenthetical, you need to take some effect match just the first parenthetical.
That insistence can easily lead to many special measures. Matching only the last parenthetical in
repetitive corrections that numb both the To find only the first parenthetical in a a paragraph is in a way the mirror image
mind and the fingers. But we can use that paragraph, the idea is to look for an opening of matching only the first one: find a
repetitiveness to our advantage, because parenthesis which has no other opening parenthetical such that there is no opening
where there’s a consistent pattern of errors, parenthesis between it and the beginning parenthesis between the matched closing
there’s an opportunity to fix them with GREP. of the paragraph. This GREP expression parenthesis and the end of the paragraph:
One type of correction that regularly does that: ^.*?\K\(.+?\). The expression \([^)]+\)(?=[^\(]+$). The first part
occurs is the use and placement of works as follows: from the start of the of the expression, \([^)]+\), matches a
parentheticals, such as publication years paragraph (^), we match any sequence of parenthetical. The part of the expression
and names of publishers. These are often characters (.*) until we hit the first opening that ensures that we match just the last
wrong, and in order to fix them, we typically parenthesis. Then we discard whatever parenthetical is (?=[^\(]+$), which reads
need to find just the first parenthetical in we matched so far: that’s what \K, the “from here to the end of the paragraph, and
a paragraph or only the last one. But GREP magnificent and only recently discovered no character is (.”
expressions are gluttons—they want to modifier, does (see InDesign Magazine 73). —Peter Kahrel
INDESIGN MAGAZINE
Reprinted by from InDesign
permission74 Magazine 74
June 2015 To purchase this issue or subscribe, visit indesignsecrets.com/issues 53
1 5
GREP of the Month
Reveal Codes
Normally, you can’t target text with mixed formatting in
a GREP search. But with this workaround, it’s a cinch.
A limitation of GREP searches (and of To field, and then set Italic in the Find
normal text searches too) is that everything Format panel and Regular (or Roman, or
Figure 1: A GREP Find/Change to wrap all italic characters in
you’re looking for must be in the same whatever the non-italic style is called) in tags that you can use in another GREP operation.
style. Thus, it’s not possible to formulate the Change Format panel (Figure 1). The
a GREP expression along the lines of “find find expression matches everything in non-italic space, we can now search for
certain punctuation in italic that is followed italic. The replace expression uses $0, which [:;,.]</i>\x20 (you’ll recall that \x20
by a non-italic space.” This can be a bad stands for “whatever was matched by the stands for the space character). To get that
limitation, but fortunately there’s a way find expression.” Instead of the HTML-style punctuation out of italics, search using the
around this problem: you can reveal all or tags <i> and </i> that we used here, you GREP expression ([:;,.])</i>\x20, and
some formatting codes. can of course use any form: %i% and %/i% use </i>$1\x20 as the replacement string.
Using GREP expressions, you can would do fine too, as would @i@ and #i#—it To reinstate the italic formatting, use
temporarily add HTML-like tags that spell doesn’t matter much. the Find What string <i>(.+?)</i>, the
out formatting in a way that you can use After running this query, your text could Change To string $1, and set Italic in the
and manipulate. For example, to show contain things like ;</i> (where there’s a Change Format panel (make sure that you
all italic formatting, enter .+ in the Find space after the closing angle bracket). To leave the Find Format panel blank).
What field, <i>$0</i> in the Change find italic punctuation followed by a —Peter Kahrel
INDESIGN MAGAZINE
Reprinted by from InDesign
permission76 August Magazine
2015 76 To purchase this issue or subscribe, visit indesignsecrets.com/issues 44
1 8
GREP of the Month
If you ever have to add section letters to an possible of the same (+). In the example before and a font style, that has to be done
existing index, you can do that quickly using above, ^(\u).+\r matches Barbera 14 using a very simple, separate query. At Find
a GREP query. (and the return character), and (\1.+\r)+ What, enter ^\u$, and in the Change format
You use a single query: Find what: matches all following lines that start with B. panel, enter the paragraph style—in other
^(\u).+\r(\1.+\r)+ and Change to: And there’s your section. words, apply a paragraph style to all one-
$1\r$0. Translation: match and capture Now, to insert the section letter (which is letter paragraphs. Make sure the Change To
a capital ((\u)) at the beginning of a the letter we matched by ^(\u)), we replace field is empty, and then click Change All.
paragraph (^), and then match all following the section with the letter we captured ($1) The drawback of this method is that it
characters in the paragraph (.+) up to and followed by a return (\r) and the entire makes a mess of any formatting, such as
including the return character (\r). Then section ($0)—remember that $0 stands for italics. You can convert formatted text to
match a letter that’s the same (\1) as the “everything that was matched by the Find text tags, insert the section letters, and then
one we captured earlier, followed by all What expression.” convert the text tags back to the formatting
characters up to the end of that paragraph If you want to apply a paragraph style to (see InDesign Magazine, issue 76 for details).
(.+\r), group that, and find as many as the section letters, e.g., to add some space —Peter Kahrel
INDESIGN
Reprinted by MAGAZINE 79 InDesign
permission from Magazine
November 201579 To purchase this issue or subscribe, visit indesignsecrets.com/issues 1 56 0
GREP of the Month
\x
Unicode
Target any range of characters: Hebrew or Hirigana,
Cherokee or currency, Devanagari or dingbats.
GREP Level: Medium for Unicode characters is \x{0000}, so Looking at the chart more closely,
in InDesign’s notation the basic Arabic however, you notice that things are slightly
GREP is an excellent tool for finding characters range from \x{0600} to more complicated: Arabic is contained
characters in certain Unicode ranges and \x{06FF}. not in one range, but in four. Apart from
applying a character style to them. I’ve used Assuming that your document contains the basic range 0600–06FF, we have
this method to apply a phonetic font to a character style (e.g., “arabic”) that sets an Arabic Extended-A (08A0–08FF), Arabic
phonetic characters and an Arabic font to Arabic font, you can apply that character Presentation Forms-A (FB50–FDFF), and
Arabic script. To accomplish this, you need style to all Arabic characters in the basic Arabic Presentation Forms-B (FE70–FEFF).
to know InDesign’s notation for Unicode range like this: in the Find What field, In our document we’re interested only in
characters and the limits of, in this case, the enter [\x{0600}-\x{06FF}]+, and set one additional class, Extended-A. To find all
phonetic and Arabic Unicode ranges. the character style in the Change Format characters in two ranges—here, Unicode
The limits of Unicode ranges can be panel. The bracket notation is used in GREP ranges 0600–06FF and 08A0–08FF—we
found on the website of the Unicode expressions to define a range of characters simply add the latter to the class we defined
consortium. There you’ll find that the basic (it’s called a character class); and we use earlier: [\x{0600}-\x{06FF}\x{08A0}-
Arabic characters range from Unicode the plus operator to apply the character to \x{08FF}]+.
values 0600 to 06FF. InDesign’s format series of Arabic characters. —Peter Kahrel
INDESIGN MAGAZINE
Reprinted by from InDesign
permission82 Magazine
February 2016 82 To purchase this issue or subscribe, visit indesignsecrets.com/issues 46
1
GREP of the Month
(?<=)
Lookahead
instances of the word Figure not only when
Targeting a string by what follows it
they’re followed by a digit, but also by the
symbol #, which you use as a placeholder,
GREP Level: Medium finding instances of the word Figure only for instance. This is possible by using the
if they’re followed by a digit, we can use character class [\d#], which defines both
Many find-and-replace actions involve find- the query Figure(?=\s\d). Note that we digits and # as possible characters following
ing a string and replacing—or applying include in the lookahead \s (the space after Figure: Figure(?=\s[\d#]). If a character
some formatting to—only part of the found the word Figure) and the digit \d. If you try class is not suitable, for example, when you
string. But if you find a string and want to this, you’ll see that Figure is highlighted, but use a multi-character placeholder such as
apply formatting to just part of it, you have the space and digits are not. This means #@#, then you can use alternatives inside
to select the part of the string you’re inter- that whatever we do now applies only to the lookahead: Figure(?=\s(\d|#@#)).
ested in, do the replacement or formatting, Figure. For example, if you want to italicize Lookahead has a negative counterpart
and then find the next occurrence. All this these instances of Figure, just set Italics in that lets you match text when it’s not fol-
can become very tedious very quickly. the Change Format panel. Now you can lowed by some other text. The format of
Fortunately, GREP offers a way to do con- click Change All (or the more cautious these so-called negative lookaheads is (?!).
ditional finds, such as “find the word Figure Change and Change/Find) to process all For example, to find instances of the word
only if it’s followed by a digit.” These con- remaining instances. Figure when it’s not followed by a digit, use
ditionals are called “lookahead,” and their In lookaheads, you can use other GREP Figure(?!\s\d).
general format is (?=). In our example of constructs as well. Say you want to capture —Peter Kahrel
INDESIGN MAGAZINE
Reprinted by from InDesign
permission86 Magazine 86
June 2016 To purchase this issue or subscribe, visit indesignsecrets.com/issues 37
1 8