• Home
  • About
  • Events
  • Blog
  • Articles
  • Find an Indexer
  • Join
  • Member Documents
    • Meeting Minutes >
      • 2017 Spring
      • 2016 Fall
      • 2016 Spring
      • 2015 Fall
      • 2015 Spring
      • 2014 Fall
      • 2014 Spring
      • 2013 Fall
      • 2013 Spring
      • 2012 Fall
      • 2012 Spring
      • 2011 Fall
      • 2011 Spring
      • 2010 Fall
      • 2010 Spring
    • Brochure
    • Bylaws
    • Policies & Procedures
    • Disbursement Form
    • Reimbursement Form
  • Contact
Heartland Chapter of the American Society for Indexing

Patterns in Indexing:
Software Find and Replace

By Sue Klefstad
Spring 2014
All indexing software—whether Cindex, Sky, Macrex, or Word—offers an advanced find and replace feature commonly called “patterns”; in Word this feature is called “wildcards.” In the computer world, these advanced find and replace patterns are known as “regular expressions” or “regex.” There are a number of regular expression tutorials on the Web, but only Cindex 3 uses the same symbols as regex in its patterns; Cindex 2, Sky, Macrex, and Word use different symbols for the same concepts. 

With patterns, we add the power of specificity to the Find command. For example, to find entries with acronyms, we could simply search for a left parenthesis, but then we’d also find all the entries with glosses in parentheses. Patterns allow us to specify finding a left parenthesis followed by at least two capital letters: 
Cindex 3 
\([:lu:][:lu:]

Sky 7 
([A-Z][A-Z]
Cindex 2 
([A-Z][A-Z]

Macrex
([A-Z][A-Z]

What have we said here? In Cindex 3, parentheses have a special meaning, so to find a parenthesis, we have to precede it with a backslash to specify a literal parenthesis character. Parentheses are not special characters in Cindex 2, Sky 7, and Macrex, so the left parenthesis has no special treatment here. However, for other characters that are special, Cindex 2 and Macrex also use the backslash to turn special characters into literal characters; Sky 7 puts literal characters inside square brackets. 

Then after the parenthesis, we specify looking for two uppercase letters. Cindex 3 uses “character sets” to specify certain collections of characters, such as [:ll:] for lowercase letters and [:lu:] for uppercase letters. The Cindex 3 user guide states that these named sets are more reliable than enumerating the characters. 

Cindex 2, Sky 7, and Macrex all use square brackets to specify the uppercase letter character sets, which is what Cindex 3 means by enumerating the characters. But Cindex 3 can also use square brackets to enclose character sets, just like the others do. Confused yet? The manuals do a nice job of explaining all the symbols and provide good indexing examples. For an explanation of Word’s wildcards, check out the Microsoft Word MVP page on wildcards, “Finding and replacing characters using wildcards.” 

I like to use patterns for quality control. For the science texts that I index, my clients want capitalized main headings. But some terms, such as pH or cAMP, need a lower case initial letter, so I can’t use the built-in main entry capitalization function. I check that my main entries are properly capitalized by Finding main entries that begin with a lower case letter: 

Cindex 3 
^[:11]

Sky 7 
<[a-z]
Cindex 2 
(^[a-z]

Macrex
(^[a-z]

I need to select the Main field from the field dropdown list. 

The caret (^) used by Cindex and Macrex tells the Find to start searching at the beginning of the line. Sky uses a less than sign (<). To find a pattern at the end of a line, Cindex and Macrex use a dollar sign ($) and Sky uses a greater than sign (>). 

In Sky, if the field dropdown list is set to Any field, then the record is considered one block of text; the less than sign (<) matches the beginning of the Main field and the greater than sign (>) matches the end of the Page field. To match the beginning and end of a specific field, simply select the field from the dropdown list. 

In Cindex and Macrex, if the field dropdown list is set to Any field then each field is checked for a match at its beginning and/or end. And yes, you can match both the beginning and end of a field: 

Cindex 3 
^[BC].*y$

Sky 7 
<[BC]?*y>
Cindex 2 
^[BC]?*y$

Macrex
^[BC]?*y>

This pattern will find words like Biology, Chemistry, Biography, Cry, and By. It’s looking for either a B or a C at the beginning, then anything or even nothing, with a y at the end. 

Another quality control use of patterns is to check that all t and f table and figure indicators are in italic when they need to be. This is a two-step process of first doing a Find All in the page field for [0-9][ft]. I specify a number before the f or t to avoid including cross references containing f or t. Then in that find group I Find in the page field [ft] with the Attribute set to italic and the Not checkbox checked, to find the f’s and t’s that are not italic. 

Download the Spring 2014 workshop handout for a table of the commonly used pattern symbols for Cindex 3, Cindex 2, Sky 7, and Macrex. 


Picture
Source: xkcd.com
© 2014 by Heartland Chapter of ASI. All rights reserved.
Powered by Create your own unique website with customizable templates.