More Than a Hundred Thousand!?
The following discussion is for skeptics and language lovers.
A hundred thousand errors and problems may seem an extravagant number to claim but it is conservative.
Editor's USAGE database of writing errors and problems contains more than 35,000 items (and
counting). Most of them are "wildcarded" for efficiency's sake, so that each database entry can
identify two or more variations of its basic pattern, and a rough estimate of the "scope" or "capture"
of the database easily amounts to more than 100,000 troublesome terms. A few examples will clarify
what we mean.
Editor has an entry in its wordy-expressions dictionary to catch the phrase back-seat driver.
But back-seat drivers and back-seat driving are variations of the same expression.
We could make three entries to catch all the variations, but suppose instead we define
a wildcard [w] that stands for "include all remaining letters of this word." The single entry
"back-seat driv[w]" then allows our computer program to catch all three variations. (It would
also improperly flag uncommon phrases like back-seat drivel and back-seat driveshaft, of
course, if anyone should use them.)
If someone were to write back seat driver, by the way, Editor would point out that back seat
should be hyphenated before a noun. An incorrect spelling of back seat as backseat should
be caught and eliminated by an ordinary spelling checker before Editor is used.
Some entries in a USAGE dictionary will catch a single problem word or phrase with no variations.
Most wildcarded entries, as we have seen, can catch two or more possible variants of a single word or phrase.
Consider the term co-opt: it needs a hyphen between the o's. If we use our
wildcard in the single entry "coopt[w]," that entry can identify coopt, coopts, coopted,
coopting, cooption, cooptation, cooptative, cooptive, and cooptively—all
listed (correctly, with hyphens) in Webster's—as spelling mistakes. Microsoft
Word's spelling checker identifies some but not all of these mistakes. One wildcarded
entry in Editor's spelling database can identify all nine.
Using a computer program that counts each wildcarded entry as representing up to three misspellings, a
conservative count, we estimate that the 12000 entries in Editor's spelling dictionary can identify
perhaps double that number of spelling errors. Most of them are common, not exotic.
Most entries in Editor's USAGE dictionary files have a capture potential that cannot be counted
precisely. Consider the common phrase due to the fact that, an entry in the
USAGE dictionary that checks for unnecessary words. Due to the fact that is
wordy—it means because or since—and wordy phrases make writing flabby and
tedious. How many variations on this basic phrase can an entry find if we add wildcards?
If we define a new wildcard [a] that stands for "any one word," we can compose a second
entry, "due to the [a] fact that," which has considerable range. The [a] wildcard brings
possible phrases like due to the obvious fact that, due to the unexpected fact that,
due to the well-known fact that, and an unknown number of others within Editor's scope.
If we add our earlier wildcard [w] after fact, allowing fact, facts,
and even faction and factions to be included among variant examples that the computer
would find, we have multiplied the entry's power even further. A third entry, "due [a] to
the fact[w] that," brings in a few more instances of the basic phrase: due possibly to the
facts that and due unquestionably to the fact that, for example. We cannot count
all the possible variations of due to the fact that because we cannot determine which words
writers might use in the wildcard positions. But all the possible variations are all wordy, and
an Editor user should be notified if any of them appears in her work. To find them all
takes only three USAGE dictionary entries.
Because the [a] wildcard allows Editor to find so many possible phrases to draw to the writer's
attention, we give it a heavier weighting than [w] in our estimating program, and we use other wildcard
expansions, as well. Thus our conservatively estimated scope for Editor's USAGE database
of 8500 wordy and redundant phrases is more than 42,000. It is not extravagant to estimate that the
35,000 entries in Editor's five USAGE dictionary files—the two described above plus three
others—have a potential range of well over 100,000 problem words and phrases, including more than 20,000
spelling mistakes that standard spelling checkers do not catch.
Beyond the range of the USAGE dictionary databases, we have programmed into Editor functions that use
specialized context checking to find possible homonym and hyphenation mistakes, possessive-plural
confusions, and many other errors and problems that spelling and grammar checkers like Word's and
WordPerfect's overlook and that Editor's USAGE databases do not otherwise catch. These
functions' context-based range, applied to ordinary English prose, is too broad to calculate; it is at least many
thousands, and when added to the scope of the other database files, it raises our estimate of
Editor's potential for identifying writing problems to well over 100,000 words and phrases.
Where do the entries in these databases come from? Many derive from our decades of experience as
college English professors. We have also done extensive research, over many years, in printed and
electronic texts and compilations, dictionaries, and manuals, on the kinds of writing problems that Editor
looks for. Editor's accompanying Writer's Manual has a lengthy bibliography of these sources.
Our own reading of newspapers, magazines, books, menus, Web pages, e-mails, and miscellaneous advertising everywhere
is an inexhaustible source of new entries.
The sidebars on these Web pages illustrate the kinds of writing problems Editor finds.
Copyright © 2008 by E & J Thiesmeyer. Last revised April 25, 2008.