More Than a Hundred Thousand!?

The following discussion is for skeptics and language lovers.

A hundred thousand errors and problems may seem an extravagant number to claim but it is conservative.   Editor's USAGE database of writing errors and problems contains more than 35,000 items (and counting).  Most of them are "wildcarded" for efficiency's sake, so that each database entry can identify two or more variations of its basic pattern, and a rough estimate of the "scope" or "capture" of the database easily amounts to more than 100,000 troublesome terms.  A few examples will clarify what we mean.

Editor has an entry in its wordy-expressions dictionary to catch the phrase back-seat driver.   But back-seat drivers and back-seat driving are variations of the same expression.   We could make three entries to catch all the variations, but suppose instead we define a wildcard [w] that stands for "include all remaining letters of this word."  The single entry "back-seat driv[w]" then allows our computer program to catch all three variations.  (It would also improperly flag uncommon phrases like back-seat drivel and back-seat driveshaft, of course, if anyone should use them.)

If someone were to write back seat driver, by the way, Editor would point out that back seat should be hyphenated before a noun.  An incorrect spelling of back seat as backseat should be caught and eliminated by an ordinary spelling checker before Editor is used.

Some entries in a USAGE dictionary will catch a single problem word or phrase with no variations.   Most wildcarded entries, as we have seen, can catch two or more possible variants of a single word or phrase.   Consider the term co-opt: it needs a hyphen between the o's.  If we use our wildcard in the single entry "coopt[w]," that entry can identify coopt, coopts, coopted, coopting, cooption, cooptation, cooptative, cooptive, and cooptively—all listed (correctly, with hyphens) in Webster's—as spelling mistakes.  Microsoft Word's spelling checker identifies some but not all of these mistakes.  One wildcarded entry in Editor's spelling database can identify all nine.

Using a computer program that counts each wildcarded entry as representing up to three misspellings, a conservative count, we estimate that the 12000 entries in Editor's spelling dictionary can identify perhaps double that number of spelling errors.  Most of them are common, not exotic.

Most entries in Editor's USAGE dictionary files have a capture potential that cannot be counted precisely.  Consider the common phrase due to the fact that, an entry in the USAGE dictionary that checks for unnecessary words.  Due to the fact that is wordy—it means because or since—and wordy phrases make writing flabby and tedious.  How many variations on this basic phrase can an entry find if we add wildcards?   If we define a new wildcard [a] that stands for "any one word," we can compose a second entry, "due to the [a] fact that," which has considerable range.  The [a] wildcard brings possible phrases like due to the obvious fact that, due to the unexpected fact that, due to the well-known fact that, and an unknown number of others within Editor's scope.   If we add our earlier wildcard [w] after fact, allowing fact, facts, and even faction and factions to be included among variant examples that the computer would find, we have multiplied the entry's power even further.  A third entry, "due [a] to the fact[w] that," brings in a few more instances of the basic phrase: due possibly to the facts that and due unquestionably to the fact that, for example.  We cannot count all the possible variations of due to the fact that because we cannot determine which words writers might use in the wildcard positions.  But all the possible variations are all wordy, and an Editor user should be notified if any of them appears in her work.  To find them all takes only three USAGE dictionary entries.

Because the [a] wildcard allows Editor to find so many possible phrases to draw to the writer's attention, we give it a heavier weighting than [w] in our estimating program, and we use other wildcard expansions, as well.  Thus our conservatively estimated scope for Editor's USAGE database of 8500 wordy and redundant phrases is more than 42,000.  It is not extravagant to estimate that the 35,000 entries in Editor's five USAGE dictionary files—the two described above plus three others—have a potential range of well over 100,000 problem words and phrases, including more than 20,000 spelling mistakes that standard spelling checkers do not catch.

Beyond the range of the USAGE dictionary databases, we have programmed into Editor functions that use specialized context checking to find possible homonym and hyphenation mistakes, possessive-plural confusions, and many other errors and problems that spelling and grammar checkers like Word's and WordPerfect's overlook and that Editor's USAGE databases do not otherwise catch.  These functions' context-based range, applied to ordinary English prose, is too broad to calculate; it is at least many thousands, and when added to the scope of the other database files, it raises our estimate of Editor's potential for identifying writing problems to well over 100,000 words and phrases.

Where do the entries in these databases come from?  Many derive from our decades of experience as college English professors.  We have also done extensive research, over many years, in printed and electronic texts and compilations, dictionaries, and manuals, on the kinds of writing problems that Editor looks for.  Editor's accompanying Writer's Manual has a lengthy bibliography of these sources.

Our own reading of newspapers, magazines, books, menus, Web pages, e-mails, and miscellaneous advertising everywhere is an inexhaustible source of new entries. The sidebars on these Web pages illustrate the kinds of writing problems Editor finds.


Copyright © 2008 by E & J Thiesmeyer.    Last revised April 25, 2008.

Close this window