Visualizing the accuracy of the “i before e, except after c” spelling rule

Posted In: Language

See which words follow and break the “i before e” rule

I wanted to see how often there were exceptions to the spelling rule “i before e, except after c”.  I found a website (wordfrequency.info) that had a list of the 5050 most common english words and decided to do some analysis on it to see which words followed this rule and which did not. Below is a treemap graph that shows the words that follow the rule in green and those that do not in red. The size of the box represents how common the word is in regular American English usage (based on the frequency that it shows up in the Corpus of Contemporary American English).

What we see is that while 81% of the 158 most common words with ‘e’ and ‘i’ adjacent to one another do follow the rule, when you take into account how frequently these words are used, the weighted percentage of words following the rule drops to around 60%.  This is because some very commonly used words do not follow this rule and if you were to count how many times you use words from this list, it’s likely that about 40% of the time you’ll be using words that don’t follow the rule. For example, the two most commonly used words with ‘e’ and ‘i’ adjacent (their and being) do not follow the rule, since then have the ‘e’ before the ‘i’ but aren’t after a ‘c’.

 

I was inspired to look into this after seeing a tweet about the rule in the comic Pearls before Swine by @stephanpastis.

I asked my kids but they had never heard of the rule so perhaps this isn’t taught in schools anymore.
 

Sources and Tools:
I downloaded the word list from wordfrequency.info. The wordlist comes from the Corpus of Contemporary American English (COCA), a collection of English works across a wide variety of genres (spoken, fiction, popular magazines, newspapers, academic texts, and TV and Movies subtitles, blogs, and other web pages between 1990 and 2020). This word list was then analyzed using javascript to categorize the word as fitting or breaking the rule. The visualization uses the plotly.js open source graphing library and HTML/CSS/Javascript code for the interactivity and UI.

i before e rule




2 Comments »


2 Responses to Visualizing the accuracy of the “i before e, except after c” spelling rule

  1. sam2 says:

    Fun idea. Ceiling is miscoded. It follows the rule.

    • chris says:

      Thanks for catching that. A bug in my code didn’t check for the c being the first letter. It’s fixed now.

Leave a Reply

Your email address will not be published. Required fields are marked *