Tuesday, 27 August 2013

Usage & Academic Research: Differences In Similar Words -- The Corpus Approach Part 1


Centre of Corpus Research @ The University of Birmingham
Image from birmingham.ac.uk

Received an email from student A asking me the difference between the following words,

Hi Locky,

Long time no see.  I was on of your student few months ago.
I've some question on using some word.
What's the different of using/meaning  between the following words, could you explain them with some examples?

sick and ill;
whether and if;
fast, quick, speedy and rapid;
appropriate and suitable;
finish and complete;
give in and give up;
effective and efficient;
elder and older;
further and farther;
gather and collect;
find, look for and find out;
match and game;
small, little and tiny;
like, alike and as;
like, love and prefer;

I'm looking forward for your reply.

This is a really good question, one that looks simple but really tough to answer. Some will need to go into the history and origin of the words to explain, which I, like many of you, will have to rely on scarce internet sources.

An alternative method is to adopt a corpus linguistics approach. This time round, we will use the Corpus of Contemporary American English (COCA)

The first step we need to consider is in what contexts are we going to consider.

Let say we are interested in the verb within 4 words to the left of sick and ill

Sick and Ill comparison extracted
Image from COCA
From the left chart, we can easily see that the frequency of sick (W1) showing up with 'm, got, getting, called, am, make, being, call, get, gotten and gets is hundreds to thousands of times more often than the frequency of ill (W2) with those words.

From the right chart, ill (W2) is most often used with speak and injured. Ill is used roughly as often as sick (W1) with became, but when is, had, be, are, been, was and were show up, sick (W1) basically dominates again.

In summary, the data shows that it is very common in American English to see 
  1. verb-to-be + sick
  2. get + sick
  3. call + sick
  4. make + sb + sick (sb = somebody)
  5. had + past participle of main verb + sick
whereas it is slightly more often to see
  1. speak  + ill ( + of sb)
  2. injured + and + ill
Take note that with ill, all 11 verbs are shown to have found examples, just that the frequency of occurrence is very small as compared to that when used with sick

These results from the corpus show the general behaviour of these words when the users of American English apply them.

********************************************

What about something more difficult like whether and if?

Questions like this, I will usually dig out my good old friend Practical English Usage by Michael Swan, but can the corpus approach provide us with any insights?

Let's set our conditions. I'm interested in the following:

  1. which are the popular words that come immediately before whether and if?
  2. which are the popular words that come immediately after whether and if?
  3. which are the popular words that come within 2 words after whether and if?
  4. which are the popular words that come within 3 words after whether and if?

    and everything else that unexpectedly stands out from the data.

The list can go on and on, so it really depends on the data in each search. Since the entire list is pretty long, I will use Photoshop to compile my observations in the question 1, and then I will just list them out by typing.

1. Which are the popular words that come immediately before whether and if?
Image from COCA
From the chart above, we can tell that the words which commonly go before whether (W1) are 
  • debated/ing, regarding, weighing, considers, exploring, investigating, discussing
and the words which commonly go before if  (W2) are 
  • what, surprised, nice, easier, die, sorry, damned, lucky, few, better, (from image above)
  • especially, even, happens (from image below)
It is obvious to us the -ing form of a verb is very common just before whether.

verb + ing whether

As for if, we can see that both positive and negative adjectives are common before if.

adjective + if

while what if, especially if, even if, happens if are also prominent.


Image from COCA
Some words stand out in terms of extremely high frequency counts for when used with both whether and if.

  • of, on, about, to, question.

What this tells us is that it is common to see 

  • of whether, on whether, about whether, to whether, question whether
  • of if, on if, about if, to if, question if
It should be noted that some of these observations do not make any sense, such as of if, on if, about if, so we must look into actual sentences themselves. 

"Of if" examples
Image from COCA
From the above extract of the examples of of if, we can quickly see that of does not actually form a pair with if. In fact, of forms pairs with the words in front of it, such as a matter of, think of, the author of, speak of, keep him out of.

if simply marks the beginning of another clause.


Up till this point, we should understand that although numbers can be extremely powerful in providing us some insights of the behaviour of words, we cannot simply look into the search results and immediately make conclusions. If there are weirdness in the search results, we must go into the actual examples to get a clearer picture. We may also need to adjust our search criteria as we go on in order to get to our final result. It may sound time-consuming, but like everything else, practice makes perfect, once you are familiar with the system, everything can be done in split seconds!

In Usage: Differences In Similar Words -- The Corpus Approach Part 2, we will continue to work with whether and if to answer the rest of the questions.

Have a happy corpus day!

Resources:
Corpus of Contemporary American English (COCA)