Why Spellcheck Is So Good and Grammar Check Is So ...

It's much easier to program software to check spelling than it is to check grammar. bubaone/Getty Images

There's an old saying in robotics: Anything a human being learns to do after age 5 is easy to teach a machine. Everything we learn before 5, not so easy. That unwritten law of machine learning might explain why there are computers that can beat the world's best chess and Go masters, but we've yet to build a robot that can walk like a human. (Don't try to tell me that ASIMO walks like a human.)

This might also explain why the spellchecker on your computer works so brilliantly, but the grammar checker doesn't. We learn how to spell only when we're old enough to go to school, but the basics of language development can start as early as in the womb.

Inference and Context

Spelling is a finite task with discrete right or wrong answers. English grammar, on the other hand, contains a near infinite number of possibilities, and whether something is grammatically correct or incorrect can largely depend on subtle clues like context and inference.

That's why certain English sentences are such a pain in the neck for automated grammar checkers. Les Perelman, a retired MIT professor and former associate dean of undergraduate education who ran the university's writing program, gave me this one: 'The car was parked by John.'

My admittedly dated version of Microsoft Word (Word for Mac 2011) is programmed to recognize and correct passive voice, a no-no in most grammar circles. When I type this sentence into Word, the program dutifully underlines it in green and suggests: 'John parked the car.' That would be fine if John had parked the car, but what if I meant that the car was physically parked near John?

Simple mistake, you might say, but look what happens when I change the sentence to 'The car was parked by the curb.' Word underlines it and suggests: 'The curb parked the car.' That's downright goofy, even for a computer.

'So much of English grammar involves inference and something called mutual contextual beliefs,' says Perelman. 'When I make a statement, I believe that you know what I know about this. Machines aren't that smart. You can train the machine for a specific situation, but when you talk about transactions in human language, there's actually a huge number of inferences like that going on all the time.'

Perelman has a beef with grammar checkers, which he claims simply do not work. Citing previous research, he found that grammar checkers only correctly identified errors in student papers 50 percent of the time. And even worse, they often flagged perfectly good prose as a mistake, known as a false positive.

In one exercise, Perelman plugged 5,000 words of a famous Noam Chomsky essay into the e-rater scoring engine by ETS, the company that produces (and grades) the GRE and TOEFL exams. The grammar checker found 62 errors — including 14 instances of a sentence starting with a coordinating conjunction ('and,' 'but,' 'or') and nine missing commas — all but one of which Perelman classified as 'perfectly grammatical prose.'

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。