University of Pittsburgh |  Pitt Home | Find People | Contact Us


PittChronicle

HOME | NEXT ARTICLE >>


You’ll read them on blogs, not in newspapers,
hear them on street corners, not in speeches:
The Many Tongues of the Arab World
Pitt computer scientist Rebecca Hwa is working on
getting computers to translate Arabic dialects

January 9, 2006 Issue

By Karen Hoffmann

Under the Iraqi sun, sweat pours down the soldier’s face and into his eyes. He squints at the man standing before him, who gestures vehemently and repeats something. But the soldier doesn’t speak Arabic. Is the man threatening him? Warning him of danger?

To find out, the soldier reaches into his pocket and pulls out a computer that instantly renders the Iraqi man’s words into English.

That Arabian Nights-like trick isn’t as fanciful as it may sound, thanks to progress being made by researchers at Pitt and elsewhere in programminging computers to “comprehend” and translate spoken Arabic.

Rebecca Hwa
Modern Standard Arabic (MSA), the “high” version of the language used in official speech and newspapers, is fairly well understood by computational linguists. But spoken Arabic is different from MSA, consisting instead of dialects that vary throughout the world. Because of this variety and because dialects are mainly spoken, not written, they have proven much more difficult for computers to translate.

But researchers are getting closer to cracking the code. Last summer, Pitt Assistant Professor of Computer Science Rebecca Hwa, along with senior computer science major Carol Nichols and researchers and students from other universities, participated in a project titled “Parsing Arabic Dialects” at a workshop on human language engineering at Johns Hopkins University.

Before computers can understand human language, the language must be parsed, or broken down into smaller units. Parsing is an important component in many advanced natural language processing systems, and also is useful in language modeling to enable computers to recognize speech. (Language modeling finds the chance that a word will occur. For example, a certain string of sounds is more likely to be “mushroom soup” than “much rooms hope.”)

Hwa’s group consisted of professors of computer science and linguistics as well as graduate and undergraduate students. Three in the group were native Arabic speakers. They worked primarily with the Levantine dialect, spoken in countries bordering the eastern Mediterranean.

(Hwa herself does not speak Arabic. “For me it’s essentially ‘symbols in, symbols out,’” she says, although she did learn a few words and phrases, like “SabaaH el kheer”—“good morning” in the Egyptian dialect.)

The researchers’ goal was to put the tools that have been developed for MSA to work in translating Arabic dialects. To do this, they compared newspapers written in MSA to transcriptions of Levantine Arabic dialogues.

The team wasn’t completely without help in this challenging task.

First, 60-80 percent of the words in MSA and the Levantine dialect overlap. Finding words that frequently showed up near those known words also helped, although, notes Hwa, “It’s not a precise process.”

Second, they applied their knowledge of typical word orders (subject-verb-object, for example) and constructs (such as double negatives, as in the French “Je ne parlez-pas”).

Hwa’s biggest surprise, she says, was that although she had imagined that translating between MSA and a dialect would be easier than between MSA and English, she soon discovered that wasn’t true.

“I’m still trying to understand why it’s so challenging, since we do have such an overlap of common words,” she says. “You would think differences between dialects of the same language would be much closer than of different families.

“One thing we’ve learned is that having a good word-to-word dictionary between MSA and Levantine, even if it is small, is very helpful.”

Hwa and Nichols are continuing their research into parsing Arabic dialects. Hwa hopes such tools might one day be used to summarize or search blogs written in dialects, for example. But she emphasizes there is a long way to go: “Last summer’s work was only the very beginning.”



 Home | Top of Page | Pitt Home | Find People | Current Pitt News | Past Issues | Contact Us