|
|
HOME | NEXT ARTICLE >> |
You’ll read them on blogs, not in newspapers,
|
![]() |
|
Rebecca Hwa
|
But researchers are getting closer to cracking the code. Last summer, Pitt Assistant Professor of Computer Science Rebecca Hwa, along with senior computer science major Carol Nichols and researchers and students from other universities, participated in a project titled “Parsing Arabic Dialects” at a workshop on human language engineering at Johns Hopkins University.
Before computers can understand human language, the language must be parsed, or broken down into smaller units. Parsing is an important component in many advanced natural language processing systems, and also is useful in language modeling to enable computers to recognize speech. (Language modeling finds the chance that a word will occur. For example, a certain string of sounds is more likely to be “mushroom soup” than “much rooms hope.”)
Hwa’s group consisted of professors of computer science and linguistics as well as graduate and undergraduate students. Three in the group were native Arabic speakers. They worked primarily with the Levantine dialect, spoken in countries bordering the eastern Mediterranean.
(Hwa herself does not speak Arabic. “For me it’s essentially ‘symbols in, symbols out,’” she says, although she did learn a few words and phrases, like “SabaaH el kheer”“good morning” in the Egyptian dialect.)
The researchers’ goal was to put the tools that have been developed for MSA to work in translating Arabic dialects. To do this, they compared newspapers written in MSA to transcriptions of Levantine Arabic dialogues.
The team wasn’t completely without help in this challenging task.
First, 60-80 percent of the words in MSA and the Levantine dialect overlap. Finding words that frequently showed up near those known words also helped, although, notes Hwa, “It’s not a precise process.”
Second, they applied their knowledge of typical word orders (subject-verb-object, for example) and constructs (such as double negatives, as in the French “Je ne parlez-pas”).
Hwa’s biggest surprise, she says, was that although she had imagined that translating between MSA and a dialect would be easier than between MSA and English, she soon discovered that wasn’t true.
“I’m still trying to understand why it’s so challenging, since we do have such an overlap of common words,” she says. “You would think differences between dialects of the same language would be much closer than of different families.
“One thing we’ve learned is that having a good word-to-word dictionary between MSA and Levantine, even if it is small, is very helpful.”
Hwa and Nichols are continuing their research into parsing Arabic dialects. Hwa hopes such tools might one day be used to summarize or search blogs written in dialects, for example. But she emphasizes there is a long way to go: “Last summer’s work was only the very beginning.”
| Home | Top of Page |
Pitt Home | Find People | Current Pitt News | Past Issues | Contact Us |