Connie (Adsett) Jess - Masters Thesis Abstract
Automatic Syllabification in European Languages: A Comparison of Data-driven Methods
Although automatic syllabification is an important component in several natural language tasks,
little has been done to compare the results of data-driven methods on a wider set of languages.
This thesis compares the results of four data-driven syllabification algorithms (IB1, the Look-up
Procedure, Liang's algorithm, and Syllabification by Analogy) on nine European languages (Basque,
Dutch, English, French, Frisian, German, Italian, Norwegian, and Spanish). Three questions are
investigated: which algorithm performs best, which domain (spelling or pronunciation) is easier
for automatic syllabification, and which languages are more straightforward to syllabify. Firstly,
findings show that Syllabification by Analogy performs better than the other algorithms tested with
a mean word accuracy of 96.84\%. Secondly, contrary to claims in the field, no significant
difference was found between automatic syllabification performance in the two domains. Finally, the
ranking of the languages in terms of syllabic complexity matches the results of previous work using
alternate approaches.