Experiments on Automatic Prosodic Labeling. Antje Schweitzer, IMS Uni Stuttgart This talk presents first results from experiments on automatic prosodic labeling. Using the WEKA machine learning software [Witten/Frank:2005], classifiers were trained to determine for each syllable in a speech database its pitch accent type and its boundary type. Pitch accents and boundaries are according to the GToBI (Stuttgart) dialect, with slight modifications: downstepped accents are mapped to their un-downstepped counterparts, and for the tonally unspecified boundary tones - and %, the preceding trailing tone is included in the category label, thus yielding (H)- and (L)- instead of unspecified -, and (H)% and (L)% instead of unspecified %. In order to build the classifiers from a database, as well as for the application of these classifiers to new data, the databases must be fully annotated on the segment, syllable, and word level with good accuracy. The labels must include word stress, and pauses must be labeled as such. The results presented here are for data for which POS labels (STTS tagset) and punctuation were annotated on the word level, but leaving this information out only caused minor loss in accuracy. There must be sufficient data to reliably estimate means and standard deviations of the durations observed for each phoneme. The following procedure derives all information necessary for building the classifiers. (i) F0 extraction and smoothing. (ii) Utterance building: The speech and label files are converted to Festival utterance format as used by Festival, the text-to-speech system developed at Edinburgh University. (iii) Feature extraction: The features needed for the classifiers are extracted from the utterances using Festival. The extraction involves F0 parametrization using Festival's PaIntE approximation [Moehler/Conkie:1998] and the calculation of phoneme-specific duration z-scores. Further features are the location of word stress, word boundaries, and pauses. Also, certain higher-linguistic properties that are known to be pedictive of boundary and accent placement are derived from the POS tags and punctuation marks using our German Festival linguistic preprocessing module. (iv) Building classifiers using WEKA. This procedure was applied to two very similar speech synthesis corpora developed in the SmartWeb project using the same text material, one spoken by a professional male speaker (approx. 2 hrs. of speech), and one spoken by a professional female speaker (approx. 3 hrs. of speech), which both had been manually prosodically labeled. Working with the male data, several classification algorithms provided with WEKA turned out to yield very good results, with no significant differences between the best two (Random Forests [Breiman:2001] and Bagging [Breiman:1996]). In the following, results are reported for the Random Forest classifiers. When evaluating the classifiers on the word level, the overall accuracy was approx. 70% for pitch accents, and 88% for boundary tones. This is comparable to inter-labeler consistency as reported by [Grice/et al.:1996], which was found to be at 71% for pitch accents, and at 86% for boundaries. However, this comparison is not completely fair because downstep was ignored in the experiments described here. Applying the classifiers to the female data gives very good results for boundaries (same accuracy as on male data); for pitch accents, the accuracy is lower (65%, compared to 70% on the male data). However, the performance is much above the baseline classifier which just assigns the most frequent category, which would yield an accuracy of approx. 40% for pitch accents).