NEW
RO‑N3WS: A Romanian Speech Benchmark for Low‑Resource ASR
Romanian speech recognition systems face unique challenges due to the language's low-resource status. Unlike widely supported languages like English or Mandarin, Romanian lacks sufficient training data for accurate automatic speech recognition (ASR). This gap leads to higher error rates and poor performance in real-world applications. The RO-N3WS benchmark addresses this by providing over 126 hours of transcribed speech gathered from diverse sources like broadcast news, audiobooks, film dialogue, children’s stories, and podcasts. As mentioned in the Design and Development of RO-N3WS section, this dataset was created to address critical gaps in low-resource Romanian speech recognition by ensuring domain-agnostic diversity. This dataset not only expands the available training material but also introduces variations in speaking styles, accents, and background noise-key factors in improving model generalization. Low-resource languages often struggle with Word Error Rate (WER) improvements because existing datasets lack diversity or fail to represent real-world conditions. RO-N3WS solves this by curating speech data from multiple domains. For instance, audiobooks and children’s stories introduce clear, structured speech, while podcasts and film dialogue add spontaneity and colloquial language. This mix ensures ASR systems trained on RO-N3WS can handle both formal and informal speech patterns. Studies show that fine-tuning models like Whisper and Wav2Vec 2.0 on this benchmark reduces WER by up to 20% compared to zero-shot baselines, as demonstrated in the Baseline System Results and Error Analysis section. These results prove its effectiveness in low-resource settings. The impact of RO-N3WS extends beyond academia. Industries relying on Romanian speech recognition-such as customer service, healthcare, and education-stand to gain significantly. For example, a call center using RO-N3WS-trained models could transcribe customer interactions with higher accuracy, reducing manual effort and improving response times. Similarly, educational platforms could use the benchmark to develop voice-based tools for language learners, ensuring correct pronunciation is recognized even in varied dialects. Researchers and developers benefit as well, using RO-N3WS to test and refine algorithms tailored to Romanian’s linguistic nuances without relying on generic datasets that underperform for low-resource languages.