ICT Gebäude

Building Reliable AI for Multilingual Media Analysis at the University of Innsbruck in collaboration with Eurac Research.

Beyond the Black Box: DIDI Research Project Awarded Funding

Building Reliable AI for Multilingual Media Analysis

We are pleased to announce our participation in the newly funded interdisciplinary project DIDI ("Different groups, different lenses? How Media Framing Shapes Perceptions of Majority and Minority Communities"), funded by Research Südtirol Alto Adige 2024, where we bring our expertise in formal methods and natural language processing to tackle fundamental challenges in generative AI. Project DIDI is a collaboration between University of Innsbruck (TCS Research Group) and Eurac Research, investigating how media framing shapes perceptions across German, Italian, and Ladin communities in South Tyrol.

Tackling AI Bias and Hallucinations in Multilingual Contexts

As news media increasingly serve as training data for AI systems like ChatGPT and Google's Gemini, the risks of biased outputs and hallucinations—information inconsistent with facts—become critical concerns. For smaller linguistic communities with limited training data, these risks are amplified, as their cultural and linguistic nuances are often underrepresented or misrepresented. Our research addresses these challenges at the intersection of formal methods, natural language processing, and multilingual communication.

The scarcity of South Tyrolean context in LLM pre-training data creates significant challenges for generating accurate, unbiased content across the region's three languages. We are developing computational approaches to address two core problems: First, identifying and mitigating intrinsic biases in LLMs when applied to minority contexts, where cultural and linguistic underrepresentation leads to skewed narratives. Second, reducing hallucinations through formal constraint mechanisms and Retrieval-Augmented Generation (RAG) systems that ground model outputs in verified local knowledge bases. By combining logic-based constraints with curated regional multilingual databases, we aim to ensure AI systems can reliably analyze media content while highlighting differences in coverage between linguistic communities and detecting biased or fabricated information.

A critical challenge lies in adapting NLP techniques designed for high-resource languages to Ladin. We are developing specialized tools for topic modeling, sentiment analysis, and emotion mining by leveraging cross-lingual transfer learning, pre-training models on resource-rich languages and fine-tuning them on limited Ladin data. This work not only supports the project's media analysis objectives but also creates generalizable methodologies for extending modern language technologies to underrepresented linguistic communities while preserving their distinctive cultural and linguistic characteristics. Our interdisciplinary approach integrates theoretical computer science with social science research, contributing to more transparent, reliable, and inclusive AI systems for analysing digital discourse in multilingual societies.

Nach oben scrollen