Support

AI vs. Data sovereignty: why on-premises AI matters more than ever

20 May 2025

pic 47_1.jpg

In May 2025, a major open letter made headlines across the UK and beyond. Signed by over 400 artists - including Paul McCartney, Elton John, Dua Lipa, and Ed Sheeran - the letter urged Prime Minister Keir Starmer to rethink proposed changes to AI copyright laws. Their concern? That powerful AI systems are being trained on their work without permission, compensation, or oversight.

According to reports in BBC News, the artists warn that the UK’s global leadership in the creative industries is under threat if AI companies are allowed to scrape protected material from books, music, and scripts without consent (BBC News).

This moment highlights a much bigger question: Who controls the data AI learns from?

The massive appetite of Generative AI

To perform at the level users expect, large language models (LLMs) must be trained on enormous datasets. These datasets are often pulled from:

  • Wikipedia
  • Online forums like Reddit
  • News articles
  • Book archives
  • Video subtitles (e.g. from YouTube)

For example, GPT-3 was trained on hundreds of billions of tokens sourced from Common Crawl, WebText2, Wikipedia, and multiple book datasets (Wikipedia).

But this scale comes with trade-offs. Some companies - including Apple, Anthropic, and others - have faced backlash for training on scraped YouTube subtitles without obtaining creator consent (WIRED).

The risks of training on the open web

Using publicly available content sounds harmless until you consider the consequences:

  • Copyright violations: training on protected material without consent may breach intellectual property laws.
  • Loss of privacy: public data may include personal or sensitive information that was never meant for AI ingestion.
  • Lack of control: organisations can’t see or shape what’s in the training data - which can lead to hallucinations or reputational risk.

When AI models are trained on anything and everything, no one knows what’s inside the black box.

Why Ulla takes a different path

Unlike generative AI trained on uncontrolled data, Ulla is designed with data control at its core. Here’s how:

  • On-premises deployment: Ulla can be hosted entirely on your organisation’s own infrastructure. No cloud dependency. No external processing.
  • Internal learning only: Ulla works exclusively with your meeting data. It doesn't scan the web or pull from public content.
  • No third-party AI services: Ulla does not use OpenAI or similar APIs. All processing is done securely, ensuring privacy and compliance.

This means your organisation gets the benefit of AI-driven productivity - without giving up ownership or oversight.

The takeaway

The letter from Britain’s artists is about more than copyright. It’s about the right to control how your work, your voice, and your data are used in the age of AI.

For public institutions, legal firms, healthcare providers, and any organisation handling sensitive conversations, tools like Ulla provide an answer:

Yes to AI - but on your terms.


🔒 Want to learn more about how on-premises AI can protect your data and respect your boundaries? Visit ulla.bot

Posted in Uncategorized on May 20, 2025.