THE DOWNPOUR
Posts
One drop at a time

One drop at a time

The AI Provenance Problem

Christian Duvall
July 21, 2023

Is it Friday already? Well, the weeks start coming and they don’t stop coming.

Here’s what we got for you:

The Provenance Problem
Links and whatnot

Here’s what you got for us:

Please share this post on social media! It really helps us gain additional reach. (check out the handy links above)
Take part in our poll below! Be part of shaping our future content.

The Provenance Problem

AI scares the living sh*t out of people. I get it, it’s scary sh*t.

Some respond by attacking the tools.

Some respond by going on strike.

Some respond by filing lawsuits.

And some respond by demanding Orwellian levels of spyware to be embedded in all things that machines (…and people) create.

So about that last part… let’s talk a little about what exactly is being suggested and why it matters so much.

To get this topic started, first we need to understand the C2PA. The C2PA is the Coalition for Content Provenance and Authenticity – a foundation that was born of the unification of Adobe’s Content Authenticity Initiative and Microsoft’s Project Origin.

The consortium’s goal is to provide a way to track, from inception-to-consumption, the entire “custody chain” of any piece of content generated in this post-AI era.

Still with me? Ok.

This past week, the Senate Judiciary Committee heard a group consisting of a representative of a major music conglomerate, policy representatives from AI-empowered companies (Stability AI, Adobe), an artist, and a college professor. So to be clear, 942 legal professionals…and one angry artist.

If you’ve already watched it – because presumably you just can’t get enough of aging politicians cracking dad jokes and dropping inane puns – then you can probably skip ahead a bit.

I don’t know why that tree over there doesn’t like me, but it’s been throwing shade all day.

From the hearing, probably.

The session begins with a lifeless AI parody of “New York, New York” (“AI, AI” of course) in the simulated voice of the late Frank Sinatra, crooning cringeworthy ChatGPT-generated lyrics that crescendo into literally nothing. It was so uncomfortable that I half-expected Senator Coons to start handing out trophies to the audience.

And the Dundie for Whitest Old Dude goes to…

The hearing continued for over an hour, focused largely on the training sets used to build LLMs and the weakness of the “opt-out” culture that’s been adopted by technology companies (robots.txt ftw). Karla Ortiz (the lone artist) also called out an industry practice called Data Laundering, which was an interesting TIL for me.

On the surface it seemed like everyone was arguing for the same outcome – a mutually ideal situation for LLM businesses and creatives alike. But there was far more between the lines than just that.

A quick sidebar

It is hilarious to me that a record label is there to “stand up for its artists” and their original creations. The industry has already decided that creating endless clones is the business model and it has been that way since the dawn of the profession. It’s a simple formula: find an artist with a fresh sound that people love, sign 20 clones knowing a few more will boost the bottom line, and let the rest drown in a no-win contract. Repeat ad nauseam.

Yet with the introduction of AI’s capabilities – that arguably can create something more unique than its inputs – all of the profiteers seem to be in a tizzy, fighting for “artists rights.”

It’s a lie.

At one point, a member of the group even prompted for a new piece of Federal legislation that would grant an “Anti-Impersonation Right” to humans, offering protection from unlawfully copying someone else’s style.

Uh oh.

We know that the very moment the executives figure out how to properly exploit AI, it’s over for the artists. At least, for the ones that aren’t truly genre-defining. _{Milli Vanilli was an inside job.}

To further underscore the irony, one of the most popular books on creativity suggests a mental model that is eerily similar to how an LLM behaves. TLDR; By bringing in as many influences as possible, one can truly create something unique. Remember “there is no new thing under the sun” and what is a human if not a large language model that takes longer to consume its training set?

So back to the nuance.

On the AI side, the testimony in the hearing kept going back to ways to not restrict the models’ training sets, but rather to focus on how to identify content that was generated by AI.

Yes, in yet another case of “how could we do this to ourselves?”, large tech organizations are here to save you from the problem they’ve created. Surprise – C2PA is exactly what the doctor ordered.

The proposed technology promises a very similar value to blockchain, except instead of having a ledger be mathematically verified across a large group of independent parties, this time the good stuff is embedded completely in the content itself and policed by super-duper altruistic mega-corporations.

Let’s be super clear. This will not work.

AI content will be generated in places outside of U.S. jurisdiction that will not comply.

An existing deepfake can be duplicated without its metadata.

An AI-generated image can be screen-cap’d (or even photographed) to wipe the audit trail out completely; watermarks can be eliminated by a noncompliant AI.

This means that while we can be sure something is of dubious authenticity, we can’t be truly certain of its origins or – in the case of AI – what training material was used to create the model that generated it. And that’s entirely the point.

But Stability, Adobe and all of the others really don’t care about that. They want to provide a solution to the concern that doesn’t impede their ability to keep doing what they do, because removing the large amounts of human-created content (by policy) will cause the AI to absolutely suck.

Even if all of these problems could be solved (psst: they can’t), this is just the tip of the iceberg when it comes to problems with this level of origin tracing. Now we introduce another no opt-out solution for creators, which will lead to major issues with privacy and free speech.

Perhaps even worse, this will serve to amplify the “authenticity” claims of whatever information mainstream media sources create, further silencing information from any unapproved origin. Imagine a world where Threads and Facebook only allow content from “authentic” sources. We already know that Meta will comply with pressure and the list of partners is not exactly filled with independent sources.

While I’m not here to sound the death knell for journalism, the corporations may gain yet another tool to minimize the smaller, independent voices that seek truth. That may be the first real risk that AI has introduced.

Links and whatnot

Sponsor

This issue is brought to you by Rainwell Technical Consulting.

A lot of founders struggle with the same questions.

How do we get from MVP to a scalable, effective product?
Is our current technology holding us back?
What type of technical organization is best for us?
How do we know we’re hiring the right technical employees?

At Rainwell Technical Consulting, we help companies gain the confidence to know their tech is heading the right direction, so they can focus on solving the big problems.

Email me ([email protected]) to discuss how these questions might not be so scary.

If you’d like to sponsor future issues: [email protected]