From Stone Age to the Age-of-Not-Knowing-What’s-Real-Anymore – The Rise of Generative AI

The war in Ukraine has brought to the fore how difficult it is to differentiate between genuine photos and fakes these days, now that we can use generative artificial intelligence (AI) to create make-believe images.

With both sides in the conflict pushing propaganda and social media platforms designed to elevate the most emotional and dramatic images, it has proved essential to be able to verify any content emanating from this war.

Media networks such as ABC News use verification tools such as geolocation to identify the location where an image has been captured, as well as weather indications, sun orientation and length of shadows on the image to estimate date and time of capture. But it’s on-the-ground reporters that the network relies on to provide more precise verification, as well as provide context for any video footage that may emerge.

‘Sometimes we’re able to verify the date and location of a video but without context it’s often difficult to tell what is going on, in which case we can’t use it,’ said ABC News on its website. ‘Digital verification is hugely important, but without real-world reporting from the ground, the number of verified videos would be massively reduced.’

So, a complex, multi-layered, and not always reliable process. However, there now exist easier, more effective tools to authenticate images, such as the one developed by Thomson Reuters, Canon, and Starling Lab, an academic research unit based at Stanford and University of Southern California.

Can hashes save the day?

The Reuters/Canon technology addresses the legitimacy of images used in news reporting so that AI images don’t slip through the cracks as real photos. It consists of an end-to-end method for embedding information into an image during capture that is preserved through editing and publication.

According to the Reuters website, the technology was tested during coverage of the war in Ukraine, when Reuters photographer Violeta Santos Moura captured pictures from the frontline using a prototype Canon camera that digitally assigns each photograph, and its corresponding time, date and location, with a unique identifier (hash value), and then cryptographically signs the photo to establish a root of trust for its authenticity. The photo’s pixels, GPS and other metadata are then sent to the Reuters system directly from the camera, where they are registered onto a public blockchain, and preserved in two cryptographic archives.

Each successive edit to the photo creates a new record in a private database (ProvenDB) that is indexed to the original registration. This allows edit logs to be kept private, but these logs also have their authenticity records registered on the public blockchain. The process continues until publication of the photo, when a final version is created and the photo is distributed with information about its original time, date, location and blockchain registration embedded directly into it, using the C2PA standard.

A news consumer can check the validity of a photo by comparing its hash to the value in the blockchain. The values should be the same, irrespective of where the consumer has retrieved the picture. If they are not, this means the image has changed since Reuters published the photo, which breaks the chain.

‘This is a wonderful thought – but you do have to wonder how many news consumers will actually go through the process of learning how to do that for themselves,’ observed a journalist from voices.media. While this newsletter would have to agree with the journalist’s observation as far as consumers are concerned, for news networks with an obligation to verify all pictures used in their stories, the Reuters/Canon end-to- end authentication system could prove a viable alternative to the aforementioned complex and time-consuming methods for determining image legitimacy.

The threat to e-books

Moving to another area where generative AI is proving to be a threat: e-books. A Nielsen survey funded by Digimarc, providers of digital watermarking, has revealed that e-book piracy costs US publishers $315 million yearly in lost sales. While book piracy is not a new phenomenon for the publishing industry, the way AI generates content in seconds has led to new concerns relating to counterfeit e-books, fraudulent listings and copyright ownership.

Recently, author Jane Friedman discovered several listings of fraudulent books using her name, and likely written by AI-generated content. Amazon and Goodreads resisted removing the faux titles until the author’s complaints went viral on social media. Friedman’s blog post titled ‘I Would Rather See My Books Get Pirated Than This (Or: Why Goodreads and Amazon Are Becoming Dumpster Fires)’ detailed her struggle with counterfeit books. ‘It feels like a violation because it’s low-quality material with my name on it,’ she said.

Friedman isn’t alone in this struggle; another author posted her concern on X (formerly Twitter), when she discovered 29 titles on Goodreads that incorrectly listed her name as the author.

In February, Reuters wrote about authors using ChatGPT, a language-powered AI model, to write e-books and sell them through Amazon. And in June, Vice reported an influx of dozens of AI-generated books, full of nonsensical content, that had taken over Kindle bestseller lists.

Then in September, the Authors Guild and 17 specific authors were named in an ongoing class action lawsuit against OpenAI, the developers of ChatGPT, for using copyrighted books to train AI models.

Most AI models are trained on huge amounts of content scraped from the web, be that text, code, or imagery. Like most machine learning software, they work by identifying and replicating patterns in that content. However, the creators of the content are human beings, and their work is often copyright-protected.

AI companies argue that the use of such content is covered (in the US, at least) by the doctrine of fair use, which aims to encourage the use of copyright-protected work to promote freedom of expression.

Training a generative AI on copyright- protected data is likely legal, but you could use that same model in illegal ways, explained Daniel Gervais, a professor at Vanderbilt Law School, speaking to theverge.com. Think of it as the difference between making fake money for a movie and trying to buy a car with it.

‘If you give an AI 10 Stephen King novels and say, ‘produce a Stephen King novel,’ then you’re directly competing with Stephen King. Would that be fair use? Probably not,’ said Gervais.

The challenge, therefore, is to identify what remedies can be introduced, technical or otherwise, to allow generative AI to flourish while giving credit or compensation to the creators whose work feeds AI models?

The most obvious suggestion is to licence the work and pay its creators, similar to what happened in the field of music with companies like Spotify and iTunes making licensing deals in order to use content legitimately, reported theverge.com.

The Digimarc solution

As far as technological remedies are concerned, the Digimarc Validate technology offers a digital copyright protection solution that can be purchased and applied by anyone looking to protect their digital assets, such as e-books. The solution consists of a covert digital watermark embedded in the book, which provides it with a unique, machine- readable identifier. The identifier carries data on the book’s ownership, authenticity, provenance, and copyright, and travels with the book wherever it goes in the digital world.

The technology is powered by Digimarc’s digital watermark detection software, called SAFE (Secure, Accurate, Fair, Efficient), which AI companies have to buy into if they want to prevent copyrighted material with the Digimarc Validate symbol from making it to generative AI training datasets.

So far, however, AI companies have not promised they will stay away from copyrighted material in training datasets, although seven of the top AI companies (including Amazon, Meta, Google, and OpenAI) did reach a deal with the US government to introduce guardrails such as digital watermarking into AI-generated content. However, the goal with this deal is to help consumers differentiate between AI and human content, rather than provide copyright protection.

Nevertheless, having a digital ‘paper trail’ of the copyright on a digital asset could allow the human creators of the asset to identify cases where AI developers may have intentionally infringed that copyright, and take appropriate action.