Skip to main content

Multimedia forensics when digital photography meets artificial intelligence | PhotoVogue Festival 2023: What Makes Us Human? Image in the Age of A.I.

In this talk, we showcase the progress made in by the multimedia forensic community in the last few years. From leveraging AI-driven detection methods to unveiling unique data-driven approaches, this talk sheds light on the collaborative efforts that are reshaping the field. As we navigate through the intersection of AI and multimedia forensics, we'll gain insights into the promises and limitations associated with these evolving technologies.

Released on 11/22/2023

Transcript

[audience clapping]

Welcome everybody. Good afternoon everybody.

Today I m here to talk to you about multimedia forensics

in digital photography in an era of artificial intelligence.

I start by saying that faking photos

and faking pictures is nothing new.

Here is an example.

We are in 1860 and this is Abraham Lincoln

who got his head swapped on the body of another politician.

Around the same years, 1861, 1865,

another picture during the American Civil War,

the picture on your right, was actually a composition

of three different photographs.

The one on your left, we have a background, a horse,

and this General Ulysses Grant

who got this picture as a composition of other photographs

and this is, I mean, quite a few years ago.

So this is nothing new.

The thing is that today it is much easier.

It s much easier to generate picture, to fake picture,

to tamper with images and imagery of different kinds.

One of the reason because of this easiness to do that

is that, of course you all know we have

so many user-friendly and powerful software.

Softwares, Photoshop from Adobe, just to mention one,

but you can go around and check on Google

and find so many websites

that allow you to upload your pictures,

edit them very easily with more or less good results.

Another thing that we can do today

that was impossible back then, generate fake identities.

Here s a few examples, starting from 2014,

some of the preliminary work

on generated fake images of faces,

faces of people that do not exist.

They were low resolution, gray scale images back then,

but in just a few years,

if we look at what we got in 2021,

thanks to NVIDIA research,

now we can generate fake faces, images of fake people,

people that do not exist with very high resolution

and very, very realistic and this is quite impressive

and this was like two years ago.

If we think what we can do now, we can even do more.

As you know, as it was shown before,

if you go around the other room, you see some examples.

We can generate pictures with AI,

just starting from text prompts.

You describe a scene and you ll find online so many websites

and services that allow you to generate fake pictures,

pictures of scenery that do not exist,

but that depicts what you have described with a simple text

and it is quite impressive.

The thing is that if we use all of these tools

that are extremely powerful in a bad manner

with malicious intent, this may become very, very dangerous,

and not just for copyright issues,

but also because they may lead to promote disinformation,

spread fake news online, spread fake news on journals

and newspaper and this is something that definitely

we don t wanna happen.

A couple examples here, on your left,

the cover of the National Review

as it was published in 2012,

they showed this picture of Obama in front of a crowd

with people holding signs saying, Abortion.

The true picture is the other one,

where of course there is a crowd

in front of former President Obama,

there is people, people are holding signs,

but they say, Forward something completely different.

Another example, a picture published in 2010

by Halarem, politicians, world leaders

walking down a corridor, but the true picture, the real one,

the one that was shot by the photographer is the one on top.

The bottom part of the figure shows the picture published

by the journal where the order of the politicians,

the order of this leader was warped

in order to convey a different political message

with a different intent.

Luckily we know now, now we talk about that.

You can find different papers related to the fact

that sharing fake and generated images

may lead to social issue.

We just heard a a talk about how important it is

to regulate these sort of things,

but still this is not enough.

We know that fake news are spread.

We know that generated images are used without consent.

We know we may have copyright issues.

So the question is can we actually defend today

against fake imagery and generated images

or are we doomed to live in a world where

we cannot distinguish anymore what is reality

and what is generated imagery?

One of the answer to this question,

I m not saying this is the only answer to the question

is multimedia forensics.

Multimedia forensics is a field of study

whose goal is that to assess the integrity

and authenticity of digital content,

especially multimedia content.

What do I mean by that?

The role of a multimedia forensic expert,

of a multimedia forensic researcher is that to

develop techniques in order to allow investigators

to tell whether a photograph or an image has been modified,

to tell whether a photograph come from one specific device

or another one, to tell if a video

is a real video of a person talking or it is a deep fake.

To tell if an audio recording actually depicts the speech

of a real person or it is a synthetically generated speech,

for instance, that was forged

in order to impersonate somebody

and make that person say something that he never said.

How is that even possible? Well, let s focus on images.

Let s focus on photographs and pictures.

Most of you are passionate about photography

or professional photographers or work with photographer,

so you probably know this much better than I do,

but if you just use your smartphone

to take amateur photographs and share them on social media,

you may not be aware of that, when you shoot a picture,

so many things happen.

We start from rays, light rays, rays of light,

going through a lens or multiple lenses

and this shape the light somehow.

You may have an anti-alias filter in your camera

filtering some of the frequency information.

You may have a color filter array

and most likely you have a color filter array.

If you only have one sensor,

you wanna shoot color photographs.

You have the sensor itself who maps the light photons

into electrons, so there s lights to electricity conversion

and then you have a bunch of processing operation

that are done directly on board on the camera,

starting from white balancing most likely compression.

If now you re using a powerful smartphone,

you have some AI automatic photo editing

that is done directly on the smartphone

to announce the picture quality

and only after all these operations

you actually get the digital image.

The thing is that of course if I change the lens,

the picture change, I change the sensor, the picture change,

I change some processing algorithm, the picture change.

This means that each one of these operation

actually leaves some traces on the photographs

that we re shooting.

The role of the forensic analyst is then to take an image,

analyze it, look for those traces,

check if these traces are present,

characterize them in order to

at least reveal part of the history of that image,

try to understand which was the model

who showed the picture, try to understand

if the picture was shared on some social networks

and so on and so forth.

How can we use these in order to distinguish

what are real photographs from generated images then?

Because this is the point, how can I tell

given an image if it is a real photograph or not,

if it has been synthetically generated?

If I m working in a controlled scenario,

so like I am the photographer, I wanna protect my work,

I have access to the original picture

before sharing it,

I can rely on so-called active techniques.

In a way you can imagine that you can somehow

invisibly sign your data

so that if somebody has your picture,

then it can trace back the pictures to you,

because it contains somehow your signature.

In many other situations,

if we re just downloading images from the web,

if we are sent images to be analyzed,

maybe they were not watermarked at their origin,

they were not watermarked at inception time,

so we have to rely on positive techniques,

which means as forensic analysts, we have to try

revealing those forensic traces left

by the processing operation on the images

in order to understand what happened to the picture.

Let s see some examples.

Let s say I wanna answer to the question,

is this a real photograph?

And let s say that I can use active techniques.

One of these may be photo water marking.

At inception time, let s say I m the photographer,

I wanna protect my work, I can shoot the picture

and then I can embed a watermark,

embed a sort of invisible signature

that does not actually change to much

the visual quality of the image, cannot be easily removed,

it s not just some text layered on top of the picture,

but you can scramble all of the image pixel statistics

in order to insert this sort of

proprietary signature of yours

and then you can share the watermarked image.

An end user, if he wants to verify

whether the picture is yours or not,

so if that is a real photo, the photo of a person

who the end user trusts, they can extract the watermark

for the image and check if the watermark is there

and if it is there, which watermark is there

so that they can trace the image back to you.

What if you cannot watermark your images?

What if you want to check if this is a real photo,

but you re given any random photo?

You have to rely on passive techniques.

The idea here is to exploit source attribution methods.

What do I mean?

Well, let s start from a completely different example

that of bullets shot from a gun.

We knows from movies and from reality as well, I guess

that when you shoot bullets with a gun,

I mean you re not shooting bullets with a gun,

but anybody else shoots bullets with a gun

very far away from here, we re safe.

Any gun leaves, some sort of fingerprint, some marks,

some signature on the bullets

like those stripes that you can see here.

So you can connect the bullets to the gun

by looking at these fingerprints.

Truth is that you can do kind of the same with photographs.

Every time you shoot a photograph with one specific sensor,

well, sensors are imperfect.

They re hardware, they re mass produced,

they all comes with tiny little mistakes.

Each and every pixel on a sensor

reacts slightly different to light.

So each camera is introducing a noise pattern

in a photograph that is slightly different

than the noise pattern from many other camera in the world.

This means that if I have a camera,

I can shoot many photographs with that,

extract this noise pattern

and I obtain the fingerprint of the camera,

which is like the stripes on the bullet

from one specific gun, then I have another picture.

I download a picture from your Facebook profile.

I wanna detect if you took that picture from

that exact same camera model,

I can extract the noise from your picture

and compare the noise from the camera and the picture.

If they are coincident to some extent,

then I can track the picture to the device

that was used to shot it.

What if the answer is slightly different?

Rather if assessing if a photograph is a photograph,

it is real, what if we want to detect

if an image has been generated

and assess that it has been generated?

So we try solving the dual problem,

again, let s say that we are in a control scenario.

We can use an active technique,

which is the situation in which

I may be interested in doing that.

Let s say that I am producing a generative model,

I am selling generative model,

I am one of those websites allowing you

to generate pictures, but I don t want you

to generate those picture

and use them with malicious intent,

so I want those picture to

have the possibility of being detected as generated

and not being photographs, what can I do?

Is to insert the watermark

directly into my generation software.

I can do that so that all pictures generated by the software

contain the watermark.

Again, when I m talking about this watermark,

this signature, I m not thinking of something

that you can see with the naked eyes.

I m thinking about something that is embedded at pixel level

that you cannot see unless you run a software on the image.

So if these generated images,

generated with watermark models are shared on the web,

people can download them, extract the watermark

and check if the watermark

is that of one specific generator,

so those pictures can be attributed

to the specific generator.

Last but not the least, and this is actually

the most important scenario to my eyes,

what if we are in a completely in the wild

and uncontrolled condition?

What if we re given image to be verified as being generated,

but we have absolutely no control

in terms of signatures and watermarking.

The idea is that nowadays we can fight AI with AI.

Something that we are being developed

at Politecnico di Milano

but there are other excelling university in Italy

working on this topic from Sienna to Florence

to Turin to Paola to Trento

to Naples and so on and so forth.

What we re doing is developing techniques

that take an image and understand if the image

or part of it has been synthetically modified or generated.

One of the techniques, one of the latest one

we re proposing is you take the picture,

the picture is split into multiple blocks,

into multiple sub regions.

Each one of these sub regions is passed

through an neural network that act as a detector

and these attribute to each one of the patches

a score telling if the score is likely to be

a real photograph or something that has been manipulated

and then we can put everything back together

and obtain a sort of yellow bluish heat map

like the one here on the slide

showing in yellow, highlighting in yellow,

which is the part of the picture

that has likely been modified.

A few example to show you that this thing may actually work,

a picture here, some grass

and there is a animal like an armadillo

or something like that, top left.

So there s a closed up view.

We run this picture through the algorithm

that we re developing and this is what we get.

A clear yellow spot top left

that showed that something was happening there.

The truth is that this image was obtained

and starting from this original picture

that didn t contain the animal, was passed to DALL-E,

the website that now anybody can access

and the image was edited in order to add the animal there.

Another example, another picture,

sky with a plane passing this image to the detector,

this is where we obtained.

Clear yellowish, greenish, bright spot, top left,

something going on with that plane.

As a matter of fact, that picture was obtained again

with DALL-E, starting from this other picture

where the plane was not there.

So it was DALL-E putting the plane.

The plane was a generated part of the image

was not coming from the real photograph.

Another example, typical pictures of selfies

posted on some socials or sent through WhatsApp

or other applications,

passed through the algorithm,

we see that the face in the background is highlighted

and actually also this image was obtained at using DALL-E,

starting from the image with the two faces to your right,

so the background face was a woman face

and not that guy s face.

That was a fake one inserted by by AI.

Last example here, I m just covering the the kids face

because it s a son of a friend.

I didn t wanna disclosure it s identity.

This is a picture, my friends asked me to check

if anything was going on with the picture

and this is what we got.

A bright yellow spot around the neck

and the truth is that my friend was using

Photoshop generative filling to correct

the wrongly folded neck of the shirt.

So actually that part of the image was modified

and we were able to spot it.

Do I mean that we are safe and everything is solved?

Of course not. Of course not.

There is still much to do.

This is fundamental research.

These are preliminary results,

but the truth is that if we wanna go from research

to actual products on the market, there s still a lot to do.

Why so?

Because these techniques that I was showing you

typically lack generalization capability,

like they work under very specific and controlled condition.

They work if some hypothesis on the images

that are being analyzed are fulfilled,

but they do not work just on any kind of images

that you can download

and of course unless we can reach that point,

these detectors cannot be used reliably.

Reliability and trustworthiness.

Would you trust, would you rely on one of these tool

in case of court of laws

where you don t know for real what is gonna happen,

because I m using AI again as a detector,

probably not or not yet at least.

So there is research going on

in order to make these tools trusted and reliable

and the other key word for future research

is that of explainability.

I mean as long as we re using AI as a black box

for a detector and we don t know for real what is going on,

this is something that make us

not want to use this technique again in court of law,

so the ability of understanding for real

what is going on at the detector level

is something that has to be reached.

With this, I want to acknowledge all the people

working with me, Politecnico di Milano,

as being part of the imaging sound processing lab

of the Dipartimento Di Informazione

Elettronica E Bioingegneria

and also the sponsors who are sponsoring this research

from DARPA to AFRL, to the Italian Ministry

who s actually funding some project to work on that

and thank you very much.

[audience clapping]

Starring: Paolo Bestagini