Multimedia forensics when digital photography meets artificial intelligence | PhotoVogue Festival 2023: What Makes Us Human? Image in the Age of A.I.
Released on 11/22/2023
[audience clapping]
Welcome everybody. Good afternoon everybody.
Today I m here to talk to you about multimedia forensics
in digital photography in an era of artificial intelligence.
I start by saying that faking photos
and faking pictures is nothing new.
Here is an example.
We are in 1860 and this is Abraham Lincoln
who got his head swapped on the body of another politician.
Around the same years, 1861, 1865,
another picture during the American Civil War,
the picture on your right, was actually a composition
of three different photographs.
The one on your left, we have a background, a horse,
and this General Ulysses Grant
who got this picture as a composition of other photographs
and this is, I mean, quite a few years ago.
So this is nothing new.
The thing is that today it is much easier.
It s much easier to generate picture, to fake picture,
to tamper with images and imagery of different kinds.
One of the reason because of this easiness to do that
is that, of course you all know we have
so many user-friendly and powerful software.
Softwares, Photoshop from Adobe, just to mention one,
but you can go around and check on Google
and find so many websites
that allow you to upload your pictures,
edit them very easily with more or less good results.
Another thing that we can do today
that was impossible back then, generate fake identities.
Here s a few examples, starting from 2014,
some of the preliminary work
on generated fake images of faces,
faces of people that do not exist.
They were low resolution, gray scale images back then,
but in just a few years,
if we look at what we got in 2021,
thanks to NVIDIA research,
now we can generate fake faces, images of fake people,
people that do not exist with very high resolution
and very, very realistic and this is quite impressive
and this was like two years ago.
If we think what we can do now, we can even do more.
As you know, as it was shown before,
if you go around the other room, you see some examples.
We can generate pictures with AI,
just starting from text prompts.
You describe a scene and you ll find online so many websites
and services that allow you to generate fake pictures,
pictures of scenery that do not exist,
but that depicts what you have described with a simple text
and it is quite impressive.
The thing is that if we use all of these tools
that are extremely powerful in a bad manner
with malicious intent, this may become very, very dangerous,
and not just for copyright issues,
but also because they may lead to promote disinformation,
spread fake news online, spread fake news on journals
and newspaper and this is something that definitely
we don t wanna happen.
A couple examples here, on your left,
the cover of the National Review
as it was published in 2012,
they showed this picture of Obama in front of a crowd
with people holding signs saying, Abortion.
The true picture is the other one,
where of course there is a crowd
in front of former President Obama,
there is people, people are holding signs,
but they say, Forward something completely different.
Another example, a picture published in 2010
by Halarem, politicians, world leaders
walking down a corridor, but the true picture, the real one,
the one that was shot by the photographer is the one on top.
The bottom part of the figure shows the picture published
by the journal where the order of the politicians,
the order of this leader was warped
in order to convey a different political message
with a different intent.
Luckily we know now, now we talk about that.
You can find different papers related to the fact
that sharing fake and generated images
may lead to social issue.
We just heard a a talk about how important it is
to regulate these sort of things,
but still this is not enough.
We know that fake news are spread.
We know that generated images are used without consent.
We know we may have copyright issues.
So the question is can we actually defend today
against fake imagery and generated images
or are we doomed to live in a world where
we cannot distinguish anymore what is reality
and what is generated imagery?
One of the answer to this question,
I m not saying this is the only answer to the question
is multimedia forensics.
Multimedia forensics is a field of study
whose goal is that to assess the integrity
and authenticity of digital content,
especially multimedia content.
What do I mean by that?
The role of a multimedia forensic expert,
of a multimedia forensic researcher is that to
develop techniques in order to allow investigators
to tell whether a photograph or an image has been modified,
to tell whether a photograph come from one specific device
or another one, to tell if a video
is a real video of a person talking or it is a deep fake.
To tell if an audio recording actually depicts the speech
of a real person or it is a synthetically generated speech,
for instance, that was forged
in order to impersonate somebody
and make that person say something that he never said.
How is that even possible? Well, let s focus on images.
Let s focus on photographs and pictures.
Most of you are passionate about photography
or professional photographers or work with photographer,
so you probably know this much better than I do,
but if you just use your smartphone
to take amateur photographs and share them on social media,
you may not be aware of that, when you shoot a picture,
so many things happen.
We start from rays, light rays, rays of light,
going through a lens or multiple lenses
and this shape the light somehow.
You may have an anti-alias filter in your camera
filtering some of the frequency information.
You may have a color filter array
and most likely you have a color filter array.
If you only have one sensor,
you wanna shoot color photographs.
You have the sensor itself who maps the light photons
into electrons, so there s lights to electricity conversion
and then you have a bunch of processing operation
that are done directly on board on the camera,
starting from white balancing most likely compression.
If now you re using a powerful smartphone,
you have some AI automatic photo editing
that is done directly on the smartphone
to announce the picture quality
and only after all these operations
you actually get the digital image.
The thing is that of course if I change the lens,
the picture change, I change the sensor, the picture change,
I change some processing algorithm, the picture change.
This means that each one of these operation
actually leaves some traces on the photographs
that we re shooting.
The role of the forensic analyst is then to take an image,
analyze it, look for those traces,
check if these traces are present,
characterize them in order to
at least reveal part of the history of that image,
try to understand which was the model
who showed the picture, try to understand
if the picture was shared on some social networks
and so on and so forth.
How can we use these in order to distinguish
what are real photographs from generated images then?
Because this is the point, how can I tell
given an image if it is a real photograph or not,
if it has been synthetically generated?
If I m working in a controlled scenario,
so like I am the photographer, I wanna protect my work,
I have access to the original picture
before sharing it,
I can rely on so-called active techniques.
In a way you can imagine that you can somehow
invisibly sign your data
so that if somebody has your picture,
then it can trace back the pictures to you,
because it contains somehow your signature.
In many other situations,
if we re just downloading images from the web,
if we are sent images to be analyzed,
maybe they were not watermarked at their origin,
they were not watermarked at inception time,
so we have to rely on positive techniques,
which means as forensic analysts, we have to try
revealing those forensic traces left
by the processing operation on the images
in order to understand what happened to the picture.
Let s see some examples.
Let s say I wanna answer to the question,
is this a real photograph?
And let s say that I can use active techniques.
One of these may be photo water marking.
At inception time, let s say I m the photographer,
I wanna protect my work, I can shoot the picture
and then I can embed a watermark,
embed a sort of invisible signature
that does not actually change to much
the visual quality of the image, cannot be easily removed,
it s not just some text layered on top of the picture,
but you can scramble all of the image pixel statistics
in order to insert this sort of
proprietary signature of yours
and then you can share the watermarked image.
An end user, if he wants to verify
whether the picture is yours or not,
so if that is a real photo, the photo of a person
who the end user trusts, they can extract the watermark
for the image and check if the watermark is there
and if it is there, which watermark is there
so that they can trace the image back to you.
What if you cannot watermark your images?
What if you want to check if this is a real photo,
but you re given any random photo?
You have to rely on passive techniques.
The idea here is to exploit source attribution methods.
What do I mean?
Well, let s start from a completely different example
that of bullets shot from a gun.
We knows from movies and from reality as well, I guess
that when you shoot bullets with a gun,
I mean you re not shooting bullets with a gun,
but anybody else shoots bullets with a gun
very far away from here, we re safe.
Any gun leaves, some sort of fingerprint, some marks,
some signature on the bullets
like those stripes that you can see here.
So you can connect the bullets to the gun
by looking at these fingerprints.
Truth is that you can do kind of the same with photographs.
Every time you shoot a photograph with one specific sensor,
well, sensors are imperfect.
They re hardware, they re mass produced,
they all comes with tiny little mistakes.
Each and every pixel on a sensor
reacts slightly different to light.
So each camera is introducing a noise pattern
in a photograph that is slightly different
than the noise pattern from many other camera in the world.
This means that if I have a camera,
I can shoot many photographs with that,
extract this noise pattern
and I obtain the fingerprint of the camera,
which is like the stripes on the bullet
from one specific gun, then I have another picture.
I download a picture from your Facebook profile.
I wanna detect if you took that picture from
that exact same camera model,
I can extract the noise from your picture
and compare the noise from the camera and the picture.
If they are coincident to some extent,
then I can track the picture to the device
that was used to shot it.
What if the answer is slightly different?
Rather if assessing if a photograph is a photograph,
it is real, what if we want to detect
if an image has been generated
and assess that it has been generated?
So we try solving the dual problem,
again, let s say that we are in a control scenario.
We can use an active technique,
which is the situation in which
I may be interested in doing that.
Let s say that I am producing a generative model,
I am selling generative model,
I am one of those websites allowing you
to generate pictures, but I don t want you
to generate those picture
and use them with malicious intent,
so I want those picture to
have the possibility of being detected as generated
and not being photographs, what can I do?
Is to insert the watermark
directly into my generation software.
I can do that so that all pictures generated by the software
contain the watermark.
Again, when I m talking about this watermark,
this signature, I m not thinking of something
that you can see with the naked eyes.
I m thinking about something that is embedded at pixel level
that you cannot see unless you run a software on the image.
So if these generated images,
generated with watermark models are shared on the web,
people can download them, extract the watermark
and check if the watermark
is that of one specific generator,
so those pictures can be attributed
to the specific generator.
Last but not the least, and this is actually
the most important scenario to my eyes,
what if we are in a completely in the wild
and uncontrolled condition?
What if we re given image to be verified as being generated,
but we have absolutely no control
in terms of signatures and watermarking.
The idea is that nowadays we can fight AI with AI.
Something that we are being developed
at Politecnico di Milano
but there are other excelling university in Italy
working on this topic from Sienna to Florence
to Turin to Paola to Trento
to Naples and so on and so forth.
What we re doing is developing techniques
that take an image and understand if the image
or part of it has been synthetically modified or generated.
One of the techniques, one of the latest one
we re proposing is you take the picture,
the picture is split into multiple blocks,
into multiple sub regions.
Each one of these sub regions is passed
through an neural network that act as a detector
and these attribute to each one of the patches
a score telling if the score is likely to be
a real photograph or something that has been manipulated
and then we can put everything back together
and obtain a sort of yellow bluish heat map
like the one here on the slide
showing in yellow, highlighting in yellow,
which is the part of the picture
that has likely been modified.
A few example to show you that this thing may actually work,
a picture here, some grass
and there is a animal like an armadillo
or something like that, top left.
So there s a closed up view.
We run this picture through the algorithm
that we re developing and this is what we get.
A clear yellow spot top left
that showed that something was happening there.
The truth is that this image was obtained
and starting from this original picture
that didn t contain the animal, was passed to DALL-E,
the website that now anybody can access
and the image was edited in order to add the animal there.
Another example, another picture,
sky with a plane passing this image to the detector,
this is where we obtained.
Clear yellowish, greenish, bright spot, top left,
something going on with that plane.
As a matter of fact, that picture was obtained again
with DALL-E, starting from this other picture
where the plane was not there.
So it was DALL-E putting the plane.
The plane was a generated part of the image
was not coming from the real photograph.
Another example, typical pictures of selfies
posted on some socials or sent through WhatsApp
or other applications,
passed through the algorithm,
we see that the face in the background is highlighted
and actually also this image was obtained at using DALL-E,
starting from the image with the two faces to your right,
so the background face was a woman face
and not that guy s face.
That was a fake one inserted by by AI.
Last example here, I m just covering the the kids face
because it s a son of a friend.
I didn t wanna disclosure it s identity.
This is a picture, my friends asked me to check
if anything was going on with the picture
and this is what we got.
A bright yellow spot around the neck
and the truth is that my friend was using
Photoshop generative filling to correct
the wrongly folded neck of the shirt.
So actually that part of the image was modified
and we were able to spot it.
Do I mean that we are safe and everything is solved?
Of course not. Of course not.
There is still much to do.
This is fundamental research.
These are preliminary results,
but the truth is that if we wanna go from research
to actual products on the market, there s still a lot to do.
Why so?
Because these techniques that I was showing you
typically lack generalization capability,
like they work under very specific and controlled condition.
They work if some hypothesis on the images
that are being analyzed are fulfilled,
but they do not work just on any kind of images
that you can download
and of course unless we can reach that point,
these detectors cannot be used reliably.
Reliability and trustworthiness.
Would you trust, would you rely on one of these tool
in case of court of laws
where you don t know for real what is gonna happen,
because I m using AI again as a detector,
probably not or not yet at least.
So there is research going on
in order to make these tools trusted and reliable
and the other key word for future research
is that of explainability.
I mean as long as we re using AI as a black box
for a detector and we don t know for real what is going on,
this is something that make us
not want to use this technique again in court of law,
so the ability of understanding for real
what is going on at the detector level
is something that has to be reached.
With this, I want to acknowledge all the people
working with me, Politecnico di Milano,
as being part of the imaging sound processing lab
of the Dipartimento Di Informazione
Elettronica E Bioingegneria
and also the sponsors who are sponsoring this research
from DARPA to AFRL, to the Italian Ministry
who s actually funding some project to work on that
and thank you very much.
[audience clapping]
Starring: Paolo Bestagini
There Are 15 Designer Debuts This Season. The Big Reshuffle Is Here to Help You Make Sense of Them All
15-Pound Chanel Bodysuits, Stompy Gucci Boots, Schiaparelli Corsetry: A Peek Inside Dua Lipa’s Tour Wardrobe
Inside Chloë Grace Moretz and Kate Harrison’s Final Wedding Dress Fittings at the Louis Vuitton Atelier in Paris
How Aespa’s Giselle Perfects No-Makeup Makeup
On the Podcast: Paul Mescal and Josh O’Connor on Adele, Fan Letters, and Learning to Pose on the Red Carpet
Inside Alex Consani’s Birkin Bag: A Journal for Manifesting, a Top Model Video Game, and the Keys to Brat City
Kendall Jenner and Gigi Hadid Share Secrets, Search for Cowboys, and Get Real About Their Friendship
Go Behind the Scenes of the Gothic Venetian Gala in Wednesday Season 2, Part 2
Margherita Missoni’s Favorite Spots in Milan
Exclusive: Emilio Pucci’s Pilot Episode Premiere