Skip to main content

Real Tone Imagining A.I. for Communities of Color | PhotoVogue Festival 2023: What Makes Us Human? Image in the Age of A.I.

Racial bias has an extensive, well-documented history in both the medium of photography and field of technology. The explosion of A.I.’s popularity with tools like Chat GPT and DALL-E have also echoed the questions and concerns that scholars and activists have raised about how these biases might be accelerated in these new tools. Florian Koenigsberger is a photographer and technologist who leads Google’s efforts to make camera technologies that more accurately and authentically render people of color—especially with darker skin tones. During "Real Tone & Imagining A.I. for Communities of Color" he will share the deeply interdisciplinary and collaborative journey of building Real Tone on the Pixel Camera, and explore imaginative possibilities—and limitations—of how else A.I. could serve communities of color in the imaging space.

Released on 11/22/2023

Transcript

Lovely to be here with y all.

My name is Florian Koenigsberger.

I come to you as a photographer.

I lead what we refer to

as our Image Equity Initiative at Google.

And fundamentally, I am a deep, deep believer

in the power of images in our society.

And I m gonna tell you a little story about a technology

that we ve built called Real Tone

that serves what I like to refer to

as the world s underserved majority people of color.

And hopefully open and continue the discussion

we ve been having about ways

that we can imagine the potential of these tools

for the communities that I m from.

And I also want to express gratitude.

I think we have...

It s exceptional to have this many

interdisciplinary thinkers in one space.

I think the quantity is not what s important,

but the quality of conversation.

And thank you Alessia, for curating this.

So beginning with a story, the year is 2017.

I am just moving back to New York from some time in Brazil.

I lived in Sao Paulo.

And I will never forget the moment

I was looking for pictures of my mother,

who is a Black Jamaican woman in my phone.

And in fact, I was actively prompted

to revisit memories of her.

And when I click the notification,

this is the screen that popped up.

And what you ll see here is

some of these are in fact pictures of my mother.

Some of them are pictures of Black men

I went to high school with.

Some of them are pictures of pictures

that I saw at a show in Paris from decades before.

So there was a really wide range

of what it tolerably understood as who my mother was.

Which was an eye-opening moment for me

because it was the first time, I think that I questioned

the objectivity of the thing in my pocket

that we so often use, right?

Our phones in generating truth.

Pair this with the fact that when I returned from Brazil,

I went to a conference where I got to see Dr. Sarah Lewis

speak about her work as the guest editor

for Aperture s Vision and Justice issue.

This was seminal for me

because as somebody who had studied photography

and had taken a lot of photo history courses in college,

none of them had really exposed me

to the weight of the contribution

of African American image makers to the medium.

And for anybody who doesn t know this issue,

it has sold out countless times.

I think this should be fundamentally part

of every photo curriculum that exists.

I encourage you to check it out.

It also speaks to some of what I ll get into

about Frederick Douglass in a moment,

but the last piece of this sort of trifecta

was at this time Google, where I was already working,

but in a very different department,

was working with Annie Leibovitz

on developing the Pixel Four.

And for me it was an eye-opening moment

because I thought if we have the humility as a company

to invite somebody from the outside in

to speak to our engineering teams

about ways that we might change our camera,

what other things might we be able to do

if we think a little bit more expansively

about that kind of external collaboration?

So we re gonna make this a little interactive.

I would like you to raise your hand and keep it up.

If you know who this person is.

If you can read, you know who it is.

Okay, cool. It s mostly just to check that we re all here.

Keep em up. Keep em up.

How many people know that he was formally enslaved?

Keep your hands up.

How many people know that he was an abolitionist?

An orator? A statesman? Okay.

How many people know that Frederick Douglass

was the most photographed American man of the 19th century?

Okay, so usually where...

Yeah, see, Fred, still got his hand up, he s a historian.

Is usually where we lose people, right?

What Frederick Douglass understood in the 1800s...

You can put your hand down now, thank you, appreciate you.

Was the power of the image.

And he understood very presciently

that images inform who and what we care about

as a collective society.

And that if he could reimagine

the image of the African American in the public psyche,

it would do more for our progress

than any war or policy ever would.

Now, we ve heard the 1800s referred to several times

as a very long time ago.

I think it is unbelievable how true

that still rings today

given how far away from today s

image technologies we are when he said this.

And this is one of my favorite quotes

from a photography professor of mine in college,

this notion that we see most of the world

through images, right?

We will never go to all of the countries

and walk all of the streets and see all of the people.

So it is literally through images

that we shape our perception of most of the world.

And those images reinforce

what we believe to be true about the world.

And so if these images are shaping

what is and what is possible for us,

what does it mean when a large portion of the population

is not able to have a dignified experience

with the tools that make images?

So there s another piece of history

that s important to contextualize for this.

Are folks familiar with Shirley Cards?

Rough show of hands, heard of this before.

So Shirley Cards, or sort of the leading ladies

as they were sometimes referred to,

is this practice of using white women.

Kodak started this process in the 1950s

to evaluate the performance of film emulsion technologies.

I know we don t think about film emulsion as a technology

that way now given smartphone cameras,

but what happened as a result of this process, right,

only testing to see if the film worked well

in taking pictures of people

against white women is that of course,

whiteness becomes normalized

and the performance of any other tonal range is not included

and therefore, might fail.

And so there are lots of Black parents

who had the experience growing up

of having their kids go in for class photo day

and ending up with pictures like this

where detail is obscured,

maybe only see eyes and mouth.

And an interesting piece of history about this,

this only started to change

because chocolate manufacturers

and wood manufacturers had complaints

about the lack of nuance in their product photography.

I can t show the difference between my milk chocolate bar

and my dark chocolate bar,

and I m paying a lot of money for this film,

so I need this to work better for me, right?

So all of this sort of context

led to this question about,

what if as somebody sitting in the seat

of a major tech company

that produces tools that see people and make images,

we could work on trying to make a camera

that saw people more representatively and more inclusively?

So the Image Equity Initiative that I referred to earlier,

what is that?

That term Image Equity is something that we ve

sort of coined in this process.

I ve spent the last six years working on this.

And this is fundamentally rooted in

fighting historical bias in this medium

by bringing best-in-class camera and imagery tools

to the world with a focus on darker skin tones.

And I say this because historically,

this is where we ve seen the most issues.

Now, there is a science problem here,

as much as there is a curatorial problem,

if you will, right?

Obviously, darker objects absorb more light,

have less lumens that are bouncing off them.

And I don t want to not acknowledge that.

But what s also true is, as we hopefully know,

cameras and photographs are not subjective.

And there is a lot of work that goes into determining

what a phone camera can and cannot do.

As I m sure many of you

are aware with things like astral photography modes

or portrait modes, right?

We re not for lack of ideas and innovation

in what we can do with these tools.

Real Tone was born out of this cross product mission

that we ve now established at the company.

So what is Real Tone?

The first thing that happened when we said that question,

what if we could go try and go build the world s

most inclusive camera?

Was immediately recognizing that we did not possess

the internal in-house aesthetic expertise

to achieve that goal.

We have some of the brightest engineering minds in the world

who are extremely well resourced,

but we didn t have the people inside who could say to us,

That s too bright, that s too dark, that s too warm.

Here s how I want to be seen.

And so over the last six years of this project,

we ve invited in an extraordinary group of collaborators,

photographers, cinematographers, colorists, directors,

who we then sent into the field with our pre-release tools

and said, Go break this thing.

Work with your communities and help us understand

what s not working for you

in the experience that we deliver on our camera today.

And what I love about having worked with this group now

over several years is, it s an international group,

it is cross-disciplinary.

And we ve gotten some really interesting feedback

because of those intersections.

This set a new precedent

for the way that our engineering teams worked.

And I really wanna hone in on that

because if we believe that we re here

and that something can come of this time

that exists outside of these walls,

we have to believe in the change that s possible.

I have watched firsthand

people s whole worldview engineers

who lead auto exposure teams and white balance teams

change as a result of spending time

with these experts and understanding their concerns.

A classic example that I love to share

is in giving us some of this aesthetic feedback.

Deun Ivory, who s brilliant image maker

out of Houston, Texas,

was talking about how a lot of darker skinned people

were showing up looking ashy in our pictures.

Cultural audience, I m curious,

how many people know what ashy is?

Does that mean something to people in this room?

Okay, half.

It s a term used among Black folks

about skin looking dried out, right?

And we had an engineer who virtually raised his hand

cause this is during pandemic time still,

who said, You keep using a word

that I ve never heard before.

What is ashiness?

And in that moment, I remember having that quick step back

and going, Oh, there is so much cultural context

baked into this issue that we cannot assume.

And of course, this isn t something

I would expect somebody to learn in a PhD program

about computational photography, right?

That comes from the school of life.

And so this cultural learning

that happened through this process

was I would argue the biggest achievement of it.

So getting into what the product does? What it is?

In 2021, we launched Real Tone externally.

And we think about this as comprising a set

of computational photography improvements

that are driven by machine learning and AI

to make a more fair and representative experience

for people of color with our smartphone camera.

Another thing that the experts contributed to this,

because we know that data

as has been emphasized in several talks,

is so important to this issue, right?

What data informs the models and tools that we build

is beyond just giving us their aesthetic feedback,

they helped us grow the data that tunes our camera, right?

That is used to influence the color profiles of our camera

by tens of thousands of images

of people who are all paid

for their time to contribute to this mission.

And I think that s also been a really

important structural change

because that gives us runway to keep iterating

on this technology with each device that comes out.

So I m gonna get a little nerdy.

I just wanna show you a couple of components

of the technology to make this tangible,

cause people always say,

Okay, so what is this thing actually?

And if we can go to the loops for this,

that would be lovely,

just so folks can keep seeing that change for a second.

I ll talk while we look at this.

So computational photography is the term that we use

basically to talk about smartphone photography.

Some of you will be familiar with things like HDR Plus,

or portrait mode in your camera, right?

When you take a picture in portrait mode,

it s not actually widening the aperture

so that you re getting a bokeh effect

from a lower aperture number.

It is taking a mask of the subject

that it believes to be the person in focus

or the people in focus

and then it s layering in a blur effect around that mask,

which is why some of you,

I m sure have had the experience of seeing

your ears cut off, or flyaway hairs cut off,

and you get the chia-pet effect.

That s because this is software

that s creating that experience, right?

Face detection is a core, core, core component

of taking pictures of people.

Different than facial recognition, right?

Face detection says,

There is a face in this picture, and I know it s a face.

Facial recognition says,

This is Alessia s face.

I know that face and I can identify it

as hers in particular, right?

So face detection...

We ve seen a lot of cameras struggle

with face detection of darker skin faces,

especially in more complex lighting scenarios, right?

So here you have a strongly backlit screen

in the first image,

and it s a little tough

when you don t have a sort of LED screen,

but you can appreciate the difference, right?

You go from not understanding

that there are four people in this image

to understanding that there are four people in this image,

which then allows you to do a whole bunch

of really interesting stuff, right?

So as an example, if we go to the next loop.

Auto exposure.

This is an example of we were in Houston, Texas

working on some data collection for the project,

again with Deun, the photographer I mentioned earlier.

And we had a really, really,

really special experience understanding...

If you don t see the face

and then cannot make adjustments

to the brightness of that face, people get lost, right?

And I remember the conversation we had with...

Salma is the name of this model,

who was talking to us about her experience modeling

and how often she felt like she disappeared into pictures,

but that because she was often on sets

with people who didn t have her lived experience,

she also felt like she couldn t comment on it.

And so you can appreciate how dramatic

the transformation gets

once some of these underlying

technological pieces are addressed.

And then if we go to the next one,

talking about auto white balance, right?

So this is where we get into color.

A lot of the expert s feedback

helped us get to more nuanced

human looking representations of color,

especially in skin tones.

So things like not making someone too warm

or too cool, right?

Respecting the undertone that somebody has

and making sure that that s reflected in an image.

Getting into the AI piece a little bit.

Another example of how this works is,

we have something called a skin tone classifier, right?

That helps us bucket what skin tones

do we observe in an image?

And then what are the right decisions

we wanna make accordingly,

when we re processing that image?

Our teams have built something called

the Monk Skin Tone Scale

which is a more representative scale

featuring 10 different skin tone buckets

to better and more accurately characterize

the variation we see in skin tones.

And then using that skin tone classifier

are able to better understand what appears in an image

so that even as lighting conditions change, right?

Say I am under very yellow light,

or I might be outside and backlit,

or I might be indoors in shade, right?

We have a feature called frequent faces

where if you turn that on

in conjunction with the skin tone classifier,

it gets better at understanding what you look like

in those different scenarios

based on how often a face

is showing up in those images, right?

So again, really distinct results

that come from the technology.

And what s cool is,

we ve now seen some really affirming results

externally about the performance of our camera

compared to insert your favorite

competing phone to Pixel here, right?

So just to share a little bit about this, right?

These are three images I ve made on the device.

I come from a mixed-race family.

My father is a white German man.

My mother s a Black Jamaican woman.

There have been a lot of struggles

to get the whole family in the picture

and have everybody look right.

This is an image that I made of my parents

celebrating their anniversary

in a low-lit candle setting.

And it s so special to me

to have an experience where I can see them as they are

and as they see themselves

without having to go through a bunch of editing

and post capture, what have you.

This is us working on the project

in Abuja, Nigeria earlier this year.

And you see the richness of the tonal range

that comes out, right?

On the right, my fiance and two dear friends of ours.

Again, seeing the range of colors come through

with that pop and natural authenticity

that everybody should be able to have

with a smartphone camera.

But there were also structural improvements

that came from this.

Beyond being a differentiable product pillar, right?

Something that we can say to people

our phone has that other phones might not,

this has changed the visual language

of a multimillion-dollar company

when it comes to the way that we advertise our products.

This is an example of a billboard ad

that ran for Pixel in 2021 when we launched Real Tone.

In the subsequent years,

I have never seen so many people of color

in our advertising as a reflection

of what the technology in the phone itself achieves, right?

We also see the kind of work that s been produced,

and most importantly,

the creatives who have been paid

as a result of this initiative.

Without getting too deep into the members,

I can say in the last five years of working on this,

we have put millions of dollars

in the hands of Black and brown creatives

who have contributed to building the technology,

photographing with the technology, telling stories.

Some of you heard the creator labs talk yesterday

with Sebastian and Alessia and team.

That to me is the impact that goes beyond the technology,

that s a really important part of this story.

But also, this is personally a really cool moment for me

because we pitched this project internally

using Black Panther as a proof point.

And so then years later

to see that the phone was used to photograph

the cast of Wakanda Forever was like,

icing on the cake for me.

But going to the industry scale piece of this,

we re also seeing...

So folks familiar with DXOMARK?

Does that mean anything to anyone?

My colleague? Wonderful.

It s a French company that does a lot of,

sort of objective, standardized scoring

on camera phone performance.

They, for the first time in response to this project,

are now starting to build in measurement

around skin tone accuracy.

So independent of what we do in this space,

which was always my goal with this,

you now have a third party that is holding to account

all of the OEMs in this space

who are responsible for seeing people with their tools.

So, what?

The what here is to please buy a Pixel.

Thank you so much for your time tonight.

No, that s not the end. Guys, come on.

You think she would let me sell like that up here? Come on.

As we look at this new frontier of AI,

Generative AI, Synthetic Imagery,

I think there s more than just a product story

that we can learn from here.

And if we go back to Frederick Douglass

and back to what pictures can achieve societally for us,

and this was echoed in a number of the talks today,

I wonder what we can imagine through

some of these synthetic tools

that could be new possibilities for communities of color

who are historically underserved.

So I asked some people in my life

just to get a sampling before I came here.

Like, I say AI, I say Gen AI, what does that mean to you?

And I got like three rough buckets of answers.

Weird pictures of animals,

driverless cars and humanoid robots.

And all of these things are true and in motion,

and they will undoubtedly change our lives

or already have in some ways.

But we also have to ask the question,

who is not being brought along for this revolution,

if we look at the history of Kodak

and treat this as another frontier in the medium?

In my communities,

some of the things that we think about

are the research of someone like Professor Ted Kim at Yale

who here shows the leader ladies

that I refer to are Shirley Cards back in the 70s and 80s.

And a gallery example of what happens if you look for skin

in something called an Arnold Renderer,

which is a graphics imaging tool.

Again, we see that whiteness

is fundamentally normalized, right?

What are the costs of that for communities

that don t get to self determine?

Or I think about the story of Robert Williams,

who in 2020 was arrested in front of his young children

at his home in Detroit, Michigan

because a facial recognition tool

wrongly asserted that he had stole

tens of thousands of dollars in watches, right?

And was actually taken handcuffed, taken to a precinct,

and then they realized that they were wrong.

Or recent research by Dr. Jie Zhang, right?

Showing that there s a disproportionate risk

to people with darker skin tones or children

of being hit by driverless cars.

And as we have seen Echo today,

demanding more regulation in this space.

I don t want to be doom and gloom about this.

I am a believer in possibility.

And I wonder speaking now from the perspective

of an artist and an image maker,

and not as somebody working for a tech company

and so far as those are separable.

What could happen if we believe deeply

in the possibility of this new frontier of images?

I don t have answers,

in the spirit of vulnerability and exploration,

which I think have been theme so far.

I just wanted to share some early provocations

and things that I played around with in some of these tools.

For example,

how might we reimagine geopolitical justice

for indigenous peoples.

Hot topic around the world right now.

If we could actually see what that reality looked like.

So this is an example of a synthetic image that I made

looking at the Lenape people,

who are the Native Indigenous peoples

to the island of Manhattan where I was born.

They called it Mannahatta.

We have that word from their language.

And what it would be like to see

the descendants of those people owning a brownstone

in the west village of New York City, right?

All of this land that we now sit on and occupy,

if those lineages had been preserved, right?

Or if we look at here on the...

Your left, my right.

You re seeing an image of the Louisiana State Penitentiary

often referred to as Angola,

which is one of the greatest sites of carceral violence

that exists in the world

and in the United States, of course.

How might we think differently about carceral structures

and spatial design?

And we say the justice system, in this case,

the injustice system often,

if we could actually see

the possibility of rebuilding those spaces, right?

This is a taxpayer funded hell on earth, right?

How might we think differently about this?

And it s fun to then start to play around,

how do we inject the vision

of some of the creators of our time, right?

Part of the reference images

that I gave to land on this image

are from famed architect, Sir David Adjaye, right?

Who is a Black architect

who is thinking a lot about what Black space

looks like going forward.

Or as a last example,

how might we impact voter turnout

for communities who have never meaningfully seen themselves

represented in government?

And in the spirit of nuance and complexity,

this is an image actually of our Congress,

shortly after being sworn in in the United States.

This is an image that I was just playing around with going,

Okay, say it s 2050. We played with the demographics.

Could I create a Congress that is entirely of color?

And because of the limitations of the reference images

that are available to us today of what Congress looks like

and the kinds of prompts

that you can give some of these tools,

I probably spent an hour and a half

just trying to get to one image that showed everyone to be

not white as a provocation, right?

Not that that will ever demographically be true,

and I couldn t get there, right?

So it s an interesting comment on

where we are with these tools today.

But, again, hopefully some thoughts

that we will now continue into our panel discussion.

But I thank everybody for their time

and if you wanna be in touch personally after this,

please feel free to reach out. [crowd clapping]

Thank you.

Starring: Florian Koenigsberger