Text to Image Prompt: The technology and science behind A.I. and what it could mean for image makers? | PhotoVogue Festival 2023: What Makes Us Human? Image in the Age of A.I.
Released on 11/22/2023
Hey, everyone.
Lovely to be here today,
and thank you for letting me, Alessia, be part
of the festival and kick off.
I m going to start just to give a bit of an overview,
actually, of the technologies that drive AI
and how they apply to image making.
I m actually gonna start just a little bit of a background
on myself.
I was gonna play this video,
but it s something I think you should all watch after.
And actually, I didn t realize when I started
that Fred is in this video,
but it s a Today show talk about the launch
of Photoshop in 1990 and the application that Photoshop
had as a revolution in digital technology and image making.
Why that applies to me, cause that s where I started.
Kind of gives away my age a little bit.
But at university and as part of the work that I did,
both as learning photography and digital publishing,
Photoshop 1 was one of the first tools
that we started using.
And from that, my career in the technology world
has kind of grown from there.
One of my first jobs was at Lonely Planet Images,
back in early 2000, building the first stock image library,
and integrating, actually, all the slides
and converting them into the repository,
and created a search and eComm platform
where you could buy and license those images.
When I moved to Europe;
you can tell by my accent that I m not from here,
I m Australian, one of the projects
and companies I worked with was a company called Photobox.
It is a gifting and photographic company
where you basically create custom gifts,
photo books, mugs, et cetera.
What was so fascinating about that job
was that it was the largest repository of images
outside Google and Facebook at the time.
And every single product that we made,
which was manufactured through our factories,
was an individual product based on images.
And then, for the last four years,
five years, actually, I ve been working at Conde Nast
in a variety of roles on the technology side,
but basically re-imagining, you know, our platforms,
experiences and livestream.
But on the flip side, I started as an artist.
I actually trained as a photographer.
When I moved to the UK, I actually paused to raise a family
and build a life in Europe,
and this year reestablish my work as a photographer
and published my first photo book.
So, what we re gonna just talk about today
is actually the technologies behind AI,
in a very simplistic, you know, context,
I m not an engineer, so I m not a programmer,
I don t come from a computer science background.
I come from a creative and storytelling background,
but I do work with these technologies every day
and with engineers, programmers,
machine learning specialists,
to kinda create these technologies.
But AI has been around for a really long time, right?
The work and the transformation and the exponential stuff
that s happening at the moment
is really around generative AI.
But anytime you ve been using your phone, a camera,
you are using some form of AI technology.
Because ultimately, the digital images
that we have today are basically made up of numbers
and mathematics; ones and zeros, right?
Pixels; no more, no less.
And that s when it started, you know, back in the seventies
with the beginnings of the translation of film
into digital mathematics and computer language.
So, what does it mean now, right?
Through that timing, I mean, like, I ve been around
doing technology for 30 years,
but the rate of change is exponential.
I thought it was fast when I started,
but now I can t even keep up.
There are technologies that have been launched
on a daily basis that completely supersede
what was happening the week before.
So, let s look at some AI building blocks.
It s actually quite simple;
I probably shouldn t say that out loud,
but it kinda really is.
Because basically, AI works off understanding the language
and the image and the taxonomy of the thing
that it is looking at.
So, if we think about that in technology terms,
you ll hear words like metadata, or taxonomy,
or information architecture,
but it s basically creating context
around the objects that it sees.
AI can t work without that knowledge, right?
It needs to understand the inputs,
and then obviously whatever goes in
is a manifestation of what comes out.
So, if the inputs that go in that are biased,
then unsurprisingly, often the results
that you get are biased.
And within that, AI covers a broad spectrum
of computer science, and we re only gonna touch
on a really small part of that today.
So, what is the biggest part of AI
in terms of a learning mechanism
is the large language model, right?
It s the data that you feed it, by which it translates
and brings back an image.
And in that context, and over probably the last 20 years,
from the context of visual technologies,
we ve gone through a very simple transformation.
The first one, which is around the digital camera,
was actually to make an image, right?
To take those transparencies or that film
and convert it into pixels.
From there, we ve been learning how to see a picture, right?
To understand that kind of context and add metadata
or information around it.
That is often very keyword centric:
here is a tree, there s a cat.
And if you watch any of the early videos,
it was all about understanding what is a cat,
what is a tree, what is a car.
From that, we ve learned how to describe a picture,
and then ultimately add meaning to it.
And that s where, like, the last 10 years has really focused
around not only just saying this is some information
about a cat, it s what that cat doing, is it lying on a bed,
and transcribing an element of knowledge and context to it.
The transformation in the last 12 months
is the generative side, where the algorithms are now working
within themselves and adding in technologies
and almost talking to each other
to create these generative images.
And then, in real time
and, like, literally in the last week, we can start
to do that in real time.
So, where technologies may have taken years or decades
to kind of absorb and learn all this information,
we are now seeing a rate of change on a daily basis,
to the point where you have AIs talking to Ais,
to create the prompts, to create the images in real time.
So, I m just gonna walk through
very simply how it works, right?
And how these words, context keywords
ultimately create meaning.
This is a really nice article actually
from the Financial Times in the UK
that was a visual narrative about explaining,
in very simple terms, how AI learns from information
to translate it into context.
Because basically, it has to translate words.
So, how does it do that?
We go to work by train.
From that, it basically breaks up those words into subsets
of other words, right?
And they become tokens.
So, everything is basically broken down
into each of its parts, and then rebuilt over time.
By doing that and adding thousands of articles,
thousands of contexts,
thousands of ways of expressing train,
it will look for patterns.
And those patterns are then what it translates back
when you put into ChatGPT, write me an article,
create me an image,
it is basically taking the sum of all those patterns
to create an image.
It then looks for things that are negative, positive,
like for like, to start to create context.
And within that, it starts to look at generating text
that is similar to what it understood in the beginning,
but it can only understand the inputs that you give it,
which is really important when you get to image making
or storytelling, because it s always looking at the history
or the context of the past, not the context of the future.
So, as we work in an engineering context,
by taking these millions and then billions of outputs,
we can simplify it back into patterns and logic,
and the most likelihood of something being consistent
to something else, translating it back into meaning.
And from that, as it learns, the more information it has,
the more it can understand context and consistency.
It can come back with a sentence,
or it can come back with a narrative.
All AI is built on these large language models or data sets.
So, if you were listening to Reffit the other day,
he takes one data set and another
and puts it together to create a narrative;
weather data, location data, image data
to translate into a new story or a new visual narrative.
But ultimately, all AI is built on the foundation
of a language model.
So, therefore, what is vision to a machine?
So, a lot of the work is text to prompt.
We now have image to image, video to video.
So, how is it doing that?
Actually, it is basically the same thing.
It s classification.
When we start to input images,
and in the early days and why it took 10 years
to kind of get this exponential growth,
is that people were manually keywording the metadata
for every image that it had, right?
That is how the model learnt.
So, you take any image, it can identify who it is,
where it is, the location,
it can take information from the JPEG of your photo,
where was it taken, and location data.
And that becomes everything that the machine
and the learning model uses to create images.
So, from this, we can then create new stories.
This is actually one of my favorite pieces of art
by a creative technologist called Dries Depoorter,
who took two very simple data sets:
Instagram images
and the locations that they were tagged;
and then open-source open camera data,
to find the source of that image
and then match them, right?
So, all these kind of connections
that occur are based on taking two data sets
and finding the similarities between the language models,
and then translating it into a visual story.
But it is fascinating in the world today,
given so many, you know, surveillance cameras,
every one of us posting on Facebook, Instagram, TikTok,
is that the machines can know everything about you
in an instant; where you are, who you re hanging out with,
the images that you take, and the stories that you tell.
So, what does this mean in the context of curation?
We re gonna talk about two types of AI now:
curation AI and recommendations,
and then the generative side.
Cause, ultimately, as all this information goes in,
it s looking for those patterns,
and all it will bring out in the beginning is a median
or an average, right?
The most likelihood that you will buy this thing,
the most likelihood that you will like watching this video
is based on the median or the statistical average
of this data.
And in that context, as I said before, all AI is created
in a historical context.
So, while it can predict some elements of the future,
it is always looking to the past.
It s a little bit small.
So, I actually also asked ChatGPT, what is AI?
And if I pull some things out, it s basically the role,
sorry, to discuss the role of algorithms and storytelling.
Basically, the role is to organize,
analyze, and interpret data.
We find patterns.
Algorithms are used to personalize stories
for individual users.
We recommend content based on our history,
and we analyze past viewing behavior to suggest new content
based on what we know about you.
And from that, we can help and support creativity,
but we also risk biases.
So, these machine learning models are constantly learning,
like, all day, every day.
The more inputs that we have, the more tests we do
with ChatGPT, it is learning, it is improving,
and it is creating new patterns
based on, you know, how people are responding
to the technology and the images that they like,
create, and share.
So, very simply, two elements; curation AI.
This is the very simple part of anything
that you would use on your phone today.
Whether it s your Spotify, your Instagram,
your discover page will tell a lot about you as a person.
You can tell for a while I was very interested
in Pedro Pascal.
But in terms of how they work,
it basically takes your listening behavior,
your reading behavior, what you look at on the internet,
and then serves you similar content.
That is why we often see,
in these kind of social media platforms,
a lot of repetition.
Because basically, if we re going for engagement and views,
we look for, as a publisher context, things that perform
and actually replicate that.
So, let s flip now to the creation side
and what s happened in the last 12 months.
So, as we build out these models,
or not we, but the technologists around the world today,
the next phase that is occurring
is very much about taking these vision technologies
and adding meaning and context, right?
So, where we may have started
with a pattern of I understand the keyword,
it is very logical.
The world that we are in today is a little bit more blurry
in terms of what is real, what is not, and what is meaning.
But the challenge with this is actually around the input,
because words are very literal,
and I think sometimes we forget that,
because prompt engineering and the languages that we use
to create these images are actually also very keyword-based,
in the way that they are written to generate an image.
And yet, images are the reverse, right?
There is so much more meaning around the context
of love that you can t quite explain
to a large language model, or maybe we can.
Part of the challenge is, I think, at the moment
for computer science is to solve this problem,
is to take words that are literal in translation
and create multiple meanings for very simple concepts.
Because, ultimately, life exists within the context
of this language, whether it s verbal, visual,
cultural, or over time.
And it really impacts, you know, the way we communicate,
the relationships we have, democracy,
publishing, media, conversation.
But the challenge, I think, at the moment
is that a lot of gen AI is really based on a simple prompt,
and they can get complex; very, very related
to how computer and engineering languages work.
They re very specific,
they re asking for very specific concepts,
but they don t actually do nuance.
And yet, we re creating this paradigm
of a new set of language that is very literal
in its translation.
And often, the images that come back are also very literal.
So, this was something I did actually six months ago,
and I asked ChatGPT, no, sorry, Midjourney this week, again.
Earlier on in a talk this year, I did a very simple prompt,
not a complicated one.
You can get right into the APIs and the backend technologies
and do very complicated prompt engineering
to really specify what you want to create.
But in its most simple terms, it works like metadata, right?
You ask it a question,
or you give it very specific instructions.
I want Serena holding a tennis racket on a tennis court.
The quality of the image that has transformed
in the last six months is extraordinary,
but that is based off all the other images
it s now creating and feeding off itself to improve
that visual representation.
But my question has always been, well, not question,
is I would actually much prefer to see the real thing.
So, while I can create an image to represent, do it faster,
publish it, save time, save money,
it doesn t create the same feeling
as actually taking a photograph.
So, the biggest part and transformation
that s happening right now is that you can train the AI
with your own images.
So, ChatGPT last week literally launched a variety
of technologies where you can take your own images
and put them into the system
to generate very similar images in your own style.
So, I tried that, based on a project I would like to start
but kind of struggling with.
So, this year my mother passed away,
and when I was going through her archive,
there is almost no photographs of my mother with our family;
there s only four in existence
of my mother, father, and my sister and I.
And so, what does it mean as a child
or as someone who is in grief to try and imagine
or have an archive of images that don t exist.
So, I literally started to try and create my own ChatGPT.
It s gonna take some time.
This is the first render that it did to try
and create a family photo in the likeness.
As you can see, it s way more illustrative
and performative;
and look, I would really love to look like Wes Anderson,
but I don t.
But using those prompts and using my own images,
it s still struggling with that context.
So, I tried to create and fake some images of myself
as a young child with my mother.
It still can t quite do it.
But like with the Serena Williams example,
it won t take long before it will start
to render very interesting and close-to-real-life images
of a family.
However, that s still not the memory that I have, right?
So, while I can illustrate, transform,
and morph for photograph; I took this one, it s not my son
but it was taken at my son s 21st.
But to translate it into different person,
different location; while it is a fantastic illustration,
it doesn t create the meaning and the history
and the story that I would create with the photograph
that I ve published or created, or the memory that I had.
I wanted to show, and it s actually Marco, a little video.
This is the level of where the technology
and how the speed have changed,
which is now you can create, in real life, images,
video-to-video based on recording yourself,
in a matter of minutes, which as a creative s perspective
is extraordinary and super exciting,
because the kind of stories we can tell
from this is amazing.
Also, the level of, like, I don t know, just technology
and creativity is awesome.
You could do that at any style that you want:
manga, you know, CPR, et cetera.
So, I think, from a future-based storytelling,
it s really interesting as a tool to enable
new creative paths to telling stories.
So, to kind of wrap up, I just wanted
to flip it a bit in terms of, the rate of change,
even for someone who works in technology,
is a lot right now.
But ultimately, it comes down to the input.
What stories you wanna tell,
which technologies you wanna use, whether it s Photoshop,
film, 10 by 8, or even doing it in digital.
But ultimately, I actually think
that there will be a shift back to memory making
and more traditional forms of storytelling.
While I like an AI, like, telling me where to go to travel,
it s not gonna drive a decision,
nor is a photograph that is basically a facsimile
of my family and my life gonna represent the meaning
that I have growing up in my family.
So, I did ask ChatGPT also, can it do meaning or memory?
And it can t.
And while computer scientists are looking
to create historical and collective meaning,
I don t think we ll ever get to a position
where it can mimic.
I mean, it shouldn t be able to mimic.
It s a technology; it s a tool.
But ultimately, while it might mirror human-like qualities,
it will never replace.
And I think, as artists and creatives, we need to remember
that, even though the discussion around technology
is really loud right now.
Cause the question isn t that we re gonna use
these technologies; they re here, right?
We re all using them, we re all experimenting,
we re all finding new paths and ways to tell our stories.
But it s more that, as they become more generic,
and like I said it s a median, most of the images
that we create look very similar.
So, how do you know it s artist A, B, or C?
What story are they telling?
If it s a fashion image
and it looks just like every other fashion image,
we re gonna actually become numb
to these kind of technologies.
So, I m just gonna show two pieces of work to flip it back
to the photographic side.
Dito Pepe is a Czech photographer.
This is one of my favorite series
to really talk about the context of image and meaning-making
that I m just not sure an AI can do.
Her project was done where she imagined her life
as a mother and a wife based on the men that she met.
Often, the people in the photographs
are the children of the male.
And she basically did a whole series that represented
what her life might look like, from the styling,
the location, where she lived, her demographics,
whether she rich or whether she s poor.
And these images, I think, very simply illustrate
that AI technologies can help us tell all kinds of images,
but it s the meaning and the context
that is a little bit harder.
And just to end, because I thought I d show
a little bit of my work, the project that I did
over the last two years
during COVID was very similar.
So, it was called A Chance of Love.
It s based off photo book, a novel, and a series of images
that were based on meeting men on Tinder
and going on a first date.
And the interesting thing about that
is that you are interfacing intimacy and connection
via and through an algorithm.
And that algorithm gives you a lot of same same.
So, whether you are swiping left or right,
you will have [indistinct] in the mirror guy,
you will have selfie guy, you will have gym guy,
you will have all these very generic views of images,
because that s what ranks, right;
back to the large language models.
And everything looks the same,
so how are you meant to find partner, love,
intimacy, connection, when all you are seeing
is the same types of people saying the same types of thing
over and over again?
Because ultimately, when you meet someone,
they re not like what they look like on the screen, right?
And actually taking their portrait,
taking a photograph and meeting someone
through a chat with the opportunity for connection
is actually quite hard.
And it s currently filtered by an algorithm that decides
before you do who you re gonna be popular to
and who is most likely to swipe left or right,
and is basically kind of interfacing
with our choice of connection.
Which is something I think we need to bear in mind,
not just for dating, but for TikTok,
for any of our communication channels, is that the context
by which we have is different in a relational context
versus the AI and the technology.
In the book, all the men agreed,
we published all the conversations
and all the setups to, like, meet, connect,
take their portrait.
It has poetry.
And it was only nine images over two years.
But ultimately, while it started as a discussion
around connection, it also became a reflection
on algorithmic narratives
and these interfaces that we have with technology
that determine who we meet, what we see,
what we read, and ultimately how we just infer meaning
in our lives.
So, just to end up.
I think one of the challenges that we face at the moment
is this meaning context, right?
As humans, we actually don t necessarily want
all this speed, right?
We don t necessarily want technology driving that change.
Sometimes we want a bit of boredom.
We also want a bit of serendipity.
And technology finds it really hard to do serendipity.
And then, just to end... hope I m on time.
With all these technologies,
the key is just to understand how they work.
And in the context of AI, its input in creates output.
So, the quality of your prompt, the quality of the images
that you upload, the quality of the construct
of the story you tell is ultimately what will determine
the image, not the AI itself.
And that is it.
Thank you.
[audience clapping]
Starring: Mel McVeigh
There Are 15 Designer Debuts This Season. The Big Reshuffle Is Here to Help You Make Sense of Them All
15-Pound Chanel Bodysuits, Stompy Gucci Boots, Schiaparelli Corsetry: A Peek Inside Dua Lipa’s Tour Wardrobe
Inside Chloë Grace Moretz and Kate Harrison’s Final Wedding Dress Fittings at the Louis Vuitton Atelier in Paris
Go Behind the Scenes of the Gothic Venetian Gala in Wednesday Season 2, Part 2
Kendall Jenner and Gigi Hadid Share Secrets, Search for Cowboys, and Get Real About Their Friendship
Inside Alex Consani’s Birkin Bag: A Journal for Manifesting, a Top Model Video Game, and the Keys to Brat City
On the Podcast: Paul Mescal and Josh O’Connor on Adele, Fan Letters, and Learning to Pose on the Red Carpet
How Aespa’s Giselle Perfects No-Makeup Makeup
Margherita Missoni’s Favorite Spots in Milan
Exclusive: Emilio Pucci’s Pilot Episode Premiere