I asked an A.I. “what does a scientist look like?”
Artificial Intelligence (A.I.) reflects the inherent biases of the data it is trained on. My experiment with Stable Diffusion suggests that these tools encode male-dominant gender-science stereotypes. Despite this limitation, such technologies can still be directly guided to represent various intersections of science and identity.
This is an article in a series exploring my journey to blend user experience research into my data science work
Take a moment and close your eyes. Now imagine a scientist.
What do you see?
Perhaps you see a neat white lab coat. Maybe you see a messy experiment bench adorned with a steampunky undulation of glassware, metal, and tubes. The glassware may even be filled with bubbling tonics with wispy vapor tumbling over the necks of oddly shaped flasks.
Now ask yourself, what did the person wearing the safety goggles look like?
Children increasingly sketch women when asked to depict a scientist
When researchers ask children aged 6–16 to “draw a scientist” — something we have data on since the 1960s — their drawings reveal an interesting reflection of the historic social influences on gender-science stereotypes.
Dr. David Miller and his colleagues published a meta-analysis back in 2018 looking at the evolution of gender-science stereotypes in children by analyzing the “Draw-A-Scientist” (DAST) dataset. The authors were seeking evidence that how children depict scientists has changed as the representation of women in science has increased.
What the authors concluded from 78 studies is that:
- Children increasingly draw women when they draw scientists
- Girls, in particular, have shown the largest change, depicting women more than 50% of the time
Such findings indicate that cultural stereotypes of gender and science have shifted measurably as more women participate in fields like chemistry, biology, and astronomy & physics.
A new cultural mirror
As part of a larger exploration of the ways that humans and intelligent algorithms interact, I wanted to better understand A.I. models that generate images.
Stable Diffusion is a generative deep learning, text-to-image model trained on the LAION-5B dataset, a collection of 5 billion image-text pairs scraped from the entirety of the web. Images and their text are aggregated by resolution, caption language, and subject matter.
I wondered whether datasets generated from A.I. image tools like Stable Diffusion—similar to children’s drawings in DAST—might provide a lens into how social stereotypes are encoded in intelligent algorithms.
What does Stable Diffusion think a scientist looks like?
To run this experiment I generated 132 images from a pre-trained model of Stable Diffusion (version 1.5). Images were generated with a positive prompt: “illustration of a scientist, colored pencil, simple.” A negative prompt was included to limit the likelihood that the scientist was drawn “out of frame” or with “bad symmetry” to minimize random artifacts in images.
Images were then categorized as either depicting a man, woman, or not determined. This categorization scheme was chosen to be consistent with methods in prior literature. If you would like to analyze these images with a different categorization scheme, the dataset is available here: download Stable Diffusion DAST images (54 MB).
So what did I find?
1. Majority of depicted scientists were men
What I found was that in the sample of generated images, a majority (89%) of them depicted men. For reference, this is about 30% points higher than the 2016 estimate of all drawings in the DAST dataset.
While I suspected that men may be overrepresented in the data, the magnitude of the discrepancy to DAST was surprising.
There was one silver lining in the data: not all scientists were white men.
Though I did not specifically attempt to analyze the intersection of gender and race, a subset of scientists were drawn of non Anglo-European descent.
2. Around 1 in 10 illustrations could not be readily categorized
I also found that 8% of scientists could not be tagged as male or female. In some cases images lacked or contained conflicting gender indicators. Some images also did not clearly depict a human.
The inability to clearly code drawings by gender is also a feature that occurs in the DAST dataset, especially when children draw rudimentary sketches like stick figures.
3. Women received sparse representation
How did A.I. do when representing women in scientific roles?
Unfortunately, female scientists were very rarely depicted in this dataset. I estimate that only 3% of images generated with the given positive and negative prompt depict women.
Women were only sketched in 3% of images
The representation of non-white female scientists was worse—none were generated in my sample.
These estimates are troubling when viewed against current workforce participation data. Recent data from Pew Research reports that women comprise 50% of science, technology, engineering, and mathematics (STEM) jobs.
Women are most predominantly represented in health-related careers (74%), followed by life sciences (48%), mathematics (47%), and physical sciences (40%). Computer (25%) and engineering roles (15%) lag further behind.
A.I. is our future, but its data is our past
I found it troubling that the discrepancy for depicting women relative to DAST and workforce participation was so vast.
Though women’s representation has shifted significantly since the 1960s toward parity with men, A.I.’s representation of who is a scientist showed a very skewed, outdated view of gender-science stereotypes.
Framed against the meta-analysis from Miller and colleagues, A.I. drew women around as often as children did in 1965. This was much more extreme than I had anticipated going into this experiment.
A.I. drew women around as often as children did in 1965
This brings up important questions about how we reckon with the bias in the machine: how do we know that our A.I. is not just repackaging old social problems for a future generation? How do we identify the sources of bias in our data and catch them before they go live?
Theories of how children internalize gender stereotypes include both their direct observations of their social environments through interactions with people in their families and communities. Children also indirectly observe gendered roles through traditional and online multimedia.
The relative importance of gender as a social category also means that children frequently look for cues about what is appropriate for their gender. This makes reducing bias very consequential in intelligent algorithms that are becoming more integrated into everyday use.
Bias may seem like an abstract concept that we collectively know is important, but do not have a good grasp on how to address it in concrete terms.
Image generation tools like Stable Diffusion make the scope and scale of concepts like gender-science bias visible in a very real way.
Can we alter the mirror’s reflection?
This experiment presented an interesting opportunity to directly observe the way that intelligent algorithms are reflections of both the data they learn, but also the society that produced that data.
Despite the limitations of A.I. tools when given minimally specified prompts like “draw a scientist,” there may still be potential to wield these tools toward positive social ends.
Models like Stable Diffusion are open source and can be trained to generate images from novel classes of objects and people.
One such possibility that has gained popularity recently is using A.I. to generate artistic avatars of people from their own camera reels.
While these apps are limited in what they produce, it is possible to train models like Stable Diffusion — on as few as 3–5 images — and have access to the entire language model for generating novel images.
To illustrate this I trained a model of myself and asked Stable Diffusion to draw me as a scientist.
Admittedly some of the images captured, erm, more of my sense of humor than science skill.
But some of the images did a really wonderful job of showing me what I might have sketched as a child envisioning myself as a scientist (though with much more artistic skill).
With the entire language model at my fingertips, I was able to personalize the drawing prompt to also include relevant concepts about me. Here we can see that I was able to generate some representations of the intersection of my queer and scientific identities.
Focusing the reflection with our own lens
From this experiment I relearned an important lesson. Stable Diffusion, like many other machine learning models, is just a tool.
Tools can be wielded in a number of ways — for good or ill — but the person holding the tool gets to make those choices. By understanding how these and similar ML/A.I. work, we have the capability to shape their outputs in new and interesting ways.
Imagine, for example, consider an intervention where we use A.I. to let children see themselves as a scientist. They could, quite literally, see what they would look like as a veterinarian, an astronomer, a video game developer, or even an inhabitant of an interstellar space station.
Children could see themselves as a veterinarian, physicist, video game developer, or even an inhabitant of an interstellar space station
A.I. tools surely will come to us with biases, but we can still exert some creative control over how those algorithms represent our world.
In the case of image generation, we can personalize the experience by training the model to render a particular person. The language model can then be used to layer on other relevant concepts about that individual, including their interests, salient identity characteristics, and their own vision of what a scientist is and does.
The challenge now lies in really understanding how A.I. reflects a world that is fundamentally static and in the past. We need to be vigilant for ways that this past world no longer represents our lived experience and when we should be prepared to guide it to a different output.
We may not be putting pencil to paper, but we have an opportunity to sketch a new generation of ML/A.I. tools that can have a positive social impact.
Disclaimer: These writings reflect my own opinions and not necessarily those of my employer.