Since the year 2022, companies including OpenAI, Google, DeepSeek and others have released progressively more powerful consumer-facing products that make use of generative-AI models; these models are trained on massive data sets
In the same year, I began a long-term creative project: a sequel trilogy of illustrated novels based on the classic American fairytale, The Wonderful Wizard of Oz (1990)
The main character, Dorothy Gale, and the land of Oz had a profound impact on my childhood; presently, the material had recaptured my imagination and I became absorbed in research, writing, collecting physical literature and hand-illustrating components of my vision
I recognized the potential to visualize elements of this vision in high-fidelity using emerging technology, and so my experiments began…
Cover art for a forthcoming series of adult-fiction novels based on The Wonderful Wizard of Oz (1900) by L. Frank Baum; the covers blend generative-AI with traditional graphic design
The Gale of Kansas
An elderly Dorothy Gale returns to Oz by way of a mysterious machine; once in Oz, she must contend with a cold war taking place in parallel with that which takes place on Earth
The Fortress In The Sand
The mad professor lures the winged monkeys into her yellow stone fortress at the edge of the deadly desert; there, she plays psychological games of cat and mouse with the monkeys in order to unlock their hidden potential and reveal their true purpose in her larger plan to conquer the desert
The State Of Magic
In the distant future of Oz, Dorothy the Magineer has perfected the method for slingshotting probes between Oz and Earth; through her hubristic plan to connect the worlds across dimensions, she risks the very existence of both worlds along with a third: the mysterious dome world known as Oobliad
Prompt Development
1
Structure & Syntax
2
Vocabulary of Visual Culture
3
Prompting with Images
4
Producing Variation
Prompt Development
1
Structure & Syntax
Midjourney and other generative tools require the use of specific prompt structures and syntax in order to achieve the best results for a given model
My generations improved significantly once I began to employ these built-in parameters
However, the use of designated structure and syntax were never a guarantee that a given prompt would succeed on the first pass
Example cheat sheet detailing syntax and structure best-practices in Midjourney (Credit: Tristan Wolff)
Each generative product has a unique set of best-practices for syntax and structure
Prompt Development
2
Vocabulary of Visual Culture
I made use of my knowledge of art history and visual culture to make precise references to artistic periods, styles and mediums, artists, filmmakers and specific work
Vocabulary of Visual Culture
Points of Reference
Forms
Jellyfish; shape and quality of movement
Clouds; textural quality and environmental context
Style
1700’s pastoral oil painting by Russian and English masters
The process of layering elements by naming them throughout the body of the prompt produced a series of phantasmagoria from an Art Basel of the future
In addition to producing the desired “green tornado”, I was able to generate an abundance of variations that interpreted the root prompt in unique ways
Prompt Development
3
Prompting with Images
Midjourney and other generative-AI tools allow the use of images as references within the body of the prompt
The author can steer the prompt to fuse, combine, juxtapose or otherwise blend elements from multiple images in combination with text
Prompting with Images
Points of Reference
Judy Garland, Age 45, Credit: Bettman
The Green Man/Woman; a figure derived from English folklore and often represented in garden ornaments
Prompting with Images
Generations
The resulting combinations fused elements of content as well as colour information from each photo reference
Prompt Development
4
Producing Variation
It’s possible to rapidly generate an enormous amount of variety with generative image-making tools
By deriving variations, the user creates a rootlike or branchlike map of outputs as they steer the model towards a desired outcome
It was most often through a process of making small tweaks to the base prompt and executing variations that I arrived at the most impactful images
An example of the abundance possible with variations; the final image was produced over a series of working sessions, where the prompt was adjusted or completely re-written multiple times
Challenges
1
Non-Human Skin Tones
2
Uncommon Creatures of Fantasy
3
Natural Ageing
4
Imaginary Phenomena
5
Unwanted Body Parts & Unsettling Expressions
Challenges
1
Non-Human Skin Tones
I was unable to produce non-human skin tone by text alone; it required the use of photo references in the body of the prompt
Non-Human Skin Tones
Photo References
Non-Human Skin Tones
Early Generations
Variations in the early stages of generating the desired combination between skin tone and other formal qualities of appearance (gender, age, character reference) were more subtle
Images that adhered most closely to the desired outcome were selected for follow-up iteration/variation
Challenges
2
Uncommon Creatures of Fantasy
Multiple sessions were required to achieve the synthesis of elements I was looking for in the character of the winged monkeys; L. Frank Baum’s winged creations are not a common creature of fantasy in comparison to, say, a unicorn—a representation which likely appeared in abundance during the model’s training
As such, my role as the prompt engineer was to creatively rephrase the prompt and select variations for nearest accuracy until I arrived at the desired combination of style and other formal elements before pivoting to production of abundance
Initial generations humorously misinterpreted the cultural reference to “flying monkeys”
Follow-up generations made use of an image reference from the 1939 MGM film, but also drew on unwanted to references to more current styles of illustration
A third pass attempted to make text reference to visual culture and style, but suffered from AI sheen: an image quality of generative-AI that produces an unwanted filter reminiscient of modern digital illustration
In the next phase I pivoted to a different style, attempting to make reference to 17th century baroque oil painting
I attempted to use text prompting to fuse the winged monkeys with additional qualities of other species such as pigeons and rats with varying success
In the final phase, I achieved the desired formal and stylistic qualities and began to generate an abundance of variations, experimenting with shifting the locale/background
Challenges
3
Natural Ageing
The model had a tendency to output generations that appeared too artificially attractive and youthful, despite the intentional use of age descriptors in the body of the prompt
Initial generations had a tendency to appear overly youthful despite the deliberate inclusion of descriptive language regarding age
A set of variations eventually took on a naturally aged appearance, and so further variations were prompted from that set
A final image depicts the character of Glinda the Good with a more natural appearance of ageing, upscaled from a desired generation
Challenges
4
Imaginary Phenomena
There is a limit to how many parcels of information a model can process; prompt engineers are tasked with creatively reducing the volume of text in the prompt necessary for the model to successfully generate what the user envisions
This can become challenging when the engineer is attempting to describe a phenomenon that does not already exist as a referenceable image, or one which is difficult to describe; the following generations represent that scenario
Animation: The dome world known as Oobliad spins in the dark plasma medium of its unique universe;
Animation: Dorothy’s probe arrives deep within the Earth’s core in the distant future
Image: An elderly Dorothy Gale is swept into a wormhole containing green plasma
Challenges
5
Unwanted Body Parts & Unsettling Expressions
Although this phenomenon is being actively addressed in more recent product releases, the sense that human representations in generative-AI appear somehow alien or inhuman has been a common critique of generative images and video
Earlier models were highly prone to adding additional extremities to human and humanlike representations such as fingers and limbs
Ozma, the Reclusive Queen of Oz
Additional limbs are just one example of uncanny valley apparent in generative images; closer inspection of the face, for example, makes it more clear that something is off — I notice it most clearly in the eyes
Animation
Animation
1
Dorothy's Mysterious Machine
Animation
2
Ancestors of Oz
Animation
3
Ozma's Robotic Prototype
Animation
4
The Winged Monkey in the Yellow Fortress
Images
Images
1
The Cowardly Lion, in the style of Matthew Barney
Matthew Barney is a contemporary cinema artist who uses high-tech film cameras and cinema-grade prosthetics to build the surreal and provocative worlds of his expansive film series, Cremaster Cycle(1-5); the production of the complete work spans multiple decades
Using the syntax “in the style of” generated representations of the humanoid Cowardly Lion from MGM’s 1939 Wizard of Oz musical with visual stylings reminiscent of Matthew Barney’s cinematography-driven video art
The Cowardly Lion -ITSO- Matthew Barney
Points of Reference
The Cowardly Lion -ITSO- Matthew Barney
Generations
Midjourney did an exceptional job of capturing the essential visual flavour of both MGM’s 1939 iteration of the Cowardly Lion, as well as Matthew Barney’s work overall.
Images
2
Countryfolk & Other Hominids of Oz
These images are another example of mass variation that was executed once a desired style was achieved; the initial successful variation resulted in an abundance of unique characterizations in the same distinct style and colour scheme, but with a large variety of nuanced physical characteristics
Images
3
The Distant Future of Oz
Part of my vision of a future Oz is informed by 20th century science fiction literature spanning from the 1960s to the 1990s; key inspirations include Alice Bradley Sheldon, who wrote under the pseudonym James Tiptree Jr., and Larry Niven, author of the mind-bending sci-fi classic, Ringworld (1970)
In addition to literary inspiration, my visual explorations also heavily reference 20th and 21st century science fiction film and animation
The models excelled at producing textural qualities of different materials, even when forms were not exact or precise
This image showcases Midjourney’s power when generating textural quality, despite it’s difficulty representing the human form with accuracy and precision; notice the “third leg syndrome”
2
Particle Systems
Particularly evident in generative animation, the models excelled at representing particle systems in motion, even when that particle system was a surreal or nonexistant phenomenon
“A Cacophonous Apparition Appears Above Oz” — Image and Animation
Examples of a successful particle system; in this case, a cloud-like aether with a quality of movement superimposed
3
Stylistic References
Midjourney and Pikabot both excelled at representing artistic mediums, stylistic periods, creators and films when precise references were included in the vocabulary of the prompt
Top/Left: Actress Julie Andrews runs to the top of a grassy hill with the Swiss Alps in the background in The Sound of Music (1965)
Bottom/Right: The prompt for this image of an adult Dorothy Gale running through a field of poppies made clear stylistic reference to The Sound of Music (1965)
Critical Reflection
Integrated Media
Future practical applications for generative image and animation could likely include the use of generative tools to produce elements of scenery that would normally be produced with modern day CGI.
Rapid Advances in Generative Tech
Google’s Veo 3, has opened new creative doorways; the tool allows the integrated transition between vignettes in order to produce complex animated sequences with consistent characterization and a linking of scene-progression
The above shortfilm was made entirely using Google’s Flow tool. The tool integrates a creative user interface streamlined for filmmakers with the powerful Veo-3 animation model
Deepfakes & Propaganda
The emergence of widely available generative tech has laid the path for an explosion of deepfakes and other political propaganda to proliferate en masse across the web, particularly evident on social media platforms at present
Unless we come up with effective methods of tracking and filtering capable of accurately and consistently identifying AI-generated content—methods that are independent of the self-reporting we are currently relying upon—we are at serious risk of entering a dark period of information where we cannot effectively vet the trustworthiness of what we see, read and hear. Some might argue that this dark age of information is already here, and is actively impacting the well-being of people and communities
Many were fooled by this AI-generated animation of bunnies jumping on a trampoline at night
The more data a model has to train on, the more accurate its generations may become
Musician Oliver Richman comments on the Ai trampoline bunnies i this musical sketch he produced as part of a song-a-day challenge for musicians on social media
Environmental Impact of Generative-AI
The environmental impact of widely available generative tech is already apparent; the current Trump administration and its lax environmental stewardship is allowing Big Tech to forego consideration of the environmental impact in pursuit of profit; this is often at the expense of the communities where massive data centres are being newly constructed
Economic research suggests that AI and automation will replace a significant number of jobs over the coming years; additionally, tech companies including Meta, Google and OpenAI developed their generative models by non-consensually scraping vast amounts of artwork, photography, audio, video and writing from the web without compensating a single creator for the use of their work
What is arguably impacting jobs most significantly at the present moment is the trend of large tech companies outsourcing roles to locations with far cheaper labour available; this is creating significant challenges for those in the early stages of career, and for those who are finding themselves at the mercy of mass layoffs taking place as a result of the trend
This article from the World Economic Forum discusses perceptions of job displacement and the value of labour in the AI-driven economy
This report from the World Economic Forum’s Future of Jobs 2024 survey goes in-depth to narrate the projected economic and career landscape in the coming years
Project Conclusion
I learned to apply syntax and structure depending on the model being used to envision creative ideas
I practiced trial-and-error in the use of prose-based, cultural vocabulary-based and action-based language in my prompts
I developed a system of production:
Achieve successful generation/combination through the use of structure, syntax, creative language and image references
Produce variations of the most successful generations or combinations
(For Images) Make precise corrections by re-prompting lassoed sections of the image
(For Animations) Re-prompt many times using slightly varied sentence structures and syntax-based attributes (such as camera zoom and panning, degree/quantity of motion