World-Building in Oz with AI

Sections

Project Overview

What I Accomplished

1

Used generative-AI to create a diverse library of cinematic imagery inspired by The Wizard of Oz

2

Engineered prompts using precise references to visual culture, creative prose and action-oriented language

3

Developed practical knowledge around generative-AI: 

  • how models function
  • how models are trained
  • strengths and limitations 

Project Sections

1

Context

2

Prompt Development

3

Challenges

4

Animation

5

Images

6

Critical Reflection

7

Conclusion

Tools I used

Toggle Between Tabs

Context

Since the year 2022, companies including OpenAI, Google, DeepSeek and others have released progressively more powerful consumer-facing products that make use of generative-AI models; these models are trained on massive data sets

In the same year, I began a long-term creative project: a sequel trilogy of illustrated novels based on the classic American fairytale, The Wonderful Wizard of Oz (1990)

The main character, Dorothy Gale, and the land of Oz had a profound impact on my childhood; presently, the material had recaptured my imagination and I became absorbed in research, writing, collecting physical literature and hand-illustrating components of my vision

I recognized the potential to visualize elements of this vision in high-fidelity using emerging technology, and so my experiments began…

Cover art for a forthcoming series of adult-fiction novels based on The Wonderful Wizard of Oz (1900) by L. Frank Baum; the covers blend generative-AI with traditional graphic design

  1. The Gale of Kansas
    • An elderly Dorothy Gale returns to Oz by way of a mysterious machine; once in Oz, she must contend with a cold war taking place in parallel with that which takes place on Earth 
    • The Fortress In The Sand
    • The mad professor lures the winged monkeys into her yellow stone fortress at the edge of the deadly desert; there, she plays psychological games of cat and mouse with the monkeys in order to unlock their hidden potential and reveal their true purpose in her larger plan to conquer the desert
  1. The State Of Magic
    • In the distant future of Oz, Dorothy the Magineer has perfected the method for slingshotting probes between Oz and Earth; through her hubristic plan to connect the worlds across dimensions, she risks the very existence of both worlds along with a third: the mysterious dome world known as Oobliad

Prompt Development

1

Structure & Syntax

2

Vocabulary of Visual Culture

3

Prompting with Images

4

Producing Variation

Prompt Development

1

Structure & Syntax

Midjourney and other generative tools require the use of specific prompt structures and syntax in order to achieve the best results for a given model

My generations improved significantly once I began to employ these built-in parameters

However, the use of designated structure and syntax were never a guarantee that a given prompt would succeed on the first pass

Example cheat sheet detailing syntax and structure best-practices in Midjourney (Credit: Tristan Wolff)

Each generative product has a unique set of best-practices for syntax and structure

Prompt Development

2

Vocabulary of Visual Culture

I made use of my knowledge of art history and visual culture to make precise references to artistic periods, styles and mediums, artists, filmmakers and specific work

Vocabulary of Visual Culture

Points of Reference

Forms

  • Jellyfish; shape and quality of movement 
  • Clouds; textural quality and environmental context

Style 

  • 1700’s pastoral oil painting by Russian and English masters
  • Early colour photographs known as autochromes 

Vocabulary of Visual Culture

Generations

The process of layering elements by naming them throughout the body of the prompt produced a series of phantasmagoria from an Art Basel of the future

In addition to producing the desired “green tornado”, I was able to generate an abundance of variations that interpreted the root prompt in unique ways

Prompt Development

3

Prompting with Images

Midjourney and other generative-AI tools allow the use of images as references within the body of the prompt

The author can steer the prompt to fuse, combine, juxtapose or otherwise blend elements from multiple images in combination with text

Prompting with Images

Points of Reference

Judy Garland, Age 45, Credit: Bettman

The Green Man/Woman; a figure derived from English folklore and often represented in garden ornaments

Prompting with Images

Generations

The resulting combinations fused elements of content as well as colour information from each photo reference

Prompt Development

4

Producing Variation

It’s possible to rapidly generate an enormous amount of variety with generative image-making tools

By deriving variations, the user creates a rootlike or branchlike map of outputs as they steer the model towards a desired outcome

It was most often through a process of making small tweaks to the base prompt and executing variations that I arrived at the most impactful images

An example of the abundance possible with variations; the final image was produced over a series of working sessions, where the prompt was adjusted or completely re-written multiple times

Challenges

1

Non-Human Skin Tones

2

Uncommon Creatures of Fantasy

3

Natural Ageing

4

Imaginary Phenomena

5

Unwanted Body Parts & Unsettling Expressions

Challenges

1

Non-Human Skin Tones

I was unable to produce non-human skin tone by text alone; it required the use of photo references in the body of the prompt

Non-Human Skin Tones

Photo References

Non-Human Skin Tones

Early Generations

Variations in the early stages of generating the desired combination between skin tone and other formal qualities of appearance (gender, age, character reference) were more subtle

Images that adhered most closely to the desired outcome were selected for follow-up iteration/variation

Challenges

2

Uncommon Creatures of Fantasy

Multiple sessions were required to achieve the synthesis of elements I was looking for in the character of the winged monkeys; L. Frank Baum’s winged creations are not a common creature of fantasy in comparison to, say, a unicorn—a representation which likely appeared in abundance during the model’s training

As such, my role as the prompt engineer was to creatively rephrase the prompt and select variations for nearest accuracy until I arrived at the desired combination of style and other formal elements before pivoting to production of abundance

Initial generations humorously misinterpreted the cultural reference to “flying monkeys” 

Follow-up generations made use of an image reference from the 1939 MGM film, but also drew on unwanted to references to more current styles of illustration

A third pass attempted to make text reference to visual culture and style, but suffered from AI sheen: an image quality of generative-AI that produces an unwanted filter reminiscient of modern digital illustration

In the next phase I pivoted to a different style, attempting to make reference to 17th century baroque oil painting

I attempted to use text prompting to fuse the winged monkeys with additional qualities of other species such as pigeons and rats with varying success

In the final phase, I achieved the desired formal and stylistic qualities and began to generate an abundance of variations, experimenting with shifting the locale/background

Challenges

3

Natural Ageing

The model had a tendency to output generations that appeared too artificially attractive and youthful, despite the intentional use of age descriptors in the body of the prompt

Initial generations had a tendency to appear overly youthful despite the deliberate inclusion of descriptive language regarding age

A set of variations eventually took on a naturally aged appearance, and so further variations were prompted from that set

A final image depicts the character of Glinda the Good with a more natural appearance of ageing, upscaled from a desired generation

Challenges

4

Imaginary Phenomena

There is a limit to how many parcels of information a model can process; prompt engineers are tasked with creatively reducing the volume of text in the prompt necessary for the model to successfully generate what the user envisions

This can become challenging when the engineer is attempting to describe a phenomenon that does not already exist as a referenceable image, or one which is difficult to describe; the following generations represent that scenario

Animation: The dome world known as Oobliad spins in the dark plasma medium of its unique universe;

Animation: Dorothy’s probe arrives deep within the Earth’s core in the distant future

Image: An elderly Dorothy Gale is swept into a wormhole containing green plasma

Challenges

5

Unwanted Body Parts & Unsettling Expressions

Although this phenomenon is being actively addressed in more recent product releases, the sense that human representations in generative-AI appear somehow alien or inhuman has been a common critique of generative images and video

Earlier models were highly prone to adding additional extremities to human and humanlike representations such as fingers and limbs

Ozma, the Reclusive Queen of Oz

Additional limbs are just one example of uncanny valley apparent in generative images; closer inspection of the face, for example, makes it more clear that something is off — I notice it most clearly in the eyes

Animation

Animation

1

Dorothy's Mysterious Machine

Animation

2

Ancestors of Oz

Animation

3

Ozma's Robotic Prototype

Animation

4

The Winged Monkey in the Yellow Fortress

Images

Images

1

The Cowardly Lion, in the style of Matthew Barney

Matthew Barney is a contemporary cinema artist who uses high-tech film cameras and cinema-grade prosthetics to build the surreal and provocative worlds of his expansive film series, Cremaster Cycle(1-5); the production of the complete work spans multiple decades

Using the syntax “in the style of” generated representations of the humanoid Cowardly Lion from MGM’s 1939 Wizard of Oz musical with visual stylings reminiscent of Matthew Barney’s cinematography-driven video art

The Cowardly Lion -ITSO- Matthew Barney

Points of Reference

The Cowardly Lion -ITSO- Matthew Barney

Generations

Midjourney did an exceptional job of capturing the essential visual flavour of both MGM’s 1939 iteration of the Cowardly Lion, as well as Matthew Barney’s work overall.

Images

2

Countryfolk & Other Hominids of Oz

These images are another example of mass variation that was executed once a desired style was achieved; the initial successful variation resulted in an abundance of unique characterizations in the same distinct style and colour scheme, but with a large variety of nuanced physical characteristics

Images

3

The Distant Future of Oz

Part of my vision of a future Oz is informed by 20th century science fiction literature spanning from the 1960s to the 1990s; key inspirations include Alice Bradley Sheldon, who wrote under the pseudonym James Tiptree Jr., and Larry Niven, author of the mind-bending sci-fi classic, Ringworld (1970)

In addition to literary inspiration, my visual explorations also heavily reference 20th and 21st century science fiction film and animation   

The Distant Future of Oz

Points of Reference

A Barnstormer In Oz (1983) by Philip José Farmer, Warm Worlds and Otherwise (1975) by James Tiptree Jr. aka Alice Bradley Sheldon, Ringworld (1970) by Larry Niven 

The Distant Future of Oz

Generations

Professor Gale, the multinautic engineer

Professor Gale’s Earthbound Kaiju

Inter-dimensional schematics

Gallicrow witnesses collapsing dimensions

Model Strengths

1

Textural Qualities

The models excelled at producing textural qualities of different materials, even when forms were not exact or precise 

This image showcases Midjourney’s power when generating textural quality, despite it’s difficulty representing the human form with accuracy and precision; notice the “third leg syndrome” 

2

Particle Systems

Particularly evident in generative animation, the models excelled at representing particle systems in motion, even when that particle system was a surreal or nonexistant phenomenon

“A Cacophonous Apparition Appears Above Oz” — Image and Animation

Examples of a successful particle system; in this case, a cloud-like aether with a quality of movement superimposed

3

Stylistic References

Midjourney and Pikabot both excelled at representing artistic mediums, stylistic periods, creators and films when precise references were included in the vocabulary of the prompt

Top/Left: Actress Julie Andrews runs to the top of a grassy hill with the Swiss Alps in the background in The Sound of Music (1965)

Bottom/Right: The prompt for this image of an adult Dorothy Gale running through a field of poppies made clear stylistic reference to The Sound of Music (1965)

Critical Reflection

Integrated Media

Future practical applications for generative image and animation could likely include the use of generative tools to produce elements of scenery that would normally be produced with modern day CGI.

Rapid Advances in Generative Tech

Google’s Veo 3, has opened new creative doorways; the tool allows the integrated transition between vignettes in order to produce complex animated sequences with consistent characterization and a linking of scene-progression

The above shortfilm was made entirely using Google’s Flow tool. The tool integrates a creative user interface streamlined for filmmakers with the powerful Veo-3 animation model

Deepfakes & Propaganda

The emergence of widely available generative tech has laid the path for an explosion of deepfakes and other political propaganda to proliferate en masse across the web, particularly evident on social media platforms at present

Unless we come up with effective methods of tracking and filtering capable of accurately and consistently identifying AI-generated content—methods that are independent of the self-reporting we are currently relying upon—we are at serious risk of entering a dark period of information where we cannot effectively vet the trustworthiness of what we see, read and hear. Some might argue that this dark age of information is already here, and is actively impacting the well-being of people and communities

Many were fooled by this AI-generated animation of bunnies jumping on a trampoline at night

The more data a model has to train on, the more accurate its generations may become

Musician Oliver Richman comments on the Ai trampoline bunnies i this musical sketch he produced as part of a song-a-day challenge for musicians on social media

Environmental Impact of Generative-AI

The environmental impact of widely available generative tech is already apparent; the current Trump administration and its lax environmental stewardship is allowing Big Tech to forego consideration of the environmental impact in pursuit of profit; this is often at the expense of the communities where massive data centres are being newly constructed

This infographic from the Cap Gemini Research Institute describes some of the environmental impact of generative-AI

Economic Impact of Generative-AI

Economic research suggests that AI and automation will replace a significant number of jobs over the coming years; additionally, tech companies including Meta, Google and OpenAI developed their generative models by non-consensually scraping vast amounts of artwork, photography, audio, video and writing from the web without compensating a single creator for the use of their work

What is arguably impacting jobs most significantly at the present moment is the trend of large tech companies outsourcing roles to locations with far cheaper labour available; this is creating significant challenges for those in the early stages of career, and for those who are finding themselves at the mercy of mass layoffs taking place as a result of the trend

This article from the World Economic Forum discusses perceptions of job displacement and the value of labour in the AI-driven economy

This report from the World Economic Forum’s Future of Jobs 2024 survey goes in-depth to narrate the projected economic and career landscape in the coming years

Project Conclusion

I learned to apply syntax and structure depending on the model being used to envision creative ideas

I practiced trial-and-error in the use of prose-based, cultural vocabulary-based and action-based language in my prompts

I developed a system of production:

  1. Achieve successful generation/combination through the use of structure, syntax, creative language and image references
  2. Produce variations of the most successful generations or combinations
  3. (For Images) Make precise corrections by re-prompting lassoed sections of the image
  4. (For Animations) Re-prompt many times using slightly varied sentence structures and syntax-based attributes (such as camera zoom and panning, degree/quantity of motion

I learned more deeply about generative AI:

  • how models are trained
  • how models function
  • strengths and limitations of specific models
  • implications and impact of generative tech

Back to projects ↴

Learn more about me ↴

Find me on LinkedIn ↴

© Copyright. Matthew Crans. 2025.