World-Building in Oz with AI

Leveraging generative A.I. to visualize a cinematic universe inspired by The Wonderful Wizard of Oz (1990)

Tools & Platforms

Discord

Midjourney

Pika.art

1. Introduction

2. Early Generations

3. How Do I Engineer Prompts Effectively?

4. Diving Deeper Into Challenges

5. Image Gallery

6. Animating Images Using Pikabot

7. Animation Gallery

8. Reflecting on the Impacts of Generative-AI

9. Writing & Illustrating by Hand

10. Final Thoughts

Project Outcomes

Through a process of experimentation, I developed a system for engineering image and animation prompts

I created an expansive library of images and cinematic sequences, and I tested the creative limits of generative models

I developed a holistic understanding of generative-AI, considering:
- how models work,
- how models are trained,
- what the strengths and limitations are

Introduction

Generative-AI & The Internal Sense of Futureshock

When powerful consumer AI hit global markets in 2022, I felt overwhelmed; the emergence of these tools carried some obvious and heavy implications about the future.

I had just begun work on a personal writing project: a sequel trilogy to L. Frank Baum’s The Wonderful Wizard of Oz, narrated from a science fiction lens for a mature reading audience—

Like Gregory McGuire’s Wicked, but in outer space.

The first image I produced in relation to the writing project: a cover art concept for the first novel in the trilogy, created in Adobe Illustrator.

A Possible Antidote to Futureshock

As I dove into the writing process, I eventually made an important connection:

💭Generative-AI could potentially help me realize this vision from a truly cinematic perspective.

What if this project is actually the perfect sandbox for overcoming feelings of futureshock about generative-AI?

Project Outcomes

Through a process of experimentation, I developed a system for engineering image and animation prompts

I created an expansive library of images and cinematic sequences, and I tested the creative limits of generative models

I developed a holistic understanding of generative-AI, considering:
- how models work,
- how models are trained,
- what the strengths and limitations are

Early Generations

Peering into An Uncanny Valley of Shadows

First Attempts

I engineered my first image prompts without referring to any Midjourney documentation.

Some of the initial images were nice to look at, but they ignored the majority of the prompt.

PROMPT:

Judy Garland as Dorothy in 1950 at the age of 60. She is standing over a reactor that is glowing green. She is inside of a dark barn, only illuminated by the green light from the reactor. She has long brown curly hair styled for the 1950’s. She has an expression of wonder on her face. She is looking down at the reactor. She is wearing a lab coat from 1930 and thick gloves for handling nuclear material. Everything in the scene is in greyscale, with intense shadows cast in the barn behind Dorothy. The green light from the reactor is the only colour in the scene. The reactor looks like a steampunk artifact. It has a nautical look to it.

Red text = did not compute.

Initial Problems I encountered

Difficulty with correctly assigning the defining features across multiple character references; the model would produce an impression of the known characters, but would splice their features together oddly or misinterpret directions about specific characteristics

Uncanny Valley
- Odd and unsettling facial expressions
- Additional limbs
- Inaccurate proportions

Some of these early images were either tremendously goofy, truly unsettling, or a combination of both.

Strengths I noticed

The capability to mimic styles

Referencing fewer characters per prompt often led to better accuracy of representation

How do I engineer Prompts Effectively?

Developing a Systematic Approach

Elements of Effective Prompt Engineering

1. Structure & Syntax

Midjourney and similar tools require the use of specific prompt structures and the use of established syntax in order to achieve the best results.

My generations improved significantly once I began to employ these built-in parameters, however, the use of designated structure and syntax were not necessarily a guarantee that a given prompt would succeed on the first pass.

Example cheat sheet detailing Midjourney’s syntax and structure best-practices (Credit: Tristan Wolff)

2. Vocabulary of Visual Culture

I made use of my knowledge of art history and visual culture to make precise references to historical periods, artistic styles and mediums, artists and filmmakers, and the titles of their work.

Points of Reference

Forms

Jellyfish: shapes, quality of movement
Clouds: shapes, textures, environmental context

Style

1700s pastoral oil painting by the European masters
Early colour photographs known as autochromes

Resulting Generations

The process of layering elements by naming them throughout the body of the prompt produced a series of phantasmagoria from some Art Basel of the future

In addition to producing the desired “green tornado”, I was able to generate an abundance of variations that interpreted the root prompt in unique ways

3. Prompting With Images

Midjourney and similar tools enable the user to deploy images as references within the body of the prompt

The author can steer the prompt to fuse, combine, juxtapose or otherwise blend elements from multiple images in combination with text prompting.

Points of Reference

Left: Judy Garland, Age 45, Credit: Bettman Images

Right: The Green Man/Woman; a figure derived from English folklore and often represented in garden ornaments

Resulting Generations

The resulting combinations fused conceptual elements of content, as well as colour information from each photographic reference.

4. Generating Variation

With generative tools, it’s possible to rapidly create an enormous amount and variety of content riffing on a common theme or style of image.

When deriving such variation, the user produces a branch-like map of outputs as they participate in steering the model towards a desired outcome.

It was most often through a process of small tweaks to the root prompt and executing variation that I arrived at the most impactful images.

An example of the abundance possible when generating variations.

The final image was produced over a series of working sessions, where the prompt was adjusted or completely re-written multiple times before I arrived at the image that appealed the most to my creative sensibilities and vision for the character.

Diving Deeper into Challenges

Challenges & Process

1. Non-human Skin Tone

I struggled to produce non-human skin tone by text alone; I eventually achieved the results I was looking for by loading image references into the body of the prompt.

2. Uncommon Subjects

The winged monkeys are not a common creature of fantasy as compared to, for example, a majestic unicorn

Representations of unicorns likely appeared in abundance during the model training process, unlike L. Frank Baum’s winged creations, meaning that the model can easily reproduce that concept.

To successfully generate the winged monkeys, I had to creatively steer the prompt and select variations for nearest accuracy until I arrived at the truest desired combination of elements.

Winged Monkeys Process

Initial generations humorously misinterpreted the cultural reference to “flying monkeys”.

Follow-up generations made use of an image reference from the 1939 MGM film, but also drew on unwanted sources of inspiration thematically and stylistically.

A third pass attempted to make references via text to visual culture and style, but the images suffered from AI sheen: an image quality of generative images I identified that seems to recreate a textural quality of modern digital painting.

In the next phase I pivoted to a different style, attempting to make reference to 17th century baroque oil painting.

I was also attempting to imbue the winged monkeys with qualities of other species of animal such as the colouring of pigeons and the tails of rats with varying degrees of success.

In the final phase, I achieved the desired formal and stylistic qualities and began to generate an abundance of variations, experimenting with shifting the background locale.

3. Natural Appearance of Ageing

The model had a tendency to output generations that appeared too artificially youthful and conventionally attractive, despite the intentional use of age descriptors in the body of the prompt.

Initial generations had a tendency to appear overly youthful despite the deliberate inclusion of descriptive language regarding age.

A set of variations eventually took on a naturally aged appearance; further variations were prompted from that set.

A final image depicts the character of Glinda the Good with a more naturally aged appearance.

4. Uncanny Valley

Uncanny valley is a phenomenon that can be described as: a hypothesized psychological and aesthetic relation between an object’s degree of resemblance to a human being and the emotional response to the object [source].

The experience of witnessing uncanny valley can be described as a physical sense of eerieness or feeling unsettled by the representation.

Human representations containing qualities of the uncanny valley is a common critique of generative images.

The presence of uncanny valley in these images ranges from:

subtle (on the left[top] image, a somewhat hidden third leg), to
obvious-yet-isolated (middle image: a third arm/hand), to
obvious throughout (right[bottom] image: ghastly/distorted faces, limbs twisting oddly, a general clay-like appearance)

5. Imaginary Physics/Phenomena

There is a limit to how many parcels of information a model can process.

Prompt engineers are tasked with reducing the volume of text in the prompt, while still providing the necessary information for the model to successfully generate what the user envisions.

This can become challenging when attempting to describe a phenomenon that does not exist as an easily reproducible image, and which may be difficult to describe in words.

Image Gallery

Existing Characters & settings

1. The Tin Woodman

Points of Reference: 1939 MGM Film

Midjourney Generations

Style Prompts: baroque oil painting, black-and-white film directly referencing the MGM Oz film, steampunk

2. The Emerald city

Points of Reference: 1939 MGM Film, Original Book Illustrations

Midjourney Generations

Style Prompts: art nouveau, neoclassical architecture, the Chicago World’s Fair “White City”, retrofuturism

3. Ozma, The Fairy Queen of oz

Points of Reference: 1985 Disney Film, Original Book Illustrations

Midjourney Generations

Style Prompts: The Matrix (1999 film), 1980s dark fantasy genre, 1980s airbrush illustration, 1950s National Geographic photography, art nouveau meets photorealism, Tibetan and Indonesian cultural dress

4. The Cowardly Lion

For this series of generations, I focused on making reference to the work of video artist Matthew Barney, and specifically his visionary film series, Cremaster Cycle.

Points of Reference: 1939 MGM Oz Film, Matthew Barney's Cremaster Cycle

Midjourney Generations

New Characters & settings

1. The Yellow Brick Fortress

In the second novel, The Fortress In The Sand, Dorothy will construct a Yellow Brick Fortress at the edge of the Deadly Desert, an important locale in the Oz lore.

Midjourney Generations

2. Blockette: Dorothy's Robotic Hivemind

In the first novel, I explore Queen Ozma’s use of humanoid robots to keep the Emerald City secure from the Nome King, the story’s main antagonist.

In the second novel, Dorothy will continue this trend by creating a modular robotic hivemind known as Blockette, that will be responsible for keeping the Yellow Brick Fortress secure.

Midjourney Generations

3. Galecrow: Fusing Characters

The mysterious character known as Galecrow is a fusion of three characters:

Dorothy
The Scarecrow
The Hell-Hens from the dome world, Oobliad

Midjourney Generations

Animating Images Using Pikabot

Similar Principles with Key Differences

Comparing Processes

Similarities

Prompt with text alone, images or both text and images

Differences

Midjourney: describe the subjects and scenery

Pikabot: Describe motion using action-oriented language, verbs

Differences in Syntax, but similarities in structure: the deeper into the descriptive portion of the prompt you go, the less accuracy you’ll achieve

Observations

Challenges of Animation

Much more randomness; needed to rework prompt and re-roll the generation many times to achieve results I was happy with for the abilities of the freeware tool I was using.

Strengths I Noticed

Convincing particle systems: water, smoke

Room for interpretation = better results sooner

Animation Gallery

1. The Mysterious Machine In The Subterranean Grotto

In the first novel of my trilogy, Dorothy Gale will spend a large portion of her adult life piecing together an esoteric machine, created from parts collected during her global travels in search of the Wizard’s geneological ancestors.

2. The Monkey In The Fortress

In the second novel, Dorothy will lure the winged monkeys into her Yellow Brick Fortress in order to fashion them into emissaries of the Deadly Desert.

3. Ozma's Robotic Prototype

In the first novel, Queen Ozma will make use of an army of robot guards to keep the Emerald City and her throne safe from the Nome King and his forces.

4. Dorothy Melts Into The Field of Poppies

Throughout her life on Earth, Dorothy experiences vivid dream sequences where the laws of physics break down, such as in this sequence taking place in the poppy fields of Flanders, Belgium.

5. The Whispering Ancestors of Oz

In the third novel, Dorothy will be visited by a series of messengers, including the Ancestors of Oz, a mysterious liquid marble frieze with muttering faces, warping into and out of existence.

6. Oobliad: The Inescapable Dome World

In the final novel, readers will be introduced to a mysterious dome world known as Oobliad, a world which is doomed to annhiliation in the forseeable future, as are all other dome worlds adrift in this proto-universe that is somehow adjacent to Oz’s and Earth’s universal planes.

Reflecting on the Impacts of Generative-AI

Rapid Advancements In Generative Technology

Google’s Veo 3 Model broke new ground with the release of the Flow tool, which enables creators to generate cinematic sequences and stitch them together seamlessly using an interface designed with filmmakers at the forefront of its design.

This shortfilm was made entirely using Google’s Flow tool, which makes use of the Veo 3 generative model.

Mass Spread of AI-Generated Content

Novelty, Deepfakes & Political Propaganda

Novelty animal videos are not a new phenomenon unique to the age of social media.

Cute animals doing silly things was a hallmark component of America’s Funniest Home Videos, a syndicated TV program that debuted in 1989, well over thirty years ago. This type of media is an obvious target for AI content engineers to mimic and attempt to spread virally across the web.

This AI-generated video of bunnies jumping on a trampoline at night, “in the style of” security cam footage, was convincing to the untrained eye.

Due to the ease of production and replicability that’s already possible using the current generation of models, the sheer volume of AI-generated content already flooding the web has caused me to consider: could the volume of ai-generated content eventually overtake actual user-generated content made by more traditional means of production?

This BBC report details the rise of AI-generated social media influencers currently convincing millions, and even generating real income for the engineers responsible for these fake personas flooding social media.

Possibly even more troubling than fake AI-influencers selling wellness products, is the fact that real people can have their likeness co-opted to be used, for example, in smear campaigns as a form of political propaganda.

This CBC report details the viral spread of a convincing deepfake of Canada’s former Prime Minister, Justin Trudeau.

An important question:

Must we now question every single thing we see online by default? Is this simply the new normal?

The Next Maker Revolution?

The emergence of generative-AI as a new creative tool has been highly divisive, with relative acceptance and adoption by some, and complete pushback and deliberate non-use by others.

Oliver Richman is an indie musician participating in an ongoing “song-a-day” challenge

In the case of UX and product design and research, advocates of AI tools capable of replicating user interfaces, such as Base44 and Lovable, suggest that generative-AI is “democratizing design”.

I might venture to challenge this notion if the real result is a devaluation of the discipline of product and UX design.

Democratization feels like a contemporary buzzword that is distracting from the very real disenfranchisement of individuals due to the loss of available opportunities for economic mobility as a result of the emergence of the tools.

Opinion:

Capability alone does not equal

1.vision

2.taste level

3.experience

If adoption is inevitable, I believe creative thinkers still have the edge in this process of “democratization”, but we will have to be brave and visionary throughout this uncomfortable process; innovative leaders at the forefront of the ethical use of AI.

Environmental Impact

The sudden emergence and rapid proliferation of generative tools has placed a measurable strain on natural resources.

The data centres, where a sudden and massive influx of AI-generated content is now being housed, use large quantities of fresh water to cool their hardware.

Additionally, the communities where these data centres are being built are being directly impacted.

This infographic from the Cap Gemini Research Institute describes some of the environmental impact of generative-AI

Economic Impact

Economic research suggests that AI and automation will replace a significant number of jobs over the coming years.

This article from the World Economic Forum discusses perceptions of job displacement and the value of labour in the AI-driven economy.

Additionally, big-tech including Meta, Google and OpenAI all trained their models off of content that was non-consensually scraped from across the web; vast quantities of artwork, photography, audio, video and written content were used without a single creator being compensated for the use of their IP in the model-training process.

Writing & Illustrating by Hand

Where I started; where I am still headed

ResearcH & Collecting

What Happened In Oz After Dorothy Returned to Kansas?

L. Frank Baum wrote thirteen sequel novels, and several short stories and spin-offs after publishing the original novel in May of the year 1900 before his passing in 1919.

All of this IP exists in the public domain; it is a rich library of characters, objects and settings from which I am able to derive an evolution of the series from a new perspective and with the ability to eventually publish the derivative work.

Collecting Physical Literature

I began hunting down physical copies of these sequel novels, as well as other media directly related to or inspired by the Oz lore.

My quest to locate copies of this literature lives in parallel to my plot device of Dorothy scouring the globe for pieces of the Wizard’s machine.

Like Captain Ahab in Hermann Melville’s classic novel, Moby Dick, I eventually tracked down an early printing of The Wonderful Wizard of Oz in a local used bookstore. I left a glowing 5-star review.

Learning How To Write Fiction

Inspiration/Points of Reference

My project is inspired by 20th century science fiction literature, written by authors including Alice Bradley Sheldon (aka James Tiptree Jr.) and Larry Niven.

The Mechanics of Fiction

An inspired idea is nothing without strong technical execution.

To improve writer’s craft and to develop understanding about the mechanics of writing fiction, I am listening to successful authors who I admire discuss their work and process.

I am also consulting guides for specific information about fiction best-practices, literary structures and writing processes.

Illustrating Oz

A meditative Counterpoint to the Disorienting Velocity of Generative-AI

Final Thoughts

Art is Political; Creating It is Radical

Topics/Themes I am Exploring In My Writing

Disruptive Technology

Wheat and cattle farming in Kansas from 1910 to 1955

The development of electricity and municipal electrical grids

The transition from cattle and steam-powered transportation to the consumer automobile and combustion engines

Wartime conditions leading to rapid technological advancement

Geopolitics

Nucelar armament and the development of the Cold War

The political commentary of George Orwell

Propaganda and political espionage

My Personal Maker Revolution

Why am I writing & Illustrating?

When the idea for this project first occurred to me, I was going through a creative drought. I was creating for others, but not for myself. I felt like I was losing sight of why I create in the first place.

The idea for the writing project occurred to me when I came across a copy of The Wonderful Wizard of Oz in a Little Free Library. I have been fascinated by the story since childhood. This chance encounter with a copy of the novel is what reignited my passion to create.

Much like the strategy of facing down my futureshock by engaging with generative-AI, engaging with these topics and themes is allowing me the opportunity to confront and process internalized fears about global affairs through a creative rather than consumptive lens.

Back to projects:

Learn more about me:

Find me on LinkedIn:

World-Building in Oz with AI

Tools & Platforms

Table of Contents

1.

Introduction

2.

Early Generations

3.

How Do I Engineer Prompts Effectively?

4.

Diving Deeper Into Challenges

5.

Image Gallery

6.

Animating Images Using Pikabot

7.

Animation Gallery

8.

Reflecting on the Impacts of Generative-AI

9.

Writing & Illustrating by Hand

10.

Final Thoughts

Project Outcomes

Table of Contents

Introduction

Generative-AI & The Internal Sense of Futureshock

A Possible Antidote to Futureshock

Project Outcomes

Early Generations

Peering into An Uncanny Valley of Shadows

First Attempts

Initial Problems I encountered

Strengths I noticed

How do I engineer Prompts Effectively?

Developing a Systematic Approach

Elements of Effective Prompt Engineering

1. Structure & Syntax

2. Vocabulary of Visual Culture

Points of Reference

Resulting Generations

3. Prompting With Images

Points of Reference

Resulting Generations

4. Generating Variation

Diving Deeper into Challenges

Challenges & Process

1. Non-human Skin Tone

2. Uncommon Subjects

Winged Monkeys Process

3. Natural Appearance of Ageing

4. Uncanny Valley

5. Imaginary Physics/Phenomena

Image Gallery

Existing Characters & settings

1. The Tin Woodman

Points of Reference: 1939 MGM Film

Midjourney Generations

2. The Emerald city

Points of Reference: 1939 MGM Film, Original Book Illustrations

Midjourney Generations

3. Ozma, The Fairy Queen of oz

Points of Reference: 1985 Disney Film, Original Book Illustrations

Midjourney Generations

4. The Cowardly Lion

Points of Reference: 1939 MGM Oz Film, Matthew Barney's Cremaster Cycle

Midjourney Generations

New Characters & settings

1. The Yellow Brick Fortress

Midjourney Generations

2. Blockette: Dorothy's Robotic Hivemind

Midjourney Generations

3. Galecrow: Fusing Characters

Midjourney Generations

Animating Images Using Pikabot

Similar Principles with Key Differences

Comparing Processes

Similarities

Differences

Observations