Fluxus Ex Machina (2019)

Fluxus Ex Machina is a project where a natural language processor, the GPT2 — then the most advanced text-producing machine learning model in the world — was fine-tuned with an archive of short performance art pieces, or “scores,” by the artist group Fluxus. The machine analyzed these texts and then produced its own. 100 of these outputs were collected for the “Fluxus Ex Machina” project.

The essay below reflects on what NLP is and how the scores were made. You can access the scores using the buttons below.

In 1952, a composer named John Cage wrote a piece for an audience gathered to see contemporary piano compositions in Woodstock, NY. A pianist, David Tudor, lifted the piano cover to suggest the opening of the piece, and opened up a piece of sheet music that was blank. Timing the piece out to 4 minutes and 33 seconds (the title of the piece), Tudor closed the lid of the piano three times, to signal three movements. Other than that, there was no sound.

Except, of course, for the sound of the rain, crickets, wind, coughing, and nervous shuffling in the crowd. This was 1952, and Cage had just introduced the idea of incidental music, and the idea that music could be anything, from any source, signified only by the directing of attention by a composer or performer.

“There's no such thing as silence,” Cage said later. “What they thought was silence, because they didn't know how to listen, was full of accidental sounds. You could hear the wind stirring outside during the first movement. During the second, raindrops began pattering the roof, and during the third the people themselves made all kinds of interesting sounds as they talked or walked out.”

Cage had no control over the rain, and so what Cage did was open up music to the idea of chance. Cage would later incorporate chance into all kinds of work: the I Ching, for example, is a Tibetan divination method that relies on sticks falling into patterns. Those patterns can be compared to drawings in the book, and the corresponding entry would give advice. Cage used the I Ching, and other random influences, to guide his decisions when creating music. Similar tactics include Brian Eno’s Oblique Strategies, a series of cards that you can draw to inspire your next action in creating a piece.

In essence, this meant Cage was giving up control over his work, to varying degrees. You can call this automatism: since music is always created in a kind of social context, it’s almost impossible to break away from the artist’s social context to create something new. But by incorporating chance, you open up the possibilities for elements that stand outside of culture, fashion, and “rules” that the composer might have adopted.

Finally, Cage introduced the idea that the audience was part of the work, and had a role in completing a performance that goes beyond “listening.” This is the idea that inspired Fluxus. Fluxus contained a great number of artists who had literally attended Cage’s classes at the New School in New York City, which focused on Zen-inspired approaches to composition including “indeterminacy.”

An image created by AttnGAN, a machine learning tool that generates an image based on the text you provide. In this case, the text came as a byproduct of the Fluxus scores:

“This is not an abstract exercise but a visual exercise. The tower is an image of a woman standing on a terrace, with the sun to her right; above, on either side her foot is in the water. The bottom is made up of pipes, with a red roof of roses and another red roof above.”

So, Cage does this performance, and a bunch of people roll their eyes, but a lot of other people hear about this and a light bulb goes off over their heads. Suddenly, music isn’t about notes or instruments — it’s about time and space and attention. Define the time, define the space, and draw attention to something in those boundaries, and you’re making a new kind of music.

A whole scene arrives by the 1960s. These artists were inspired by this idea but also sort of aware that the “avant garde” can be a bit pretentious. So these artists, which later came together under the name Fluxus, started making art that blurred the lines between humor and avant-garde art, taking both core concepts quite seriously. They brought in elements of chance, influenced by the boom of Western interest in Eastern Religion at the time. They reflected the spirit of the scenes that gave rise to pop and Andy Warhol, the blending of “high” and “low” forms of art.

Fluxus performances weren’t silent piano pieces: they referenced vaudeville, games, found objects, audience participation, found objects and pop art. The group was a tense blend of approaches from across the world, not all of whom agreed on a singular philosophy or spirit behind what they were doing.

What’s an Event Score?

One of the tools the Fluxus crowd used was the performance score. Think of them as sheet music: notes for actions to be performed in front of audiences. But it didn’t stop there. Some wrote scores that could only be performed in the mind of the reader, a type of conceptual art, pushing the definition of music even further. Others wrote scores that were physically impossible, and are more in line with poetry. But all of them are, fundamentally, to be understood as a new form of music, because they are about directing attention within a time and space.

George Maciunas, the founder of the movement, held fast to the idea that “anything can be art and anyone can do it,” which is the glue that held many of the pieces together. Fluxus scores were meant to be short and simple — some read like haikus. They’re often funny, elevating very minor gestures to the level of “performance.” And they are often punny, making a play on words or a connection between concepts that is funny, clever, and a little bit of what I might dub “profound-dumb,” in the sense that they are beautiful and elegant because of how simple they are.

Here’s an example from the Fluxus Performance Workbook, which I think might be a perfect Fluxus score. It’s by Ken Friedman.

Zen Vaudeville
The sound of one shoe tapping.
(1966)

It’s a pun: it’s funny, a take on Zen Buddhism’s koan, “What is the sound of one hand clapping?” You can actually perform it, but you don’t have to: You get a sense of everything just by reading it, so it’s conceptual art. And it’s nicely written, I think. It reminds me of a Haiku in its simplicity, which is important: Haiku, too, is about taking the mundane and every day and inscribing it into poetry. Not just to make a joke about poetry, but to elevate the everyday experience, to help us recognize the beauty that comes from isolating a moment and allowing ourselves to be present to it. In comparison, here’s a Haiku by the Japanese poet, Issa:

Asked how old he was,
the boy in the new kimono
stretched out all five fingers.

This is a haiku, a poem, but it’s also a moment in time, where our attention is directed toward something beautiful. It’s a bit mundane, at first glance: a kid tells you he’s five. But by directing our focus to the gesture, in bound moment, some of us see something special and profound in the gesture.

I’m including that here because Fluxus is often paired with Dada, but I think there’s an important distinction. Dada was perfectly satisfied with assembling a bunch of words together at random to create something that rebelled against art. I’m not knocking it. But if you look at Dada in its historical context, the whole group was pissed off about the war, they thought art had done nothing to help the world, they believed art had only served to ease the tensions of the powerful elite, sitting in concert halls while everything beyond Zurich was in flames. Dada wanted to destroy art and build something entirely new. Fluxus, taken as a whole, wanted to bring a new, democratized form of presence back to everyday life.

What does this have to do with AI art?

I’m bringing this up because the result of this project isn’t just an assemblage of gobbledygook spit out by a text generator, filling in blanks like Mad Libs. That kind of thing was possible in the late 1990s, using Javascript to cobble together sentences from a pre-selected series of nouns and verbs. I know, because I’ve made them.

An example of predictive text on an Apple iPhone. Start typing “Hey w” and the software can predict what you might type next. Current tech allows us to do this at the level of entire sentences and, in research labs, to generate an entire newspaper article based on a short introduction.

Today we’re at a point where AI is doing something much more interesting: it’s able to read a sentence or a few paragraphs, and guess what the next sentence might be. It’s not just picking pieces of existing text and remixing them. It’s taking in the context of what it reads, and doing its best to keep that text going.

Think about your phone, or your email. When you start typing a message, you get a little suggestion for the next word, or with GMail, you get a sentence to autorespond to messages: “Sounds Great!” for example. Or you start writing “W” and see three boxes for what those words might be.

That’s called Natural Language Processing, and it’s a goal of AI programmers in the corporate world: they would like you to be able to say more or less anything to Alexa or Siri, and have it understand you. That means recognizing the unique ways people type and speak when they talk to each other instead of when we talk or text a machine.

To make these 100 Fluxus Ex Machina scores, I basically let a one program — the revolutionary GPT-2 by OpenAI — read the Fluxus Performance Workbook, a collection of Fluxus Scores collected by Owen Smith, Ken Friedman, and Lauren Sawchyn. Smith is an artist and a Fluxus historian at the University of Maine, Friedman is a Fluxus artist (the author of the Zen Vaudeville piece quoted above). The book collects an assortment of Fluxus Scores from everyone who wrote one — from A-Yo to Yoko Ono. You can go look at it here, if you want.

Who is the artist, then?

Download the original Fluxus Performance Workbook, by Ken Friedman, Owen Smith, and Lauren Sawchyn, 2002, as a pdf document.

That’s tricky to answer. Fluxus descended from John Cage — and his idea that chance has a place in the creative process. Cage once wrote a Symphony for 12 Radios where the musicians tuned the dial of a radio, producing random bursts of static, music, and announcements, depending on whatever was being broadcast at the precise moment of its performance. Radios spitting out static can be music, too, under Cage’s rules that music is about time and space and attention. When a piece of Mozart spit out of the radio, does that mean Mozart wrote a bit of Cage’s piece? Were the DJs, on the air that night, collaborators? Maybe!

The work isn’t totally the machine’s doing. It’s still curated, selected by me, with my own tastes and preferences and even my own idea about what Fluxus is, one which isn’t particularly widespread among the people who performed it.

“I included one score, “Cream Soda and Chlorine,” which is a set of instructions for a chlorine bomb, but I quickly moved past scores that referenced suicide, murder, or worse.”

The pieces produced by the GPT-2 were much more raw, with a lot more noise and sentence fragments. Honestly, the machine also had a weird propensity toward violence: I included one score, “Cream Soda and Chlorine,” which is a set of instructions for a chlorine bomb, for example, but I quickly moved past scores that referenced suicide, murder, or worse. I selected 100 after weeks of feeding and generating text. Occasionally they needed copy edits, but I tried to respect the nature of the experiment by leaving the scores, titles, and dates as intact as I possibly could.

But it is organized. or even, you might say, orchestrated. I’m an artist, too, and a musician. I’ve been making (traditional) music for 25 years, and much of it is about choice. As a laptop musician, the software gives me an option: a tone of synthesizer, a particular sequence of notes, a plugin with a certain effect. I try one and if I don’t like it, I push a few buttons and try another one, and another one, and another one, until I find an option I like, which I keep in the music.

Just like Cage was able to orchestrate a new technology — organizing the radio waves, the datastreams of his day, into something new, he was able to author a performance as a set of instructions. Instructions are an algorithm, and so, at the end of the day, an artist working with an AI is just the first variable in the algorithmic processes an AI does to produce an outcome. As Maciunas wrote in one of his manifestos: “eventually we would destroy the authorship of pieces and make them totally anonymous—thus eliminating artists’ ‘ego’—[the] author would be ‘Fluxus.’”

A list of additional people who could be considered the artist:

If you wanted to make a list of people who could be the artist, you would probably include

  • Every artist who is in the Fluxus Performance Workbook

  • Owen Smith & Lauren Sawchyn for collecting & editing the Workbook w/ Ken Friedman;

  • OpenAI, which created the GPT-2 language processing framework;

  • Adam King, who created Talk To Transformer, a website that allowed me to access GPT-2;

  • Runway, which I also used to run GPT-2;

  • The entire Wikipedia community, since the GPT-2 was trained heavily by studying how language is used on the site.

So how does an AI work?

First off, the best definition of AI is “technology that hasn’t been invented yet.” As tech keeps getting better and better, we keep moving the goalpost for what we think AI is. The Terminator-nightmare-scenario AI is what you’d call “General” Artificial Intelligence, and that’s not here, and some people are unsure if it ever could be.

What we’re looking at in the Fluxus scores is a kind of AI called NLP, Natural Language Processing. Basically, NLP is autocorrect or autofill on your phone, but instead of recommending a word, it can recommend an entire sentence, even a series of sentences.

GPT-2 is OpenAI’s software that they’ve been releasing slowly, over time, to let researchers understand the implications of the software. Behind the scenes, they have a version that can basically write news stories or Wikipedia articles from scratch. You write a sentence, and it basically takes on the role of a master bullshit artist, spinning off more and more text from the sentence you gave it. Because that has a lot of implications for propaganda, fake news, and other applications, they aren’t sharing the full version. But they have shared some early releases as test cases, which is what I’ve used.

Here’s a short video that explain the whole business.

OK, so how does an NLP work?

(This section relies on a deep dive of NLP presented by Shreya Ghelani, a data scientist at Amazon, during the 2019 re-work conference panel “NLP: A New Age.” It’s a bit technical, but I did my best. If you aren’t interested in the guts of it, it’s OK — you can skip to the next section).

A natural language processor (NLP) reads “maps” of language that it makes by connecting commonly associated neighbors. So if you type “good” into your phone and it “predicts” the next word is either “bye” “morning” or “idea,” this is because the phone has learned common neighbors for “good” — “good” is going to be followed by “bye” or “morning” or “idea” pretty often. It is almost never followed by words like “barley” or “technicolor” or “obtuse.”  

The more often you pair words, the stronger their pairing score is. Relationships are assigned a number, a value that serves as a kind of ranking: good + morning may have the highest possible score of 1, while good + obtuse gets a score of 0 or even -1. “Good pasta” might have a score of 0.5, or higher, depending on how much spaghetti you eat and how often you tell your friends about it. In any case, when you type "good” it will recommend the next word based on pairs that have the highest scores.

So, that’s the low-level text prediction you see every day in smartphones and GMail. But there are much more powerful versions of this out in the world.

One of the first of these more sophisticated models we already see sometimes is called ELMO. ELMO looks at full sentences and breaks them into pairings, then weighs those pairings. So, like your phone, it knows “good” is heavily paired with “night,” but also looks at “good+night” as a pair and figures out what comes after that. This can be used to generate a sentence. Like this:

  1. The -> Cat = TheCat (what usually comes after “thecat”? “Is.”)

  2. TheCat -> Is = CatIs

  3. CatIs -> Happy = IsHappy

  4. IsHappy -> EndOfSentence = “The Cat Is Happy.”

However, this word+pair association tends to be bad at “remembering” things from earlier in the text, especially pronouns like he/she/it/they. It’s good at writing sentences at the Twitter length (a sentence or two) but ask it to write a text, and its meanings will fall apart as it moves further away from the start point.

Let’s say you want to use this sample to write another sentence. The cat’s happy. But why? We have to feed the program something to get it going, like “It is…” So: 

The -> Cat = TheCat

TheCat -> Is 

CatIs -> Happy

IsHappy -> EndOfSentence

It -> is 

ItIs -> a

IsA -> long

ALong -> Time

LongTime -> Coming

TimeComing -> EndOf Sentence

“The Cat is Happy. It is a long time coming.” is a perfectly happy sentence, but it doesn’t answer the question of why the cat is happy, and it’s not a very natural statement. Furthermore, while we as humans might fill in the “it” in the second sentence, as far as the machine is concerned, “it” doesn’t correspond to the “cat.” The computer has no idea what the “It” in sentence two refers to. It reads from left to right, and doesn’t really look very far back or very far forward.

If you have a iPhone, you can try this right now and see it at work. Start typing a sentence and just choose the autocorrect recommendation in the center over and over again. You’ll see that the next word makes sense when paired with the last word, but the sentence that results is more often a mess.

That brings us to a model called BERT, the result of Google labs trying to make autofill better for Android phones. BERT introduced a capacity for “attention,” which can understand what’s happening in other sentences and uses it to inform its predictions for the next sentence. It would know that the “It” meant “Cat” in this example. And it doesn’t just read text from left to right. If you start typing in the middle of a text, it will look all around it to make better informed decisions about what you should write next.

As Towards Data Science explains in a blog post:

As opposed to directional models, which read the text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once. Therefore it is considered bidirectional, though it would be more accurate to say that it’s non-directional. This characteristic allows the model to learn the context of a word based on all of its surroundings (left and right of the word).

This brings us to a phase of NLP where computers can actually anticipate what your next sentence will be, rather than your next word. That was made even better with OpenAI’s GPT-2, (the software in the video above) which is so good that they’re not releasing it out of security fears. You can read more about that, and see what the full version is able to do, on their website.

In a nutshell, GPT-2 is really good at predicting the next sentence, and the next, and the next, out to the level of a complete newspaper story. That story would, of course, be completely fake. But it would be convincing.

How did you make these Fluxus scores?

Right now the public can access public editions of GPT-2 a few different ways: Talk to Transformer is a Web-based text window that responds with GPT-2 feedback to what you feed it. The downloadable “machine learning for creatives” software platform, Runway, which is primarily geared toward images, also allows you to download GPT-2 as a text module. Both seem to be about the same, though you can add slightly more material into Runway. The model is available on GitHub, so you can go get it and run it anyway you want.

“From what I could see, GPT-2 could figure out that formula quite well, though you have to work against it’s mansplaining tendencies.”

Once I had access to GPT-2, I took a copy of the Fluxus Performance Workbook and fed as much as I could into the platform, careful to format as title, score, date. Because of those security restrictions, I could only do bits at a time — about 6-7 scores at most. From what I could see, GPT-2 could figure out that formula quite well, though you have to work against it’s mansplaining tendencies. Often it would start “semantic drift” — aka “bullshitting,” giving me weird lessons on the history of things mentioned in a score, which I have to imagine comes from drawing so much of its language knowledge from Wikipedia and the Web at large. Some of these were kind of genius — for example, it told me that David Bowie had written the soundtrack to a TV miniseries version of the Disney film, 101 Dalmations, in 1976. He didn’t, and a TV miniseries of that film never existed, but damn if I don’t want to hear a ‘76-era Bowie glam-rock cover of “Cruella DeVille.”

The output was actually better when constrained to a handful of pieces, rather than the entire thing; in particular, it seemed to do best when analyzing some combination of one script each from Eric Andersen, Alison Knowles, George Brecht and Ken Friedman. The reason as far as I can tell is that these scores contain the best combination of verbs and abstraction: sentences starting with short imperatives such as “make” or “performers begin by” that allowed the GPT-2 to more quickly arrive at what I wanted it to do, which was to create a set of instructions.

Adding 3-4 pieces in this format:

Title
Imperative sentence to conduct an action.
Date

Presented the best possible results. Too much in the middle frame gave me that semantic drift: the GPT-2 that we have as a public test seems better than anything else as staying focused, but its mind still tends to wander away from the topic of “attention.”

Puns in the Shell

GPT-2 also seems fond of puns, most likely because of its tendency to pair words without quite knowing the meanings. If software sees “Apple” it has no way to distinguish between “Apple” the fruit and “Apple” the company, so if it thinks about the sentence “apple of my eye” it can draw from words associated with food or tech companies.

I think that makes the current state of NLP extremely punny, which is appropriate to Fluxus scores. Here’s an example that made the final Fluxus Ex Machina workbook:

TV Game
Have a television and a television face each other, playing tennis on a court.
(1966)

It took me a minute to see it, but the entire score revolves around a pun on “play”:

  1. we play something on a TV

  2. we play tennis on a court

  3. a performer plays music

Explaining jokes makes them less funny, I know. But it’s interesting insight into how the GPT-2 works: you can see that the TVs are serving three levels of “play” here, which is delightfully appropriate. While it’s tempting to think this is evidence of extreme cleverness, it’s more likely (sorry, everyone) a result of the word-pair process for “play.” The final layer “play meaning performer” might be one I invented in my own mind — it’s hard to know, though that doesn’t make it less true that it’s there. After all, the GPT-2 doesn’t have any sense of intent. All of the meanings are what we create for them, which makes them delightfully adaptable, and perfectly in the spirit of a Fluxus score.

On the flip side, one of the first scores the GPT-2 generated was a series of seconds, strangely resonant with the structure of John Cage’s 4’33”. That event, called “13 Seconds,” was a variety of time intervals in seconds, followed by a moment of silence lasting 100 days, and then a series of intervals of seconds. This piece, at first, seemed easy to dismiss as gobbledygook — “oh this looks like a machine.” But it’s really not very far off from Cage’s 4’33”. That would be easy to imagine if I had typed the score of 4’33” into the text for analysis, but I didn’t. In fact, I never did — Cage isn’t in the Performance Workbook. The GPT-2 decided that time intervals were interesting, based on an analysis of other pieces that were inspired by Cage in the first place. In a sense, it reverse-engineered Fluxus.

What did you learn by doing this?

AI is going to be “augmenting” creativity soon. I think the Fluxus Ex Machina pieces are honestly interesting, even outside of being created by an AI. It raises a lot of key questions about authorship. Importantly, though, the process helped me to think about a question that’s been burning my mind since I started thinking about this project and “AI Art” in general:

Why do we even care about art made in this way?

“Why do we even care about art made in this way?”

If you consider art to be a way of articulating an emotion or an experience, then AI art is doomed to fail, because an AI can’t have emotions or convey an experience. An AI is not struggling to translate some inner landscape into a shared idea. There is no struggle, so it’s hard to see an achievement. Sure, it could become remarkably adept at replication — tracing the contours of images, tracing the contours of someone else’s inner landscape. Is that compelling? Is that art?

That question became blurred a bit for me, in this project, because I had to grapple with curation, collection, and editing the work. Pieces that resonated with me were included, those that didn’t, did not. The result is that the AI created text; that text resonated with something in my own imagination, and so it was passed on. The line between my art and the AI’s art is pretty blurry, even as I don’t consider these to be my pieces “outright.”

Putting my name on them seems odd; labeling them “GPT-2” seems wrong, too; the GPT-2, unassisted, would have produced nothing. If there is an “artist” in need of a name, that artist exists at the intersection of myself and the machine, a kind of cyborg fluxus, which is why I’m calling the whole thing Fluxus Ex Machina.

Obviously the term is a play on Deus Ex Machina, “God from the Machine,” defined on Wikipedia as “a plot device whereby a seemingly unsolvable problem in a story is suddenly and abruptly resolved by an unexpected and seemingly unlikely occurrence, typically so much as to seem contrived.” It’s appropriate to Fluxus, because as a plot device, “deus ex machina” was subject of both ridicule and awe; the idea of the ancient Greeks bringing actors onto the stage with a crane to act as Gods, fixing whatever plot mess was taking place, is maybe the first Fluxus event.

But Fluxus Ex Machina is also about the idea that the machine can make art. That’s a trope, too. The machines could never make art without us, and turning blindly to the machines to save us — to write Fluxus scores in our absence — is a bit naive. The human is already in the machine. Just as our conversations around algorithmic bias are often obscured by our sense of the machine’s capabilities, so is our vision of AI art: That the machine might make our work as artists so easy as to push a button on the machine until it inspires us. It may be true, but it would be a lazy way to work, and I don’t see why we’d do it. We are humans, we have inner lives we want to share and explore. It’s unlikely anyone who is drawn to create art would feel satisfied by pushing the button and copying and pasting the result.

Fluxus also means Flow, and that’s an important part of the project’s name. Fluxus Ex Machina is “the flow from the machine,” and I find that highly appropriate: the machine is part of the flow, affecting the direction, changing the shape, but one that we humans can direct and shape as well. That working space is what I’m most interested in exploring.

ArtEryk Salvaggiofluxus, ai