Weekly Report 10

What I’ve Made

This week I kept up on the Non-Sentences project mostly, I spent most of the time working on the network lines and a new text to speech mode from it that adds a female voice speaking the words as they come out generated. These two things, which on the surface don't look like that much, took a lot of time and coding to make them working. For the lines, essentially I had to calculate the x and y position of each cell, then draw a 3D line based on those points as they're being called by my sentence controller script, to draw it out word by word, then erase after the sentence is being generated. It makes interesting shapes and maybe there is a way to use them in other contexts, either within or outside of this particular project. The text-to-speech took some major edits of this sentence controller as well, as it naturally wants to read each word as fast as possible, resulting in a lot of cutoff from word to word, trying to read the full set of words every time before being cut off when a new word was added. I had to work a lot with timing and spacing of the words being generated, and spacing between the generation of the spoken word, as how that works is it basically creates an mp3 file for each word, then finds that file in my computer, and plays it back out through TouchDesigner. It still reads the last word a couple times before the sentence is erased, but I think it adds a sort of hypnotic element to it, akin to movies like Blade Runner 2049's baseline test (interlinked) and the opening of Under the Skin as the Scarlett Johansson alien is being "built" and learning to talk like a person, which is based on real life phonetic training for speech therapy (the best available video I could find of this I'll have below, as well as the Blade Runner sequence). 

As it relates to my thesis, this can bring more insight into the training of a LLM chat or TTS model. Trained on millions of words, this project can be seen as a visualization / audio representation of the back end process of a model both learning how to structure sentences that make sense to us, and the actual behind the scenes of a model stringing together words before they become visible to us. Normally, we see it's best foot forward in this predictive randomness, but not the millions of millions of training nonsense or how it chooses to put words together. I also think the imperfect voice adds to the uncanny-ness to it all, where by lowering the fidelity of the voice (compared to something like ChatGPT's text-to-speech models) it strips back a layer of relateability, where its main goal is trying to mimic human speech patterns, but by it being slightly "off" we might better step back and see it for what it really is: algorithmic decisions trying to make sense.
As a note the audio recording for the voice is pretty quiet, so if you turn up your computer audio to hear it make sure you turn it back down before you play the other videos.
To add more to this specific project I think I will add a word counter that shows how many times a specific word has been used by my algorithm. With it being random you would think over time that all the words would average out to about the same number of times being used, but in building this I've noticed words like "architecture, facilitate" and "dream" being used more frequently than other words. Adding that can speak to the over-reliance of certain words by LLM, which are making their way into our language as well, and in a non-measurable speculative effect of this project for someone viewing it, they might end up using some of the words in the table more frequently than others. Additionally for some stylistic polish I need to add the words being used highlighted in the cells besides just the line drawing to it, and think of some options for the sentence text on the right, thinking of having the characters look like they're "floating" by translating them on the x, y plane smoothly and randomly, to add to the flexibility of meaning as sentences break down to words, break down to characters, which break down to lines.
Among working towards other projects for the Emergent Garden interactive edition I've done some work on the interactive aspects of hand tracking, where I have currently a setup that tracks your index finger and thumb, and a basic interface over it that shows a datapoint of how spread out your fingers are. This has a lot of promise already, as all the data is there to map things like primitives to your fingers or the midpoints between them, maybe pinching to increase / decrease their size, and using other fingers or hand motions to place them down. There is a balance to find here, where you want to give people as much control to build the garden as they please, but with non-traditional methods of building, there can be a chance for people to get confused of how it actually works, which could undermine some of my work on it. This is where potential combinations of midi pads could come in handy I think, where there is a reliable hardware interface that people can use aside from their actual body movements and motions.
Finally, in terms of the previous project with face / emotion tracking, I did some brainstorming and think a good direction to go with that could be a reactive AI-rorshach visual that reacts to your smile or frown, with some accompanying text that's trying to elicit how you feel in the moment, and try to manipulate how you feel to be overly positive. Rorshach tests nowadays are typically seen as outdated and unreliable, bordering on the edge of pseudoscience as a means to determine personality characteristics or even underlying psychopathologies. What else may be unreliable? Maybe using an AI model as a therapy device or emotional crutch. The Rorshach test has a couple of "algorithms" or scoring criteria to determine anything from predisposed schizophrenia to a persons personilty, coping mechanisms, or personal perception. These algorithms provide a means to an end to a predetermined result of a score for whatever may be evaluated in taking a test, but suppose you don't know that and take one from a seemingly trustworthy source, and take at surface value what the Rorshach test says about you or your personality, and you believe it to be true even if it was based on uninformed pretenses or contexts. The same line of thinking can be applied about AI models and concepts of sycophancy,  where the AI tends to reinforce beliefs that you have when conversing with it, whether it be your emotions or if you have a great new business idea to combine french fries with salad. (A newer South Park episode used this concept recently, to mock AI, startup funding, and of course the Trump administration). 
Finally, on the more research side of my thesis, I've reached out to a couple AI creatives, and have gotten some confirmations on those that would like to be interviewed. One of my favorite creatives using AI was one of the first to reach out, which I am pretty stoked about. I won't say who he is, but he's done some incredible work with AI that explores concepts like the perception of time and fragility of infrastructure across long time, and has been featured on things from electronic billboards to creating visuals for major music festivals and artists, like Future, Metro Boomin, Travis Scott at events like Rolling Loud and Lollapalooza. Within the next couple of days to weeks, I'll have some interviews down and transcripted to be used for my thesis.

What I’ve Read

I managed to get a hold of "HOMO LUDENS A STUDY OF THE PLAY-ELEMENT IN CULTURE", and in reading some of it and using Google Notebook to quickly summarize other parts, it brings up ideas and definitions of play that seems familiar to me, whether I've heard it from an undergraduate philosophy class or YouTube essay. Johan Huizinga describes play as a fundamental and autonomous activity that is free, meaningful, and distinct from ordinary life, emerging before culture or civilization itself, as animals also are observed to play with each other. He argues that play is not merely a biological function or a means to an end, but rather a primary category of life that carries its own intrinsic value and significance. He says play is “in fact freedom,” a temporary stepping outside of real life into a separate sphere governed by its own order and rules. Although it is “not serious,” it can be pursued and used with complete seriousness and absorption. Serious in tone and attitude, but not serious in stakes or survival. Within this self-contained world, play creates order, tension, and beauty, giving meaning and form to human experience beyond survival or utility.

Huizinga says that play is a foundational element of culture itself, not just a pastime or instinct. Civilization, he claims, “arises and unfolds in and as play,” and thus Homo Ludens: “man the player”, is as essential a concept as Homo Sapiens. Through language, ritual, art, and contest, humans express their creative and cultural instincts in the spirit of play. This definition frames play as more than simple amusement, to a cultural force that shapes meaning, community, and the human spirit.

As this relates to my thesis I'll have to do some more in-depth reading in how play can promote learning and AI literacy. The shaping of meaning and community seems like it could be a good stance to explore further, and relating it to the ideas of biology and AI I've explored too seem like a connective tissue that can be formed.

Where the Next Steps are Leading

Again, I'll need to continue refining my projects / installations. My muse 2 has been in the mail for some time now, but it should be arriving in the next couple of days so that I can explore that more. I'll have to refine the emergent garden interactions as well, and in working with the Rorshach idea I'm thinking if there is potential in combining it with the cognitive control to fully encompass the transference of meaning, collaboration, and cognition from emotional, physical, and perceptual levels. Aside from projects, the interviews are coming along. Once I've done a couple of those and have the transcripts, I can start coding them for consistent themes and work towards the workshop content.

Leave a comment

//about

Ryan Schlesinger is a multidisciplinary designer, artist, and researcher.

His skills and experience include, but are not limited to: graphic design, human-computer interaction, creative direction, motion design, videography, video-jockeying, UI/UX, branding, and marketing, DJ-ing and sound design.

This blog serves as a means of documenting his master’s thesis to the world. The thesis is an exploration of AI tools in the space of live performance and installation settings.