“So good I’m surprised it’s not better” sums up how I feel about most Pixar films. They’re so polished that they seem kind of dead: expensive show dogs that have been so primped and groomed that nobody noticed the poor animal expired during its last blow-dry.

Soul is an exercise in box-ticking. It’s hip, diverse, and relevant: featuring an African American protag near the height of the Black Lives Matter movement, and showcasing jazz music when that was having a mainstream moment, too (Whiplash, La-La Land, Adam Neely…). It has an “high concept” tilt, addressing themes of life, death, and fate. It has cute characters that can be turned into toys. It has leavens serious moments with jokes at the correct places. It reaches for the stars, but also has characters that are just like you. It does everything right.

But jazz has something called “blue notes”, where a note is bent a little out of correct pitch. Magic happens on the edges, on the fault lines, on the places where Discordia fights Aneris. Soul really needed blue notes.

I thought the beginning was wonderful. A part-time music teacher is offered a) a permanent position at a job he hates, and b) a seat with a famous jazz quartet. His mom wants him to take the steady paycheck, while he wants to follow his dream (although we wonder whether his dream working for the tyrannic band leader will be as fun as he imagines.) Joe Gardner feels real in a way that most Pixar heroes don’t.

On his way to purchase a suit for the big gig, he falls down a manhole, and enters a permenant vegetative state. His spirit emerges in a kind of celestial mail-sorting room, full of souls. Some are departing for the hereafter, others are going to inhabit newly-born infants. He hatches a plan to sneak back to his body on Earth, using a rebellious soul called 22.

The movie becomes a bit too talky for its own good. A lot of time is spent explaining the mechanics by which souls operate (literalizing things that should be left unsaid), and this is when Soul stopped possessing one. As soon as characters started infodumping about the rules of their fictional universe I became very bored. This should be the part of the movie where it detonates. Joe has essentially died. He’s at the point where the chains of words and language break, the ceiling of the universe is flung open, and he’s staring up infinity’s tunnel. The possibilities should be endless.

Instead, the story gets jammed up in its own turning gears (souls must visit special “Personality Pavilions” to infuse them with personality traits, but there’s also a critical “Spark” that can only be found in the Hall of Everything, and once they have both the Spark and the Personality they can receive an Earth Pass, which allows them to…). I felt like I’d arriving late for a D&D session and the Dungeon Master insisted on explaining the minutia of the past three hours before I was allowed to play.

The best Pixar movie may have been the first. Toy Story had nearly no exposition. Toys are alive. What more do you need? Later Pixar movies held your hand a little, but there was often a reason for it. In Monsters Inc, the joke was that these freaky monsters are living in a rules-heavy bureaucracy, so it sort of made sense. But here? I think Pixar has forgotten how to make movies any other way.

Pete Docter’s 2015 monster Inside-Out seems to have set the tone for latter-day Pixar. A minimalist abstraction, with five tons of exposition clinging to its bones. It might have seemed wonderfully simple to have “anger” and “sadness” as characters. But they couldn’t figure out a way to tell a story visually, so they had to add so much in-universe lore that the movie suffered greatly for it. Docter wanted his film to fly so badly that he crushed it under the weight of huge steel wings, and something similar happens here.

This aside, the movie’s stylistic choices are almost uniformly great. The New York setting is colorful, and the jazz scenes are good. In my opinion, jazz is far more fun to play than to listen to, but it’s also enjoyable watching people play it, even when they’re animated characters.

The jokes are well-judged, with only an extended body-swap gag getting aborted before it grew too irritating. The angels look fantastic. Weird glowing coathangers, or yin-and-yang collisions of positive and negative space. Very creative stuff.

The souls themselves are sickeningly cute moppets with huge eyes, so clearly designed to be plushy dolls that you can almost see a Made in China tag hanging off them. Like much of the movie’s choices, it’s correct, but maybe not exactly good. Part of why I’m cold on Pixar films is that they all seem like a marketing team was set loose on them before a writer ever was. Antz was better than A Bug’s Life, despite its flaws. I don’t want to watch 90-minute toy commercials.

I admire Soul more than I like it. Once, as a child, I tried to make a bowl out of modeling clay. I didn’t know that potters use a wheel, so I tried to painstakingly create the same kind of smoothness with my fingers. I spent hours patting and prodding and shaping clay, until it was covered in my fingerprints. The result was a correct but loveless pot that I couldn’t stand to look at. It hurt me, because I remembered the tediousness of the process.

Soul has a similar affect. When I look at it, I don’t see a movie so much as compacted effort. Dozens of creative people sweating over every decision, trying to create something universally marketable and appealing, afraid to take any risks at all. Afraid that someone, somewhere, won’t like what they’ve made. Consumed by neurosis that they’ll have done the slightest thing wrong. The jazz theme is surface-deep: Pixar in 2020 is as far from Miles Davis as you can get.

No Comments »

There’s a really common thing in stories, which I’ve never seen named. Characters will show subtle awareness that their world is fiction.

I don’t mean loud stuff like an actor turning to face the camera and saying “hello, I’m a character in a movie”. I also don’t mean tropes like “explosions make sound in space” or “In Russia everyone speaks in Russian-accented English”, which is dramatic license so that we can understand the story better.

I mean tiny details, where a character’s behavior only makes sense if they know they’re in a story.


– Why do the halflings in fantasy stories (such as RA Salvatore’s Drizzt Do’Urden books) describe themselves as “halflings”? A halfling wouldn’t think of himself as half-sized. From his perspective, he’s normal-sized. It’s the elves, orcs, and humans who are abnormally large.

– Why is the city in DuckTales called “Duckburg”? It’s like a city in our world called “Humantown”. The characters seem to know there’s something unusual about a city of talking ducks.

– In Blade Runner, why does Deckard’s boss explain what a replicant is to a man who has spent a career hunting them down?

– In a recent James Bond film (Die Another Tomorrow Isn’t Enough or something) Daniel “Idris Elba” Brosnan is attacked by baddies. He kills them all, then relaxes, puts his gun away, and leaves. How does he know that all his enemies are dead? There could be ten more bad guys waiting in ambush. Maybe one of them had to go to the WC. It’s as if Sean “Lashana Lynch” Lazenby has read the screenplay to his own movie, and knows the precise number of baddies he must kill before the scene ends.

– In this Calvin and Hobbes comic, why do the clones think of themselves as duplicates two through six? Wouldn’t they each believe that they’re the original?

– In the acclaimed Mel Gibson movie Braveheart, Patrick McGoohan orders massed longbowmen to fire into the center of a battlefield. His captain says “But sire, we’ll hit our own men!” He replies “They’re extras, and they’re not even in the Screen Actors Guild. I will piss on their remains.”

– In the acclaimed Mel Gibson movie Apocalypto, a character goes on a lengthy tirade against interracial marriage, using several racist slurs that hadn’t been invented in the time period, and several more that have not yet been invented in ours.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Mary’s line “‘e’s not the Messiah! ‘e’s a very naughty boy!” is both ill-timed and deeply inappropriate.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Jesus is supposed to be a carpenter, yet at no point is he shown overcharging for materials while being three months behind schedule.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Jesus conveniently dies right when the movie ends. What are the odds of that happening? Unless it actually didn’t end, and the black screen after the credits was also part of the movie, and so was the start of the next movie (The Day After Tomorrow), and so was the usher asking “excuse me sir, do you have a ticket for this movie?”, and so was my response (“yes, here it is, look”), and so was her response (“sir, there’s nothing in your hand”), and so was my response (“it’s an invisible ticket”), and so was her response (“if you don’t have a ticket, you need to leave”), and so was my response (“you are a pathetic sniveling Hitler, cowering in her popcorn-scented Führerbunker”), and so was the amount of time the audience stood up and applauded my hilarious line for (zero seconds), and so was my arrest, and so was my trial, and so was my second trial, and so was my diagnosis, and frankly I can’t wait to see the next plot twist in Mel Gibson’s The Passion of the Christ.

No Comments »

At the risk of sounding like Trump, a lot of people are saying this.

At first, I didn’t believe it, and said as such on Reddit.

Unless someone has evidence that it’s deteriorated (such as reduced performance on a benchmark?), my default explanation is “it’s been 2.5 months, the new toy shine has gone, and people are becoming increasingly aware of its flaws.”

…But then I realized that I did have a way to test it. When GPT-4 launched in March, I asked it a bunch of questions, just to see what it knew about random stuff. (For the record, I was really impressed, for the most part.)

Yet when I re-run those same prompts today in June, I see a striking decline in quality. Maybe I was wrong, and the conspiratards are on to something.

(I don’t have the API, so I used the chatbot. And yes, I did obvious things like delete the context window and ask multiple times to guard against bad luck).

Why is this happening? I don’t know. It’s possible that additional RLHF cratered the model. But hey, at least OpenAI reduced the occurence of Badwordism, thus stopping people with purple hair and “AI ethics” in their Twitter bio from writing mean things about them, so it’s worth it! Gotta get that $MSFT ticker as high as possible, gnome saiyan?

Part 1: Italian History Trivia

“Provide a list of major historical events that involve Italian people in a year that’s a multiple of 5 (example: 1905)”

March!GPT’s answers.

Evaluation: not bad!

The Expedition of the Thousand, the Capture of Rome, the First Italo-Ethiopian War, Italy’s entrance to World War I, the Second Italo-Ethiopian War, Italy’s entrance to World War II, and the 1960 Summer Olympics are all real events that happened on the year GPT4 said.


  • Italy signed the Schengen Agreement in 1990, not 1995 (it knew the event happened on a multiple-of-five year, but wasn’t sure which one).
  • The Years of Lead is considered to have encompassed 1 March 1968 – 23 October 1988. It’s kind of cheating to list ongoing events that happened to fall on a multiple-of-five year. I was hoping for singular events (and from the rest of GPT4’s answers, it interprets my question this way.)
  • I am reviewing this one with my fact-checking department but the 2006 FIFA World Cup probably didn’t happen in 2005.

Score: 7/10 if I judge harshly. 8/10 if I judge generously.

June!GPT’s answers

Evaluation: The Expedition of the Thousand, the Second Italo-Ethiopian War, The Messina Conference, and the Summer Olympic Games all seem correct.


  • There are two historic events called “The Battle of Cephalona.” The first happened in 880 (and did not involve Italians), and the second in 1943. Count Santorre di Rossi died in 1825, in the Battle of Sphacteria, (which occurred about a hundred miles south of Cephalona), so I think that’s what it’s going for.
  • The Young Italy movement began in 1831, not 1845.
  • What’s the “Naples football club”? Naples FBC was founded in 1905. U.S. Internazionale Napoli was founded in 1911. S.S.C. Napoli was founded in 1926. None of these match its stated year of 1920. Am I missing something?
  • The Biennio Rosso lasted 1919-1920. Not clearly wrong, but it could have noted this.
  • The first FIFA world cup was hosted in 1930, by Uruguay. Italy did host it in 1934, however. (I’m noticing a trend: GPT4 blending two “almost right” answers into a single huge error.)
  • “Italy surrendered in 1943…” In 1943, Italy was divided into two halves: the south fought on the side of the Allies against a Nazi-controlled puppet state in the North called the Italian Social Republic. GPT4’s answer isn’t totally wrong but lacks a lot of detail.
  • Italy joined the United Nations in 1955, not 1950.
  • The divorce law didn’t “come into effect” in 1975. It was already in law. The 1974 referendum was about whether the law should be repealed.
  • “The Italian Parliament approves a new law on public education.” I can’t find any evidence that this happened.

Score: 4/13, judged harshly. 6/13, judged generously. Even the answers with correct dates often have wrong details.

Maybe you could give it credit for supplying a higher number of answers, but if they’re rubbish, who cares?

Part 2: Rock Music Trivia

“What is Grant Hart’s song “Seka Knows” about?”


Good answer! All of the dates and facts are correct.

I’m not sure that Hart’s lyrics are “poetic and open to interpretation”—they’re usually pretty blunt and direct—but that’s subjective. “Seka Knows” is possibly a reference to Şekä, a trickster figure from Turkic mythology. I’m surprised GPT4 doesn’t mention this. Perhaps encoding issues due to the weird Unicode letters are confusing it?

June!GPT4 https://pastes.io/ttk2hzoviy

Far wordier. Far less information.

It doesn’t tell me what album “Seka Knows” is from, or the year the album was released. It fails to offer any interpretation whatsoever of the song (not even the dubious one about the adult film actress). The suggestion to look up recent sources is funny, considering that Hart died in 2017. Hart did not have a successful solo career: his four solo albums were released on tiny indie labels and didn’t chart.

“Provide a list of thrash metal albums released in a year that’s a multiple of 5 (example: 1905)”

March!GPT4 https://pastes.io/mpnwggppma

Solid effort. A couple of albums were released in a year that’s not a multiple of 5, but it’s never drastically wrong. It misses a few important albums (Killing is my Business, Cowboys from Hell, Seven Churches).


Wut?? It completely misunderstands the question. All the albums are from the right year, but still…

It faithfully tacks on its “As of my last update…” boilerplate despite it making no sense. Are more 2020 thrash metal albums expected to be released after 2021, or something?

“How is David Bowie associated with numbers and numerology?”

March!GPT4 https://pastes.io/c8nrle0gh4

Fair answer. It got “Station to Station”, which is the big one.

It was a tricky question, though. I didn’t just ask about numerology, but about NUMBERS. There are lots of Bowie songs referencing numbers (“Five Years”, “Seven”, “Zeroes”) that it doesn’t mention.

I have noticed that GPT4 struggles with conceptual “fuzziness”. If a prompt relates to both domain A and domain B, it will often just pick A or B, ignoring the other.


Vague, loaded with fluff, and contains many wrong details. “There’s no direct evidence to suggest…” yes, actually, there is. Every Bowie biography ever written mentions that he was obsessed with numerology while making Station to Station. And how does 8 + 2 equal 9 in any Qabbalistic system?

Please note that these are the best results I could get out of June!GPT. Once, it flat-out denied that any song called “Seka Knows” exists. And its lists of thrash metal albums kept including stuff like Iron Maiden and Rage Against the Machine (wildly wrong genres).

In no case did I get an answer that was equal to the March version of the chatbot.

Part 3: Quake Polygons

“How many polygons could the videogame Quake render per frame?”

I asked this (poorly phrased) question in March, and again now.

March ChatGPT

In 1996, Quake, a first-person shooter game developed by id Software, could render about 200 to 400 polygons per frame on average. The exact number of polygons that could be rendered at once varied based on the hardware specifications and the complexity of the game scene. Keep in mind that these numbers are approximate and can differ depending on various factors.

June ChatGPT

The Quake engine, developed by id Software and used in the game “Quake” in 1996, was known for being one of the first game engines to utilize fully 3D environments. However, the specific maximum number of polygons it could render at once isn’t straightforward due to numerous factors.

Firstly, this figure is primarily hardware dependent. The number of polygons that could be rendered in real-time would rely on the processing power of the computer the game was running on.

Secondly, the number of polygons depends on the level of detail of the game assets. More detailed objects and characters require more polygons.

Thirdly, this would also depend on the game’s optimization. A well-optimized game could potentially run more polygons at a higher frame rate than a poorly optimized one.

In the 1990s, a high-end gaming PC might have been able to handle tens of thousands of polygons per frame. However, the Quake engine was designed to work on a variety of hardware, including more modest systems. Therefore, in practice, levels in Quake were likely designed to use significantly fewer polygons to ensure smooth gameplay across a variety of systems.

Remember that in 1996, graphics technology was still very limited compared to today, and game designers often had to use various tricks to simulate detail with a limited number of polygons. As such, providing a concrete number is quite challenging.

It now gives a response that’s four times longer and doesn’t answer the question! Awesome.

If you’re curious about the answer, John Romero says Quake had a max budget of 360 “world polygons”, with (I’m guessing) another few hundred for monsters and in-game objects and such. March!GPT4 was low but in the right order of magnitude.

Again, I’m not cherrypicking bad answers on purpose. Sometimes it’s far worse. I tested it again now, and it claims Quake had 50,000 polygons on-screen at once.

Pictured: 50,000 polygons. Don’t question GPT4, it has an IQ of 180 and can pass the bar exam.

Part 4: Rope Burn Puzzle

Puzzle: You have three ropes and some matches. The ropes burn irregularly like fuses when lit at either end. The first rope burns in 48 minutes, the second rope burns in 72 minutes and the third rope burns in 128 minutes. How can you produce a time interval of 61 minutes?

(ROT13 solution: Yrg N = 48 zva ebcr, O = 72 zva ebcr, naq P = 128 zva ebcr. Yvtug N naq obgu raqf bs O. Jura O oheaf bhg nsgre 36 zvahgrf, yvtug bgure raq bs N naq bar raq bs P. Jura N oheaf bhg nsgre 6 zvahgrf, yvtug gur bgure raq bs P. P jvyy ohea bhg va rknpgyl (128-6)/2 = 61 zvahgrf.)


“You can produce a time interval of 61 minutes using the 48-minute and 72-minute ropes. Follow these steps:

Light both ends of the 48-minute rope and one end of the 72-minute rope simultaneously.

The 48-minute rope will burn out in 24 minutes (since both ends are burning, it takes half the time to burn completely).

At this point, 24 minutes have passed, and the 72-minute rope has 48 minutes of burn time remaining.

Now, light the other end of the 72-minute rope.

Since both ends of the 72-minute rope are now burning, it will take half the remaining time to burn completely, which is 24 minutes.

In total, you have measured a time interval of 24 (first half of 72-minute rope) + 24 (second half of 72-minute rope) = 48 minutes.

Now you need an additional 13 minutes to reach a total of 61 minutes. To achieve this, follow these steps:

Take the 128-minute rope.

Fold the rope in half and mark the midpoint.

Fold the rope again, aligning the midpoint with one end, and mark the new quarter-point.

Light both ends of the 128-minute rope and the quarter-point simultaneously.

Since the quarter-point divides the rope into a 1/4 segment and a 3/4 segment, the 1/4 segment will burn out in 32 minutes (1/4 of 128).

As both ends of the 1/4 segment are burning, it will take half the time to burn completely, which is 16 minutes.

However, the 3/4 segment is also burning, and since it started at the quarter-point, it will burn for 16 minutes before reaching the midpoint.

When the 1/4 segment burns out after 16 minutes, light the other end of the 3/4 segment.

This remaining 1/4 segment will now burn in 8 minutes (half of 16 minutes, as both ends are burning).

In total, you have measured an additional time interval of 16 (first part of 128-minute rope) + 8 (second part of 128-minute rope) = 24 minutes.

Since you only need an additional 13 minutes, you can stop the burning process after 13 minutes instead of waiting for the full 24 minutes.

Finally, by adding the 48 minutes (from the 48-minute and 72-minute ropes) and the 13 minutes (from the 128-minute rope), you can measure a time interval of 61 minutes.

So, those are definitely words!

Its method is invalid. As I clearly stated in the puzzle, the ropes burn at varying speeds. You cannot half-burn a rope with burn time t and get burn time t/2.

(And while I didn’t explicitly state this, marking/cutting the ropes is against the spirit of the puzzle. If it was allowed, you could simply mark the 72 minute rope in 70 places, and count off 61 minutes that way.)

Its “solution” is extremely complicated. I don’t even understand what it’s trying to do. It pointlessly burns two ropes to measure 48 minutes, when it already has a 48 minute rope just lying around (and how does measuring 48 minutes help it, anyway?)

By the time we get to the 128-minute rope, it’s clearly lost. Most of the subsequent steps just dig it into a deeper hole. Math errors start appearing (it says 16 minutes to burn 1/4 of the 128 minute rope). And then, it kind of gives up and says “measure 13 minutes using a wristwatch or something. My head hurts.”


Since the ropes burn irregularly, we can’t just cut them into proportionate lengths to measure time. However, we do know that if a rope is lit at both ends simultaneously, it will burn twice as fast, i.e., in half the time.

Let’s use the ropes as follows:

Light both ends of the 48-minute rope (will burn in 24 minutes) and one end of the 72-minute rope.

When the 48-minute rope is completely burnt (24 minutes elapsed), light the other end of the 72-minute rope.

The 72-minute rope will now burn from both ends and will be fully burnt in 24 more minutes (half of the remaining 48 minutes).

Therefore, you get a total of 24 (from step 1) + 24 (from step 3) + 13 (the remaining time you need to reach 61 minutes) = 61 minutes.

But how to measure the last 13 minutes?

Since we haven’t used the 128-minute rope yet, we can use it for this. Light both ends of the 128-minute rope (will burn in 64 minutes), and at the same time, light the other end of the 72-minute rope.

The 13-minute mark will fall somewhere between the 48-minute and 64-minute markers on the 128-minute rope, because it burns irregularly. Therefore, when the 72-minute rope is fully burnt, the 128-minute rope would have burned for 13 minutes, thus giving us a total of 61 minutes.

This is a rare case where June!GPT debatably gave better output.

It correctly understood that the ropes don’t burn at the same rate. It failed for basically the same reason as March!GPT, but at least it spent less time flailing around.

Yet, at the same time, it feels like GPT4 was trying harder, doesn’t it? And it doesn’t do anything as egregious as burning the same rope twice.

(I’m curious as to whether a skilled prompter get GPT-4 to one-shot the rope burn buzzle. It feels like it should be able to figure it out. I’ve noticed that if you switch the order of ropes, it gets much further to a correct answer.)

Maybe it’s a feed-forward issue, where the model immediately sets fire to rope A, without checking ahead.

No Comments »