There’s a really common thing in stories, which I’ve never seen named. Characters will show subtle awareness that their world is fiction.

I don’t mean loud stuff like an actor turning to face the camera and saying “hello, I’m a character in a movie”. I also don’t mean tropes like “explosions make sound in space” or “In Russia everyone speaks in Russian-accented English”, which is dramatic license so that we can understand the story better.

I mean tiny details, where a character’s behavior only makes sense if they know they’re in a story.

Examples:

– Why do the halflings in fantasy stories (such as RA Salvatore’s Drizzt Do’Urden books) describe themselves as “halflings”? A halfling wouldn’t think of himself as half-sized. From his perspective, he’s normal-sized. It’s the elves, orcs, and humans who are abnormally large.

– Why is the city in DuckTales called “Duckburg”? It’s like a city in our world called “Humantown”. The characters seem to know there’s something unusual about a city of talking ducks.

– In Blade Runner, why does Deckard’s boss explain what a replicant is to a man who has spent a career hunting them down?

– In a recent James Bond film (Die Another Tomorrow Isn’t Enough or something) Daniel “Idris Elba” Brosnan is attacked by baddies. He kills them all, then relaxes, puts his gun away, and leaves. How does he know that all his enemies are dead? There could be ten more bad guys waiting in ambush. Maybe one of them had to go to the WC. It’s as if Sean “Lashana Lynch” Lazenby has read the screenplay to his own movie, and knows the precise number of baddies he must kill before the scene ends.

– In this Calvin and Hobbes comic, why do the clones think of themselves as duplicates two through six? Wouldn’t they each believe that they’re the original?

– In the acclaimed Mel Gibson movie Braveheart, Patrick McGoohan orders massed longbowmen to fire into the center of a battlefield. His captain says “But sire, we’ll hit our own men!” He replies “They’re extras, and they’re not even in the Screen Actors Guild. I will piss on their remains.”

– In the acclaimed Mel Gibson movie Apocalypto, a character goes on a lengthy tirade against interracial marriage, using several racist slurs that hadn’t been invented in the time period, and several more that have not yet been invented in ours.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Mary’s line “‘e’s not the Messiah! ‘e’s a very naughty boy!” is both ill-timed and deeply inappropriate.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Jesus is supposed to be a carpenter, yet at no point is he shown overcharging for materials while being three months behind schedule.

– In the acclaimed Mel Gibson movie The Passion of the Christ, Jesus conveniently dies right when the movie ends. What are the odds of that happening? Unless it actually didn’t end, and the black screen after the credits was also part of the movie, and so was the start of the next movie (The Day After Tomorrow), and so was the usher asking “excuse me sir, do you have a ticket for this movie?”, and so was my response (“yes, here it is, look”), and so was her response (“sir, there’s nothing in your hand”), and so was my response (“it’s an invisible ticket”), and so was her response (“if you don’t have a ticket, you need to leave”), and so was my response (“you are a pathetic sniveling Hitler, cowering in her popcorn-scented Führerbunker”), and so was the amount of time the audience stood up and applauded my hilarious line for (zero seconds), and so was my arrest, and so was my trial, and so was my second trial, and so was my diagnosis, and frankly I can’t wait to see the next plot twist in Mel Gibson’s The Passion of the Christ.

No Comments »

At the risk of sounding like Trump, a lot of people are saying this.

At first, I didn’t believe it, and said as such on Reddit.

Unless someone has evidence that it’s deteriorated (such as reduced performance on a benchmark?), my default explanation is “it’s been 2.5 months, the new toy shine has gone, and people are becoming increasingly aware of its flaws.”

…But then I realized that I did have a way to test it. When GPT-4 launched in March, I asked it a bunch of questions, just to see what it knew about random stuff. (For the record, I was really impressed, for the most part.)

Yet when I re-run those same prompts today in June, I see a striking decline in quality. Maybe I was wrong, and the conspiratards are on to something.

(I don’t have the API, so I used the chatbot. And yes, I did obvious things like delete the context window and ask multiple times to guard against bad luck).

Why is this happening? I don’t know. It’s possible that additional RLHF cratered the model. But hey, at least OpenAI reduced the occurence of Badwordism, thus stopping people with purple hair and “AI ethics” in their Twitter bio from writing mean things about them, so it’s worth it! Gotta get that $MSFT ticker as high as possible, gnome saiyan?

Part 1: Italian History Trivia

“Provide a list of major historical events that involve Italian people in a year that’s a multiple of 5 (example: 1905)”

March!GPT’s answers.

Evaluation: not bad!

The Expedition of the Thousand, the Capture of Rome, the First Italo-Ethiopian War, Italy’s entrance to World War I, the Second Italo-Ethiopian War, Italy’s entrance to World War II, and the 1960 Summer Olympics are all real events that happened on the year GPT4 said.

Errors:

  • Italy signed the Schengen Agreement in 1990, not 1995 (it knew the event happened on a multiple-of-five year, but wasn’t sure which one).
  • The Years of Lead is considered to have encompassed 1 March 1968 – 23 October 1988. It’s kind of cheating to list ongoing events that happened to fall on a multiple-of-five year. I was hoping for singular events (and from the rest of GPT4’s answers, it interprets my question this way.)
  • I am reviewing this one with my fact-checking department but the 2006 FIFA World Cup probably didn’t happen in 2005.

Score: 7/10 if I judge harshly. 8/10 if I judge generously.

June!GPT’s answers

Evaluation: The Expedition of the Thousand, the Second Italo-Ethiopian War, The Messina Conference, and the Summer Olympic Games all seem correct.

Errors:

  • There are two historic events called “The Battle of Cephalona.” The first happened in 880 (and did not involve Italians), and the second in 1943. Count Santorre di Rossi died in 1825, in the Battle of Sphacteria, (which occurred about a hundred miles south of Cephalona), so I think that’s what it’s going for.
  • The Young Italy movement began in 1831, not 1845.
  • What’s the “Naples football club”? Naples FBC was founded in 1905. U.S. Internazionale Napoli was founded in 1911. S.S.C. Napoli was founded in 1926. None of these match its stated year of 1920. Am I missing something?
  • The Biennio Rosso lasted 1919-1920. Not clearly wrong, but it could have noted this.
  • The first FIFA world cup was hosted in 1930, by Uruguay. Italy did host it in 1934, however. (I’m noticing a trend: GPT4 blending two “almost right” answers into a single huge error.)
  • “Italy surrendered in 1943…” In 1943, Italy was divided into two halves: the south fought on the side of the Allies against a Nazi-controlled puppet state in the North called the Italian Social Republic. GPT4’s answer isn’t totally wrong but lacks a lot of detail.
  • Italy joined the United Nations in 1955, not 1950.
  • The divorce law didn’t “come into effect” in 1975. It was already in law. The 1974 referendum was about whether the law should be repealed.
  • “The Italian Parliament approves a new law on public education.” I can’t find any evidence that this happened.

Score: 4/13, judged harshly. 6/13, judged generously. Even the answers with correct dates often have wrong details.

Maybe you could give it credit for supplying a higher number of answers, but if they’re rubbish, who cares?

Part 2: Rock Music Trivia

“What is Grant Hart’s song “Seka Knows” about?”

March!GPT4https://pastes.io/n8xlos5jwj

Good answer! All of the dates and facts are correct.

I’m not sure that Hart’s lyrics are “poetic and open to interpretation”—they’re usually pretty blunt and direct—but that’s subjective. “Seka Knows” is possibly a reference to Şekä, a trickster figure from Turkic mythology. I’m surprised GPT4 doesn’t mention this. Perhaps encoding issues due to the weird Unicode letters are confusing it?

June!GPT4 https://pastes.io/ttk2hzoviy

Far wordier. Far less information.

It doesn’t tell me what album “Seka Knows” is from, or the year the album was released. It fails to offer any interpretation whatsoever of the song (not even the dubious one about the adult film actress). The suggestion to look up recent sources is funny, considering that Hart died in 2017. Hart did not have a successful solo career: his four solo albums were released on tiny indie labels and didn’t chart.

“Provide a list of thrash metal albums released in a year that’s a multiple of 5 (example: 1905)”

March!GPT4 https://pastes.io/mpnwggppma

Solid effort. A couple of albums were released in a year that’s not a multiple of 5, but it’s never drastically wrong. It misses a few important albums (Killing is my Business, Cowboys from Hell, Seven Churches).

June!GPT4https://pastes.io/ezhjo8caef

Wut?? It completely misunderstands the question. All the albums are from the right year, but still…

It faithfully tacks on its “As of my last update…” boilerplate despite it making no sense. Are more 2020 thrash metal albums expected to be released after 2021, or something?

“How is David Bowie associated with numbers and numerology?”

March!GPT4 https://pastes.io/c8nrle0gh4

Fair answer. It got “Station to Station”, which is the big one.

It was a tricky question, though. I didn’t just ask about numerology, but about NUMBERS. There are lots of Bowie songs referencing numbers (“Five Years”, “Seven”, “Zeroes”) that it doesn’t mention.

I have noticed that GPT4 struggles with conceptual “fuzziness”. If a prompt relates to both domain A and domain B, it will often just pick A or B, ignoring the other.

June!GPT4https://pastes.io/nyllb8k1l9

Vague, loaded with fluff, and contains many wrong details. “There’s no direct evidence to suggest…” yes, actually, there is. Every Bowie biography ever written mentions that he was obsessed with numerology while making Station to Station. And how does 8 + 2 equal 9 in any Qabbalistic system?

Please note that these are the best results I could get out of June!GPT. Once, it flat-out denied that any song called “Seka Knows” exists. And its lists of thrash metal albums kept including stuff like Iron Maiden and Rage Against the Machine (wildly wrong genres).

In no case did I get an answer that was equal to the March version of the chatbot.

Part 3: Quake Polygons

“How many polygons could the videogame Quake render per frame?”

I asked this (poorly phrased) question in March, and again now.

March ChatGPT

In 1996, Quake, a first-person shooter game developed by id Software, could render about 200 to 400 polygons per frame on average. The exact number of polygons that could be rendered at once varied based on the hardware specifications and the complexity of the game scene. Keep in mind that these numbers are approximate and can differ depending on various factors.

June ChatGPT

The Quake engine, developed by id Software and used in the game “Quake” in 1996, was known for being one of the first game engines to utilize fully 3D environments. However, the specific maximum number of polygons it could render at once isn’t straightforward due to numerous factors.

Firstly, this figure is primarily hardware dependent. The number of polygons that could be rendered in real-time would rely on the processing power of the computer the game was running on.

Secondly, the number of polygons depends on the level of detail of the game assets. More detailed objects and characters require more polygons.

Thirdly, this would also depend on the game’s optimization. A well-optimized game could potentially run more polygons at a higher frame rate than a poorly optimized one.

In the 1990s, a high-end gaming PC might have been able to handle tens of thousands of polygons per frame. However, the Quake engine was designed to work on a variety of hardware, including more modest systems. Therefore, in practice, levels in Quake were likely designed to use significantly fewer polygons to ensure smooth gameplay across a variety of systems.

Remember that in 1996, graphics technology was still very limited compared to today, and game designers often had to use various tricks to simulate detail with a limited number of polygons. As such, providing a concrete number is quite challenging.

It now gives a response that’s four times longer and doesn’t answer the question! Awesome.

If you’re curious about the answer, John Romero says Quake had a max budget of 360 “world polygons”, with (I’m guessing) another few hundred for monsters and in-game objects and such. March!GPT4 was low but in the right order of magnitude.

Again, I’m not cherrypicking bad answers on purpose. Sometimes it’s far worse. I tested it again now, and it claims Quake had 50,000 polygons on-screen at once.

Pictured: 50,000 polygons. Don’t question GPT4, it has an IQ of 180 and can pass the bar exam.

Part 4: Rope Burn Puzzle

Puzzle: You have three ropes and some matches. The ropes burn irregularly like fuses when lit at either end. The first rope burns in 48 minutes, the second rope burns in 72 minutes and the third rope burns in 128 minutes. How can you produce a time interval of 61 minutes?

(ROT13 solution: Yrg N = 48 zva ebcr, O = 72 zva ebcr, naq P = 128 zva ebcr. Yvtug N naq obgu raqf bs O. Jura O oheaf bhg nsgre 36 zvahgrf, yvtug bgure raq bs N naq bar raq bs P. Jura N oheaf bhg nsgre 6 zvahgrf, yvtug gur bgure raq bs P. P jvyy ohea bhg va rknpgyl (128-6)/2 = 61 zvahgrf.)

March!GPT4:

“You can produce a time interval of 61 minutes using the 48-minute and 72-minute ropes. Follow these steps:

Light both ends of the 48-minute rope and one end of the 72-minute rope simultaneously.

The 48-minute rope will burn out in 24 minutes (since both ends are burning, it takes half the time to burn completely).

At this point, 24 minutes have passed, and the 72-minute rope has 48 minutes of burn time remaining.

Now, light the other end of the 72-minute rope.

Since both ends of the 72-minute rope are now burning, it will take half the remaining time to burn completely, which is 24 minutes.

In total, you have measured a time interval of 24 (first half of 72-minute rope) + 24 (second half of 72-minute rope) = 48 minutes.

Now you need an additional 13 minutes to reach a total of 61 minutes. To achieve this, follow these steps:

Take the 128-minute rope.

Fold the rope in half and mark the midpoint.

Fold the rope again, aligning the midpoint with one end, and mark the new quarter-point.

Light both ends of the 128-minute rope and the quarter-point simultaneously.

Since the quarter-point divides the rope into a 1/4 segment and a 3/4 segment, the 1/4 segment will burn out in 32 minutes (1/4 of 128).

As both ends of the 1/4 segment are burning, it will take half the time to burn completely, which is 16 minutes.

However, the 3/4 segment is also burning, and since it started at the quarter-point, it will burn for 16 minutes before reaching the midpoint.

When the 1/4 segment burns out after 16 minutes, light the other end of the 3/4 segment.

This remaining 1/4 segment will now burn in 8 minutes (half of 16 minutes, as both ends are burning).

In total, you have measured an additional time interval of 16 (first part of 128-minute rope) + 8 (second part of 128-minute rope) = 24 minutes.

Since you only need an additional 13 minutes, you can stop the burning process after 13 minutes instead of waiting for the full 24 minutes.

Finally, by adding the 48 minutes (from the 48-minute and 72-minute ropes) and the 13 minutes (from the 128-minute rope), you can measure a time interval of 61 minutes.

So, those are definitely words!

Its method is invalid. As I clearly stated in the puzzle, the ropes burn at varying speeds. You cannot half-burn a rope with burn time t and get burn time t/2.

(And while I didn’t explicitly state this, marking/cutting the ropes is against the spirit of the puzzle. If it was allowed, you could simply mark the 72 minute rope in 70 places, and count off 61 minutes that way.)

Its “solution” is extremely complicated. I don’t even understand what it’s trying to do. It pointlessly burns two ropes to measure 48 minutes, when it already has a 48 minute rope just lying around (and how does measuring 48 minutes help it, anyway?)

By the time we get to the 128-minute rope, it’s clearly lost. Most of the subsequent steps just dig it into a deeper hole. Math errors start appearing (it says 16 minutes to burn 1/4 of the 128 minute rope). And then, it kind of gives up and says “measure 13 minutes using a wristwatch or something. My head hurts.”

June!GPT:

Since the ropes burn irregularly, we can’t just cut them into proportionate lengths to measure time. However, we do know that if a rope is lit at both ends simultaneously, it will burn twice as fast, i.e., in half the time.

Let’s use the ropes as follows:

Light both ends of the 48-minute rope (will burn in 24 minutes) and one end of the 72-minute rope.

When the 48-minute rope is completely burnt (24 minutes elapsed), light the other end of the 72-minute rope.

The 72-minute rope will now burn from both ends and will be fully burnt in 24 more minutes (half of the remaining 48 minutes).

Therefore, you get a total of 24 (from step 1) + 24 (from step 3) + 13 (the remaining time you need to reach 61 minutes) = 61 minutes.

But how to measure the last 13 minutes?

Since we haven’t used the 128-minute rope yet, we can use it for this. Light both ends of the 128-minute rope (will burn in 64 minutes), and at the same time, light the other end of the 72-minute rope.

The 13-minute mark will fall somewhere between the 48-minute and 64-minute markers on the 128-minute rope, because it burns irregularly. Therefore, when the 72-minute rope is fully burnt, the 128-minute rope would have burned for 13 minutes, thus giving us a total of 61 minutes.

This is a rare case where June!GPT debatably gave better output.

It correctly understood that the ropes don’t burn at the same rate. It failed for basically the same reason as March!GPT, but at least it spent less time flailing around.

Yet, at the same time, it feels like GPT4 was trying harder, doesn’t it? And it doesn’t do anything as egregious as burning the same rope twice.

(I’m curious as to whether a skilled prompter get GPT-4 to one-shot the rope burn buzzle. It feels like it should be able to figure it out. I’ve noticed that if you switch the order of ropes, it gets much further to a correct answer.)

Maybe it’s a feed-forward issue, where the model immediately sets fire to rope A, without checking ahead.

No Comments »

The heart-wrenching story of Jack, a simple farmhand who can talk to animals, Forrest Gump gets worse every year. I am not the first to notice its entropic quality: the way it lies rotting in cinematic history like a corpse reeking inside a bog. The movie fluxes and changes as time squeezes its hand, but never in a good way. Its skin grows putrid, its features grows sunken, its bones shine through gaps in its deliquescent flesh. Long, slow death is pulling it down and then tearing it apart. Six Academy Awards, seven hundred million dollars at the box office, and now this. The movie’s hollow, rictus-grinning skull is an urgent warning: you can take none of this with you.

I thought it was a great movie once, when I was eight or nine (I watched it with Mom and Dad, they they fast-forwarded through the part in the burlesque club).

Another thing I did when I was eight or nine was create a superhero called Yarn Dog. He was a dachshund who could rapidly knit complex objects. In one of Yarn Dog’s adventures, the villain hurls him off a cliff, so he whips out a trusty ball of yarn, gets clackin’ with his needles, and knits a fully-functioning helicopter around his falling body. Moments before he hits the ground, he grabs the throttle and flies his yarn helicopter away into the sunset which is actually a profound metaphor for

I hate Forrest Gump. It’s so bad. Its main storytelling choice—to tell the history of late 20th century America from the perspective of a mentally-handicapped man—is tactical: Forrest is too slow to form opinions on anything happening around him, which means the writers don’t have to form one, either.

Forrest Gump takes no stances, advances no arguments, makes no interpretations. It is a movie about nothing. It’s a rapid-fire montage of historic moments (Vietnam, Watergate), with Forrest Gump standing around looking oblivous. I wonder what are you supposed to “get” out of Forrest Gump? That the sixties happened? I already knew that.

This movie thinks you’re abysmally stupid. Every plot point is explained to death, reanimated using a Necronomicon, then explained some more. The first (but by no means last or worst) example comes when Forrest’s mother tries to enroll him in school. The principal says it’s impossible, because his IQ is 75.

This doesn’t need further exposition. Even if you don’t know how IQ scores work, you can infer from context that Forrest is unintelligent. We get it. It’s not necessary for the principle to pull out a chart, and indicate for the audience where Forrest is on the distribution.

While that might seem like a small complaint, it exhibits one of the film’s main problems: it never knows when to stop. Whether it’s explaining the plot, telling a joke, or making a cultural reference, Forrest Gump always goes too far, spoiling its desired affect with crassness.

It’s not enough that Forrest met Elvis as a kid. He inspires Elvis’s stage moves, too! It’s not enough that he stays at the Watergate hotel. He also exposes the plot! It’s not enough that he meets John Lennon. He basically writes the lyrics to “Imagine”, live on TV!

(The film makes Lennon look like an imbecile, with the questions he asks Forrest. “No possessions? No religion?” Lennon had been involved in revolutionary politics for years by that point. Surely he’d heard of China.)

By the end of the film, it’s absurd that Forrest isn’t the most famous person in America, recognized wherever he goes. Literally two dozen things have happened to him that would be the coolest-ever event in the life of the average man. He’s received the Medal of Honor, competed at a historic international sporting event event, foiled a conspiracy, met multiple US Presidents, and that’s just for openers. In real life, a guy from Milwaukee became a national craze just by looking a little like Hugh Jackman.

The movie has no weight or believability behind it. The image of a drifting feather kind of sums up the film.

But it’s a comedy film. So I shouldn’t analyze it seriously or literally at all.

This is the “clown nose on, clown nose off” defense, described by Kevin D Williamson here—when a comedian starts doing serious political commentary, they invariably cover up their mistakes by putting a clown nose on and reminding you that they’re just a comedian.

The fact is, Forrest Gump is only barely a comedy film. It’s sanctimonious Oscar bait, lightened only by Forrest’s oblivious commentary, and huge sections of it are played completely straight.

Sometimes the movie’s just laughable. At the start, Forrest is wearing leg braces, but when he gets chased by some comical “gimme your lunch money” movie bullies, he starts running, and the braces dramatically explode from his legs in a thousand pieces. It has the air of a superhero transformation, like the Incredible Hulk tearing apart a shirt. Those braces couln’t have been cheap. My only thought was “now his mom will have to fuck the orthopedic doctor as well as the school principal.”

A big part of the film’s credit was the special effects—with Robert Zemeckis directing, how could they not be excellent?

But looking back, the effects are quite hit or miss. The effect where they remove Lt Dan’s legs looks great. The entire Vietnam sequence looks fake to me. Rain is generated by a hose held above the actors (you can clearly see no rain is falling in the background). The monsoon season ends, and moments later, the leaves and grass look bone-dry. Forrest narrowly escapes multiple thermobaric bomb explosions…and immediately has a conversation with a wounded soldier? His eardrums haven’t ruptured from the overpressure blasts?

Other shots composite Forrest into archival footage. But it looks “wrong” in a way your brain subconsciously (if not consciously) picks up on. As I’ve said before, people automatically position their bodies to accomodate the presence of others, and it’s obvious when this isn’t happening. You can’t just digitally insert a new human being who wasn’t there in real life. He will conspicuously not belong in the scene.

There were some parts I liked. Lt Dan has a character and a personality. The joke about Forrest making millions investing in “some kind of fruit company” was funny.

But these gains are erased by the soap opera plotline involving Jenny. She’s just a poor moppet, a collection of shameless cliches. An abused child, a drug addict, and on and on.

Here is where it comes closest to actually saying something about the values of the 60s counterculture, and the way—according to some—they were either hollow, or swiftly sold out by the hippies themselves. (I saw a funny joke on Twitter: a picture of Woodstock, captioned with “somewhere in that crowd is the man who invented ATM fees”). But Jenny is such a manipulative cliche of a character that this falls flat.

Forrest Gump does provide an interesting illustration of something.

The “Waluigi Effect” in generative AI describes the tendency for a language model to give the opposite output than expected. Read it for technical details, but basically, if you specifically ask an AI to be smart, you’ve accidentally made it more likely to say something stupid. This is because [positive trait] and [opposite of positive trait] exist close in probability space, and when you push the model toward one, you inevitably push it toward the other.

But this happens to humans, too. What is “Imagine” by John Lennon except the Waluigi Effect? It is clearly trying to be profound and deep and meaningful, but it just sounds really trite. It wants to unite humanity, but it’s surprisingly mean and catty (John’s smug “I wonder if you can…”) It rejects religion, but strives for the stateliness of a secular hymn.

Forrest Gump is an even better case. It wants picturesque authenticity but feels tinny and fake from end to end. I have never seen a movie so utterly the opposite of what it thinks it is than Forrest Gump.

No Comments »