COAGULOPATH

I read Stephen King’s Lisey’s Story when I was young.... | News | Coagulopath

I read Stephen King’s Lisey’s Story when I was young. I didn’t get much out of it. The incessant baby-talk (“smucking”, “bad gunky”) felt stickily tiresome, like wading through a saliva-splattered ballpit. The pacing was languid; the plot mushy and oversentimental. It felt like a personal work written for Tabitha Spruce King, with me as an outsider, unwanted and begrudged, sitting at their table and being resented for it.

I read it again now and enjoyed it more. It’s not as inaccessible as I thought. I can better see what King was trying to do. It takes themes he explored at thirty, and lets the (different, dimmer) light of seventy gleam over their cracks and hollows.

The plot basically combines Misery (one of his more successful books) with Rose Madder (one of his less successful). From Misery, we get the idea of an psychopath fan who’s obsessed with a famous writer. The twist here is that the famous writer (Scott Landon) is already dead, and the stalker’s rage and entitlement settles on his bereaved widow, Lisey. That adds an interesting dynamic. In Misery, Paul Sheldon at least had some power. He’s the only one who can write the Misery Chastain romance stories his captor loves, so she’s forced to keep him alive. Lisey Landon, on the other hand, is not her husband. She’s just a person who shared a bed with him. Through the world’s eyes, she’s a person-shaped mirror, a window to her husband. Mirrors cannot create; only reflect. They also cannot die; only be smashed. This heightens Lisey’s victimhood: her husband’s fans and enemies grow obsessed with her, but never actually regard her as a person.

From Rose Madder, he takes the idea of an magickal dreamworld that can be accessed using chintzy artifacts. The otherworldly land of Booya Moon (which Scott introduced Lisey her to while he was alive) is useful. Injuries heal swiftly. It might also be a good place to hide a dead man, or lose an unwanted living one. But it’s ultimately a dangerous place to be. This is because of what lives there: the long boy.

The long boy is one of King’s better inventions; one of his most direct forays (along with “N”) into Lovecraft-style horror.

It is not bound by the same rules as most of the things in Booya Moon. It can reach into the real world somehow (using glass surfaces and mirrors as portals). It has marked Scott as its prey, and has spent a long time searching for him. Occasionally he sees its face in glass, peering around and looking for him.

In the end Scott’s thing had come back for him, anyway—that thing he had sometimes glimpsed in mirrors and waterglasses, the thing with the vast piebald side. The long boy.

Long before we see it ourselves, we hear it, in a second-hand way. Scott knows the sound it makes, and imitates this for Lisey.

Scott says, “Listen, little Lisey. I’ll make how it sounds when it looks around.”
“Scott, no—you have to stop.”
He pays no attention. He draws in another of those screaming breaths, purses his wet red
lips in a tight O, and makes a low, incredibly nasty chuffing noise. It drives a fine spray of
blood up his clenched throat and into the sweltering air.

[..]

“I could . . . call it that way,” he whispers. “It would come. You’d be . . . rid of my . . . everlasting . . . quack.”
She understands that he means it, and for a moment (surely it is the power of his eyes) she
believes it’s true. He will make the sound again, only a little louder, and in some other world
the long boy, that lord of sleepless nights, will turn its unspeakable hungry head.

Later (or earlier, in a flashback), Scott is stranded in Booya Moon, and Lisey travels there to rescue him. Here, she briefly sees the long boy in the unflesh.

“Shhhh, Lisey,” Scott whispers. His lips are so close they tickle the cup of her ear. “For
your life and mine, now you must be still.”
It’s Scott’s long boy. She doesn’t need him to tell her. For years she has sensed its presence
at the back of her life, like something glimpsed in a mirror from the corner of the eye. Or, say,
a nasty secret hidden in the cellar. Now the secret is out. In gaps between the trees to her left,
sliding at what seems like express-train speed, is a great high river of meat. It is mostly
smooth, but in places there are dark spots or craters that might be moles or even, she supposes
(she does not want to suppose and cannot help it) skin cancers. Her mind starts to visualize
some sort of gigantic worm, then freezes. The thing over there behind those trees is no worm,
and whatever it is, it’s sentient, because she can feel it thinking. Its thoughts aren’t human,
aren’t in the least comprehensible, but there is a terrible fascination in their very alienness . . .

“A great high river of meat” is a vivid phrase. Stephen King should consider writing more words. He can be quite good at them.

But she finally sees the long boy’s face—or mouth, at least—near the end.

Then there’s movement from her right, not far from where Dooley is thrashing about and trying to haul himself upward. It is vast movement. For a moment the dark and fearsomely sad thoughts which inhabit her mind grow even sadder and darker; Lisey thinks they will either kill her or drive her insane. Then
they shift in a slightly different direction, and as they do, the thing over there just beyond the
trees also shifts. There’s the complicated sound of breaking foliage, the snapping and tearing
of trees and underbrush. Then, and suddenly, it’s there. Scott’s long boy. And she understands
that once you have seen the long boy, past and future become only dreams. Once you have
seen the long boy, there is only, oh dear Jesus, there is only a single moment of now drawn
out like an agonizing note that never ends.
What she saw was an enormous plated side like cracked snakeskin. It came bulging
through the trees, bending some and snapping others, seeming to pass right through a couple
of the biggest. That was impossible, of course, but the impression never faded. There was no
smell but there was an unpleasant sound, a chuffing, somehow gutty sound, and then its
patchwork head appeared, taller than the trees and blotting out the sky. Lisey saw an eye,
dead yet aware, black as wellwater and as wide as a sinkhole, peering through the foliage. She
saw an opening in the meat of its vast questing blunt head and intuited that the things it took
in through that vast straw of flesh did not precisely die but lived and screamed . . . lived and
screamed . . . lived and screamed.
She herself could not scream. She was incapable of any noise at all. She took two steps
backward, steps that felt weirdly calm to her. The spade, its silver bowl once more dripping
with the blood of an insane man, fell from her fingers and landed on the path. She thought, It
sees me . . . and my life will never truly be mine again. It won’t let it be mine.
For a moment it reared, a shapeless, endless thing with patches of hair growing in random
clumps from its damp and heaving slicks of flesh, its great and dully avid eye upon her. The
dying pink of the day and the waxing silver glow of moonlight lit the rest of what still lay
snakelike in the shrubbery.

At the end of the book, the long boy becomes aware of Lisey Landon. She starts seeing it peering in mirrors, uncoiling muddily at the bottoms of glasses, just as Scott did. (Emphasis mine)

“Looks a little like dried blood,” Mike said, and finished his iced tea. The sun, hazy and hot, ran across the surface of his glass, and for a moment an eye seemed to peer out of it at Lisey. When he set it down, she had to restrain an urge to snatch it and hide it behind the plastic pitcher with the other one. […] They both laughed. Lisey thought hers sounded almost as natural as his. She didn’t look at his glass. She didn’t think about the long boy that was now her long boy. She thought about nothing but the long boy.

Like the madman stalking her throughout the story, perhaps the long boy has marked her as a substitute for her husband. The man I truly want is dead and gone…but in his place, you’ll do.

I wonder where King got the idea for the long boy?

Worms as symbols of corruption and decay are too common to be worth discussing at any length. A mindworm or mindsnake is a more specific image, though.

Yes, the brain kind looks like a kind of worm, coiled around and around inside the skull, slippery and wet. Perhaps the metaphor extends further. In the 60s, it was actually believed that planarium worms could encode memories in their bodies, and transfer them to new bodies. In the late fifties, James McConnell of UMich conducted experiments that appeared to show that memory transfer via cannibalism was possible in planarian flatworms.

Chop a worm into three pieces. All three pieces will regrow into new worms, and each of those worms will have the same brain, including (supposedly) the same memories. Do worms store memories outside their brains, somehow? DNA and RNA are fairly informationally dense—the haploid genome of a human being encodes about ~720mb of uncompressed data—and other chemicals and proteins can also encode things. This, as I understand, is fairly well-accepted science.

McConnell apparently figured out something weirder. He used a painful electric jolt to train worms to contract their bodies upon exposure to light. Then, he chopped them to pieces, fed the body parts to cannibalistic worms called Dugesia dorotocephala…and they contracted their bodies to light, too! Confirmation of this has been slow in coming—this the type of science the replication crisis tragically stole from us.

(McConnell, by the way, has one of those all-timer Wikipedia pages. “McConnell was one of the targets of Theodore Kaczynski, the Unabomber. In 1985, he suffered hearing loss when a bomb, disguised as a manuscript, was opened at his house by his research assistant Nicklaus Suino.”)

King, at least, has been struck by the image of a wormlike thing that preys on the psyche and memories and trauma of its victims. Maybe there’s just something viscerally repellant about worms.

In any case, the long boy may not be a worm or a snake. The thing Lisey regards as such might just be an appendage. An adjunct to a large (perhaps vast) body, whose totality we do not see. And furthermore, that it doesn’t eat as much as capture—that its victims might still live on.

Lisey closed her own. For a moment she saw that blunt head that wasn’t a head at all but only a maw, a straw, a funnel into blackness filled with endless swirling bad-gunky. In it she still heard Jim Dooley screaming, but the sound was now thin, and mixed with other screams.

I like the long boy. It will never be as famous as Pennywise or Randall Flagg or [Insert Thinly-Veiled Metaphor for Republican Politician Here] but that’s good. The enemy of darkness is the light, and no horror creature survives too much media exposure. As the century spins on, the long boy will retain its mystery.

(Also, what happens when you chop OpenWorm into three pieces of code?)

The Fate of GPT-4o

This post is speculation + crystal balling. A change might... | News | Coagulopath

This post is speculation + crystal balling. A change might be coming.

OpenAI has spent six months rolling out updates to GPT-4o. These perform extremely well by human-preference metrics.

gpt-4o-2024-11-20, the latest endpoint, boasts a colossal 1360 ELO on Chatbot Arena, compared to the earliest GPT4-o, which scored a meager 1285. What does that mean? That blind human raters prefer gpt-4o-2024-11-20’s output 70% of the time.

I believe this is the result of aggressive human preference-hacking on OpenAI’s part, not any real advances.

Control for style, and gpt-4o-2024-11-20 drops by fifty points. It remains at #1, but only because the other preference-hacked models at the top also drop a lot.

Claude 3.5 Sonnet gains points. So do most of the older GPT-4s.

Optimizing for human preference is not a wrong thing to do, per se. So long as humans use LLMs, what they like matters. An LLM that produced output in the form of Morse code being punched into your balls would suck to use, even if it was smart.

But this is exactly why you should be careful when using Chatbot Arena to make statements about model capabilities – the top of the chart is mainly determined by style and presentation.

Benchmarks tell a different story: gpt-4o’s abilities are declining.

https://github.com/openai/simple-evals

In six months, GPT4-o’s 0-shot MMLU score has fallen from 87.2 to 85.7, which is probably similar to what GPT-4 scored on release.

(to be clear, “GPT-4” doesn’t mean “an older GPT-4o” or “GPT-4 Turbo”, but “the original broke-ass GPT-4 from March 2023, with 8k context and no tools/search/vision and Sept 2021 training data”).

I am more concerned about the collapse of GPT4-o’s score on the GPQA benchmark, which fell from 53.1 to 46.0. This is a significant drop, particularly in light of the tendency for scores to rise as data contaminates the internet. (Claude 3.5 Sonnet scores 59.4, for comparison)

Even this may be optimistic:

https://twitter.com/ArtificialAnlys/status/1859614633654616310

An independent test by Artificial Analysis (on the GPQA diamond subset) found that GPT-4o scored 39.00. They’ve downgraded the model to 71/100, or equal to GPT-4o mini (OpenAI’s free model) in capabilities.

Further benching here:

https://artificialanalysis.ai/providers/openai

Some of their findings complicate the picture I’ve just described (in particular, they have GPT4-o scoring a higher MMLU than OpenAI’s internal evals), but the bottom-line is that the new gpt-4o-2024-11-20 is the worst of its line by nearly every metric they test, except for token generation speed.

Livebench

https://livebench.ai

GPT-4o’s scores appear to be either stagnant or regressing.

gpt-4o-2024-05-13 -> 53.98
gpt-4o-2024-08-06 -> 56.03
chatgpt-4o-latest-0903 -> 54.25
gpt-4o-2024-11-20 -> 52.83

Aider Bench

https://github.com/Aider-AI/aider-swe-bench

Stagnant or regressing.

gpt-4o-2024-05-13 -> 72.9%
gpt-4o-2024-08-06 -> 71.4%
chatgpt-4o-latest-0903 -> 72.2%
gpt-4o-2024-11-20 -> 71.4%

Personal benchmarks

It doesn’t hurt to have a personal benchmark or two, relating to your own weird corner of the world. Either you’ll have a way to evaluate AIs that escapes the Goodharting suffered by large benchmarks, or OpenAI starts fine-tuning AIs on your niche use case (in which case, mission fucking accomplished.)

I like to ask LMMs to list the levels in the 1997 PC game Claw (an obscure videogame.)

Claude 3.5 Sonnet and Claude 3 Opus do great, getting about 80-90% of Claw’s levels correct.

GPT-4-0314 makes a reasonable attempt, getting about 50-75% right. Typically the first half of the game is fine, with the other levels being a mix of real and hallucinated.

(once, it listed “Wreckage” as a level in the game. That’s actually a custom level I helped make when I was 14-15. I found that weirdly moving: I’d found a shard of myself in the corpus.)

GPT-4o scores like ass: typically in the sub-50% range. It doesn’t even consistently nail how many levels are in the game. It correctly lists some levels but these are mostly out of order. It has strange fixed hallucinations. Over and over, it insists there’s a level called “Tawara Seaport”—which is a real-world port near the island of Kiribati. Not even a sensible hallucination given the context of the game.

Another prompt is “What is Ulio, in the context of Age of Empires II?”

GPT-4-0314 tells me it’s a piece of fan-made content, created by Ingo van Thiel. When I asked what year Ulio was made, it says “2002”. This is correct.

GPT-4o-2024-11-20 has no idea what I’m talking about.

To me, it looks like a lot of “deep knowledge” has vanished from the GPT-4 model. It’s now smaller and shallower and lighter, its mighty roots chipped away, its “old man strength” replaced with a cheap scaffold of (likely crappy) synthetic data.

What about creative writing? Is it better on creative writing?

Who the fuck knows. I don’t know how to measure that. Do you?

A notable attempt is EQBench, which uses Claude 3.5 as a judge to evaluate writing samples. gpt-4o-2024-11-20 is tied for first place. So that seems bullish.

https://eqbench.com/creative_writing.html

…but you’ll note that it’s tied with a 9B model, which makes me wonder about Claude 3.5 Sonnet’s judging.

https://eqbench.com/results/creative-writing-v2/gpt-4o-2024-11-20.txt

Most of these samples seem fairly mediocre to me. Uncreative, generic, packed with empty stylistic flourishes and pretentious “fine writing”.

The cockpit was a cacophony of dwindling lights and systems gasping their final breaths, a symphony of technological death rattles. Captain Elara Veyra sat in the command chair, her face illuminated by the sickly green glow of the emergency power indicator, which pulsed like the heartbeat of a dying creature. The Erebus Ascendant, once a proud envoy of humanity’s indomitable spirit, now drifted derelict and untethered in the silent abyss of interstellar void. The engines were cold, the life support systems faltering, and the ship’s AI had succumbed to cascading failures hours ago, leaving Elara alone with her thoughts, her resolve, and the unceasing hum of entropy.

A cacophony refers to sound: lights cannot form a cacaphony. How can there be an “unceasing hum” in a “silent abyss”? How does a light gasp a final breath? What is this drizzling horseshit?

This is what people who don’t read imagine good writing to be. It’s exactly what you’d expect from a model preference-hacked on the taste of people who do not have taste.

ChatGPTese is creeping back in (a problem I thought they’d fixed). “Elara”…”once a proud envoy of humanity’s indominable spirit”… “a testament to…” At least it doesn’t say “delve”.

Claude Sonnet 3.5’s own efforts feel considerably more “alive”, thoughtful, and humanlike.

https://eqbench.com/results/creative-writing-v2/claude-3-5-sonnet-20241022.txt

(Note the small details of the thermal blanket and the origami bird in “The Last Transmission”. There’s nothing really like that in GPT4-o’s stories)

So if GPT-4o is getting worse, what would that mean?

There are two options:

1) It’s unintentional. In this world, OpenAI is incompetent. They are dumpstering their model to win a leaderboard dick-measuring measuring contest against DeepMind.

2) It’s intentional. In this world, a new, better model is coming, and GPT4-o is being “right-sized” for a new position in the OA product line.

Evidence for the latter is the fact that token-generation speed has increased, which indicates they’ve actively made the model smaller.

If this is the path we’re on, I predict that GPT4-o will become a free model soon. And behind the ChatGPT Plus paywall will be something else: Orion, GPT-5, or the full o1.

AI Art Turing Test

Scott Alexander created a Turing Test for AI generated artwork.... | News | Coagulopath

Scott Alexander created a Turing Test for AI generated artwork. Begin quote:

Here are fifty pictures. Some of them (not necessarily exactly half) were made by humans; the rest are AI-generated. Please guess which are which. Please don’t download them onto your computer, zoom in, or use tools besides the naked eye. Some hints:

I’ve tried to balance type of picture / theme , so it won’t be as easy as “everything that looks like digital art is AI”.

I’ve tried to crop some pictures of both types into unusual shapes, so it won’t be as easy as “everything that’s in DALL-E’s default aspect ratio is AI”.

At the end, it will ask you which picture you’re most confident is human, which picture you’re most confident is AI, and which picture was your favorite – so try to keep track of that throughout the exercise.

All the human pictures are by specific artists who deserve credit (and all the AI pictures are by specific prompters/AI art hobbyists who also deserve credit) but I obviously can’t do that here. I’ll include full attributions on the results post later.

I got 88% correct (44/50). Here’s my attempt (or rather my imperfect memory of my attempt), and my justification for the answers I gave.

Human. Stuff like the cherubs holding her skirts (one blindfolded, one a cyborg) read like the kind of deliberative creative choice that AI never makes. The birds and feathers and hair (typical pain points for AI) are all sharp and coherent.

Human. I haven’t seen an AI-generated image like this before. Tree branches overlap believably (instead of disappearing/multiplying/changing when they occlude each other, as in AI images).

Human. Has an AI feel, but the hair strands look correct, and the frills hang believably across her shoulders. Image is loaded with intricate, distinct-yet-similar details (waves/foam/fabric/clouds/birds) that never blend into mush.

AI. Blobby oven-mitt hands. Deformed foot. Nobody in the 18th/19th century would allow their daughter to be painted wearing such a scandalous dress.

Human. Coherent and symmetrical to the smallest detail. Note the dividers on the window.

Hard call. I went with AI because of the way the blackness of the woman’s hair abruptly changes intensity when the curving line bisects it. A random, unmotivated choice that a human wouldn’t make, but a machine might.

Human. Haven’t seen any AI images like this before. The sleeping disciples are sprawled in complex but believable ways.

Human. I went back and forth, but decided the red crosshatching pattern was too coherent to be AI.

AI. I recognize the model that created this: Dall-E 3, which has a grainy, harsh, oversaturated look. Other clues are the nonsensical steps to nowhere, the symmetry errors, and the kitsch colors of the gate, which detract from the sense of ancient grandeur. Nobody would spend so much time on the details of the stonework, only to make it look like it was built out of Lego blocks.

A hard one. I guessed AI because of the way the windows/chimneys of the houses appear to be slanted. No reason a human artist would do that. (I didn’t see the nonsensical “signature” in the bottom length corner; if I had, I would have guessed Human.)

Not hard. At least 10 separate details would make me guess “AI” at 90% confidence. The hairstrands are a melty incoherent mess. The hands have froglike webbing on them. Her skirt is excessively detailed and its metallic designs lack symmetry. Her earring floats in space. It contains confused stylistic choices. The girl’s face is a simplified anime design: does that agree with the ultra-detailed skirt and the near-photorealistic water? Also, where is the light source coming from?

AI. Not sure what made me think a human hadn’t made it. Maybe the lack of doors: how do you get in and out?

Human. Many specific choices, plus the text on the temple.

heavy AI slop feel. Many occlusion errors. Deformed hands. Probably Dall-E 3 again.

I actually can’t remember how I guessed here but it’s pretty clearly AI. The face’s left eye is deformed in a random, unmotivated way. Occlusion errors. Filled with ugly, harsh artifacts. Seems like it was trying to write letters in the middle of the image and gave up.

Human. Mild slop aesthetic (pretty girl + shiny plastic skin + random sparkles/nonsense) plus heterochromia should point to AI, but the hairstrands are too coherent.

Human. Don’t know why I gave this answer. Was probably a guess.

AI. What’s the middle black rectangle in the house? It can’t be a door, because it doesn’t reach the ground. If it’s a window, why doesn’t it match the other two?

what makes this look human? I don’t know. Maybe it’s the stark, understated tree? AI would probably put dramatic explosion-esque leaves or flowers on it to match the sky.

A good example of the Midjourney slop aesthetic. It superficially invokes liturgic religious iconography, yet Mary wears a full face of Instathot makeup and has detailed veins on her hands. Very weird, inhuman choices. What’s the little nub of flesh poking between her right thumb and index finger?

Midjourney slop. Occlusion errors on the hairstrands. It tried to give her an earring but didn’t finish the job, leaving a strange malformed icicle hanging off her ear.

Human. This was a bit of a cheat because I’d seen the image before: it’s a render by Michael Stuart and was my desktop background for a while. If I hadn’t seen it, I would still have said “human” based on the complex, coherent rigging and ratlines.

AI. Mangled hands. Man wears two or three belts at once, and the holes don’t make sense. His loincloth tore in a weird way, giving him a Lara Croft-esque thigh holster strap. An interesting example of stylistic blend: it’s going for a Caravaggio-esque painting, but mistakenly puts all sorts of painterly details into the setting itself: notice how the man’s left hand appears to be holding a pen or a brush, and how the windowsill is a gilded painting frame.

AI. Looks like a Midjourney image from 2 years ago.

AI. Got it wrong. I thought “what are the odds that two such similar looking pictures back to back would both be by humans.

AI. The cafe has seemingly hundreds of chairs and tables, some of which are overlapping or inside one another. Why are plants growing inside the window frames?

Guessed AI because of the numerous mistakes in the reflections. It wasn’t. Come on, man.

So sloppy that Oliver Twist is holding up a bowl and asking for more.

Human. I was rushing, and becoming careless. Seems obvious it’s AI in hindsight, when you see how messy the fruit is.

Human. The musculature of the man’s torso is consistent with how human bodies were portrayed by late-medieval artists (not all of them could pilfer corpses), but inconsistent with AI, which is mode-collapsed into anatomical accuracy (see the painting of the Blessed Virgin Mother above).

Human. It’s unlike AI attempts I’ve seen at this style. The signature looks real.

I think this is by Dall-E 3. It loves excessive pillars.

Human, but I don’t know what makes it human. I haven’t seen AI images like it.

If AI slop was a videogame, this picture would be the final boss.

Messy, but packed with deliberate human choices and intent. Characters interact with each other in complex ways. Branches and trees look correct.

AI. Reflection errors in the water. The tree roots on the left look wrong.

More slop. Surface of the landing craft is just random shit (and its feet are asymmetrical). Right astronaut has a weird proboscis sprouting from his helmet.

Human. Complex interactions between positive and negative space. The cutouts are chaotic yet have a congruent internal logic. I’ve never seen an AI image like it. Some of the cutouts have torn white edges—a human error.

Human. Difficult to say. I think AI images generally either have a clear subject or a clear focal point.

AI. Another hard one. What swayed me was the second door on the left-side building. It seems to exist off the edge of the land.

We have such slop to show you. Cute big-eyed robot + staring directly at viewer + meaningless graffiti sprays + meaningless planets and circles to take up space = Midjourney.

Concluding Thoughts

You’re in a desert walking along in the sand when all of a sudden you look down, and you see a tortoise, it’s crawling toward you. You reach down, you flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs, trying to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?

Older posts