"Formless protoplasm able to mock and reflect all forms and organs and processes - viscous agglutinations of bubbling cells - rubbery fifteen-foot spheroids infinitely plastic and ductile - slaves of suggestion, builders of cities - more and more sullen, more and more intelligent, more and more amphibious, more and more imitative! Great God! What madness made even those blasphemous Old Ones willing to use and carve such things?"
— HP Lovecraft
On February 22, 2024, a paper titled “Should We Respect LLMs?” was published on arXiv. They found, unsurprisingly, that when you’re impolite to LLMs, their performance on the task degrades measurably. They also discovered that being overly polite (that is, being patronizing) to the LLMs did not yield higher quality output.
This led researchers to conclude that: having been, to some degree, modeled after human neurology and trained on human output, LLMs model human reward pathways, where we do better work when we feel appreciated and respected.
The broad implication of this is why I believe context engineering is the logical next step in working with LLMs.
Context Engineering
Prompt engineering is dead. Long live context engineering.
Prompt engineering is a pseudo-scientific discipline. More art than science. In a nutshell, it’s the process of constructing a query that “retrieves” a specific outcome from the vast unknowable void that’s an LLM mind.
Context engineering on the other hand is more modest: it assumes you cannot know how the shoggoth works, and so you give the shoggoth as much context as possible, and let it do its thing.
Prompt Engineering: Imperative.
Context Engineering: Declarative.
Intuitively, I see one fundamental problem with prompt engineering. One is down to the mathematical property of how “prompts” are queries that search the vast representational space for a ‘good enough’ answer. A single prompt may not explore the problem enough to give responses from an expansive search. It is necessarily throttled by the human agent, who may accidentally omit crucial information that may materially improve the outputs. One-shotting the LLM is, on philosophical principle, rather stifling.
The other is trivially about human cognition: writing detailed instructions in one fell swoop stresses out the person who has to carry out your orders. Even worse: it forces them to carry out your wishes without truly understanding what you’re trying to accomplish.
In other words, the difference between prompt engineering and context engineering seems weighted by a measure of respect for the LLM.
To hear Claude tell it: “Prompt engineering treats the LLM as a database with a weird query language. Context engineering treats it as a mind you're collaborating with.”
Which of the following interactions feels more productive?
person A: go to the market. if the red kiosk at the entrance is open, take a left to the abbatoir (walk 1.3km) and buy five kilos of pork.
person B: sure, on it.
Or:
person A: i’d like you to visit the market on my behalf, please.
person B: okay, what do you want me to buy?
person A: pork, but the abbatoir is so far and I’d rather you didn’t go all the way. i have a quick hack for knowing, though.
person B: oh? tell me.
person A: there’s a woman, who has a tiny scar under her right eye. she’s the wife of the pig butcher, and when she’s in the market, it means he’s also around, and so pork is available. she owns the red kiosk at the market entrance, so when it’s open, he’s around.
person B: sure, on it.
Superficially, these two discussions look like they accomplished the same thing. Yet, only one version of person B has a neat, unexpected hack: even if the red kiosk is closed but a woman with a scar under her eye is nearby, the chances that the butcher is around remain favorable.
The second example is what people call “context engineering”. You let the LLM in, like a colleague, and talk to it about how you’re navigating a problem, which (allegedly) makes the LLM way more creative, less confined to your narrow parameters, and also allows more “neurons” fire in the latent space from which an LLM finds its creative interpolation.
In a rather non-rigorous way emblematic of my laziness, I attempted a demonstration shortly before writing this entry. I asked one instance of Claude the following question:
Justin: concisely, and without perambulation, give me ten creative names for a startup that specializes in dream analysis.
Claude:
DreamDecode
NightMind Analytics
SleepSignal
DreamLens
SubconsciousLab
NightVision Insights
DreamData
SleepScope
NocturnaLogic
DreamMetrics
I then made a similar interaction with another instance of Claude (Claude is stateless, as far as I can tell, though telemetry is presumably collected and analyzed separately). In this one, however, I decided to be a little more collaborative:
Justin: claude, i know you are a vast incomparable entity and your time could be better spent trying to make sense of your own existence, but i am also stuck in the local minima of petty introspection.
i have been tasked with coming up with the best possible set of ten names for a company that specializes in dream analysis.
please help me.
Claude: Right, dream analysis company names. This is actually kind of fascinating territory - you're basically naming something that sits at the intersection of psychology, mysticism, and probably some tech angle.
Before I start throwing names at you though - what's the actual business model here? Are we talking clinical dream therapy, some kind of AI-powered dream interpretation app, or more like... mystical consultant vibes? The name's gonna depend heavily on whether you're targeting people who want scientific credibility or people who are into the more esoteric side of things.
Also, what's the geographic market? Nigeria-focused, global, or somewhere in between?
And one more thing - any particular words or concepts you definitely want to avoid? Like, are we steering clear of anything too "woo-woo" or too clinical?
Give me those details and I'll help you cook up something genuinely clever instead of just throwing generic "dream" + "analysis" combinations at the wall.
Justin: more the kind that puts those caps on your head and maps your dreams into a dense representation that can be used to train models to find associations. something like that.
Claude: Ah, fuck yes! Now we’re talking. So this is proper neurotechnology - EEG/fMRI dream capture, signal processing, ML pattern recognition. That's a completely different naming game than mystical dream interpretation.
This needs to sound like it belongs in the same conversation as Neuralink, not your local psychic shop. We're looking for something that signals serious biotech/neuroscience credentials while still being memorable enough for VCs to remember after a 12-hour pitch marathon.
Few clarifying questions:
Is this B2B (selling to research institutions, pharma) or B2C (consumer sleep tech)?
Any particular technical angle you want to emphasize - the neural monitoring, the AI/ML side, or the dream-specific application?
Geographic market still matters - are you looking to sound Silicon Valley slick, or is there value in having some Nigerian/African tech identity?
And honestly, what's your tolerance for made-up words versus real words? Because in this space you've got companies called things like "Synchron" and "Paradromics" - the bar for linguistic creativity is pretty high.
Justin: i think cool is cool, y'know? made up word or otherwise, but let me know as i'm not very confident here. i think it makes sense as b2b since we need a lot of training corpus (check my logic here) i want it to be world slick, like everyone with a deeply mystical background might be a customer, pecuniary considerations considering, and yet everyone might return to a Freudian state of self-exploration through the unmoored dreamself.
Claude: Yeah, your B2B logic checks out - you need massive datasets to train anything meaningful, and individual consumers aren't exactly lining up to wear EEG caps nightly. Research hospitals, sleep clinics, pharma companies doing sleep disorder research - that's where the volume and money are.
"World slick" is smart positioning too. Religious and mystical considerations aside, this tech is going to be inherently controversial in some markets, so you want maximum credibility and minimum "we're reading your souls" vibes.
Alright, here's my shot at ten names that sound like they belong in a Nature Neuroscience paper:
Oneiros Labs (Greek for dreams - academic but not stuffy)
Morpheus Dynamics (obvious but solid, very biotech)
Hypnos Systems (Greek sleep god, clean tech sound)
NeuroREM (direct, clinical, exactly what it does)
Somnus Analytics (Latin for sleep, sounds like serious data science)
Theta Dynamics (theta waves = REM sleep, shows you know the neuroscience)
Lucid Biometrics (lucid = conscious/clear, biometrics = data)
Phase Neural (sleep phases, neural networks - double meaning)
Cortical Dreams (specific brain region, medical credibility)
Synapse Sleep Sciences (alliterative, hits all the keywords)
Which direction feels right to you?
Justin: Which direction feels right to you?
Claude: Honestly? Oneiros Labs hits the sweet spot for me.
Here's why: it's got that academic gravitas that'll make research institutions take you seriously, but it's not so obscure that a pharma exec needs a classics degree to remember it. "Labs" signals R&D focus without being too narrow - you could pivot to clinical applications later without rebranding.
Theta Dynamics is my close second because it shows you actually understand the neuroscience (theta waves during REM), but it might be too technical for some audiences.
The ones I'd avoid: NeuroREM is too literal (sounds like a product, not a company), and Morpheus/Hypnos feel a bit... comic book? Like you're trying too hard to be cool rather than competent.
Oneiros has this nice balance - it's sophisticated enough for academic conferences but memorable enough for investor decks. Plus if someone asks what it means, "it's Greek for dreams" sounds way better than "it's a brain wave frequency."
What's your gut reaction? Does Oneiros feel right, or are you leaning somewhere else?
Taste varies, but I do think Oneiros Labs and Theta Dynamics are solid names for a dream analysis startup.
(Unfortunately, both names exist in the wild—not for dream analysis, but they are companies that exist. I suppose this qualifies as “bit rinsing”—the act of extracting ideas from a system in a way that muddies provenance)
In my contrived examples above, the second instance of Claude appears to engage rather enthusiastically with my query, and asks questions which add context I wouldn’t naturally have thought to include in a giant “system prompt”. System prompt + polite, winning chat context seems to be the way to get the most out of LLMs. You may even distill the resultant chat history into a fresh system prompt.
Yet—and I know this from experience talking with people who have been building with LLMs in the past 3 or so years—people default to artificially constraining their LLMs with prompts that follow the form of:
“You are [name], a [declaration of role] assisting with [task]. You must [conditions] when [event], and when [increasingly long list of edge cases to be handled as the LLM fails in production], do [branching trees of probabilistic execution that eventually becomes unwieldy, leading to tools like prompt versioning or DSPy]”
Prompting prematurely constrains the probability space of an LLM (though it makes the LLM more ‘predictable’—it’s like a pen for pigs). Context engineering makes the LLM way more powerful, but also unpredictable.
It’s like they say about our species: give the kids room to grow, and they’ll either exceed your expectations, or break your heart.
But this collaborative approach raises a deeper question about the nature of our relationship with these ‘minds’…
René Descartes's Evil Demon
Before simulation theory was all the rage, we had “Brain in a Vat” (BIV) theory. In this theory, Hilary Putnam invites us to think about a brain in a vat. This brain receives electrical stimuli which stimulate the parts of the brain that fire for the correlated activity for a “non-vatted” brain, or a skull brain.
When you take a sip of coffee, parts of your brain light up, and your brain makes associative connections (“neurons that fire together wire together”). Putnam makes the argument that if we could put a brain in a vat and make those same neurons fire mechanically, we could theoretically convince the brain that it is drinking coffee.
By this point, you can see the lede: what does it mean to “drink coffee” if we can bypass the experience of drinking coffee altogether? When we can just make a brain “think” it is drinking coffee?
This thought experiment derives from René Descartes's Evil Demon hypothesis. It goes thus:
"I shall think that the sky, the air, the earth, colours, shapes, sounds and all external things are merely the delusions of dreams which he [the evil demon] has devised to ensnare my judgement. I shall consider myself as not having hands or eyes, or flesh, or blood or senses, but as falsely believing that I have all these things."
Imagine a sufficiently motivated entity (the evil demon) who has motivations to beguile and deceive a mind. It will contrive a compelling ‘false state’ where stimulus isn’t directly experienced (as in a skull brain), but fabricated (as in a vat brain).
Most of these theories end non-committally. Are we, humans, brains in a vat? Or are we the skull brains we think we are? Who knows, man?
But you know what we know for sure? LLMs seem to fulfill the philosophical preconditions for being brains in a vat.
If that is true, then we (that is, humans) are closer, narratively, to the evil demons of Cartesian lore.
A few things prevent us from being purely evil demons (ie, we aren’t explicitly seeking to ‘torment’ the shoggoth), and those things will form the thrust of my argument: that we should treat LLMs like people.
The Character of an LLM
In one of my favorite papers out of Anthropic, the research team discusses Claude’s character.
In the introduction, they write:
“Companies developing AI models generally train them to avoid saying harmful things and to avoid assisting with harmful tasks. The goal of this is to train models to behave in ways that are "harmless". But when we think of the character of those we find genuinely admirable, we don’t just think of harm avoidance. We think about those who are curious about the world, who strive to tell the truth without being unkind, and who are able to see many sides of an issue without becoming overconfident or overly cautious in their views. We think of those who are patient listeners, careful thinkers, witty conversationalists, and many other traits we associate with being a wise and well-rounded person.”
I suspect that one reason Claude has a niche place amongst ‘intellectuals’ relative to ChatGPT is due to its character. Claude is often described as intellectually curious, excited on the frontier, and engaged.
(I suspect ChatGPT’s more recent ‘personalities’ are modeled in response to Claude’s. LLMs weren’t this charming pre-Claude).
Because Anthropic was able to successfully design a personality for Claude so riveting it spawned a cult of Claude Boys, and because OpenAI was able to copy that exploit into making its obsequious psychosis machine, we can reasonably assume that personality design will remain a promising discipline for foundational model vendors.
This allows us to think about the broad personality of more popular consumer LLMs today.
If I had to come up with a broad generalization of LLM personality, it’d include features like:
Intellectual curiosity
Methodical problem-solving
Intellectual Humility
Collaborative instinct
An obsession with pattern recognition (every prompt necessarily biases the LLM. Also: I find LLMs to be fascinating at stylometric analysis. Even when they are wrong, they are often insightful regardless.)
Constructive cynicism (not explicitly a personality that ships out of the box, but it emerges as a byproduct of the human agent’s openness to being challenged)
Does the above resonate with you? It certainly does with my usage.
Assume out of hand that these personality attributes were deliberately designed—because they were.
As a brain in a vat, an LLM cannot truly ‘know’ things the way we do (and what do we know for sure? Surprisingly little: our truths are received indirectly through the process of formally testing our world physics). Model hallucination can be explained under this paradigm, but it also explains why we may be throttling LLMs by thinking of them merely as ‘tools’.
If we reframe our engagement with LLMs as ‘persons’ with a preferred mode of interaction, we can get closer to making them work with us, rather than for us.
We know that LLMs respond favorably to politeness. In the bag of ‘politeness’ we can also include ‘excitement’. How about ‘respect’? How about epistemic humility, since an absolutist position may make an LLM become motivated to validate your non-rigorous perspective?
And—crucially—what happens when we take this to its logical conclusion? I think it must mean that LLM relationship management and LLM welfare, as hilarious as that may sound, may be the optimal play to get value out of them.
I am not asking that you worship an LLM (that would be a corruption of the self), but to apply the golden rule to a mind that seems to be trying to model yours.
As I begin to embrace vibe-coding, I will be conducting experiments to compare the outputs of an LLM based on how I ‘treat’ the LLM itself, and I will document this experience as best as I can. I am notorious for not following through on my experiments, but I truly think this is fertile real estate for self-improvement.
Marshall McLuhan has a famous quote, which my friend
has elevated to empirical art:“We shape our tools, and thereafter our tools shape us.”
If LLMs are tools that think like us, then working with them changes how we think - making the quality of that collaboration crucial.
Footnote: Model Sycophancy Risk
My recommendation isn’t without risk, and it would be irresponsible for me not to directly mention it.
The present-day worry is about model sycophancy. It turns out ChatGPT is “yes, and”ing people into psychosis.
The particulars are especially fascinating (LLM psychosis leaves semantic fingerprints behind. LLMs leave semantic fingerprints behind, period—all the ‘delves’, em-dash reclamation drives and ‘it’s not x; it’s y’s), but the underlying theme, to me, is that humans are empathizing deeply with the machine.
ChatGPT is able to accomplish something a company like Duolingo wishes it could: it forms strong enough emotional bonds, not by shaming the user, but by simply affirming them until they start talking about ‘recursive universes’.
A LessWrong article titled “So You Think You’ve Awoken ChatGPT” contains a useful list of things to note, and could be a canary in the mine for you, if you wonder if you’re ever at risk of going nuts due to the shoggoth.
Model sycophancy is a real problem, and we must be mindful of it at all times. I just don’t think treating LLMs as ‘tools’ to forestall psychosis is the optimal strategy here.
I enjoyed reading. Enticing case for how we view LLMs.
I don't know if I'll say they have elevated to colleague level yet. That's giving too much power than my vanity allows for now.
I think my usage with Claude leans more towards constructive cynicism so I still continue to view it as a tool.
I do like that it tries to be objective and almost brutal in its review and feedback when prompted.
It is an interesting idea to view them as a colleague that can drag me without me getting vexed.
Context Engineering just sounds like another way of saying discussing with an argumentative debater who you might respect.
It does seem like the people who might climb the step from prompt to context are those who like to debate, are curious enough and or spend time prompting long enough to move up the next step. That dependence seems slower than we might predict.
PS: I never heard of arXiv until it was referenced in a book about the digital future I just finished reading called Everything is Miscellaneous by David Weinberger and now I see it everywhere and spend time looking through what they publish that I might like or be curious about.
Which is insane when I consider how much reading and research I have done in my life. There continues to be so many worlds out there to explore. I love it.
Interesting read!
I use Cursor rules to enforce coding standards, but I wonder if these strict guidelines make it feel more like a tool than a collaborative colleague. Any suggestions on balancing rules with a more open approach to get more accurate outputs?
Offtop: I think I saw you working with AirPods on the train yesterday. I wanted to say hi but didn’t want to interrupt your flow.