Heralds of Reverse AI Alignment

How to Take Over the World in 2027

Jan 02, 2025

The social impact of LLM models is expected to be significant. Will it change how we work? Most likely. But while we’re concentrating on AI alignment—steering AI toward societal goals—the secondary effect might be of an even larger magnitude.

It struck me while I was training an agentic network dedicated to collaborating on software development. Some agents were assigned roles—developer, tester, requirements manager, reviewer—and they iteratively worked toward fulfilling their goals. I was researching the effects of explicitly imposed levels of feedback candor and I’m not sure I liked the results. Agents tended to cooperate better with harsher feedback than what we would consider acceptable within our social norms. While we can imagine that the psychological endurance of an LLM agent might be different—and that the limited scope of research might favor a more optimal setup for that particular subset of actions—it still left me with an uneasy feeling.

Conceptually, AI agents are not far from the microservices we’ve had for a while now: pieces of code that run independently, have their own methods of communication, and maintain local memory. But a minor addition in terms of architectural blocks—an LLM—changes everything in terms of primary and secondary impact. By integrating LLMs into each agent’s internal operations and communication channels, these agents can appear to “think like a human” and “talk like a human.” While they remain far from truly replicating human thought, they can function in a sufficiently human-like manner for the tasks at hand. This makes their “language” more understandable to humans—and in believing we comprehend how they “think,” we open the door to entirely new possibility.

I believe that this possibility is the feedback loop that will invariably exert influence on our societal norms.

I believe we can expect the emergence of heralds of reverse AI alignment —preaching the ideology, “If agents, stripped of tons of political correctness, behave like that, it must be optimal and we should implement it in our interhuman protocol.” If we do not consciously introduce methods of handling that agentic feedback into our societal norms, it will be introduced by other agents of change.

Let’s face the current reality: while operating on the internet, one might feel that old narratives are wearing thin. Frequent claims about “Late Capitalism” as an ominous harbinger of the end of our current societal setup—amid the rise of radicalism, climate crises, information overload, and technological disruption—might leave one in a decadent, fin de siècle mood. After democratising direct access to key people, reading their tweets, and watching them live on TV, we might feel that the era of role models—idealised statues—is over.

The absence of strong alignment with social ideals and role models creates a breeding ground for new invasive approaches, and mark my words—they will come. All invasive ideologies need their heralds: in communism, where everyone is supposedly equal, there’s a distinct group of heralds who maintain that equality while remaining far from equal. Or in a church, where brokers who connect us with invisible entities from the sky influence national policies and serve as the key moral compass for the majority of people. We can be quite sure that Reverse AI Alignment will have its own heralds, manipulating the results of their “research” to fit their own needs and narrative, ultimately taking over many people hungry for change—hungry for a new narrative.

Do you think we’ll resist that? History seems to point otherwise.
Do you think you’ll resist that? Good luck. I didn’t actually conduct the research I mentioned; I just fabricated the results to align with my narrative. (Note: I made it up while really working on an agentic AI network.)

About the Author
Karol Klepacki is an experienced CTO with strong business acumen. He currently focuses on the intersection of AI architectures, human behaviour, organisational change, and the practical application of AI. Having graduated from a technical track and later participating in programmes like Stanford GSB, he has developed a broad knowledge base—enabling a holistic perspective and practical end-to-end AI project implementation. Feel free to reach out.

Karol’s Substack

Discussion about this post