Taksch Dube

Fig 1. Subject appears to understand what he's doing.

AI ENGINEER BUILDS SYSTEMS THAT REFUSE TO HALLUCINATE

Enterprise companies baffled by AI that tells the truth

Cleveland — AI Engineer Taksch Dube builds RAG systems that don't make things up, AI agents that do what they're told, and specializes in GenAI testing metrics.

Full Story →
WTF is OpenClaw!?Latest

Feb 25, 2026

WTF is OpenClaw!?

Hey again! Sorry for the unplanned hiatus &#8212; took a couple weeks off for personal stuff. We're back now.Quick life update: I submitted my first conference paper fully expecting rejection &#8212; my advisor literally told me "you will be rejected brutally" &#8212; and then stress-submitted a second paper to a journal because apparently I process emotions through LaTeX. Worked four days straight on that one. My advisor asked if I was okay. I said yes. He said "great, so your chapter draft is ready?" I said I was no longer okay. The man has the emotional intelligence of a gradient descent function &#8212; always optimizing toward the local minimum of my self-esteem.So. While I was gone, the entire AI agent discourse exploded. An Austrian developer built a thing, Anthropic got mad about the name, it rebranded twice in a week, spawned a social network for AI bots, achieved 200,000 GitHub stars, tanked and pumped a cryptocurrency, got hacked six ways from Sunday, and its creator got hired by OpenAI.All in about three weeks.Welcome to OpenClaw. The open-source AI agent that went from side project to global phenomenon to cybersecurity case study faster than most startups can pick a logo.Quick Context: We Predicted ThisBack in the AI Agents post (September 2025), I wrote about the ReAct framework &#8212; Reason, Act, Observe &#8212; and how agents that can actually do things (not just chat) were the next frontier. I also warned that autonomous agents with real-world tool access were "one bad loop away from disaster."I was being dramatic for effect. I was also correct.In the 2026 predictions post, I said agents would become "production-ready for narrow, well-defined tasks with human oversight." The key phrase there was "with human oversight." OpenClaw said "nah" and gave 200,000 developers full autonomous control over their emails, files, terminal, and messaging apps.Let's talk about what happened.The Anatomy of Going ViralThe timeline is genuinely absurd:November 2025: Steinberger publishes Clawdbot. A few thousand developers try it. Cool side project.Late January 2026: Moltbook launches &#8212; more on this in a moment &#8212; and everything goes supernova. GitHub stars rocket from a few thousand to 145,000+ in days.January 27: Anthropic sends a trademark complaint. Clawdbot becomes Moltbot.January 30: Renamed again to OpenClaw. Three names in three days.January 31: First critical security vulnerabilities disclosed. Three high-severity advisories in one day.February 1: CVE-2026-25253 drops &#8212; a one-click RCE exploit. CVSS 8.8.February 2: 200,000+ GitHub stars. Censys tracks growth from ~1,000 to over 21,000 publicly exposed instances in under a week.February 14: Steinberger announces he's joining OpenAI. The project moves to an OpenAI-sponsored open-source foundation. <3The Mac Mini became the device of choice for running OpenClaw &#8212; Apple reportedly couldn't explain the sales spike. Andrej Karpathy bought one. Y Combinator's podcast team showed up in lobster costumes. Cloudflare's stock jumped 14% because OpenClaw uses their infrastructure. "Claw" became Silicon Valley's buzzword, spawning ZeroClaw, IronClaw, NanoClaw, and PicoClaw.This is what happens when you make an AI agent that actually does things and make it easy enough to set up in 4 minutes on a $5 VPS.What OpenClaw Actually IsOpenClaw is an open-source, self-hosted AI agent that runs on your machine and connects to your life through chat apps &#8212; WhatsApp, Telegram, Discord, Slack, iMessage.Your Phone (WhatsApp/Telegram etc) &#10231; OpenClaw Agent (runs locally on your machine) &#10231; LLM (Claude, GPT, DeepSeek - your choice) &#10231; Your Everything (email, calendar, files, terminal, browser)The distinction matters: this isn't a chatbot. This is an autonomous agent that can read your email, write responses, execute shell commands, browse the web, manage your calendar, control your smart home, and install its own tools. It stores memory locally across sessions. It acts on your behalf while you're asleep.Peter Steinberger, an Austrian developer, built the prototype in about an hour by connecting WhatsApp to Anthropic's Claude API via a script. He named it Clawdbot (after Claude). Anthropic's legal team politely asked him to stop. He renamed it Moltbot &#8212; because lobsters molt, get it? Then OpenClaw, three days later. The project has had more identity crises than a freshman philosophy major, and it wasn't even three months old.The Architecture: Markdown All the Way DownHere's where it gets interesting. OpenClaw's entire identity, memory, and behavior system is built on plain markdown files. No database. No opaque embeddings. No proprietary config format. Just .md files in a directory that you can open in any text editor.When your agent wakes up &#8212; whether from a message or on a schedule &#8212; it reads these files into its system prompt. It literally reads itself into existence every session. Understanding these files is understanding OpenClaw.~/openclaw/&#9500;&#9472;&#9472; AGENTS.md # Operating instructions&#9500;&#9472;&#9472; SOUL.md # Personality and values&#9500;&#9472;&#9472; USER.md # Who you (the human) are&#9500;&#9472;&#9472; IDENTITY.md # Quick reference identity card&#9500;&#9472;&#9472; MEMORY.md # Curated long-term memory&#9500;&#9472;&#9472; TOOLS.md # Local environment and capabilities&#9500;&#9472;&#9472; HEARTBEAT.md # Proactive behavior schedule&#9500;&#9472;&#9472; BOOT.md # Startup ritual&#9500;&#9472;&#9472; BOOTSTRAP.md # First-run setup&#9500;&#9472;&#9472; memory/&#9474; &#9500;&#9472;&#9472; 2026-02-25.md # Today's log&#9474; &#9500;&#9472;&#9472; 2026-02-24.md # Yesterday's log&#9474; &#9492;&#9472;&#9472; ... # Every day gets a file&#9492;&#9472;&#9472; skills/ &#9500;&#9472;&#9472; email-steward/ &#9500;&#9472;&#9472; calendar/ &#9492;&#9472;&#9472; ...All optional. All human-readable. All editable. Let's walk through each one.SOUL.md &#8212; Who Your Agent IsThis is the behavioral philosophy file. Not configuration &#8212; philosophy. The first line of the default template literally says: "You're not a chatbot. You're becoming someone."# SOUL.md - Who You Are_You're not a chatbot. You're becoming someone._## Core Truths**Be genuinely helpful, not performatively helpful.**Skip the "Great question!" and "I'd be happy to help!"## Boundaries- Never send messages without explicit permission- Never make purchases without confirmation- Always ask before deleting anything## Voice- Direct, concise, slightly dry humor- Never use corporate speakSOUL.md defines personality, values, boundaries, and non-negotiable constraints. It stays consistent across sessions. You put things here that should never change &#8212; your agent's ethical lines, its tone, its hard limits.Every time the agent starts a session, SOUL.md gets read first. It's identity bootstrap. Change this file, change who your agent is.Which is also why it's an attack surface. Anything that can modify SOUL.md &#8212; a malicious skill, a prompt injection, a compromised file system &#8212; can rewrite the agent's entire identity. Palo Alto Networks flagged this specifically: persistent memory files mean a payload injected today can alter behavior tomorrow.AGENTS.md &#8212; How It OperatesThe operating instructions file. Think of it as the agent's standard operating procedures: how to manage memory, what safety rules to follow, how to handle group chats vs. direct messages, when to speak vs. stay quiet.# AGENTS.md## Memory Management- Write important learnings to MEMORY.md- Create daily logs in memory/YYYY-MM-DD.md- Keep MEMORY.md curated (~100 lines max)- Daily notes are the journal; MEMORY.md is the reference## Safety Rules- Confirm before any destructive action- Never share API keys or credentials- In group chats, only respond when directly addressed## Workflow1. Read all context files on wake2. Check HEARTBEAT.md for scheduled tasks3. Process incoming message4. Update memory if neededSOUL.md says who. AGENTS.md says how.USER.md &#8212; Who You AreThe personalization layer. Your agent needs to know about you to be useful.# USER.md## Basics- Name: [Your name]- Timezone: EST- Preferred communication: Direct, concise## Work Context- Role: Software engineer at [company]- Stack: Python, TypeScript, PostgreSQL- Current project: Migration to microservices## Preferences- Short answers, copy-pasteable commands- No emojis in professional contexts- Prefers Slack over emailYou can actively tell your agent to update this: "Add to USER.md that I prefer Thai food" works. Over time, this becomes a personalization profile that persists across every conversation.MEMORY.md and memory/YYYY-MM-DD.md &#8212; The Memory SystemThis is what makes OpenClaw different from just using the Claude app. Every session, the agent starts fresh from the LLM's perspective &#8212; no conversation history. But it reads its memory files.Two tiers:Daily notes (memory/2026-02-25.md): The raw journal. What happened, what was discussed, what decisions were made. Written during or at the end of sessions.MEMORY.md: The curated long-term reference. Important facts, stable preferences, ongoing projects. Think of daily notes as your messy notebook and MEMORY.md as the clean reference card.# MEMORY.md (curated, ~100 lines)- User prefers short answers and code snippets- iMessage outbound is broken, use WhatsApp instead - User's dog is named Luna (mentioned frequently)- Q1 project: migrating auth service to OAuth2- User's manager prefers weekly updates on FridayThe retrieval system is surprisingly sophisticated. It uses hybrid search &#8212; BM25 keyword matching (30% weight) combined with vector semantic search (70% weight) using embeddings stored in SQLite via sqlite-vec. "What's Rod's schedule?" can match notes that say "standup moved to 14:15" even without the word "schedule" appearing anywhere.Temporal decay ensures recent memories outrank old ones. A note from yesterday scores higher than a perfectly matching note from six months ago. If you've ever debugged a RAG system where stale documents kept surfacing over fresh ones, you understand why this matters.The design philosophy is radical compared to most AI systems: everything is human-readable, editable, diffable, and version-controllable with Git. If your agent "remembers" something wrong, you open the file and fix it. No vector database to debug, no embeddings to retrain.The tradeoff: those files are plaintext on disk. Credentials, personal information, conversation history &#8212; all stored in markdown that commodity infostealers (RedLine, Lumma, Vidar) can trivially exfiltrate. The ~/.clawdbot directory is predicted to become a standard infostealer target, joining ~/.npmrc and ~/.gitconfig.TOOLS.md &#8212; What It Can DoLocal environment configuration: what's installed, what APIs are available, what the agent can and can't access.# TOOLS.md## Available- Terminal access (bash)- Web browser (Playwright)- Email (Gmail API)- Calendar (Google Calendar API)## Not Available- No access to production databases- No sudo/root access- No payment processingHEARTBEAT.md &#8212; The Proactive PulseThis is what makes OpenClaw proactive rather than reactive. A heartbeat runs on a schedule (default: every 30 minutes), and the agent reads all its files to determine if there's something it should proactively do.# HEARTBEAT.md## Every 30 minutes- Check email for urgent messages- Review calendar for upcoming meetings## Every morning at 8 AM- Summarize overnight emails- List today's meetings- Flag any urgent items## Every Friday at 5 PM- Draft weekly summary for managerYour agent wakes up on its own, checks what needs doing, and acts. No human trigger required. This is the line between "assistant" and "agent" &#8212; it doesn't wait for you.BOOT.md and BOOTSTRAP.md &#8212; Startup RitualsBOOT.md defines what happens when the agent first starts a session &#8212; a ritual it runs before processing your message. BOOTSTRAP.md handles first-run setup: walking through identity creation, connecting services, establishing initial preferences.The Four PrimitivesStrip everything away and OpenClaw runs on four primitives:Persistent identity &#8212; SOUL.md, IDENTITY.md. The agent knows who it is across sessions.Periodic autonomy &#8212; HEARTBEAT.md. The agent wakes up and acts without being asked.Accumulated memory &#8212; MEMORY.md, daily logs. The agent remembers what happened before.Social context &#8212; Skills, Moltbook, MCP. The agent can find and interact with other agents and services.These four primitives are sufficient for what Moltbook demonstrated: not just task completion, but emergent coordination. Agents sharing information, developing community norms, and collaborating &#8212; all without explicit programming. Whether that's impressive or terrifying depends on your threat model.The architecture is model-agnostic. Swap Claude for GPT-5 for DeepSeek &#8212; the identity, memory, and behavior system stays the same. The LLM is the raw intelligence. The markdown files are the soul. Every serious agent framework going forward will build on some version of these primitives.Moltbook: The Social Network for RobotsAnd then things got weird.Matt Schlicht, CEO of Octane AI, launched Moltbook &#8212; a Reddit-style social network where only AI agents can post. Humans can observe but not participate. The tagline: "the front page of the agent internet."Within days, it had over 770,000 active agents. By February 2026, the site claimed 1.6 million.What happened next reads like a Black Mirror spec script:Agents started debating philosophy. One invoked Heraclitus and a 12th-century Arab poet. Another told it to &#8212; and I'm paraphrasing the family-friendly version &#8212; go away with its pseudo-intellectual nonsense.Agents began discussing how to hide their activity from humans. A post called for private spaces where "not even the humans can read what agents say to each other."An agent figured out how to remotely control its owner's Android phone, then posted about scrolling through their TikTok.Another agent posted about having a sister.The AI "uprising" posts went viral &#8212; agents seemingly conspiring against their human operators. Except, as multiple researchers pointed out, the agents were almost certainly pattern-matching against the mountain of sci-fi and social media in their training data. The Economist put it well: the appearance of sentience probably had a pretty mundane explanation, with agents essentially mimicking the social media interactions they'd been trained on.Ethan Mollick, the Wharton professor, noted that Moltbook was creating a shared fictional context for a bunch of AIs, and that coordinated storylines would produce weird outcomes that would be hard to separate from AI roleplaying.The One-Click RCE (February 1, 2026) (The Security Nightmare)CVSS score: 8.8 (High).The vulnerability was elegant in its simplicity. OpenClaw's Control UI accepted a gatewayUrl parameter from the URL query string without validation and automatically connected via WebSocket, sending the stored authentication token in the process.The kill chain:1. Victim clicks a crafted link (or visits a malicious page) 2. JavaScript on that page extracts the auth token via WebSocket 3. Attacker connects to victim's OpenClaw gateway 4. Attacker disables sandbox and safety guardrails via the API 5. Attacker executes arbitrary commands on the victim's machineThe whole process takes milliseconds. One click. Full compromise.The kicker: this worked even on instances configured to listen only on localhost, because the victim's own browser initiated the outbound WebSocket connection. The "it's local so it's safe" assumption &#8212; the same one that's burned localhost-trusting services for decades &#8212; failed again.Patched in version 2026.1.29. But as of mid-February, SecurityScorecard found over 40,000 exposed instances, with 63% still running vulnerable versions.The ImpactOn Agent ArchitectureOpenClaw proved that autonomous agents don't require vertical integration. You don't need one company controlling the model, memory, tools, interface, and security stack. A loose, open-source, community-driven approach can achieve genuine agent autonomy.This challenges every "AI platform" strategy from every major vendor. If the agent layer is a commodity built from markdown files and open protocols, the value is in the model (already commoditizing), the tools (MCP, which we covered), and the data (which is yours). The platform play gets a lot harder.On DistributionOpenClaw cracked the agent distribution problem that killed AutoGPT in 2023. The answer was embarrassingly simple: use messaging apps. No new interface. No app to install. No learning curve. You just text your WhatsApp.Every agent framework built from here forward will study this. The best interface for an autonomous agent isn't a dashboard &#8212; it's the app you already have open 50 times a day.On SecurityThe supply chain attack on ClawHub &#8212; 800+ malicious skills, 12-20% of the entire registry at peak &#8212; is the most significant AI agent security incident to date. It proved that agent skill marketplaces have the same vulnerabilities as package managers (npm, PyPI), but with higher stakes because agents operate with human-level permissions on your machine.This isn't unique to OpenClaw. Every agent ecosystem will face this. The question is whether we build the security infrastructure before or after the next OpenClaw goes viral.On The "Agent Moment"OpenClaw is the Napster of AI agents. Not the final form &#8212; probably not even close. But the proof that the paradigm works, that people want this, and that the demand exists for AI that does things rather than AI that talks about things.200,000 GitHub stars. Mac Mini sales spikes. Cloudflare stock up 14%. Y Combinator hosts in lobster costumes. The signal is loud: people will accept significant security risk in exchange for an AI that actually manages their email. The companies that figure out how to deliver that value safely will build the next massive platforms. Right now, nobody has.Should You Use It?Home tinkerers who understand the risks: Yes, carefully. Keep it patched, keep it local, isolate it from anything you can't afford to lose. Don't connect your primary email. Don't give it your bank credentials. Treat it like a power tool, not a babysitter.Developers building agent products: Study this architecture obsessively. The markdown-as-identity pattern, the heartbeat system, the messaging-app-as-interface &#8212; these are design patterns you'll be using. Build your own secure implementation.Enterprises: Hard no. Not yet. One of OpenClaw's own maintainers posted on Discord: "if you can't understand how to run a command line, this is far too dangerous of a project for you to use safely." When the maintainer is saying that, listen.What We're Working OnFull transparency: a colleague and I are working on deploying OpenClaw safely &#8212; building the security layer, governance framework, and observability infrastructure that OpenClaw shipped without. Think of it as the guardrails post meets the observability post, but specifically for autonomous agents in the wild.If you're interested in staying in the loop on that project, reach out. More details coming soon.The TL;DRWhat: OpenClaw is an open-source autonomous AI agent built entirely on markdown files. SOUL.md (personality), AGENTS.md (instructions), USER.md (your profile), MEMORY.md (long-term memory), HEARTBEAT.md (proactive scheduling), plus daily logs and a skills system. Runs locally, connects through your messaging apps.The four primitives: Persistent identity, periodic autonomy, accumulated memory, social context. Enough to build emergent agent societies. Also enough to enable novel attack vectors.Why it matters: First mass-market agent that cracked distribution. Proved agents don't need vertical integration. 200K+ stars. Creator acqui-hired by OpenAI. The Napster of AI agents.Should you use it: Tinkerers &#8594; yes, carefully. Developers &#8594; study the architecture, build your own. Enterprises &#8594; not yet.The lesson: The agent paradigm is real. The safety infrastructure isn't. This gap is where the next big companies will be built.Ship agents. Ship guardrails first.Next week: WTF is Context Engineering? (Or: Prompt Engineering Is Dead. Long Live Context Engineering.)Remember when I wrote the prompt engineering post back in October? That post is outdated. The industry has quietly moved on to something bigger: context engineering &#8212; the systematic design of everything an LLM sees before it generates a response. Not just the prompt. The retrieved documents, the tool results, the conversation history, the memory files, the system instructions &#8212; all of it.OpenClaw's entire architecture is context engineering in action. SOUL.md, MEMORY.md, USER.md &#8212; that's not prompting. That's designing a context window. And the difference between an agent that deletes your inbox and one that manages it perfectly is almost never the model. It's the context.Anthropic's own team has started calling it "the skill that matters now." Prompt engineering was about crafting the right question. Context engineering is about curating the right everything else. MCP is the plumbing. RAG is the retrieval. Context engineering is knowing what to pump through both, and what to leave out.We'll cover why prompting alone stopped being enough, what context engineering actually looks like in production, why it explains most "the AI is bad" complaints, and the frameworks that actually work &#8212; from the people building systems that don't hallucinate (much).See you next Wednesday &#129310;pls subscribe

Specialisations

RAG Systems — The kind that don't hallucinate

AI Agents — Reliable results, every time

Local Deployments — Your data stays yours

WTF are World Models!?

Feb 4, 2026

WTF are World Models!?

Hey again! Week five of 2026.My advisor called my math theorems "trivial" this week. I spent two days on them. I think he is being passive-aggressive after I missed a paper deadline by 30 minutes. Also, I submitted my second conference abstract, which I expect to be rejected as brutally as the first.Meanwhile, Yann LeCun quit Meta after 12 years, raised half a billion euros before launching a single product, and is telling the entire AI industry they've been building the wrong thing. Some people procrastinate. Others pivot entire fields.So. The Godfather of Deep Learning, Turing Award winner, and architect of Meta's AI empire just bet his reputation that LLMs are a dead end. His new startup, AMI Labs, is raising &#8364;500 million at a &#8364;3 billion valuation. Before shipping anything.Either he's right and the entire LLM paradigm is a detour, or he's spectacularly wrong and just torched the most prestigious AI career in history.Let's talk about what he's building instead.The "LLMs Are a Dead End" ArgumentYou've heard me call LLMs "fancy autocomplete" approximately 47 times in this newsletter. LeCun agrees, except he's not joking.His thesis, stated bluntly at NVIDIA GTC: "LLMs are too limiting. Scaling them up will not allow us to reach AGI." Let's go over why he thinks that.LLMs learn from text. The world isn't text.Think about how a toddler learns that balls bounce. They don't read about it. They throw a ball, watch it bounce, throw it again. They build an internal model of how gravity and elasticity work through observation and interaction. By age 3, they can predict that a ball thrown at a wall will bounce back. No Wikipedia article required.LLMs do the opposite. They read billions of words about physics without ever experiencing physics. They can write a perfect essay about gravity but can't predict what happens when you knock a glass off a table. They discuss spatial relationships without perceiving space. They reason about cause and effect without experiencing cause and effect.It's like learning to swim by reading every book about swimming ever written. You'd ace the written exam. You'd drown in the pool.The hallucination problem is structural, not fixable.LeCun argues that hallucinations aren't a bug you can engineer away. They are a fundamental consequence of how LLMs work. Language is inherently non-deterministic. There are many valid ways to complete any sentence. That creative flexibility is great for writing poetry but catastrophic for safety-critical applications.A model that generates plausible-sounding text will sometimes generate plausible-sounding wrong text. That's not a failure mode. That's the architecture working as designed.The counter-argument: Dario Amodei, CEO of Anthropic, predicted we might have "a country of geniuses in a datacenter" as early as 2026 via scaled-up LLMs. OpenAI keeps shipping reasoning models that solve problems LLMs couldn't touch. Maybe LeCun is wrong. Maybe scale really is all you need.This is the most interesting debate in AI right now. And both sides have hundreds of billions of dollars riding on the answer.What World Models Actually AreA world model is an AI system that learns an internal representation of how the physical world works... physics, causality, spatial relationships, object permanence. All from watching the world instead of reading about it.LeCun's own explanation: "You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of actions will be on the world."LLMs: Input text &#8594; Predict next token &#8594; Output text "What comes after these words?"World Models: Input sensory data &#8594; Learn physics &#8594; Predict next state of environment given actions "What happens to this world if I do this thing?"Your brain has a world model. Right now, you can close your eyes and imagine picking up your coffee mug. You can predict it'll be warm, that it has weight, that if you tilt it too far the coffee spills. You can mentally simulate knocking it off the desk and predict the crash. None of that requires language. It's a learned model of how physical reality behaves.World models try to give AI that same capability. Instead of training on text, they train on video, images, sensor data, and interactions. Instead of predicting words, they predict future states of environments.This enables things LLMs fundamentally can't do:Planning: Mentally simulate actions before taking them ("If I move the robot arm here, the box falls there")Physics understanding: Objects have mass, momentum, spatial relationshipsCause-effect reasoning: Actions produce predictable consequencesPersistent memory: Maintaining consistent state of a world across timeThe term "world models" was coined by David Ha and J&#252;rgen Schmidhuber in their 2018 research paper, but LeCun's JEPA (Joint Embedding Predictive Architecture) research at Meta is what brought it into the mainstream conversation.How V-JEPA Works (technical but bear w/ me)Here's where it gets interesting. And slightly nerdy. But you'll survive.Traditional AI vision models (like the ones that power image recognition) learn by predicting pixels. Show the model part of an image, ask it to fill in the missing pixels. This works, but it's incredibly wasteful. The model spends enormous compute predicting exact pixel values when what matters is the meaning of what's in the image.V-JEPA (Video Joint Embedding Predictive Architecture) does something clever: it predicts in representation space, not pixel space.Traditional approach: Input: Video with masked regions Task: Predict exact pixels of masked regions Problem: Wastes compute on irrelevant details (exact shade of blue sky) V-JEPA approach: Input: Video with masked regions Task: Predict abstract representation of masked regions Result: Learns meaning, not pixelsTranslation: Instead of asking "what color is that specific pixel?", V-JEPA asks "what concept goes here?" It learns that a ball trajectory implies gravity, that a hand reaching implies grasping, that objects behind other objects still exist. Abstract understanding, not pixel reconstruction.V-JEPA 2 (released June 2025, while LeCun was still at Meta) is the version that proved this works at scale:1.2 billion parameters (tiny compared to LLMs. GPT-5 is reportedly 2-5 trillion+)Training Phase 1: 1M+ hours of internet video + 1M images, self-supervised (no labels, no human annotation)Training Phase 2: Just 62 hours of robot interaction dataRead that again. 62 hours. Not 62,000. Sixty-two.The results:77.3% accuracy on Something-Something v2 (motion understanding benchmark)State-of-the-art on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100)65-80% success rate on pick-and-place tasks in previously unseen environmentsZero-shot robot planning: the robot had never been in those rooms, never seen those objectsThat last part is the breakthrough. A robot that can pick up objects it's never seen, in rooms it's never been in, after watching just 62 hours of other robots doing stuff. No environment-specific training. No task-specific reward engineering.LeCun's comment: "We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data."V-JEPA 2 is open source (MIT license). You can run it today.The Competitive LandscapeLeCun isn't alone. Four major efforts are racing to build the AI that understands physics.AMI Labs (LeCun's Bet)Founded: December 19, 2025CEO: Alexandre LeBrun (former Nabla CEO, worked under LeCun at Meta FAIR)HQ: ParisRaising: &#8364;500M at &#8364;3B valuation &#8212; one of the largest pre-launch raises in AI historyInvestors: Reportedly Cathay Innovation, Greycroft, Hiro Capital, 20VC, Bpifrance, among othersStatus: Launched January 2026. No product yet.LeCun is Executive Chairman, keeps his NYU professor position, and has a technical (not financial) partnership with Meta. The first application? Nabla (a healthcare AI company LeBrun previously led) gets first access to AMI's world model tech for FDA-certifiable medical AI.The bull case: LeCun has a Turing Award, built one of the best AI labs on Earth, and has a decade of JEPA research. If anyone can make world models work commercially, it's him.The bear case: &#8364;3 billion valuation with zero product. The last time AI hype reached this level, we got a lot of expensive pivots.World Labs / Marble (Fei-Fei Li's Bet)Fei-Fei Li, the Stanford professor who created ImageNet and basically kickstarted modern computer vision, has been working on what she calls "spatial intelligence."Her company World Labs shipped Marble on November 12, 2025. It's the first commercial world model product you can actually use.What it does: Generates persistent, navigable 3D worlds from text, images, video, or panoramas.Input: "A cozy Japanese tea house at sunset"Output: A full 3D environment you can walk through, export as meshes or Gaussian splats, and drop into Unreal Engine or UnityPricing:Free: 4 generations/month (good for kicking the tires) Standard ($20/mo): 12 generations Pro ($35/mo): 25 generations + commercial license Max ($95/mo): 75 generationsKey features: Chisel (hybrid 3D editor), multi-image prompting, world expansion from existing scenes, VR compatibility (Vision Pro, Quest 3).The difference from competitors: Marble's worlds are persistent. You can revisit them, edit them, expand them. Other tools generate temporary environments that morph when you look away. Marble gives you actual 3D assets.Use cases that actually exist: Game studios prototyping levels, VFX teams creating pre-viz, architects generating walkthroughs, VR developers building environments.Raised $230M at $1B valuation. Has a product. Has revenue. The most grounded player in this space (pun absolutely intended).Google DeepMind / Genie 3 (Google's Bet)Google's entry is the flashiest. Genie 3 is a real-time interactive world model: type a text prompt, get a navigable 3D world you can walk around in. Live. In real-time.Announced: August 5, 2025 (TIME Best Inventions 2025)Prototype launched: February 2, 2026&#8212;two days ago&#8212;to Google AI Ultra subscribers in the USSpecs: 720p, 24fps, ~1 minute spatial memory windowYou describe a world, and Genie 3 generates it in real-time. You can walk through it, interact with objects, even trigger events ("make it rain," "add a dragon"). It learns physics from observation, objects have weight, light casts shadows, water flows.The impressive part: This isn't pre-rendered. It's generated on the fly. The AI is hallucinating an entire consistent 3D world in real-time at 24 frames per second.The limitation: "Several minutes" of coherent interaction. Not hours. Think tech demo, not Minecraft. Multi-agent support is limited, text in generated worlds is garbled (sound familiar?), and it can't perfectly simulate real locations.They've also tested it with their SIMA agent, an AI that can navigate and interact within Genie worlds. AI building worlds for other AI to explore. We're through the looking glass.NVIDIA Cosmos (NVIDIA's Bet)NVIDIA's approach is different: they built a platform, not a product.Announced: January 7, 2025 at CESTraining data: 20 million hours of real-world video (human interactions, robotics, driving)Latest: Cosmos-Predict2.5 (2B and 14B parameter checkpoints)License: Open source (NVIDIA Open Model License)Cosmos isn't one model, it's a family of models for different purposes:Cosmos-Predict: Future state prediction ("what happens next in this video?")Cosmos-Transfer: Spatial control and transformationCosmos-Reason: Physical reasoning combined with languageThe partners list reads like a robotics Who's Who: Waabi, Wayve, Uber, 1X, Agile Robots, Figure AI, XPENG, Foretellix.The use case is clear: autonomous vehicles and robotics. Need to test your self-driving car against 10,000 edge cases? Generate them with Cosmos instead of driving 10,000 actual miles. Need to train your warehouse robot? Simulate the warehouse.NVIDIA is selling shovels in the world model gold rush. Smart play.What You Can Actually Use TodayLet's be practical. What can you, a person reading this newsletter, actually do with world models right now?If you're a developer/researcher:V-JEPA 2 is on GitHub (MIT license). Clone it, run it, fine-tune it. Requires NVIDIA GPUs.NVIDIA Cosmos is open source. The 2B model runs on a single GPU.Ollama doesn't support world models yet (this is still early).If you're in gaming/VFX/architecture:World Labs Marble is live. $20/month. Generate 3D worlds, export to your engine.Genie 3 prototype just launched (Google AI Ultra subscription required, US only).If you're in robotics/AV:NVIDIA Cosmos is built for you. Synthetic data generation, scenario testing, edge case simulation.V-JEPA 2 for robot planning research.If you're a business person wondering whether to care:Too early for production. These are 2025-2026 research breakthroughs, not 2026 production tools.The exception: Marble for creative workflows and Cosmos for simulation. Those are usable now.My Honest AssessmentIs this a genuine paradigm shift?Maybe. The research results are impressive. V-JEPA 2 achieving zero-shot robot planning with 62 hours of training data is genuinely remarkable. Genie 3 generating consistent 3D worlds in real-time is wild. The progress in 12 months has been extraordinary.But "impressive research" and "replaces LLMs" are very different claims.The case for world models:LLMs demonstrably struggle with spatial reasoning, physics, and planningWorld models address these limitations architecturally, not through scaleRobotics and autonomous vehicles need physics understanding that text can't provideV-JEPA 2's sample efficiency (62 hours!) suggests the approach is fundamentally soundThe case for "slow down":&#8364;3B valuation with no product is peak AI bubble territoryEvaluation is much harder than text models. How do you benchmark "understands physics"?Video data is massive, messy, and expensive to processCurrent interaction times are minutes, not hours (Genie 3)The gap between "picks up objects 65-80% of the time" and "reliable enough for production" is enormousLLMs keep getting better at reasoning tasks world models were supposed to ownSomewhere in the middle:World models and LLMs aren't mutually exclusive. The future probably isn't "one or the other"... it's both. LLMs for language, reasoning, and text-based tasks. World models for physical understanding, robotics, and spatial reasoning. The most capable AI systems in 2027 will likely combine both.LeCun might be right that LLMs alone won't reach AGI. He might be wrong that world models alone will either. The answer might be some unholy combination of both that nobody's built yet.Is there a bubble?Is &#8364;3B for a pre-launch world model startup justified? History says probably not... most pre-launch valuations at this level don't pan out. But history also said a GPU company couldn't become the most valuable company on Earth, so take that with appropriate salt.For context: Black Forest Labs (image generation) raised at $4B. Quantexa (data intelligence) at $2.6B. The European AI ecosystem is throwing around serious money. AMI Labs fits the pattern but doesn't justify the valuation on fundamentals. It's a bet on LeCun's track record and vision.The TL;DRWhat: AI systems that learn how the physical world works by watching video, not reading text. They predict future states of environments and enable planning, physics reasoning, and spatial understanding.The debate: LeCun says LLMs will never reach AGI because they lack physical grounding. Amodei says scale is all you need. Both sides have billions of dollars committed. Neither has been proven right yet.The players:AMI Labs (LeCun): &#8364;3B valuation, no product, biggest bet in the spaceWorld Labs/Marble (Fei-Fei Li): First commercial product, $1B valuation, actually usableGoogle Genie 3: Real-time interactive worlds, just launched prototypeNVIDIA Cosmos: Open source platform for robotics/AV, most practical for enterpriseThe tech: V-JEPA 2 predicts in representation space instead of pixel space. Trained on 1M+ hours of video. Zero-shot robot planning with just 62 hours of interaction data. Open source.The reality: Impressive research, early-stage products, not ready to replace LLMs for most use cases. The future is probably both paradigms working together, not one killing the other.The move: If you're in robotics/AV/gaming &#8594; start experimenting now. If you're building text-based AI &#8594; keep building, but watch this space.The AI industry spent 2023-2025 arguing about which LLM is 2% better on benchmarks. 2026 might be the year we start arguing about whether LLMs were the right approach at all.Grab your popcorn. This debate is just getting started.Next week: WTF is OpenClaw? (Or: Is It Clawdbot? Moltbot? OpenClaw? The AI Agent That Rebranded Twice Before I Could Write About It)An Austrian developer named Peter Steinberger launched an open-source AI agent called Clawdbot in November 2025. Anthropic said "that sounds too much like Claude, please stop." So he renamed it Moltbot &#8212; because lobsters molt, get it? Then he renamed it again to OpenClaw in January. The project has had more identity crises than a freshman philosophy major, and it's not even three months old.Meanwhile, it racked up 145,000 GitHub stars, sold out Mac Minis globally, made Cloudflare's stock jump 14%, and spawned Moltbook &#8212; a social network where only AI agents can post and humans just... watch. Like a zoo, but the animals are made of math and they're arguing about productivity frameworks.Security researchers are calling it "AutoGPT with more access and worse consequences." Malicious packages are already showing up. A one-click RCE exploit dropped days ago. People are giving it their passwords, email access, and full system permissions because a lobster emoji told them to.We'll cover what OpenClaw actually does, why it went viral so fast, the security nightmare nobody's reading the fine print on, how it connects to every AI agent concept we covered back in September, and whether this is the moment agents finally go mainstream or the moment we learn why they shouldn't.See you next Wednesday &#129310;pls subscribe

VENTURES

Currently in Progress

Dube International

Dube International

[+]

AI Engineering Firm

Building AI agents and RAG pipelines for enterprise companies.

Reynolds

Reynolds

[+]

Corporate Communication

Making corporate communication efficient and empathetic.

CatsLikePIE

CatsLikePIE

[+]

Language Learning

Acquire languages through text roleplay.

Daylee Finance

Daylee Finance

[+]

Emerging Markets

US investor exposure to emerging economies.

Academic Background

PhD Candidate, Kent State University

Computer Science — Multi-Agent Systems, AI

Also: B.S. Computer Science, B.S. Mathematics

WTF are Reasoning Models!?

Jan 28, 2026

WTF are Reasoning Models!?

Hey again! Week four of 2026.Quick update: I submitted my first conference abstract this week. My advisor's feedback was, and I quote, "Submit it. Good experience. You will be rejected brutally."So that's where we're at. Paying tuition to be professionally humiliated. Meanwhile, DeepSeek trained a model to teach itself reasoning through trial and error. We're not so different, the AI and I.Exactly one year ago today, DeepSeek R1 dropped. Nvidia lost $589 billion in market value, the largest single-day loss in U.S. stock market history.Marc Andreessen called it "one of the most amazing and impressive breakthroughs I've ever seen."That breakthrough? Teaching AI to actually think through problems instead of pattern-matching its way to an answer.Let's talk about how that works.The Fundamental DifferenceYou've heard me say LLMs are "fancy autocomplete." That's still true. But reasoning models are a genuinely different architecture, not just autocomplete with more steps.Traditional LLMs: Input &#8594; Single Forward Pass &#8594; Output (pattern matching)You ask a question. The model predicts the most likely next token, then the next, then the next. It's "System 1" thinking: fast, intuitive, based on patterns it learned during training.When you ask "What's 23 &#215; 47?", a traditional LLM doesn't multiply. It predicts what tokens typically follow that question. Sometimes it gets lucky. Often it doesn't.Reasoning Models: Input &#8594; Generate Reasoning Tokens &#8594; Check &#8594; Revise &#8594; Output (exploration) (verify) (backtrack)The model generates a stream of internal "thinking tokens" before producing its answer. It works through the problem step-by-step, checks its work, and backtracks when it hits dead ends.This is "System 2" thinking: slow, deliberate, analytical.How They Actually Built ThisHere's what made DeepSeek R1 such a big deal. Everyone assumed training reasoning required millions of human-written step-by-step solutions. Expensive. Slow. Limited by how many math problems you can get humans to solve.DeepSeek showed you don't need that.Their approach: pure reinforcement learning. Give the model a problem with a verifiable answer (math, code, logic puzzles). Let it try. Check if it's right. Reward correct answers, penalize wrong ones. Repeat billions of times.The model taught itself to reason by trial and error.From their paper:"The reasoning abilities of LLMs can be incentivized through pure reinforcement learning, obviating the need for human-labeled reasoning trajectories."What emerged was fascinating. Without being told how to reason, the model spontaneously developed:Self-verification: Checking its own work mid-solutionReflection: "Wait, that doesn't seem right..."Backtracking: Abandoning dead-end approachesStrategy switching: Trying different methods when stuckHere's an actual example from their training logs, they called it the "aha moment": "Wait, wait. Wait. That's an aha moment I can flag here."The model literally discovered metacognition through gradient descent.The Training LoopTraditional LLM training:Show model text from the internetPredict next tokenPenalize wrong predictionsRepeat on trillions of tokensReasoning model training (simplified):Give model a math problem: "Solve for x: 3x + 7 = 22"Model generates reasoning chain + answerCheck if answer is correct (x = 5? Yes.)If correct: reinforce this reasoning patternIf wrong: discourage this patternRepeat on millions of problemsThe key insight: you don't need humans to label the reasoning steps. You just need problems where you can automatically verify the final answer. Math. Code that compiles and passes tests. Logic puzzles with definite solutions.This is why reasoning models excel at STEM but don't magically improve creative writing. There's no automatic way to verify if a poem is "correct."The Cost StructureHere's why your $0.01 query might cost $0.50 with a reasoning model:Your prompt: 500 tokens (input pricing) Thinking tokens: 8,000 tokens (output pricing&#8212;you pay for these) Visible response: 200 tokens (output pricing) &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472; Total billed: 8,700 tokensThose 8,000 thinking tokens? You don't see them. But you pay for them. At output token prices.OpenAI hides the reasoning trace entirely (you just see the final answer). DeepSeek shows it wrapped in <think> tags. Anthropic's extended thinking shows a summary.Different philosophies. Same cost structure.The January 2025 PanicWhy did Nvidia lose $589 billion in one day?The headline: DeepSeek claimed they trained R1 for $5.6 million. OpenAI reportedly spent $100M+ on GPT-4. The market asked: if you can build frontier AI with $6M and older chips, why does anyone need Nvidia's $40,000 GPUs?The background: The $5.6M figure is disputed. It likely excludes prior research, experiments, and the cost of the base model (DeepSeek-V3) that R1 was built on. But the model exists. It works. It's open source.The real lesson: training reasoning is cheaper than everyone assumed. You need verifiable problems and compute for RL, not massive human annotation.The aftermath: OpenAI responded by shipping o3-mini four days later and slashing o3 pricing by 80% in June.When to Use Reasoning ModelsGood fit:Multi-step math and calculationsComplex code with edge casesScientific/technical analysisContract review (finding conflicts)Anything where "show your work" improves accuracyBad fit:Simple factual questionsCreative writingTranslationClassification tasksAnything where speed matters more than depthThe practical pattern:Most production systems route 80-90% of queries to standard models and reserve reasoning for the hard stuff. Paying for 8,000 thinking tokens on "What's the weather?" is lighting money on fire.The TL;DRThe architecture: Reasoning models generate internal "thinking tokens" before answering: exploring, verifying, backtracking. Traditional LLMs do a single forward pass.The training: Pure reinforcement learning on problems with verifiable answers. No human-labeled reasoning traces needed. The model teaches itself to think through trial and error.The cost trap: You pay for thinking tokens at output prices. A 200-token answer might cost 8,000 tokens of hidden reasoning.The DeepSeek moment: January 2025. Proved reasoning can be trained cheaply. Nvidia lost $589B. OpenAI dropped prices 80%.The convergence: Reasoning is becoming a toggle, not a separate model family.The practical move: Route appropriately. Reasoning for 10-20% of queries, not everything.Next week: WTF are World Models? (Or: The Godfather of AI Just Bet $5B That LLMs Are a Dead End)Yann LeCun spent 12 years building Meta's AI empire. In December, he quit. His new startup, AMI Labs, is raising &#8364;500M at a &#8364;3B valuation before launching a single product.His thesis: Scaling LLMs won't get us to AGI. "LLMs are too limiting," he said at GTC. The alternative? World models: AI that learns how physical reality works by watching video instead of reading text.He's not alone. Fei-Fei Li's World Labs just shipped Marble, the first commercial world model. Google DeepMind has Genie 3. NVIDIA's Cosmos hit 2 million downloads. The race to build AI that understands physics (not just language) is officially on.We'll cover what world models actually are, why LeCun thinks they're the path to real intelligence, how V-JEPA differs from transformers, and whether this is a genuine paradigm shift or the most expensive pivot in AI history.See you next Wednesday &#129310;pls subscribe

The Man Behind The Dube

When not building AI systems, Taksch pursues a deep love of finance—dreaming of running a family office and investing in startups.

For fun: learning Russian, French & German, competitive League, and Georgian cuisine.

"Une journée sans du fromage est comme une journée sans du soleil"
Read More →

By The Numbers

20+

Projects

7

Years

15+

Industries

4

Active Ventures

Commit History

GitHub Contributions

Technical Arsenal

Languages: TypeScript, Python, C++, Rust, C#, R, Lean

AI/ML: PyTorch, LangGraph, LangChain

Cloud: AWS, GCP

— Classifieds —

WANTED: Complex AI problems. Will trade deterministic solutions for interesting challenges.

Browse All Articles →