Wednesday, May 13, 2026

Show HN: Nibble https://ift.tt/tkOndsQ

Show HN: Nibble An attempt at a single pass LLVM frontend in ~3000 lines of C without external dependencies, malloc, or an AST. Included are some graphical examples. The IR isn't perfect, and the README touches on one particular downfall https://ift.tt/D8THNiv May 14, 2026 at 07:16AM

Show HN: Rotunda - A browser built for agents with simulated typing https://ift.tt/n83a9ek

Show HN: Rotunda - A browser built for agents with simulated typing Hi HN! Pierce here. Rotunda is a firefox fork primarily intended for agent use, which I’ve been hacking on nights/weekends. There was a [lengthy]( https://ift.tt/WXlTKtU ) discussion last week on how expensive computer use models are. The cost is going to drop eventually, but I think on some level it's still usually the wrong primitive. The web gives us access to beautiful structured formats, plaintext, etc... why throw that away if we don't have to? I realized at some point that for 99% of automations I just want agents to be able to control my Chrome instance. But that’s easier said that done: CDP (the Chrome automation protocol) leaks a ton of state about being programmatically controlled, either by toggling window attributes or by running `page.evaluate()` commands right in the page context. Plus if you look at an automation running it's pretty obvious what happens: the mouse jumps around, fields are filled instantly, etc. Rotunda tries to fix this. Its standout features: - Realistic simulation of mouse movements and keyboard commands, powered by a trained RNN on my own timing patterns from the last week. (still feel weird about opting-in to a key logger but whatever) - Doesn’t lie about its host specs, only fibs about some client side details. Stealth browsers are too easy to flag statistically when you’re adding noise to canvas pixels or audio pipelines. - It runs on your local device with a CLI or Playwright API accessible to Claude, Codex, or whatever your harness-de-jure today looks like. - Patches modern Firefox (150) with an agentic harness to keep this updated over time MPL-2.0 on GitHub: https://ift.tt/0k1dvw7 Longer writeup on the design choices: https://ift.tt/W4bABqp Also check out the demo on the site! https://www.rotunda.sh/ Pretty excited by how this turned out but we’re still super early. Give it a try and please flag any issues! https://ift.tt/0k1dvw7 May 13, 2026 at 07:14PM

Show HN: Micromort Risk Visualizer https://ift.tt/PW8A9jt

Show HN: Micromort Risk Visualizer https://boxed.github.io/micromort/ May 14, 2026 at 12:09AM

Tuesday, May 12, 2026

Show HN: Statewright – Visual state machines that make AI agents reliable https://ift.tt/mF56CbG

Show HN: Statewright – Visual state machines that make AI agents reliable Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/4omOiAX Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws. https://ift.tt/aAXUYbq May 12, 2026 at 07:54PM

Monday, May 11, 2026

Sunday, May 10, 2026

Show HN: adamsreview – better multi-agent PR reviews for Claude Code https://ift.tt/XBinFLg

Show HN: adamsreview – better multi-agent PR reviews for Claude Code I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments. On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives. adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully. You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated. The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors. It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool. I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews. Repo: https://ift.tt/VT81R0X Install: /plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview https://ift.tt/VT81R0X May 11, 2026 at 07:36AM

Show HN: Nibble https://ift.tt/tkOndsQ

Show HN: Nibble An attempt at a single pass LLVM frontend in ~3000 lines of C without external dependencies, malloc, or an AST. Included are...