Paraggupta: Show HN: Ourguide – OS wide task guidance system that shows you where to click https://ift.tt/kuVXLrn

Monday, January 26, 2026

Show HN: Ourguide – OS wide task guidance system that shows you where to click https://ift.tt/kuVXLrn

Show HN: Ourguide – OS wide task guidance system that shows you where to click Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help. I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is. It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser. Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency. You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks. I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do. You can download and test Ourguide here: https://ourguide.ai/downloads The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for. https://ourguide.ai January 26, 2026 at 11:49PM

Paraggupta

Monday, January 26, 2026

Show HN: Ourguide – OS wide task guidance system that shows you where to click https://ift.tt/kuVXLrn

No comments:

Post a Comment

Show HN: Please hack my C webserver (it's a collaborative whiteboard) https://ift.tt/yX7DKEw

Report Abuse

Labels