Workany: The Open-Source Desktop Agent That Automates Any Task
Ever feel like you're doing the same tedious, repetitive computer tasks day in and day out? Clicking the same sequence of buttons, copying data between apps, or performing routine setup tasks? What if you could just tell your computer what to do and have it figure out the steps? That's the promise of Workany.
It's an open-source desktop agent that listens to your natural language requests and automates the process for you, right on your own machine. Think of it as a programmable assistant that can interact with your GUI applications, not just an API.
What It Does
Workany is a local desktop application that uses AI to translate your plain English instructions into a series of automated actions on your computer. You tell it a goal like "save all open tabs to a text file" or "resize all images in this folder to 800px width." Workany then plans the steps, controls your mouse and keyboard to execute them in real applications (like your browser or file explorer), and completes the task. It's automation, but without you having to write the script.
Why It's Cool
The clever part is how it works under the hood. Instead of requiring deep integration with every app, it uses vision-language models. Essentially, it sees your screen (via screenshots) to understand the current state of your applications and then decides what action to take next. This approach is incredibly flexible because it can work with almost any desktop application you already have installed, even old or niche ones that don't have APIs.
It's also fully local and open-source. Your data and screen information don't get shipped off to a remote server; the processing happens on your machine. This is a big deal for privacy and for working with sensitive information. As an open-source project, developers can peek under the hood, contribute, and tailor it to their own needs.
How to Try It
Ready to offload some busywork? Getting started is straightforward.
- Head over to the Workany GitHub repository.
- Check the
README.mdfor the latest installation instructions. You'll likely need to clone the repo and set up a Python environment. - You'll need to configure an API key for a vision model (like GPT-4V or Claude 3) for it to understand your screen and plan actions. The repo guides you through this.
- Run the application, give it a simple task to start with, and watch it go.
It's an active project, so diving into the issues or discussions on GitHub is a great way to see what's possible and what the community is building.
Final Thoughts
Workany feels like a glimpse into a very practical future of human-computer interaction. It's not about flashy AI demos; it's about solving the daily friction developers and power users actually face. The local, open-source model is the right call for a tool with this level of access to your system.
As a developer, it's fascinating to think about the possibilities. You could use it to automate complex local dev environment setups, document repetitive processes by having the agent perform them, or even build and test it for specialized workflows in your own field. It's a tool that rewards tinkering.
What repetitive task would you automate first?
Follow us for more cool projects: @githubprojects
Repository: https://github.com/workany-ai/workany