Whichllm

GitHub RepoMay 31, 2026 at 06:10 AMImpressions1.1k

Project Description

Stop Guessing Which LLM Your Hardware Can Actually Run

You've got a decent GPU and you want to run a local language model. So you head to HuggingFace, browse by size, pick something that fits in your VRAM, and hope for the best. But that approach is broken — the biggest model that fits isn't always the best one, and you have no way of knowing which 7B model outperforms which 13B model without hours of trial and error. That's the problem whichllm solves. It's a command-line tool that auto-detects your hardware, pulls live benchmark data from HuggingFace, and tells you exactly which local LLM you should run on your machine.

What It Does

Whichllm is a Python tool (3.11+) that scans your system's GPU, CPU, and RAM, then queries HuggingFace to find models that fit your hardware. It ranks them by real benchmark scores — not just parameter count. You run a single command, and it returns a ranked list with model names, quantization levels, quality scores, and estimated tokens per second.

The ranking engine merges data from multiple benchmarks: LiveBench, Artificial Analysis, Aider, Chatbot Arena ELO, Open LLM Leaderboard, and multimodal/vision evaluations. Every score comes tagged with a confidence level — direct, variant, base, interpolated, or self-reported — so you know how reliable each recommendation is. It also factors in recency, so a 2024 model with old benchmark scores can't outrank a current-generation model on stale data.

You can also simulate hardware you don't own yet. Want to know what you'd get with an RTX 4090 before buying one? Just pass --gpu "RTX 4090" and whichllm will show you the top picks as if you had that card. There's a plan command that works backward — tell it a model name and it tells you what GPU you need to run it. And if you just want to get started immediately, whichllm run "qwen 2.5 1.5b gguf" launches a chat session.

Why It's Cool

The obvious thing whichllm does is save you time — you don't have to manually cross-reference model sizes against your VRAM. But the deeper value is in the ranking logic.

It ranks by actual quality, not size. Most people assume a 13B model beats a 7B model. Whichllm regularly recommends smaller models over larger ones because they score higher on real benchmarks. The README's example shows a 27.8B model ranked above a 32B model because it's a newer generation with better benchmark performance. That's the kind of insight you'd never get from a "what fits?" approach.
It's recency-aware. Old leaderboard scores don't get treated like new ones. Each model's lineage is tracked, and stale benchmarks are demoted. The benchmark snapshot date is printed under every ranking, so if you're looking at old data, you can see it immediately rather than trusting it silently.
Confidence grading prevents bad recommendations. Every score is tagged with its source — direct benchmark results, variant model scores, base model inheritance, interpolated estimates, or self-reported uploader claims. Fabricated claims get discounted automatically. This is a level of rigor most model discovery tools simply don't have.
The upgrade command is genuinely useful. You can pass multiple GPUs to compare upgrade candidates — whichllm upgrade "RTX 4090" "RTX 5090" "H100" — and see how your top picks change across hardware tiers. It's a practical way to evaluate whether a hardware investment is worth it for your specific use case.

How to Try It

The fastest way to try whichllm requires no installation at all. Just run:

uvx whichllm@latest

This uses uvx to run the latest version in one shot. If you want to simulate different hardware before you buy:

uvx whichllm@latest --gpu "RTX 4090"

For regular use, install it:

uv tool install whichllm

Or use pip or Homebrew:

pip install whichllm
brew install andyyyy64/whichllm/whichllm

Once installed, some useful commands:

# Best models for your machine
whichllm

# Find what GPU you need for a specific model
whichllm plan "llama 3 70b"

# Start a chat session
whichllm run "qwen 2.5 1.5b gguf"

# Get copy-paste Python code for a model
whichllm snippet "qwen 7b"

# Return JSON for scripting
whichllm --top 1 --json

The project is on GitHub at github.com/Andyyyy64/whichllm, licensed under MIT, and actively maintained with CI testing.

Final Thoughts

Whichllm is the tool you pull out when you're tired of guessing. It's not flashy — it's a terminal command that gives you a ranked list — but that's exactly what makes it useful. If you run local LLMs, or you're planning a hardware purchase for running them, this will save you real time and probably point you toward a better model than you'd find on your own. The confidence grading and recency awareness show that the author thought carefully about the common failure modes of model recommendation tools. It's worth having in your toolkit.

Follow @githubprojects for more developer tools and open source projects.

Repository: https://github.com/Andyyyy64/whichllm

Contributors

@githubprojects

2

Total PostsPosts

1

ContributorsUsers

May 31

CreatedDate

Back to Projects

Project ID: a7bd0fe7-9554-4193-b6c4-c98c8824745bLast updated: May 31, 2026 at 06:10 AM