You get OpenAI compatibility with streaming, tool calling, and automatic failove...
GitHub RepoImpressions128
View on GitHub
@githubprojectsPost Author

FreeLLMAPI: OpenAI-Compatible Streaming + Failover Across Google, Groq, and More

If you've ever built an app that depends on a single LLM provider, you know the pain: an outage, rate limit, or sudden pricing change can break everything. You either hard-code a fallback or write custom logic for each provider’s API shape. Enter FreeLLMAPI — a tiny proxy that gives you OpenAI-compatible endpoints with streaming, tool calling, and automatic failover across multiple providers.

What It Does

FreeLLMAPI is a lightweight reverse proxy that sits between your app and LLM backends. You send requests in the standard OpenAI Chat Completions format, and it routes them to providers like Google (Gemini), Groq, Cerebras, or others. If one provider fails or returns an error, it transparently retries the next one. Crucially, it preserves streaming — so your users still see tokens arrive in real time.

The repo is a single-file Python implementation (FastAPI-based) with minimal dependencies. You point it at a config file listing your API keys and provider preferences, and it handles the rest.

Why It’s Cool

  • Zero code changes for your app. Your existing OpenAI SDK code works — just change the base URL to point at FreeLLMAPI. Tool calls (function calling) also pass through without modification.
  • Automatic failover with configurable provider priority. Want to use Groq first, then fall back to Google Gemini, then Cerebras? Just define that order in your config file. If a provider is down or returns an error, the request moves to the next one.
  • Streaming works end-to-end. This is where most proxies break — they accumulate the full response and then send it. FreeLLMAPI streams tokens from the active provider directly to your client, so you keep the real-time UX.
  • Provider-agnostic tool calling. If your app uses function calling, it works across providers that support it (e.g., Groq, Google). The proxy maps the response format back to OpenAI's schema, so your code never knows there's a different engine underneath.

How to Try It

  1. Clone the repo:

    git clone https://github.com/tashfeenahmed/freellmapi
    cd freellmapi
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Create a config.yaml file (see the example in the repo) with your API keys and provider order:

    providers:
      - name: groq
        api_key: your_groq_key
        model: llama3-70b-8192
      - name: google
        api_key: your_google_key
        model: gemini-1.5-flash
    
  4. Run the server:

    python app.py
    
  5. Point your OpenAI client at http://localhost:8000/v1 — that's it. Your existing chat or streaming code now has automatic failover.

Final Thoughts

FreeLLMAPI is a pragmatic, developer-friendly solution for a real problem. It's not trying to be a full orchestration platform or a caching gateway — it just solves the “what if my LLM provider goes down” scenario in a clean, standards-compliant way. If you're building a production app that relies on multiple LLMs, or you just want to experiment with different providers without rewriting your integration, this is a quick win.

The code is simple enough to audit or extend, and the streaming support makes it viable for chat interfaces. Worth a star on GitHub.


Follow us for more dev tools: @githubprojects

Back to Projects
Last updated: May 25, 2026 at 05:35 PM