How to #QuitBigAI for smaller models… with the Frugal AI Switchboard

Wow. Is the best way to describe your reaction to my “How to #QuitBigAI” blogpost. By far this blog’s most-viewed article (10x more views than usual), it really struck a nerve, and sparked all kinds of fascinating conversations and connections.

Underlining above all to what extent – thank god – I’m not the only one who’s fed up with how way-too-big-AI is currently being forced down our throats (hell, Trump’s America has just issued a warning about the risk of “anti-tech violent activism“; and rightly so…)

The big AI alternatives I suggested in that post – GreenPT, HuggingChat, Mistral and more – were just for starters. Alternative solutions for the most-common use cases, i.e. general purpose chatbots.

But what about the other ones? How are people using AI at work today? Or for personal projects?

Clearly, it was time for a deeper dive.

Coding: The use case par excellence

Practically all developers today use generative AI to help them code (faster). Claude Code is their tool of choice, although OpenAI’s Codex and GitHub Copilot also get a look-in. As you can guess, all of these tools rely on large, frontier models, like Claude Opus 4.6, 4.7 or 4.8. But do they really need to?

“We’re using $300m of Anthropic [Claude Code] this year (…) The vast majority of those tokens don’t need to go to Anthropic”, Salesforce head honcho Marc Benioff told the All-In podcast recently. “There needs to be some intermediary layer that’s saying, oh, that one has to go to Anthropic, but these ones can be handled by smaller models.”

Especially now that around a third of Anthropic’s compute budget now goes to Colossus 1 & 2, Elon Musk’s hyper-polluting supercomputers? (how I worked that out is in the footnotes).

So can you code with anything else? Turns out you can! Here’s how I expressed it in a talk I gave to developers at Dutch bank ABN-AMRO last month:

Don’t just take my word for it! Qwen 3.6 is recommended by Hugging Face’s CTO, who happily showed it off working in airplane mode, in a plane (as a reminder, the smaller models can function locally on the right hardware, i.e. without the cloud); Google’s Gemma 4 has been getting rave reviews since its recent release; Kimi 2.6 is recommended both by ethical cloud provider Infomaniak and Hugging Face; and MiniMax M2.5 was the default model in open source coding client OpenCode, when I first started trying it.

Why are these models more frugal? Because they are up to 74 times smaller than (we think) Claude Opus is (1.5-2 trillion parameters). Though we should always be wary of oversimplifying, as a rule of thumb, the smaller a model is, the less resources it needs to run. Which explains why small models can run locally.

But are they any good for coding? I decided to find out…

Is responsible vibe coding a thing?

Vibe coding can definitely be less impactful, with a tool like OpenCode. Pitched as a more affordable, open source alternative to Claude Code, it offers a choice from a selection of models, from MiniMax M2.5, which I started with, to Nemotron 3 Super, NVIDIA’s small model, which I ended up with. The bigger the model – MiniMax is more medium-sized – the more you can do with OpenCode’s free token limit, which runs out every five hours.

Egged on by the incredible Syd Lawrence, I decided to take an ironic plunge into vibe coding, in a quest to prove my “smaller AI is better” point (more on that irony in the footnotes…)

So what did I ask OpenCode to do? Make a Frugal AI Switchboard.

Of course, it couldn’t do all the hard work by itself. I first had to research 1/. what are the most common uses of LLMs today, and 2/. what are the best small models for those needs (instead of big, frontier LLMs).

Mission 1/. was accomplished with studies like the OpenAI/NBER study mentioned in my last post, backed up with internet research. Scouring the web was also useful for finding small alternatives to those needs, as was – OK, I admit it! – GreenPT and Kimi 2.6. Using smaller/greener LLMs to find smaller/greener LLMs? I think the planet will forgive me 🤔

Then, with a hearty dose of inspiration from Tom Watson’s excellent Bearing service, which helps you find the AI model best suited to your needs – I set about establishing a facts-based way to work out what are the most frugal alternatives to the usual suspect big LLMs.

But not just the most frugal/smallest/most energy-efficient. I knew I could draw that data from EcoLogits, the ecological assessment tool for AI models (whose NGO I’m proud to be a part of).

The harder part would be spotlighting those smaller models that can be (almost) as performant as big ones. For that, I saw RL Nabors had recommended Arize Phoenix, an “open source platform for for agent development and evaluation”, as a way to assess models, in particular with regards their accuracy, one of the most important KPIs when it comes to AI models.

So at last, I could aggregate data on not only the most frugal alternative models, but the best ones too.

Introducing the Frugal AI Switchboard

I ended up creating the Switchboard with Mistral Vibe, as I hit a wall with OpenCode, whose end result wasn’t very pretty. Mistral Vibe – which uses the French AI provider’s Medium 3.5 model (128B) – made something better-looking off the bat; plus, as the compute work (most likely) happens in low-carbon France, it’s significantly less impactful than going through a US data centre, like all the biggest models do.

In the end I also hit a wall with Mistral Vibe, as – surprise, surprise! – the more I asked it to tweak details, the more it added errors 😔 So the help of an html expert (thanks, Florent Roques!) was essential to get this over the line. Humans 1-1 Robots…

So anyway, how does it work?

Usage types (green buttons): your starting point, whether you’re coding, creating images or writing. These six uses are, re. my research, today’s most common in a work context

Models: the non-frugal baseline model is at the bottom; the alternatives are above it, ranked by Accuracy (via Arize Phoenix)

Parameters: the size of the model

Energy & CO2 per request: the impacts of one prompt to these models, via EcoLogits

Location, Sovereignty/Openness: knowing where a model runs – and whether you can run it wherever you like, and keep the data+model – are crucial considerations with AI. So I worked that in too!

Insights: Mistral Medium 3.5 threw in these recaps spontaneously. Quite helpful for taking your pick!

CSV export because we’re anti-lock-in 🙂

Please go and check out the Frugal AI Switchboard here!

What does it show us? Well, for example, in:

Coding: MiniMax M2.5 is only 4% less accurate than Claude Opus 4.6, is x9 smaller, x900 cheaper and x100 less impactful
Data Analysis: DeepSeek-V3 is just 2% less accurate than Claude Opus 4.6, is also x900 cheaper and x4 less impactful
Meetings: GPT-5.5 may be the model transcribing your Teams meetings. And yet OpenAI’s own model, the open source and ubiquitous Whisper, is nearly 2000 times smaller, x60 less impactful for just a 4% drop in accuracy
Image Generation: Stable Diffusion 3.5 Medium could save you x500 vs. GPT-5.5, with 30 times less environmental impacts and just a 3% accuracy loss
Writing: Why use ChatGPT when the tiny Mistral 7B can do 96% a good a job, with 100 less impacts and x500 cost savings?
Customer Chatbots: for answering client enquiries, if you opt for Meta’s Llama 4 Maverick, accuracy is slightly better than the go-to model in this category (Gemini 3 Flash), despite being 4 times smaller. Cost savings are just 2 times less, and impacts x6. Still, it’s a switch that makes sense…

Yes, it’s basic. It’s meant to be, so it’s as useful and accessible as possible. Yes, the impact figures are worked out via Ecologits’ “World” default – representing the global carbon intensity level – so would be considerably higher if we set the country to “France”, for example. But you get the gist.

The fact remains: smaller models can perform practically as well as the biggest ones, at a fraction of the cost, and with significantly lower environmental impacts.

So what are you waiting for? 🙂 #QuitBigAI !

PS: this is a totally open (source) experiment, so I’m very much open to feedback on to how to make this tool even more useful. It’s just the V1, and can only get better 🙏🏻🎛️💪🏻

PPS: Is it ironic to vibe-code such a tool? Yes and no. Firstly, Mistral Vibe uses Medium 3.5, which is around sixteen times smaller than Claude Opus; and it’s hosted in France (we hope!), whose energy is 10x less carbon-intense than Virginia’s. Secondly, yes, I could have asked a human developer. But he or she would probably have used Claude Code or similar, as most developers do now. Lame excuse, or reality? You decide. My caveats: the Switchboard’s impacts are clearly listed below it; and the impacts it should hopefully help avoid are way bigger than the impacts of making it. Fingers crossed!

PPPS: Colossus = 35% of Anthropic’s compute *spend*, because:
– $1.25bn a month going to xAI (TechCrunch)
– $80bn previously committed to the 3 hyperscalers by 2029 (DC Dynamics)
> $42bn total annual compute spend, $15bn of that going to xAI > 35-6%

The above is, of course, open to interpretation/debate. But it’s not that far off the 23% of compute power that Asim Hussein worked out Anthropic is now devoting to Colossus.

TL;DR: be it via spend or compute power/capacity, there’s one chance in three that Claude prompts are shortening the lives of those with the misfortune to live near Musk’s monstrosity.

How to #QuitBigAI for smaller models… with the Frugal AI Switchboard

Coding: The use case par excellence

Is responsible vibe coding a thing?

Introducing the Frugal AI Switchboard

Like this:

Related

Published by jamesmart_in

Leave a ReplyCancel reply

Coding: The use case par excellence

Is responsible vibe coding a thing?

Introducing the Frugal AI Switchboard

Partager :

Like this:

Related

Published by jamesmart_in

Leave a ReplyCancel reply

Discover more from BetterTech