Tomáš Repčík - 7. 6. 2026

GitHub Copilot price hike is not a rug pull

How people feel betrayed by a price hike, and how reality really looks like

Recently, there has been a lot of discussion about the price hike of GitHub Copilot. Many people feel betrayed by the price hike, and they are calling it a rug pull.

To be honest, you can feel this way, but this writing was on the wall for a long time. It was not a surprise, it was not a shock, it was not a rug pull.

Why people feel betrayed?

Before the price hike, people were billed by number of messages independent of how much inference the model did. You could make 1 message and GitHub Copilot would do inference with multiple steps in reasoning, doing tool calls, writing code and etc.. This resulted paying one credit multiplied by the model’s multiplier.

In other words, with one good prompt, you could make the frontier model do a whole project for you, and you would pay only one credit for it. Sounds good, right?

Imagine paying all that inference… Basically, Microsoft was subsidizing the cost of using GitHub Copilot for a long time, and now they are trying to recoup some of that cost. (not like everyone is/was not subsidizing it)

Pricing

The new pricing model is quite simple, you still pay credits for inference, but one credit now is 0.01$. Frontier models will burn more credits, because they do much more inference.

So, 100 credis are 1$, and if you use 1000 credits, you will pay 10$. If you use 10000 credits, you will pay 100$. It is that simple.

Why it is not a rug pull?

The new pricing model is more fair and more sustainable. It is based on the amount of inference the model does (consumed tokens), rather than the number of messages. This means, how much tokens the model consumes, how much work it does, how much it helps you, that is what you will pay for.

The relationship between the price and the value is much more direct now. If you use it a lot, you will pay more, but if you use it less, you will pay less.

And here is the issue, people perceive it as one sided and they feel like they are being taken advantage of, but in reality, it is a business decision that makes sense for Microsoft and for the sustainability of the product.

Imagine, you would be a company that’s selling a product that costs 100$ to produce, but you are selling it for 10$. You would be losing money on every sale, right? That is what Microsoft was doing with GitHub Copilot. They were losing money on every sale, and now they are trying to recoup some of that cost.

They did it for a long time and people just got used to it.

You can see squeezing of the costs also in other companies like Anthropic or OpenAI.

Yes, we could call it enshittification of the product to some degree, but actually, you are getting the same service, you are getting the same product, you are getting the same value, but you are paying more for it. That is not a rug pull, that is just a business decision.

Where is the problem?

The problem is not that they want to get rich out of you. The problem is compute. There is just not enough compute to go around, and it is expensive.

xAI is selling compute to the Anthropic and to Google, because they do not have enough compute to run their models.

https://x.ai/news/anthropic-compute-partnership

https://www.cnbc.com/2026/06/05/google-to-pay-spacex-920-million-a-month-for-xai-compute-capacity.html

Suddenly, xAI is more willing to bill other companies for compute, because it has become a scarce resource, instead of building their own models.

What can you do about it?

You do not need to stop using Copilot. You just need to use it with more intention.

Match the model to the task

Do not use the most expensive model for everything. That is just waste.

Use cheaper or auto-selected models for routine refactors, formatting, documentation, and small edits. Save the heavier reasoning models for architecture, debugging, migrations, and tasks where the model really has to think.

If the task is simple, the model should be simple too. Use for example mini versions of the OpenAI models, Haiku/Sonnet from Anthropic, or even open source models for non-sensitive work. That will save you a lot of credits.

Write better prompts

Vague prompts burn tokens because they cause retries, scope drift, and more back-and-forth.

Keep your prompts short and clear. State the goal, the scope, the constraints, the tests, references to the target and the expected output format. That is enough in most cases.

The same applies to instructions. Keep .github/copilot-instructions.md short and actionable. If you need a novel to explain the task, the task is probably too broad.

Break big work into smaller tasks

Do not ask for “fix the whole repo”. That is how you get chaos and token waste.

Use plan mode first. Then execute in bounded steps. One problem, one slice, one verification.

That is usually faster anyway, because you get fewer wrong turns.

Automate repeated work

If you keep asking the same thing, stop asking.

Move repeated work into dedicated tooling or prompts when it makes sense. Release notes, coverage summaries, PR triage, and status reporting do not need to be an ad hoc chat every time.

Create a script, a workflow, or a hook that does it for you. That will save you a lot of tokens and time in the long run.

Limit tools and risky behavior

Give the agent only the tools it needs.

If it does not need shell access, do not give it shell access. If it does not need write permissions, do not give it write permissions. Add hooks that log, block risky commands, and enforce the rules you already know you want.

That saves tokens and avoids dumb accidents.

Keep sessions tight

Switching models mid-session can be more expensive than people think, and long messy threads can turn into a token sink.

Prefer one clean session per task when possible. Reuse cached results when you can. Do not keep dragging huge chats forward if a fresh bounded thread would be cheaper and clearer.

Cached context

When you are reusing instructions, tools, and skills, you are effectively caching context. At the inference level, that is a huge win - providers are initializing the state of the model with that context and it saves a lot of compute/tokens on every request.

Skills and command files are cache-friendly when they are stable, short, and reused. They are not magic free context. They still enter the model request, but repeated stable context is more likely to benefit from cached-token pricing. The best use is to move durable workflow rules into instructions, keep volatile task details in the prompt, and prevent expensive agent wandering.

New models do not mean always more token drain

Recent GPT 5.5 and Opus 4.8 are more efficient than their predecessors, so they can do more work for the same or even less tokens. That is a good thing, but it does not mean that you should use them for everything.

I highly recommend to have look at the recent DeepSWE benchmark. It shows for example, that GPT 5.5 can do the same work at medium reasoning as in comparison with GPT 5.4 on high. Also, Opus 4.8 has decresead token consumption.

Cursor also released their Composer 2.5 model specialized for coding and it is extremely cheap and efficient. It is not able to do complex whole project stuff, but for programming buddy it is really good deal.

Socials

Thanks for reading this article!

For more content like this, follow me here or on X or LinkedIn.

Subscribe for more
LinkedIn GitHub Medium X