GitHub Copilot Inner Workings
How GitHub Copilot understands and generates code
There are currently many coding assistants like Codeium, Tabnine, and GitHub Copilot.
When you want to have more control over the code generation process, or you just want to have better assistant, they are way to go.
However, over the last months of me using the Copilot, it got much better at coding.
At its inception, it was clumsy, and its suggestions were off. But now, it has improved significantly in several areas.
AI assistants are quite well known already, but my interest was to have a look at how such assistants work.
TL;DR on capabilities
Copilot can help you with:
- programming languages and common practices
- documentation and GitHub integrations
- general knowledge about the code
- code reviews
Copilot cannot:
- access to private or proprietary code
- full context of private GitHub repositories
- original research or critical thinking
- real-time data or events (can be achieved with MCPs)
How Copilot Requests Information Works
Copilot comes with many techniques how to handle the context, because for models the context is crucial for understanding and generating relevant code.
Let’s have a look at how Copilot works step by step from the request to a suggestion.
The process is similar across inline suggestions and chat, but the chat has more context and is more complex, due to the speed of the conversation.
1. User Input
User input imagine you start to write code and write prompt into a chat and Copilot now starts to think what to do with it.
2. Context Analysis
Copilot analyzes the context of the user input to understand the code and its surrounding environment.
Te context can include:
- opened code
- surrounding code around your cursor / selection
- references/attachments added by you
- custom instructions coming from the project setup
We have to keep in mind, we are limited by context window of the model.
Along the included context, all the code is being tokenized - more friendly format for the LLMs.
To keep context manageable, the Copilot applies multiple techniques how to handle context:
-
context has rolling/sliding context window - where they move in and out tokenized code
-
tokens are rotated based on their age, so the oldest one are removed in favor of the newest one
-
chunking is another technique besides the tokenization, where the code is chunked by the semantics and structural relevance
-
sometimes the content does not fit into window and Copilot tries to optimize it by creating summaries
- tokenization
- encoding the context - encodes relationships between tokens
- optimizes for key information by relevance, redundancy, repetition, topics
- generating summary - selects original text / generates newest
-
planning - copilot breaks down the problem into tasks, which the LLM will execute
-
more context is not always better - smaller more specific is sometimes better
3. Understand User Intent
- from given context, the Copilot picks one of the available tools and executes it
- the tools are used to generate code, answer questions, or provide suggestions
4. Generate Responses
- the Copilot generates multiple response candidates based on the selected tool and the context
5. Rank and Filter Suggestions
- the Copilot ranks the response candidates based on relevance, context, history and user feedback
6. Generate Response
- the Copilot generates a final response based on the selected candidate and the context
- user can accept or reject response
- user can leave feedback and the model goes back to the context analysis step
All of this goes through the filters to prevent generating toxic content or vulnerable code.
@Workspace and Context Building
As you could see, context is crucial for Copilot to understand and generate relevant code.
However, sometimes we do not know what to include or what to look for, because you have large codebase or you are fairly new to it.
@workspace can help you with that.
@workspace is a feature that allows Copilot to search your codebase and provide relevant context for your questions. It builds an index of your codebase, which Copilot uses to answer questions about your code in Chat.
@workspace is only available to you while using Ask Copilot. For referencing the codebase in the Edit/Agent mode, feel free to use #codebase.
CoPilot builds the context preemptively, so the code generation is optimized as much as possible.
Remote Index
If your code lives in a GitHub repository, you can build a remote index using GitHub code search. This enables Copilot to search your codebase quickly, even for large projects.
Here’s how to set it up:
- Sign in with your GitHub account in VScode
- Run the “Build Remote Workspace Index” command in the Command Palette
- Wait for the index to build (this can take time for large codebases)
- Monitor progress in the Copilot status dashboard
The beauty of remote indexing is that you only need to set it up once. GitHub automatically keeps it up-to-date whenever you push code changes.
Important: Remote indexing requires a project with a GitHub remote. Make sure you’ve pushed your code to GitHub and keep it relatively up-to-date for the best results.
Local Index
Many people think that Copilot just works out of the box, but it actually relies on indexing your codebase to provide relevant suggestions. If you do not push code to GitHub, you have to build a local index manually.
Here’s how local indexing works by the size of your project:
- Small projects (less than 750 files): Copilot automatically builds an advanced local index
- Medium projects (750-2500 files): Run the “Build local workspace index” command once
- Large projects (more than 2500 files): Falls back to basic indexing (see below)
Building the initial index takes some time, especially when switching git branches with many changes. You can track the progress in the Copilot status dashboard.
VScode commands can be invoked by pressing F1 and typing the command name as you see it above.
Basic Index
For projects without a remote index and more than 2500 files, Copilot uses a basic index. This uses simpler algorithms optimized for larger codebases running locally.
The basic index works fine for most questions, but if you notice Copilot struggling with your codebase, consider upgrading to a remote index.
What Gets Indexed?
Copilot indexes relevant text files from your project - not limited to specific file types or programming languages. However, it’s smart enough to skip common irrelevant files like .tmp
or .out
files.
It also respects your VScode files.exclude
settings and .gitignore
file.
Note: Binary files like images or PDFs aren’t indexed currently.
Getting the Most Out of Copilot
The way you phrase your questions can make or break your Copilot experience. Here are some tips to get better results:
Be Specific and Clear
Avoid vague terms like “what does this do” - Copilot won’t know if you’re asking about the current file, the whole project, or your last answer. Instead, be explicit about what you’re asking. Imagine asking someone for directions, you wouldn’t just say “take me there.” You’d provide specific landmarks or street names to help them understand your request.
Use the Right Language
Include terms and concepts that actually appear in your code or documentation. If your codebase uses specific terminology, use that same terminology in your questions. LLMs can match terminology and concepts effectively, so leverage that to improve your queries.
Check Your References
Always review the files Copilot references in its response. Are they relevant? If not, try rephrasing your question or add other files or open other files. Sometimes closing opened tabs can help refresh the context.
Provide Context
Make Copilot’s job easier by selecting relevant code or using chat variables like #editor
, #selection
, or #filename
.
However, do not try to overwhelm Copilot with too much context at once. Too much context can lead to confusion and less relevant responses.
Sometimes less is more, so try to keep your questions focused and concise.
Provide template
When asking for code snippets or examples, provide a clear template or structure for what you’re looking for. This helps Copilot understand your request better and generate more relevant responses.
Reference
For more information about Copilot’s features, check out the GitHub Copilot documentation.
Socials
Thanks for reading this article!
For more content like this, follow me here or on X or LinkedIn.