sourcebot/packages/mcp
Brendan Kellam f3a8fa3dab
Some checks failed
Publish to ghcr / build (linux/amd64, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled
Publish to ghcr / build (linux/arm64, blacksmith-8vcpu-ubuntu-2204-arm) (push) Has been cancelled
Update Roadmap Released / update (push) Has been cancelled
Publish to ghcr / merge (push) Has been cancelled
feat(web): Streamed code search (#623)
* generate protobuf types

* stream poc over SSE

* wip: make stream search api follow existing schema. Modify UI to support streaming

* fix scrolling issue

* Dockerfile

* wip on lezer parser grammar for query language

* add lezer tree -> grpc transformer

* remove spammy log message

* fix syntax highlighting by adding a module resolution for @lezer/common

* further wip on query language

* Add case sensitivity and regexp toggles

* Improved type safety / cleanup for query lang

* support search contexts

* update Dockerfile with query langauge package

* fix filter

* Add skeletons to filter panel when search is streaming

* add client side caching

* improved cancelation handling

* add isSearchExausted flag for flagging when a search captured all results

* Add back posthog search_finished event

* remove zoekt tenant enforcement

* migrate blocking search over to grpc. Centralize everything in searchApi

* branch handling

* plumb file weburl

* add repo_sets filter for repositories a user has access to

* refactor a bunch of stuff + add support for passing in Query IR to search api

* refactor

* dev README

* wip on better error handling

* error handling for stream path

* update mcp

* changelog wip

* type fix

* style

* Support rev:* wildcard

* changelog

* changelog nit

* feedback

* fix build

* update docs and remove uneeded test file
2025-11-22 15:33:31 -08:00
..
src feat(web): Streamed code search (#623) 2025-11-22 15:33:31 -08:00
.gitignore Sourcebot MCP (#292) 2025-05-07 16:21:05 -07:00
.npmignore Sourcebot MCP (#292) 2025-05-07 16:21:05 -07:00
CHANGELOG.md feat(web): Streamed code search (#623) 2025-11-22 15:33:31 -08:00
Dockerfile [packages/mcp] deployment: Dockerfile and Smithery config (#300) 2025-05-08 09:54:36 -07:00
package.json release @sourcebot/mcp v1.0.9 2025-11-17 17:11:03 -08:00
README.md feat(mcp): Add pagination and filtering to list_repos tool (#614) 2025-11-17 17:08:20 -08:00
smithery.yaml [packages/mcp] deployment: Dockerfile and Smithery config (#300) 2025-05-08 09:54:36 -07:00
tsconfig.json Sourcebot MCP (#292) 2025-05-07 16:21:05 -07:00

Sourcebot MCP - Fetch code context from GitHub, GitLab, Bitbucket, and more

Sourcebot GitHub Docs npm

The Sourcebot MCP server gives your LLM agents the ability to fetch code context across thousands of repos hosted on GitHub, GitLab, BitBucket and more. Ask your LLM a question, and the Sourcebot MCP server will fetch relevant context from its index and inject it into your chat session. Some use cases this unlocks include:

  • Enriching responses to user requests:

    • "What repositories are using internal library X?"
    • "Provide usage examples of the CodeMirror component"
    • "Where is the useCodeMirrorTheme hook defined?"
    • "Find all usages of deprecatedApi across all repos"
  • Improving reasoning ability for existing horizontal agents like AI code review, docs generation, etc.

    • "Find the definitions for all functions in this diff"
    • "Document what systems depend on this class"
  • Building custom LLM horizontal agents like like compliance auditing agents, migration agents, etc.

    • "Find all instances of hardcoded credentials"
    • "Identify repositories that depend on this deprecated api"

Getting Started

  1. Install Node.JS >= v18.0.0.

  2. (optional) Spin up a Sourcebot instance by following this guide. The host url of your instance (e.g., http://localhost:3000) is passed to the MCP server via the SOURCEBOT_HOST url. This allows you to control which repos Sourcebot MCP fetches context from (including private repos).

    If a host is not provided, then the server will fallback to using the demo instance hosted at https://demo.sourcebot.dev. You can see the list of repositories indexed here. Add additional repositories by opening a PR.

  3. Install @sourcebot/mcp into your MCP client:

    Cursor

    Cursor MCP docs

    Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server

    Paste the following into your ~/.cursor/mcp.json file. This will install Sourcebot globally within Cursor:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest" ],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    Windsurf

    Windsurf MCP docs

    Go to: Windsurf Settings -> Cascade -> Add Server -> Add Custom Server

    Paste the following into your mcp_config.json file:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest" ],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    VS Code

    VS Code MCP docs

    Add the following to your .vscode/mcp.json file:

    {
        "servers": {
            "sourcebot": {
                "type": "stdio",
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest"],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    
    Claude Code

    Claude Code MCP docs

    Run the following command:

    # SOURCEBOT_HOST env var is optional - if not specified,
    # https://demo.sourcebot.dev is used.
    claude mcp add sourcebot -e SOURCEBOT_HOST=http://localhost:3000 -- npx -y @sourcebot/mcp@latest
    
    Claude Desktop

    Claude Desktop MCP docs

    Add the following to your claude_desktop_config.json:

    {
        "mcpServers": {
            "sourcebot": {
                "command": "npx",
                "args": ["-y", "@sourcebot/mcp@latest"],
                // Optional - if not specified, https://demo.sourcebot.dev is used
                "env": {
                    "SOURCEBOT_HOST": "http://localhost:3000"
                }
            }
        }
    }
    

    Alternatively, you can install using via Smithery. For example:

    npx -y @smithery/cli install @sourcebot-dev/sourcebot --client claude
    

  1. Tell your LLM to use sourcebot when prompting.

For a more detailed guide, checkout the docs.

Available Tools

search_code

Fetches code that matches the provided regex pattern in query.

Parameters
Name Required Description
query yes Regex pattern to search for. Escape special characters and spaces with a single backslash (e.g., 'console.log', 'console\ log').
filterByRepoIds no Restrict search to specific repository IDs (from 'list_repos'). Leave empty to search all.
filterByLanguages no Restrict search to specific languages (GitHub linguist format, e.g., Python, JavaScript).
caseSensitive no Case sensitive search (default: false).
includeCodeSnippets no Include code snippets in results (default: false).
maxTokens no Max tokens to return (default: env.DEFAULT_MINIMUM_TOKENS).

list_repos

Lists repositories indexed by Sourcebot with optional filtering and pagination.

Parameters
Name Required Description
query no Filter repositories by name (case-insensitive).
pageNumber no Page number (1-indexed, default: 1).
limit no Number of repositories per page (default: 50).

get_file_source

Fetches the source code for a given file.

Parameters
Name Required Description
fileName yes The file to fetch the source code for.
repoId yes The Sourcebot repository ID.

Supported Code Hosts

Sourcebot supports the following code hosts:

| Don't see your code host? Open a feature request.

Future Work

Currently, Sourcebot only supports regex-based code search (powered by zoekt under the hood). It is great for scenarios when the agent is searching for is something that is super precise and well-represented in the source code (e.g., a specific function name, a error string, etc.). It is not-so-great for fuzzy searches where the objective is to find some loosely defined category or concept in the code (e.g., find code that verifies JWT tokens). The LLM can approximate this by crafting regex searches that attempt to capture a concept (e.g., it might try a query like "jwt|token|(verify|validate).*(jwt|token)"), but often yields sub-optimal search results that aren't related. Tools like Cursor solve this with embedding models to capture the semantic meaning of code, allowing for LLMs to search using natural language. We would like to extend Sourcebot to support semantic search and expose this capability over MCP as a tool (e.g., semantic_search_code tool). GitHub Discussion