9.8 KiB
Blazingly fast code search 🏎️
About
Sourcebot is a fast code indexing and search tool for your codebases. It is built ontop of the zoekt indexer, originally authored by Han-Wen Nienhuys and now maintained by Sourcegraph.
Getting Started
Using Docker
-
Install
Docker
-
Create a
config.jsonfile and list the repositories you want to index. The JSON schema index.json defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found insample-config.json:{ "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", "Configs": [ { "Type": "github", "GitHubOrg": "TaqlaAI", "Name": "^sourcebot$" } ] }
Sourcebot also supports indexing GitLab & BitBucket. Checkout the index.json for a full list of available options.
-
Create a Personal Access Token (PAT) to authenticate with a code host(s):
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you're only indexing public repositories select the `public_repo` scope; otherwise, select the `repo` scope.GitHub
You'll need to pass this PAT each time you run Sourcebot, so we recommend adding it as an environment variable. In this guide, we'll add the Github PAT as an environment variable called
GITHUB_TOKEN:export GITHUB_TOKEN=<your-token-here>If you'd like to persist this environment variable across shell sessions, please add this line to your shell config file (ex.
~/.bashrc,~/.bash_profile, etc)GitLab
TODO
BitBucket
TODO
-
Launch the latest image from the ghcr registry:
GitHub
Run the
sourcebotdocker image, passing in the Github PAT you generated in the previous step as an environment variable calledGITHUB_TOKEN:docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITHUB_TOKEN=$GITHUB_TOKEN ghcr.io/taqlaai/sourcebot:mainGitLab
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITLAB_TOKEN=<token> ghcr.io/taqlaai/sourcebot:mainBitBucket
TODO
Two things should happen: (1) a
.sourcebotdirectory will be created containing the mirror repositories and indexes, and (2) you will see output similar to:INFO spawned: 'node-server' with pid 10 INFO spawned: 'zoekt-indexserver' with pid 11 INFO spawned: 'zoekt-webserver' with pid 12 run [zoekt-mirror-github -dest /data/.sourcebot/repos -delete -org <org>] ... INFO success: node-server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) INFO success: zoekt-indexserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) INFO success: zoekt-webserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)zoekt will now index your repositories (at
HEAD). By default, it will re-index existing repositories every hour, and discover new repositories every 24 hours. -
Go to
http://localhost:3000- once an index has been created, you can start searching.
Building Sourcebot
-
Install
go and
NodeJS. Note that a NodeJS version of at least
21.1.0is required. -
Install ctags (required by zoekt-indexserver): Mac:
brew install universal-ctagsUbuntu:apt-get install universal-ctags -
Clone the repository with submodules:
git clone --recurse-submodules https://github.com/TaqlaAI/sourcebot.git -
Run make to build zoekt and install dependencies:
cd sourcebot make
The zoekt binaries and web dependencies are placed into bin and node_modules respectively.
-
Create a
config.jsonfile and list the repositories you want to index. The JSON schema defined in index.json defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found insample-config.json:{ "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", "Configs": [ { "Type": "github", "GitHubOrg": "TaqlaAI", "Name": "^sourcebot$" } ] } -
Create a Personal Access Token (PAT) to authenticate with a code host:
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you are indexing public repositories only, you can select the `public_repo` scope, otherwise you will need the `repo` scope.GitHub
Create a text file named
.github-tokenin your home directory and paste the token in it. The file should look like:ghp_...zoekt will read this file to authenticate with GitHub.
TODOGitLab
TODOBitBucket
-
Start Sourcebot with the command:
yarn devA
.sourcebotdirectory will be created and zoekt will begin to index the repositories found givenconfig.json. -
Go to
http://localhost:3000- once an index has been created, you can start searching.
Telemetry
By default, Sourcebot collects anonymized usage data through PostHog to help us improve the performance and reliability of our tool. We do not collect or transmit any information related to your codebase. All events are sanitized to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)
If you'd like to disable all telemetry, you can do so by setting the environment variable SOURCEBOT_TELEMETRY_DISABLED to 1 in the docker run command:
docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 /* additional args */ ghcr.io/taqlaai/sourcebot:main
Or if you are building locally, add the following to your .env file:
SOURCEBOT_TELEMETRY_DISABLED=1
NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=1