sourcebot/README.md

202 lines
8.5 KiB
Markdown
Raw Normal View History

<div align="center">
2024-09-07 22:59:47 +00:00
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/logo_dark.png">
<img height="150" src=".github/images/logo_light.png">
2024-09-07 22:59:47 +00:00
</picture>
</div>
<p align="center">
Blazingly fast code search 🏎️
</p>
<p align="center">
<a href="https://demo.sourcebot.dev"><img src="https://img.shields.io/badge/Try the Demo!-blue?logo=googlechrome&logoColor=orange"/></a>
<a href="mailto:brendan@sourcebot.dev"><img src="https://img.shields.io/badge/Email%20Us-brightgreen" /></a>
<a href="https://github.com/TaqlaAI/sourcebot/blob/main/LICENSE"><img src="https://img.shields.io/github/license/TaqlaAI/sourcebot"/></a>
<a href="https://github.com/TaqlaAI/sourcebot/actions/workflows/ghcr-publish.yml"><img src="https://img.shields.io/github/actions/workflow/status/TaqlaAI/sourcebot/ghcr-publish.yml"/><a>
<a href="https://github.com/TaqlaAI/sourcebot/stargazers"><img src="https://img.shields.io/github/stars/TaqlaAI/sourcebot" /></a>
</p>
2024-09-08 00:17:32 +00:00
# About
Sourcebot is a fast code indexing and search tool for your codebases. It is built ontop of the [zoekt](https://github.com/sourcegraph/zoekt) indexer, originally authored by Han-Wen Nienhuys and now [maintained by Sourcegraph](https://sourcegraph.com/blog/sourcegraph-accepting-zoekt-maintainership).
![Demo video](https://github.com/user-attachments/assets/227176d8-fc61-42a9-8746-3cbc831f09e4)
2024-09-08 00:17:32 +00:00
# Getting Started
## Using Docker
2024-09-18 06:06:00 +00:00
0. Install <a href="https://docs.docker.com/get-started/get-docker/"><img src="https://www.docker.com/favicon.ico" width="16" height="16"> Docker </a>
1. Create a `config.json` file and list the repositories you want to index. The JSON schema [index.json](./schemas/index.json) defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found in `sample-config.json`:
2024-09-08 00:17:32 +00:00
2024-09-18 06:08:16 +00:00
```json
{
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
"Configs": [
{
"Type": "github",
"GitHubOrg": "TaqlaAI",
"Name": "^sourcebot$"
}
]
}
```
2024-09-08 00:17:32 +00:00
2024-09-18 06:06:00 +00:00
Sourcebot also supports indexing GitLab & BitBucket. Checkout the [index.json](./schemas/index.json) for a full list of available options.
2. Create a Personal Access Token (PAT) to authenticate with a code host(s):
<div>
<details open>
<summary><img src="https://github.com/favicon.ico" width="16" height="16" /> GitHub</summary>
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you are indexing public repositories only, you can select the `public_repo` scope, otherwise you will need the `repo` scope.
</details>
<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
TODO
2024-09-08 00:17:32 +00:00
2024-09-18 06:06:00 +00:00
</details>
2024-09-08 00:17:32 +00:00
2024-09-18 06:06:00 +00:00
<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
TODO
</details>
</div>
3. Launch the latest image from the [ghcr registry](https://github.com/TaqlaAI/sourcebot/pkgs/container/sourcebot):
2024-09-18 06:08:16 +00:00
<div>
<details open>
<summary><img src="https://github.com/favicon.ico" width="16" height="16" /> GitHub</summary>
2024-09-18 06:08:16 +00:00
```sh
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITHUB_TOKEN=<token> ghcr.io/taqlaai/sourcebot:main
```
</details>
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
```sh
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e GITLAB_TOKEN=<token> ghcr.io/taqlaai/sourcebot:main
```
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
</details>
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
TODO
2024-09-18 06:06:00 +00:00
2024-09-18 06:08:16 +00:00
</details>
</div>
2024-09-18 06:08:16 +00:00
Two things should happen: (1) a `.sourcebot` directory will be created containing the mirror repositories and indexes, and (2) you will see output similar to:
2024-09-18 06:08:16 +00:00
```sh
INFO spawned: 'node-server' with pid 10
INFO spawned: 'zoekt-indexserver' with pid 11
INFO spawned: 'zoekt-webserver' with pid 12
run [zoekt-mirror-github -dest /data/.sourcebot/repos -delete -org <org>]
...
INFO success: node-server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO success: zoekt-indexserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO success: zoekt-webserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
```
2024-09-08 00:17:32 +00:00
2024-09-18 06:08:16 +00:00
zoekt will now index your repositories (at `HEAD`). By default, it will re-index existing repositories every hour, and discover new repositories every 24 hours.
2024-09-08 00:17:32 +00:00
4. Go to `http://localhost:3000` - once an index has been created, you can start searching.
2024-09-08 00:17:32 +00:00
## Building Sourcebot
2024-09-18 06:06:00 +00:00
0. Install <a href="https://go.dev/"><img src="https://go.dev/favicon.ico" width="16" height="16"> go</a> and <a href="https://nodejs.org/"><img src="https://nodejs.org/favicon.ico" width="16" height="16"> NodeJS</a>
2024-09-08 00:17:32 +00:00
2024-09-18 06:06:00 +00:00
1. Clone the repository with submodules:
2024-09-18 06:08:16 +00:00
```sh
git clone --recurse-submodules https://github.com/TaqlaAI/sourcebot.git
```
2024-09-17 04:37:34 +00:00
2024-09-18 06:06:00 +00:00
2. Run make to build zoekt and install dependencies:
2024-09-18 06:08:16 +00:00
```sh
cd sourcebot
make
```
2024-09-17 04:37:34 +00:00
2024-09-18 06:06:00 +00:00
The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively.
2024-09-17 04:37:34 +00:00
2024-09-18 06:06:00 +00:00
3. Create a `config.json` file and list the repositories you want to index. The JSON schema defined in [index.json](./schemas/index.json) defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found in `sample-config.json`:
2024-09-08 00:17:32 +00:00
2024-09-18 06:08:16 +00:00
```json
{
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
"Configs": [
{
"Type": "github",
"GitHubOrg": "TaqlaAI",
"Name": "^sourcebot$"
}
]
}
```
2024-09-18 06:06:00 +00:00
4. Create a Personal Access Token (PAT) to authenticate with a code host:
<div>
<details open>
<summary><img src="https://github.com/favicon.ico" width="16" height="16" /> GitHub</summary>
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you are indexing public repositories only, you can select the `public_repo` scope, otherwise you will need the `repo` scope.
Create a text file named `.github-token` in your home directory and paste the token in it. The file should look like:
```sh
ghp_...
```
zoekt will [read this file](https://github.com/TaqlaAI/zoekt/blob/6a5753692b46e669f851ab23211e756a3677185d/cmd/zoekt-mirror-github/main.go#L60) to authenticate with GitHub.
</details>
<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
TODO
</details>
<details>
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
TODO
</details>
</div>
5. Start Sourcebot with the command:
2024-09-18 06:08:16 +00:00
```sh
yarn dev
```
2024-09-08 00:17:32 +00:00
2024-09-18 06:08:16 +00:00
A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`.
2024-09-08 00:17:32 +00:00
6. Go to `http://localhost:3000` - once an index has been created, you can start searching.
2024-09-18 06:06:00 +00:00
## Telemetry
2024-09-18 06:06:00 +00:00
By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We do not collect or transmit [any information related to your codebase](https://github.com/search?q=repo:TaqlaAI/sourcebot++captureEvent&type=code). All events are [sanitized](https://github.com/TaqlaAI/sourcebot/blob/main/src/app/posthogProvider.tsx) to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)
If you'd like to disable all telemetry, you can do so by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `1` in the docker run command:
2024-09-18 06:06:00 +00:00
```sh
docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 /* additional args */ ghcr.io/taqlaai/sourcebot:main
2024-09-18 06:06:00 +00:00
```
Or if you are building locally, add the following to your [.env](./.env) file:
```sh
SOURCEBOT_TELEMETRY_DISABLED=1
NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=1
```