mirror of
https://github.com/sourcebot-dev/sourcebot.git
synced 2025-12-11 20:05:25 +00:00
Readme V3 + config examples (#12)
This commit is contained in:
parent
e250fd9ae3
commit
d0d104a1e1
8 changed files with 198 additions and 86 deletions
BIN
.github/images/github-pat-creation.png
vendored
Normal file
BIN
.github/images/github-pat-creation.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 82 KiB |
BIN
.github/images/gitlab-pat-creation.png
vendored
Normal file
BIN
.github/images/gitlab-pat-creation.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 325 KiB |
|
|
@ -69,7 +69,7 @@ COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
|
|||
COPY entrypoint.sh ./entrypoint.sh
|
||||
RUN chmod +x ./entrypoint.sh
|
||||
|
||||
COPY sample-config.json .
|
||||
COPY default-config.json .
|
||||
|
||||
EXPOSE 3000
|
||||
ENV PORT=3000
|
||||
|
|
|
|||
176
README.md
176
README.md
|
|
@ -25,10 +25,10 @@ Sourcebot is a fast code indexing and search tool for your codebases. It is buil
|
|||
|
||||
## Features
|
||||
- 💻 **One-command deployment**: Get started instantly using Docker on your own machine.
|
||||
- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories (GitHub, GitLab, BitBucket).
|
||||
- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories in GitHub or GitLab.
|
||||
- ⚡**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine.
|
||||
- 📂 **Full file visualization**: Instantly view the entire file when selecting any search result.
|
||||
- 🎨 **Modern web application**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation
|
||||
- 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation
|
||||
|
||||
You can try out our public hosted demo [here](https://demo.sourcebot.dev/)!
|
||||
|
||||
|
|
@ -45,9 +45,10 @@ Navigate to `localhost:3000` to start searching the Sourcebot repo. Want to sear
|
|||
<details>
|
||||
<summary>What does this command do?</summary>
|
||||
|
||||
- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot). You'll need to make sure you have [docker installed](https://docs.docker.com/get-started/get-docker/) to do this.
|
||||
- Sourcebot will index itself to prepare for your search request.
|
||||
- Map port 3000 between your machine and the docker image (`-p 3000:3000`).
|
||||
- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot). Make sure you have [docker installed](https://docs.docker.com/get-started/get-docker/).
|
||||
- Read the repos listed in [default config](./default-config.json) and start indexing them.
|
||||
- Map port 3000 between your machine and the docker image.
|
||||
- Starts the web server on port 3000.
|
||||
</details>
|
||||
|
||||
## Configuring Sourcebot
|
||||
|
|
@ -56,42 +57,76 @@ Sourcebot supports indexing and searching through public and private repositorie
|
|||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
|
||||
<img src="https://github.com/favicon.ico" width="16" height="16" alt="GitHub icon">
|
||||
</picture> GitHub,
|
||||
<img src="https://gitlab.com/favicon.ico" width="16" height="16" />GitLab, and
|
||||
<img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket. This section will guide you through configuring the repositories that Sourcebot indexes.
|
||||
</picture> GitHub and <img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab. This section will guide you through configuring the repositories that Sourcebot indexes.
|
||||
|
||||
1. Create a new folder on your machine that stores your configs and `.sourcebot` cache, and navigate into it:
|
||||
```
|
||||
mkdir sourcebot_workspace
|
||||
cd sourcebot_workspace
|
||||
```
|
||||
```sh
|
||||
mkdir sourcebot_workspace
|
||||
cd sourcebot_workspace
|
||||
```
|
||||
|
||||
2. Create a new config following the [configuration schema](https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json) to specify which repositories Sourcebot should index. For example to index Sourcebot itself:
|
||||
2. Create a new config following the [configuration schema](./schemas/index.json) to specify which repositories Sourcebot should index. For example, to index [llama.cpp](https://github.com/ggerganov/llama.cpp):
|
||||
|
||||
```
|
||||
touch my_config.json
|
||||
echo `{
|
||||
```sh
|
||||
touch my_config.json
|
||||
echo '{
|
||||
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
|
||||
"Configs": [
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "TaqlaAI",
|
||||
"Name": "sourcebot"
|
||||
"GitHubUser": "ggerganov",
|
||||
"Name": "^llama\\.cpp$"
|
||||
}
|
||||
]
|
||||
}` > my_config.json
|
||||
```
|
||||
}' > my_config.json
|
||||
```
|
||||
|
||||
3. Run Sourcebot and point it to the new config you created:
|
||||
(For more examples, see [example-config.json](./example-config.json). For additional usage information, see the [configuration schema](./schemas/index.json)).
|
||||
|
||||
```
|
||||
docker run -p 3000:3000 --rm --name sourcebot -e CONFIG_PATH=./my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
|
||||
```
|
||||
3. Run Sourcebot and point it to the new config you created with the `-e CONFIG_PATH` flag:
|
||||
|
||||
This command will also mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache.
|
||||
```sh
|
||||
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e CONFIG_PATH=/data/my_config.json ghcr.io/taqlaai/sourcebot:main
|
||||
```
|
||||
|
||||
### (Optional) Provide an access token to index private repositories
|
||||
In order to allow Sourcebot to index your private repositories, you must provide it with an access token.
|
||||
<details>
|
||||
<summary>What does this command do?</summary>
|
||||
|
||||
- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot).
|
||||
- Mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache.
|
||||
- Mirrors (clones) llama.cpp at `HEAD` into `.sourcebot/github/ggerganov/llama.cpp`.
|
||||
- Indexes llama.cpp into a .zoekt index file in `.sourcebot/index/`.
|
||||
- Map port 3000 between your machine and the docker image.
|
||||
- Starts the web server on port 3000.
|
||||
</details>
|
||||
<br>
|
||||
|
||||
You should see a `.sourcebot` folder in your current directory. This folder stores a cache of the repositories zoekt has indexed. The `HEAD` commit of a repository is re-indexed [every hour](https://github.com/TaqlaAI/zoekt/blob/11b7713f1fb511073c502c41cea413d616f7761f/cmd/zoekt-indexserver/main.go#L86). Indexing private repos? See [Providing an access token](#providing-an-access-token).
|
||||
|
||||
<details>
|
||||
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> Using GitLab?</summary>
|
||||
|
||||
_tl;dr: A `GITLAB_TOKEN` is required to index GitLab repositories (both private & public). See [Providing an access token](#providing-an-access-token)._
|
||||
|
||||
Currently, the GitLab indexer is restricted to only indexing repositories that the associated `GITLAB_TOKEN` has access to. For example, if the token has access to `foo`, `bar`, and `baz` repositories, the following config will index all three:
|
||||
|
||||
```sh
|
||||
{
|
||||
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
|
||||
"Configs": [
|
||||
{
|
||||
"Type": "gitlab"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
See [Providing an access token](#providing-an-access-token).
|
||||
</details>
|
||||
</br>
|
||||
|
||||
## Providing an access token
|
||||
This will depend code hosting platform you're using:
|
||||
|
||||
<div>
|
||||
<details>
|
||||
|
|
@ -102,76 +137,72 @@ In order to allow Sourcebot to index your private repositories, you must provide
|
|||
</picture> GitHub
|
||||
</summary>
|
||||
|
||||
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope.
|
||||
In order to index private repositories, you'll need to generate a GitHub Personal Access Token (PAT) and pass it to Sourcebot. Create a new PAT [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope:
|
||||
|
||||
You'll need to pass this PAT each time you run Sourcebot by setting the GITHUB_TOKEN environment variable:
|
||||

|
||||
|
||||
You'll need to pass this PAT each time you run Sourcebot by setting the `GITHUB_TOKEN` environment variable:
|
||||
|
||||
<pre>
|
||||
docker run -p 3000:3000 --rm --name sourcebot -e <b>GITHUB_TOKEN=[your-github-token]</b> -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
|
||||
docker run -p 3000:3000 --rm --name sourcebot -e <b>GITHUB_TOKEN=[your-github-token]</b> -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
|
||||
</pre>
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
|
||||
|
||||
TODO
|
||||
>[!NOTE]
|
||||
> An access token is <b>required</b> to index GitLab repositories (both private & public) since the GitLab indexer needs the token to determine which repositories to index. See [example-config.json](./example-config.json) for example usage.
|
||||
|
||||
Generate a GitLab Personal Access Token (PAT) [here](https://gitlab.com/-/user_settings/personal_access_tokens) and make sure you select the `read_api` scope:
|
||||
|
||||

|
||||
|
||||
You'll need to pass this PAT each time you run Sourcebot by setting the `GITLAB_TOKEN` environment variable:
|
||||
|
||||
<pre>
|
||||
docker run -p 3000:3000 --rm --name sourcebot -e <b>GITLAB_TOKEN=[your-gitlab-token]</b> -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
|
||||
</pre>
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
|
||||
|
||||
TODO
|
||||
|
||||
</details>
|
||||
</div>
|
||||
|
||||
|
||||
## Build from source
|
||||
>[!NOTE]
|
||||
>You don't need to build Sourcebot in order to use it! If you'd just like to use Sourcebot, please read [how to configure Sourcebot](#configuring-sourcebot).
|
||||
|
||||
If you'd like to make changes to Sourcebot you'll need to build from source:
|
||||
> Building from source is only required if you'd like to contribute. The recommended way to use Sourcebot is to use the [pre-built docker image](https://github.com/TaqlaAI/sourcebot/pkgs/container/sourcebot).
|
||||
|
||||
1. Install <a href="https://go.dev/doc/install"><img src="https://go.dev/favicon.ico" width="16" height="16"> go</a> and <a href="https://nodejs.org/"><img src="https://nodejs.org/favicon.ico" width="16" height="16"> NodeJS</a>. Note that a NodeJS version of at least `21.1.0` is required.
|
||||
|
||||
2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt-indexserver):
|
||||
Mac: `brew install universal-ctags`
|
||||
Ubuntu: `apt-get install universal-ctags`
|
||||
2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt-indexserver)
|
||||
```sh
|
||||
// macOS:
|
||||
brew install universal-ctags
|
||||
|
||||
// Linux:
|
||||
apt-get install universal-ctags
|
||||
```
|
||||
|
||||
3. Clone the repository with submodules:
|
||||
```sh
|
||||
git clone --recurse-submodules https://github.com/TaqlaAI/sourcebot.git
|
||||
```
|
||||
|
||||
4. Run make to build zoekt and install dependencies:
|
||||
4. Run `make` to build zoekt and install dependencies:
|
||||
```sh
|
||||
cd sourcebot
|
||||
make
|
||||
```
|
||||
|
||||
The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively.
|
||||
The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively.
|
||||
|
||||
3. Create a `config.json` file and list the repositories you want to index. The JSON schema defined in [index.json](./schemas/index.json) defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found in `sample-config.json`:
|
||||
5. Create a `config.json` file at the repository root. See [Configuring Sourcebot](#configuring-sourcebot) for more information.
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
|
||||
"Configs": [
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "TaqlaAI",
|
||||
"Name": "sourcebot"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
4. Create a Personal Access Token (PAT) to authenticate with a code host:
|
||||
6. (Optional) Depending on your `config.json`, you may need to pass an access token to Sourcebot:
|
||||
|
||||
<div>
|
||||
<details open>
|
||||
<details>
|
||||
<summary>
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
|
||||
|
|
@ -179,9 +210,10 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`
|
|||
</picture>
|
||||
GitHub
|
||||
</summary>
|
||||
Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you are indexing public repositories only, you can select the `public_repo` scope, otherwise you will need the `repo` scope.
|
||||
|
||||
Create a text file named `.github-token` **in your home directory** and paste the token in it. The file should look like:
|
||||
First, generate a personal access token (PAT). See [Providing an access token](#providing-an-access-token).
|
||||
|
||||
Next, Create a text file named `.github-token` **in your home directory** and paste the token in it. The file should look like:
|
||||
```sh
|
||||
ghp_...
|
||||
```
|
||||
|
|
@ -190,24 +222,24 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`
|
|||
|
||||
<details>
|
||||
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
|
||||
TODO
|
||||
</details>
|
||||
First, generate a personal access token (PAT). See [Providing an access token](#providing-an-access-token).
|
||||
|
||||
<details>
|
||||
<summary><img src="https://bitbucket.org/favicon.ico" width="16" height="16" /> BitBucket</summary>
|
||||
TODO
|
||||
Next, Create a text file named `.gitlab-token` **in your home directory** and paste the token in it. The file should look like:
|
||||
```sh
|
||||
glpat-...
|
||||
```
|
||||
zoekt will [read this file](https://github.com/TaqlaAI/zoekt/blob/11b7713f1fb511073c502c41cea413d616f7761f/cmd/zoekt-mirror-gitlab/main.go#L43) to authenticate with GitLab.
|
||||
</details>
|
||||
</div>
|
||||
|
||||
5. Start Sourcebot with the command:
|
||||
7. Start Sourcebot with the command:
|
||||
```sh
|
||||
yarn dev
|
||||
```
|
||||
|
||||
A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`.
|
||||
|
||||
6. Go to `http://localhost:3000` - once an index has been created, you can start searching.
|
||||
|
||||
8. Start searching at `http://localhost:3000`.
|
||||
|
||||
## Telemetry
|
||||
|
||||
|
|
|
|||
|
|
@ -32,11 +32,11 @@ fi
|
|||
if echo "$CONFIG_PATH" | grep -qE '^https?://'; then
|
||||
if ! curl --output /dev/null --silent --head --fail "$CONFIG_PATH"; then
|
||||
echo -e "\e[33m[Warning] Remote config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m"
|
||||
CONFIG_PATH="./sample-config.json"
|
||||
CONFIG_PATH="./default-config.json"
|
||||
fi
|
||||
elif [ ! -f "$CONFIG_PATH" ]; then
|
||||
echo -e "\e[33m[Warning] Config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m"
|
||||
CONFIG_PATH="./sample-config.json"
|
||||
CONFIG_PATH="./default-config.json"
|
||||
fi
|
||||
|
||||
echo -e "\e[34m[Info] Using config file at: '$CONFIG_PATH'.\e[0m"
|
||||
|
|
|
|||
79
example-config.json
Normal file
79
example-config.json
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
{
|
||||
"$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json",
|
||||
"Configs": [
|
||||
// ~~~~~~~~~~~~ GitHub Examples ~~~~~~~~~~~~
|
||||
// Index all repos in organization "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org"
|
||||
},
|
||||
// Index all repos in user "my-user".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubUser": "my-user"
|
||||
},
|
||||
// Index repos foo & bar in organization "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"Name": "^(foo|bar)$"
|
||||
},
|
||||
|
||||
// Index all repos except foo & bar in organization "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"Exclude": "^(foo|bar)$"
|
||||
},
|
||||
// Index all repos that contain topic "topic_a" or "topic_b" in organization "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"Topics": ["topic_a", "topic_b"]
|
||||
},
|
||||
// Index all repos that _do not_ contain "topic_x" and "topic_y" in organization "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"ExcludeTopics": ["topic_x", "topic_y"]
|
||||
},
|
||||
// Index all repos in organization, including forks in "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"IncludeForks": true /* default: false */
|
||||
},
|
||||
// Index all repos in organization, excluding repos that are archived in "my-org".
|
||||
{
|
||||
"Type": "github",
|
||||
"GitHubOrg": "my-org",
|
||||
"NoArchived": true /* default: false */
|
||||
}
|
||||
|
||||
// ~~~~~~~~~~~~ GitLab Examples ~~~~~~~~~~~~
|
||||
// Index all repos visible to the GITLAB_TOKEN.
|
||||
{
|
||||
"Type": "gitlab"
|
||||
},
|
||||
// Index all repos visible to the GITLAB_TOKEN (custom GitLab URL).
|
||||
{
|
||||
"Type": "gitlab",
|
||||
"GitLabURL": "https://gitlab.example.com/api/v4/" /* default: https://gitlab.com/api/v4/ */
|
||||
}
|
||||
// Index all repos (public only) visible to the GITLAB_TOKEN.
|
||||
{
|
||||
"Type": "gitlab",
|
||||
"OnlyPublic": true
|
||||
},
|
||||
// Index only the repos foo & bar.
|
||||
{
|
||||
"Type": "gitlab",
|
||||
"Name": "^(foo|bar)$"
|
||||
},
|
||||
// Index all repos except fizz & buzz visible to the GITLAB_TOKEN.
|
||||
{
|
||||
"Type": "gitlab",
|
||||
"Exclude": "^(fizz|buzz)$"
|
||||
},
|
||||
]
|
||||
}
|
||||
|
|
@ -5,12 +5,14 @@
|
|||
"RepoNameRegexIncludeFilter": {
|
||||
"type": "string",
|
||||
"description": "Only clone repos whose name matches the given regexp.",
|
||||
"format": "regexp"
|
||||
"format": "regexp",
|
||||
"default": "^(foo|bar)$"
|
||||
},
|
||||
"RepoNameRegexExcludeFilter": {
|
||||
"type": "string",
|
||||
"description": "Don't mirror repos whose names match this regexp.",
|
||||
"format": "regexp"
|
||||
"format": "regexp",
|
||||
"default": "^(fizz|buzz)$"
|
||||
},
|
||||
"ZoektConfig": {
|
||||
"anyOf": [
|
||||
|
|
@ -110,8 +112,7 @@
|
|||
}
|
||||
},
|
||||
"required": [
|
||||
"Type",
|
||||
"GitLabURL"
|
||||
"Type"
|
||||
],
|
||||
"additionalProperties": false
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in a new issue