diff --git a/.github/images/github-pat-creation.png b/.github/images/github-pat-creation.png new file mode 100644 index 00000000..cf5957bf Binary files /dev/null and b/.github/images/github-pat-creation.png differ diff --git a/.github/images/gitlab-pat-creation.png b/.github/images/gitlab-pat-creation.png new file mode 100644 index 00000000..05eb15df Binary files /dev/null and b/.github/images/gitlab-pat-creation.png differ diff --git a/Dockerfile b/Dockerfile index 7fbe84a6..d7aea136 100644 --- a/Dockerfile +++ b/Dockerfile @@ -69,7 +69,7 @@ COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf COPY entrypoint.sh ./entrypoint.sh RUN chmod +x ./entrypoint.sh -COPY sample-config.json . +COPY default-config.json . EXPOSE 3000 ENV PORT=3000 diff --git a/README.md b/README.md index 7dcd9d6f..4cd10c08 100644 --- a/README.md +++ b/README.md @@ -25,10 +25,10 @@ Sourcebot is a fast code indexing and search tool for your codebases. It is buil ## Features - 💻 **One-command deployment**: Get started instantly using Docker on your own machine. -- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories (GitHub, GitLab, BitBucket). +- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories in GitHub or GitLab. - ⚡**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine. - 📂 **Full file visualization**: Instantly view the entire file when selecting any search result. -- 🎨 **Modern web application**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation +- 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation You can try out our public hosted demo [here](https://demo.sourcebot.dev/)! @@ -45,9 +45,10 @@ Navigate to `localhost:3000` to start searching the Sourcebot repo. Want to sear
What does this command do? -- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot). You'll need to make sure you have [docker installed](https://docs.docker.com/get-started/get-docker/) to do this. -- Sourcebot will index itself to prepare for your search request. -- Map port 3000 between your machine and the docker image (`-p 3000:3000`). +- Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot). Make sure you have [docker installed](https://docs.docker.com/get-started/get-docker/). +- Read the repos listed in [default config](./default-config.json) and start indexing them. +- Map port 3000 between your machine and the docker image. +- Starts the web server on port 3000.
## Configuring Sourcebot @@ -56,42 +57,76 @@ Sourcebot supports indexing and searching through public and private repositorie GitHub icon - GitHub, -GitLab, and - BitBucket. This section will guide you through configuring the repositories that Sourcebot indexes. + GitHub and GitLab. This section will guide you through configuring the repositories that Sourcebot indexes. 1. Create a new folder on your machine that stores your configs and `.sourcebot` cache, and navigate into it: -``` -mkdir sourcebot_workspace -cd sourcebot_workspace -``` + ```sh + mkdir sourcebot_workspace + cd sourcebot_workspace + ``` -2. Create a new config following the [configuration schema](https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json) to specify which repositories Sourcebot should index. For example to index Sourcebot itself: +2. Create a new config following the [configuration schema](./schemas/index.json) to specify which repositories Sourcebot should index. For example, to index [llama.cpp](https://github.com/ggerganov/llama.cpp): -``` -touch my_config.json -echo `{ - "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", - "Configs": [ - { - "Type": "github", - "GitHubOrg": "TaqlaAI", - "Name": "sourcebot" - } - ] -}` > my_config.json -``` + ```sh + touch my_config.json + echo '{ + "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", + "Configs": [ + { + "Type": "github", + "GitHubUser": "ggerganov", + "Name": "^llama\\.cpp$" + } + ] + }' > my_config.json + ``` -3. Run Sourcebot and point it to the new config you created: + (For more examples, see [example-config.json](./example-config.json). For additional usage information, see the [configuration schema](./schemas/index.json)). -``` -docker run -p 3000:3000 --rm --name sourcebot -e CONFIG_PATH=./my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main -``` +3. Run Sourcebot and point it to the new config you created with the `-e CONFIG_PATH` flag: -This command will also mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache. + ```sh + docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e CONFIG_PATH=/data/my_config.json ghcr.io/taqlaai/sourcebot:main + ``` -### (Optional) Provide an access token to index private repositories -In order to allow Sourcebot to index your private repositories, you must provide it with an access token. +
+ What does this command do? + + - Pull and run the Sourcebot docker image from [ghcr.io/taqlaai/sourcebot:main](https://github.com/taqlaai/sourcebot/pkgs/container/sourcebot). + - Mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache. + - Mirrors (clones) llama.cpp at `HEAD` into `.sourcebot/github/ggerganov/llama.cpp`. + - Indexes llama.cpp into a .zoekt index file in `.sourcebot/index/`. + - Map port 3000 between your machine and the docker image. + - Starts the web server on port 3000. +
+
+ + You should see a `.sourcebot` folder in your current directory. This folder stores a cache of the repositories zoekt has indexed. The `HEAD` commit of a repository is re-indexed [every hour](https://github.com/TaqlaAI/zoekt/blob/11b7713f1fb511073c502c41cea413d616f7761f/cmd/zoekt-indexserver/main.go#L86). Indexing private repos? See [Providing an access token](#providing-an-access-token). + +
+ Using GitLab? + + _tl;dr: A `GITLAB_TOKEN` is required to index GitLab repositories (both private & public). See [Providing an access token](#providing-an-access-token)._ + + Currently, the GitLab indexer is restricted to only indexing repositories that the associated `GITLAB_TOKEN` has access to. For example, if the token has access to `foo`, `bar`, and `baz` repositories, the following config will index all three: + + ```sh + { + "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", + "Configs": [ + { + "Type": "gitlab" + } + ] + } + ``` + + See [Providing an access token](#providing-an-access-token). +
+
+ +## Providing an access token +This will depend code hosting platform you're using:
@@ -102,86 +137,83 @@ In order to allow Sourcebot to index your private repositories, you must provide GitHub -Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope. +In order to index private repositories, you'll need to generate a GitHub Personal Access Token (PAT) and pass it to Sourcebot. Create a new PAT [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope: -You'll need to pass this PAT each time you run Sourcebot by setting the GITHUB_TOKEN environment variable: +![GitHub PAT creation](.github/images/github-pat-creation.png) + +You'll need to pass this PAT each time you run Sourcebot by setting the `GITHUB_TOKEN` environment variable:
-docker run -p 3000:3000 --rm --name sourcebot -e GITHUB_TOKEN=[your-github-token] -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
+docker run -p 3000:3000 --rm --name sourcebot -e GITHUB_TOKEN=[your-github-token] -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
 
-
GitLab -TODO +>[!NOTE] +> An access token is required to index GitLab repositories (both private & public) since the GitLab indexer needs the token to determine which repositories to index. See [example-config.json](./example-config.json) for example usage. + +Generate a GitLab Personal Access Token (PAT) [here](https://gitlab.com/-/user_settings/personal_access_tokens) and make sure you select the `read_api` scope: + +![GitLab PAT creation](.github/images/gitlab-pat-creation.png) + +You'll need to pass this PAT each time you run Sourcebot by setting the `GITLAB_TOKEN` environment variable: + +
+docker run -p 3000:3000 --rm --name sourcebot -e GITLAB_TOKEN=[your-gitlab-token] -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/taqlaai/sourcebot:main
+
-
- BitBucket - -TODO - -
## Build from source >[!NOTE] ->You don't need to build Sourcebot in order to use it! If you'd just like to use Sourcebot, please read [how to configure Sourcebot](#configuring-sourcebot). - -If you'd like to make changes to Sourcebot you'll need to build from source: +> Building from source is only required if you'd like to contribute. The recommended way to use Sourcebot is to use the [pre-built docker image](https://github.com/TaqlaAI/sourcebot/pkgs/container/sourcebot). 1. Install go and NodeJS. Note that a NodeJS version of at least `21.1.0` is required. -2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt-indexserver): - Mac: `brew install universal-ctags` - Ubuntu: `apt-get install universal-ctags` +2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt-indexserver) + ```sh + // macOS: + brew install universal-ctags + + // Linux: + apt-get install universal-ctags + ``` 3. Clone the repository with submodules: ```sh git clone --recurse-submodules https://github.com/TaqlaAI/sourcebot.git ``` -4. Run make to build zoekt and install dependencies: +4. Run `make` to build zoekt and install dependencies: ```sh cd sourcebot make ``` -The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively. + The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively. -3. Create a `config.json` file and list the repositories you want to index. The JSON schema defined in [index.json](./schemas/index.json) defines the structure of the config file and the available options. For example, if we want to index Sourcebot on its own code, we could use the following config found in `sample-config.json`: +5. Create a `config.json` file at the repository root. See [Configuring Sourcebot](#configuring-sourcebot) for more information. - ```json - { - "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", - "Configs": [ - { - "Type": "github", - "GitHubOrg": "TaqlaAI", - "Name": "sourcebot" - } - ] - } - ``` - -4. Create a Personal Access Token (PAT) to authenticate with a code host: +6. (Optional) Depending on your `config.json`, you may need to pass an access token to Sourcebot:
-
+
GitHub icon GitHub - - Generate a GitHub Personal Access Token (PAT) [here](https://github.com/settings/tokens/new). If you are indexing public repositories only, you can select the `public_repo` scope, otherwise you will need the `repo` scope. + - Create a text file named `.github-token` **in your home directory** and paste the token in it. The file should look like: + First, generate a personal access token (PAT). See [Providing an access token](#providing-an-access-token). + + Next, Create a text file named `.github-token` **in your home directory** and paste the token in it. The file should look like: ```sh ghp_... ``` @@ -190,24 +222,24 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`
GitLab - TODO -
+ First, generate a personal access token (PAT). See [Providing an access token](#providing-an-access-token). -
- BitBucket - TODO + Next, Create a text file named `.gitlab-token` **in your home directory** and paste the token in it. The file should look like: + ```sh + glpat-... + ``` + zoekt will [read this file](https://github.com/TaqlaAI/zoekt/blob/11b7713f1fb511073c502c41cea413d616f7761f/cmd/zoekt-mirror-gitlab/main.go#L43) to authenticate with GitLab.
-5. Start Sourcebot with the command: +7. Start Sourcebot with the command: ```sh yarn dev ``` A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`. -6. Go to `http://localhost:3000` - once an index has been created, you can start searching. - +8. Start searching at `http://localhost:3000`. ## Telemetry diff --git a/sample-config.json b/default-config.json similarity index 100% rename from sample-config.json rename to default-config.json diff --git a/entrypoint.sh b/entrypoint.sh index 09a4c8ef..708b1822 100644 --- a/entrypoint.sh +++ b/entrypoint.sh @@ -32,11 +32,11 @@ fi if echo "$CONFIG_PATH" | grep -qE '^https?://'; then if ! curl --output /dev/null --silent --head --fail "$CONFIG_PATH"; then echo -e "\e[33m[Warning] Remote config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m" - CONFIG_PATH="./sample-config.json" + CONFIG_PATH="./default-config.json" fi elif [ ! -f "$CONFIG_PATH" ]; then echo -e "\e[33m[Warning] Config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m" - CONFIG_PATH="./sample-config.json" + CONFIG_PATH="./default-config.json" fi echo -e "\e[34m[Info] Using config file at: '$CONFIG_PATH'.\e[0m" diff --git a/example-config.json b/example-config.json new file mode 100644 index 00000000..4f3884b9 --- /dev/null +++ b/example-config.json @@ -0,0 +1,79 @@ +{ + "$schema": "https://raw.githubusercontent.com/TaqlaAI/sourcebot/main/schemas/index.json", + "Configs": [ + // ~~~~~~~~~~~~ GitHub Examples ~~~~~~~~~~~~ + // Index all repos in organization "my-org". + { + "Type": "github", + "GitHubOrg": "my-org" + }, + // Index all repos in user "my-user". + { + "Type": "github", + "GitHubUser": "my-user" + }, + // Index repos foo & bar in organization "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "Name": "^(foo|bar)$" + }, + + // Index all repos except foo & bar in organization "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "Exclude": "^(foo|bar)$" + }, + // Index all repos that contain topic "topic_a" or "topic_b" in organization "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "Topics": ["topic_a", "topic_b"] + }, + // Index all repos that _do not_ contain "topic_x" and "topic_y" in organization "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "ExcludeTopics": ["topic_x", "topic_y"] + }, + // Index all repos in organization, including forks in "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "IncludeForks": true /* default: false */ + }, + // Index all repos in organization, excluding repos that are archived in "my-org". + { + "Type": "github", + "GitHubOrg": "my-org", + "NoArchived": true /* default: false */ + } + + // ~~~~~~~~~~~~ GitLab Examples ~~~~~~~~~~~~ + // Index all repos visible to the GITLAB_TOKEN. + { + "Type": "gitlab" + }, + // Index all repos visible to the GITLAB_TOKEN (custom GitLab URL). + { + "Type": "gitlab", + "GitLabURL": "https://gitlab.example.com/api/v4/" /* default: https://gitlab.com/api/v4/ */ + } + // Index all repos (public only) visible to the GITLAB_TOKEN. + { + "Type": "gitlab", + "OnlyPublic": true + }, + // Index only the repos foo & bar. + { + "Type": "gitlab", + "Name": "^(foo|bar)$" + }, + // Index all repos except fizz & buzz visible to the GITLAB_TOKEN. + { + "Type": "gitlab", + "Exclude": "^(fizz|buzz)$" + }, + ] +} \ No newline at end of file diff --git a/schemas/index.json b/schemas/index.json index 9fc6d54f..2fe2195a 100644 --- a/schemas/index.json +++ b/schemas/index.json @@ -5,12 +5,14 @@ "RepoNameRegexIncludeFilter": { "type": "string", "description": "Only clone repos whose name matches the given regexp.", - "format": "regexp" + "format": "regexp", + "default": "^(foo|bar)$" }, "RepoNameRegexExcludeFilter": { "type": "string", "description": "Don't mirror repos whose names match this regexp.", - "format": "regexp" + "format": "regexp", + "default": "^(fizz|buzz)$" }, "ZoektConfig": { "anyOf": [ @@ -110,8 +112,7 @@ } }, "required": [ - "Type", - "GitLabURL" + "Type" ], "additionalProperties": false }