From 7ce65cb66a2dec6257ce5f2b5c9a22f2568d579b Mon Sep 17 00:00:00 2001 From: bkellam Date: Tue, 25 Nov 2025 23:36:53 -0800 Subject: [PATCH] docs --- .../configuration/environment-variables.mdx | 1 + docs/docs/connections/overview.mdx | 20 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/docs/docs/configuration/environment-variables.mdx b/docs/docs/configuration/environment-variables.mdx index 87167858..e29fb88f 100644 --- a/docs/docs/configuration/environment-variables.mdx +++ b/docs/docs/configuration/environment-variables.mdx @@ -35,6 +35,7 @@ The following environment variables allow you to configure your Sourcebot deploy | `SOURCEBOT_STRUCTURED_LOGGING_FILE` | - |

Optional file to log to if structured logging is enabled

| | `SOURCEBOT_TELEMETRY_DISABLED` | `false` |

Enables/disables telemetry collection in Sourcebot. See [this doc](/docs/overview.mdx#telemetry) for more info.

| | `DEFAULT_MAX_MATCH_COUNT` | `10000` |

The default maximum number of search results to return when using search in the web app.

| +| `ALWAYS_INDEX_FILE_PATTERNS` | - |

A comma separated list of glob patterns matching file paths that should always be indexed, regardless of size or number of trigrams.

| ### Enterprise Environment Variables | Variable | Default | Description | diff --git a/docs/docs/connections/overview.mdx b/docs/docs/connections/overview.mdx index ab9f8ffc..cb3b1432 100644 --- a/docs/docs/connections/overview.mdx +++ b/docs/docs/connections/overview.mdx @@ -69,6 +69,26 @@ To learn more about how to create a connection for a specific code host, check o Missing your code host? [Submit a feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/issues/new?template=feature_request.md). +## Indexing Large Files + +By default, Sourcebot will skip indexing files that are larger than 2MB or have more than 20,000 trigrams. You can configure this by setting the `maxFileSize` and `maxTrigramCount` [settings](/docs/configuration/config-file#settings). + +These limits can be ignored for specific files by passing in a comma separated list of glob patterns matching file paths to the `ALWAYS_INDEX_FILE_PATTERNS` environment variable. For example: + +```bash +# Always index all .sum and .lock files +ALWAYS_INDEX_FILE_PATTERNS=**/*.sum,**/*.lock +``` + +Files that have been skipped are assigned the `skipped` language. You can view a list of all skipped files by using the following query: +``` +lang:skipped +``` + +## Indexing Binary Files + +Binary files cannot be indexed by Sourcebot. See [#575](https://github.com/sourcebot-dev/sourcebot/issues/575) for more information. + ## Schema reference ---