v3 effort (#158)

* SQL Database (#157)

* point zoekt to v3 branch

* bump zoekt version

* Add tenant ID concept into web app and backend (#160)

* hacked together a example of using zoekt grpc api

* provide tenant id to zoekt git indexer

* update zoekt version to point to multitenant branch

* pipe tenant id through header to zoekt

* remove incorrect submodule reference and settings typo

* update zoekt commit

* remove unused yarn script

* remove unused grpc client in web server

* remove unneeded deps and improve tenant id log

* pass tenant id when creating repo in db

* add mt yarn script

* add nocheckin comment to tenant id in v2 schema

---------

Co-authored-by: bkellam <bshizzle1234@gmail.com>

* bump zoekt version

* parallelize repo indexing (#163)

* hacked together a example of using zoekt grpc api

* provide tenant id to zoekt git indexer

* update zoekt version to point to multitenant branch

* pipe tenant id through header to zoekt

* remove incorrect submodule reference and settings typo

* update zoekt commit

* remove unused yarn script

* remove unused grpc client in web server

* remove unneeded deps and improve tenant id log

* pass tenant id when creating repo in db

* add mt yarn script

* add pol of bullmq into backend

* add better error handling and concurrency setting

* spin up redis instance in dockerfile

* cleanup transaction logic when adding repos to index queue

* add NEW index status fetch condition

* move bullmq deps to backend

---------

Co-authored-by: bkellam <bshizzle1234@gmail.com>

* Authentication (#164)

* Add Org table (#167)

* Move logout button & profile picture into settings dropdown (#172)

* Multi tenancy support in config syncer (#171)

* [wip] initial mt support in config syncer

* Move logout button & profile picture into settings dropdown (#172)

* update sync status properly and fix bug with multiple config in db case

* make config path required in single tenant mode

NOTE: deleting config/repos is currently not supported in multi tenancy case. Support for this will be added in a future PR

---------

Co-authored-by: Brendan Kellam <bshizzle1234@gmail.com>

* add tenant mode support in docker container:

* Organization switching & active org management (#173)

* updated syncedAt date after config sync:

* Migrate to postgres (#174)

* spin up postgres in docker container

* get initial pol of postgres db working in docker image

* spin up postgres server in dev case

* updated syncedAt date after config sync:

* remove unnecessary port expose in docker file

* Connection creation form (#175)

* fix issue with yarn dev startup

* init (#176)

* Add `@sourcebot/schemas` package (#177)

* Connection management (#178)

* add concept of secrets (#180)

* add @sourcebot/schemas package

* migrate things to use the schemas package

* Dockerfile support

* add secret table to schema

* Add concept of connection manager

* Rename Config->Connection

* Handle job failures

* Add join table between repo and connection

* nits

* create first version of crypto package

* add crypto package as deps to others

* forgot to add package changes

* add server action for adding and listing secrets, create test page for it

* add secrets page to nav menu

* add secret to config and support fetching it in backend

* reset secret form on successful submission

* add toast feedback for secrets form

* add instructions for adding encryption key to dev instructions

* add encryption key support in docker file

* add delete secret button

* fix nits from pr review

---------

Co-authored-by: bkellam <bshizzle1234@gmail.com>

* bump zoekt version

* enforce tenancy on search and repo listing endpoints (#181)

* enforce tenancy on search and repo listing

* remove orgId from request schemas

* adds garbage collection for repos (#182)

* refactor repo indexing logic into RepoManager

* wip cleanup stale repos

* add rest of gc logic

* set status to indexing properly

* add initial logic for staging environment

* try to move encryption key env decleration in docker file to fix build issues

* switch encryption key as build arg to se if that fixes build issues

* add deployment action for staging image

* try using mac github action runners instead

* switch to using arm64 runners on arm64 build

* change workflow names to fix trigger issue

* trigger staging actions to see if it works

* fix working directory typo and pray it doesnt push to prod

* checkout v3 when deploying staging

* try to change into the staging dir manuall

* dummy commit to trigger v3 workflows to test

* update staging deploy script to match new version in main

* reference proper image:tag in staging fly config

* update staging fly config to point to ghcr

* Connection management (#183)

* add invite system and google oauth provider (#185)

* add settings page with members list

* add invite to schema and basic create form

* add invite table

* add basic invite link copy button

* add auth invite accept case

* add non auth logic

* add google oauth provider

* fix reference to header component in connections

* add google logo to google oauth

* fix web build errors

* bump staging resources

* change staging cpu to perf

* add side bar nav in settings page

* improve styling of members page

* wip adding stripe checkout button

* wip onboarding flow

* add stripe subscription id to org

* save stripe session id and add manage subscription button in settings

* properly block access to pages if user isn't in an org

* wip add paywall

* Domain support

* Domain support (#188)

* Update Makefile to include crypto package when doing a make clean

* Add default for AUTH_URL in attempt to fix build

* attempt 2

* fix attempt #3: Do not require a encrpytion key at build time

* Fix generate script race condition

* Attempt #4

* add back paywall and also add support for incrememnting seat count on invite redemption

* prevent self invite

* action button styling in settings and toast on copy

* add ability to remove member from org

* move stripe product id to env var

* add await for blocking loop in backend

* add subscription info to billing page

* handle trial case in billing info page

* add trial duration indicator to nav bar

* check if domain starts or ends with dash

* remove unused no org component

* Generate AUTH_SECRET if not provided (#189)

* remove package lock file and fix prisma dep version

* revert dep version updates

* fix yarn.lock

* add auth and membership check to fetchSubscription

* properly handle invite redeem with no valid subscription case

* change back fetch subscription to not require org membership

* add back subscription check in invite redeem page

* Add stripe billing logic (#190)

* add side bar nav in settings page

* improve styling of members page

* wip adding stripe checkout button

* wip onboarding flow

* add stripe subscription id to org

* save stripe session id and add manage subscription button in settings

* properly block access to pages if user isn't in an org

* wip add paywall

* Domain support

* add back paywall and also add support for incrememnting seat count on invite redemption

* prevent self invite

* action button styling in settings and toast on copy

* add ability to remove member from org

* move stripe product id to env var

* add await for blocking loop in backend

* add subscription info to billing page

* handle trial case in billing info page

* add trial duration indicator to nav bar

* check if domain starts or ends with dash

* remove unused no org component

* remove package lock file and fix prisma dep version

* revert dep version updates

* fix yarn.lock

* add auth and membership check to fetchSubscription

* properly handle invite redeem with no valid subscription case

* change back fetch subscription to not require org membership

* add back subscription check in invite redeem page

---------

Co-authored-by: bkellam <bshizzle1234@gmail.com>

* fix nits

* remove providers check

* fix more nits

* change stripe init to be behind function

* fix publishible stripe key handling in docker container

* enforce owner perms (#191)

* add make owner logic, and owner perms for removal, invite, and manage subscription

* add change billing email card to billing settings

* enforce owner role in action level

* remove unused hover card component

* cleanup

* add back gitlab, gitea, and gerrit support (#184)

* add non github config definitions

* refactor github config compilation to seperate file

* add gitlab config compilation

* Connection management (#183)

* wip gitlab repo sync support

* fix gitlab zoekt metadata

* add gitea support

* add gerrit support

* Connection management (#183)

* add gerrit config compilation

* Connection management (#183)

---------

Co-authored-by: Brendan Kellam <bshizzle1234@gmail.com>

* fix apos usage in redeem page

* change csrf cookie to secure not host

* Credentials provider (#192)

* email password functionality

* feedback

* cleanup org's repos and shards if it's inactive (#194)

* add stripe subscription status and webhook

* add inactive org repo cleanup logic

* mark reactivated org connections for sync

* connections qol improvements (#195)

* add client side polling to connections list

* properly fetch repo image url

* add client polling to connection management page, and add ability to sync failed connections

* Fix build with suspense boundary

* improved fix

* add retries for 429 issues (#196)

* add connection compile retry and hard repo limit

* add more retry checks

* cleanup unused change

* address feedback

* fix build errors and add index concurrency env var

* add config upsert timeout env var

* Membership settings rework (#198)

* Add refined members list

* futher progress on members settings polish

* Remove old components

* feedback

* Magic links (#199)

* wip on magic link support

* Switch to nodemailer / resend for transactional mail

* Further cleanup

* Add stylized email using react-email

* fix

* Fix build

* db performance improvements and job resilience  (#200)

* replace upsert with seperate create many and raw update many calls

* add bulk repo status update and queue addition with priority

* add support for managed redis

* add note for changing raw sql on schema change

* remove non secret token options

* fix token examples in schema

* add better visualization for connection/repo errors and warnings (#201)

* replace upsert with seperate create many and raw update many calls

* add bulk repo status update and queue addition with priority

* add support for managed redis

* add note for changing raw sql on schema change

* add error package and use BackendException in connection manager

* handle connection failure display on web app

* add warning banner for not found orgs/repos/users

* add failure handling for gerrit

* add gitea notfound warning support

* add warning icon in connections list

* style nits

* add failed repo vis in connections list

* added retry failed repo index buttons

* move nav indicators to client with polling

* fix indicator flash issue and truncate large list results

* display error nav better

* truncate failed repo list in connection list item

* fix merge error

* fix merge bug

* add connection util file [wip]

* refactor notfound fetch logic and add missing error package to dockerfile

* move repeated logic to function and add zod schema for syncStatusMetadata

* add orgid unique constraint to repo

* revert repo compile update logic to upsert loop

* log upsert stats

* [temp] disable polling everywhere (#205)

* add health check endpoint

* Refined onboarding flow (#202)

* Redeem UX pass (#204)

* add log for health check

* fix new connection complete callback route

* add cpu split logic and only wait for postgres if we're going to connec to it

* Inline secret creation (#207)

* use docker scopes to try and improve caching

* Dummy change

* remove cpu split logic

* Add some instrumentation to web

* add posthog events on various user actions (#208)

* add page view event support

* add posthog events

* nit: remove unused import

* feedback

* fix merge error

* use staging posthog papik when building staging image

* fix other merge error and build warnings

* Add invite email (#209)

* wrap posthog provider in suspense to fix build error

* add grafana alloy config and setup (#210)

* add grafana alloy config and setup

* add basic repo prom metrics

* nits in dockerfile

* remove invalid characters when auto filling domain

* add login posthog events

* remove hard coded sourcebot.app references

* make repo garbage collection async (#211)

* add gc queue logic

* fix missing switch cases for gc status

* style org create form better with new staging domain

* change repo rm logic to be async

* simplify repo for inactive org query

* add grace period for garbage collecting repos

* make prom scrape interval 500ms

* fix typo in trial card

* onboarding tweaks

* rename some prom metrics and cleanup unused

* wipe existing repo if we've picked up a killed job to ensure good state

* Connections UX pass + query optimizations (#212)

* remove git & local schemas (#213)

* skip stripe checkout for trial + fix indexing in progress UI + additional schema validation (#214)

* add additional config validation

* wip bypass stripe checkout for trial

* fix stripe trial checkout bypass

* fix indexing in progress ui on home page

* add subscription checks, more schema validation, and fix issue with complete page

* dont display if no indexed repos

* fix skipping onboard complete check

* fix build error

* add back button in onboard connection creation flow

* Add back revision support (#215)

* fix build

* Fix bug with repository snapshot

* fix share links

* fix repo rm issue, 502 page, condition on test clock

* Make login and onboarding mobile friendly

* fix ordering of quick actions

* remove error msg dump on failed repo index job, and update indexedAt field

* Add mobile unsupported splash screne

* cherry pick fix for file links

* [Cherry Pick] Syntax reference guide (#169) (#216)

* Add .env to db gitignore

* fix case where we have repos but they're all failed for repo snapshot

* /settings/secrets page (#217)

* display domain properly in org create form

* Quick action tweaks (#218)

* revamp repo page (#220)

* wip repo table

* new repo page

* add indicator for when feedback is applied in repo page

* add repo button

* fetch connection data in one query

* fix styling

* fix (#219)

* remove / keyboard shortcut hint in search bar

* prevent switching to first page on data update and truncate long repo names in repo list

* General settings + cleanup (#221)

* General settings

* Add alert to org domain change

* First attempt at sending logs to grafana

* logs wip

* add alloy logs

* wip

* [temp] comment out loki for now

* update trial card content and add events for code host selection on onboard

* reduce scraping interval to 15s

* Add prometheus metric for pending repo indexing jobs

* switch magic link to invite code (#222)

* wip magic link codes

* pipe email to email provider properly

* remove magic link data cookie after sign in

* clean up unused imports

* dont remove cookie before we use it

* rm package-lock.json

* revert yarn files to v3 state

* switch email passing from cookie to search param

* add comment for settings dropdown auth update

* remove unused middleware file

* fix build error and warnings

* fix build error with useSearchParam not wrapped in suspense

* add sentry support to backend and webapp (#223)

* add sentry to web app

* set sentry environemnt from env var

* add sentry env replace logic in docker container

* wip add backend sentry

* add sentry to backend

* move dns to env var

* remove test exception

* Fix root domain issue on onboarding

* add setup sentry cli step to github action

* login to sentry

* fix sentry login in action

* Update grafana loki endpoint

* switch source map publish to runtime in entrypoint

* catch and rethrow simplegit exceptions

* alloy nits

* fix alloy

* backend logging (#224)

* revert grafana loki config

* fix login ui nits

* fix quick actions

* fix typo in secret creation

* fix private repo clone issue for gitlab

* add repo index timeout logic

* add posthog identify call after registeration

* various changes to add terms and security info (#225)

* add terms and security to footer

* add security card

* add demo card

* fix build error

* nit fix: center 'get in touch' on security card

* Dark theme improvements (#226)

* (fix) Fixed bug with gitlab and gitea not including hostname in the repoName

* Switch to using t3-env for env-var management (#230)

* Add missing env var

* fix build

* Centralize to using a single .env.development for development workflows (#231)

* Make billing optional (#232)

* Massage environment variables from strings to numbers (#234)

* Single tenancy & auth modes (#233)

* Add docs to this repo

* dummy change

* Declarative connection configuration (#235)

* fix build

* upgrade to next 14.2.25

* Improved database DX

* migrate to yarn v4

* Use origin from header for baseUrl of emails (instead of AUTH_URL). Also removed reference to hide scrollbars

* Remove SOURCEBOT_ENCRYPTION_KEY from build arg

* Fix issue with linking default user to org in single tenant + no-auth mode

* Fix fallback tokens (#242)

* add SECURITY_CARD_ENABLED flag

* Add repository weburl (#243)

* Random fixes and improvements (#244)

* add zoekt max wall time env var

* remove empty warning in docs

* fix reference in sh docs

* add connection manager upsert timeout env var

* Declarative connection cleanup + improvements (#245)

* change contact us footer in app to point to main contact form

* PostHog event pass (#246)

* fix typo

* Add sourcebot cloud environment prop to staging workflow

* Update generated files

* remove AUTH_URL since it unused and (likely) unnecessary

* Revert "remove AUTH_URL since it unused and (likely) unnecessary"

This reverts commit 1f4a5aed22.

* cleanup GitHub action releases (#252)

* remove alloy, change auth defaul to disabled, add settings page in me dropdown

* enforce connection management perms to owner (#253)

* enforce conneciton management perms to owner

* fix formatting

* more formatting

* naming nits

* fix var name error

* change empty repo set copy if auth is disabled

* add CONTRIBUTING.md file

* hide settings in dropdown with auth isnt enabled

* handle case where gerrit weburl is just gitiles path

* Docs overhall (#251)

* remove nocheckin

* fix build error

* remove v3 trigger from deploy staging

* fix build errors round 2

* another error fix

---------

Co-authored-by: msukkari <michael.sukkarieh@mail.mcgill.ca>
This commit is contained in:
Brendan Kellam 2025-03-31 22:34:42 -07:00 committed by GitHub
parent 2b28c11779
commit 39b92b9e98
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
398 changed files with 43323 additions and 10297 deletions

View file

@ -1,11 +1,15 @@
Dockerfile Dockerfile
.dockerignore .dockerignore
node_modules
npm-debug.log npm-debug.log
README.md README.md
.next
!.next/static
!.next/standalone
.git .git
.sourcebot .sourcebot
.env.local packages/web/.next
!packages/web/.next/static
!packages/web/.next/standalone
**/node_modules
**/.env.local
**/.sentryclirc
**/.env.sentry-build-plugin
.yarn
!.yarn/releases

81
.env.development Normal file
View file

@ -0,0 +1,81 @@
# Prisma
DATABASE_URL="postgresql://postgres:postgres@localhost:5432/postgres"
# Zoekt
ZOEKT_WEBSERVER_URL="http://localhost:6070"
# SHARD_MAX_MATCH_COUNT=10000
# TOTAL_MAX_MATCH_COUNT=100000
# The command to use for generating ctags.
CTAGS_COMMAND=ctags
# logging, strict
SRC_TENANT_ENFORCEMENT_MODE=strict
# Auth.JS
# You can generate a new secret with:
# openssl rand -base64 33
# @see: https://authjs.dev/getting-started/deployment#auth_secret
AUTH_SECRET="00000000000000000000000000000000000000000000"
AUTH_URL="http://localhost:3000"
# AUTH_CREDENTIALS_LOGIN_ENABLED=true
# AUTH_GITHUB_CLIENT_ID=""
# AUTH_GITHUB_CLIENT_SECRET=""
# AUTH_GOOGLE_CLIENT_ID=""
# AUTH_GOOGLE_CLIENT_SECRET=""
# Email
# EMAIL_FROM_ADDRESS="" # The from address for transactional emails.
# SMTP_CONNECTION_URL="" # The SMTP connection URL for transactional emails.
# PostHog
# POSTHOG_PAPIK=""
# NEXT_PUBLIC_POSTHOG_PAPIK=""
# Sentry
# SENTRY_BACKEND_DSN=""
# NEXT_PUBLIC_SENTRY_WEBAPP_DSN=""
# SENTRY_ENVIRONMENT="dev"
# NEXT_PUBLIC_SENTRY_ENVIRONMENT="dev"
# SENTRY_AUTH_TOKEN=
# Logtail
# LOGTAIL_TOKEN=""
# LOGTAIL_HOST=""
# Redis
REDIS_URL="redis://localhost:6379"
# Stripe
# STRIPE_SECRET_KEY: z.string().optional(),
# STRIPE_PRODUCT_ID: z.string().optional(),
# STRIPE_WEBHOOK_SECRET: z.string().optional(),
# STRIPE_ENABLE_TEST_CLOCKS=false
# Misc
# Generated using:
# openssl rand -base64 24
SOURCEBOT_ENCRYPTION_KEY="00000000000000000000000000000000"
SOURCEBOT_LOG_LEVEL="debug" # valid values: info, debug, warn, error
SOURCEBOT_TELEMETRY_DISABLED=true # Disables telemetry collection
# Code-host fallback tokens
# FALLBACK_GITHUB_CLOUD_TOKEN=""
# FALLBACK_GITLAB_CLOUD_TOKEN=""
# FALLBACK_GITEA_CLOUD_TOKEN=""
# Controls the number of concurrent indexing jobs that can run at once
# INDEX_CONCURRENCY_MULTIPLE=
# Controls the polling interval for the web app
# NEXT_PUBLIC_POLLING_INTERVAL_MS=
# Controls the version of the web app
# NEXT_PUBLIC_SOURCEBOT_VERSION=
# CONFIG_MAX_REPOS_NO_TOKEN=
# NODE_ENV=
# SOURCEBOT_TENANCY_MODE=single
# NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT=

87
.github/workflows/_gcp-deploy.yml vendored Normal file
View file

@ -0,0 +1,87 @@
name: GCP Deploy
on:
workflow_call:
inputs:
environment:
required: true
description: 'The environment to deploy to'
type: string
jobs:
gcp-deploy:
runs-on: ubuntu-latest
environment: ${{ inputs.environment }}
env:
IMAGE_PATH: us-west1-docker.pkg.dev/${{ secrets.GCP_PROJECT_ID }}/sourcebot/sourcebot-${{ vars.NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT }}
steps:
- name: 'Checkout'
uses: 'actions/checkout@v3'
with:
submodules: "true"
# @see: https://github.com/google-github-actions/auth?tab=readme-ov-file#direct-wif
- name: 'Google auth'
id: 'auth'
uses: 'google-github-actions/auth@v2'
with:
project_id: '${{ secrets.GCP_PROJECT_ID }}'
workload_identity_provider: '${{ secrets.GCP_WIF_PROVIDER }}'
- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v1'
with:
project_id: '${{ secrets.GCP_PROJECT_ID }}'
- name: 'Docker auth'
run: |-
gcloud auth configure-docker us-west1-docker.pkg.dev
- name: Configure SSH
run: |
mkdir -p ~/.ssh/
echo "${{ secrets.GCP_SSH_PRIVATE_KEY }}" > ~/.ssh/private.key
chmod 600 ~/.ssh/private.key
echo "${{ secrets.GCP_SSH_KNOWN_HOSTS }}" >> ~/.ssh/known_hosts
- name: Build Docker image
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: |
${{ env.IMAGE_PATH }}:${{ github.sha }}
${{ env.IMAGE_PATH }}:latest
build-args: |
NEXT_PUBLIC_SOURCEBOT_VERSION=${{ github.ref_name }}
NEXT_PUBLIC_POSTHOG_PAPIK=${{ vars.NEXT_PUBLIC_POSTHOG_PAPIK }}
NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT=${{ vars.NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT }}
NEXT_PUBLIC_SENTRY_ENVIRONMENT=${{ vars.NEXT_PUBLIC_SENTRY_ENVIRONMENT }}
NEXT_PUBLIC_SENTRY_WEBAPP_DSN=${{ vars.NEXT_PUBLIC_SENTRY_WEBAPP_DSN }}
NEXT_PUBLIC_SENTRY_BACKEND_DSN=${{ vars.NEXT_PUBLIC_SENTRY_BACKEND_DSN }}
SENTRY_SMUAT=${{ secrets.SENTRY_SMUAT }}
SENTRY_ORG=${{ vars.SENTRY_ORG }}
SENTRY_WEBAPP_PROJECT=${{ vars.SENTRY_WEBAPP_PROJECT }}
SENTRY_BACKEND_PROJECT=${{ vars.SENTRY_BACKEND_PROJECT }}
- name: Deploy to GCP
run: |
ssh -i ~/.ssh/private.key ${{ secrets.GCP_USERNAME }}@${{ secrets.GCP_HOST }} << 'EOF'
# First pull the new image
docker pull ${{ env.IMAGE_PATH }}:${{ github.sha }}
# Stop and remove any existing container
docker stop -t 60 sourcebot || true
docker rm sourcebot || true
# Run the new container
docker run -d \
-p 80:3000 \
--rm \
--env-file .env \
-v /mnt/data:/data \
--name sourcebot \
${{ env.IMAGE_PATH }}:${{ github.sha }}
EOF

18
.github/workflows/deploy-prod.yml vendored Normal file
View file

@ -0,0 +1,18 @@
name: Deploy Prod
on:
push:
tags: ["v*.*.*"]
workflow_dispatch:
jobs:
deploy-prod:
uses: ./.github/workflows/_gcp-deploy.yml
secrets: inherit
permissions:
contents: 'read'
# Requird for OIDC auth with GCP.
# @see: https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/about-security-hardening-with-openid-connect#adding-permissions-settings
id-token: 'write'
with:
environment: prod

19
.github/workflows/deploy-staging.yml vendored Normal file
View file

@ -0,0 +1,19 @@
name: Deploy Staging
on:
push:
branches: [main]
tags: ["v*.*.*"]
workflow_dispatch:
jobs:
deploy-staging:
uses: ./.github/workflows/_gcp-deploy.yml
secrets: inherit
permissions:
contents: 'read'
# Requird for OIDC auth with GCP.
# @see: https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/about-security-hardening-with-openid-connect#adding-permissions-settings
id-token: 'write'
with:
environment: staging

View file

@ -1,31 +0,0 @@
# See https://fly.io/docs/app-guides/continuous-deployment-with-github-actions/
name: Fly Deploy
on:
# Since the `fly.toml` specifies the `latest` tag, we trigger this
# deployment whenever a new version is published to the container registry.
# @see: ghcr-publish.yml
workflow_run:
workflows: ["Publish to ghcr"]
types:
- completed
jobs:
deploy:
name: Deploy app
runs-on: ubuntu-latest
environment: production
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: 'true'
- name: Use flyctl
uses: superfly/flyctl-actions/setup-flyctl@master
- name: Deploy to fly.io
run: flyctl deploy --local-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

View file

@ -1,38 +0,0 @@
name: GCP Deploy (staging)
on:
workflow_run:
workflows: ["Publish to ghcr (staging)"]
types:
- completed
jobs:
deploy:
name: Deploy staging app to GCP
runs-on: ubuntu-latest
steps:
- name: Configure SSH
run: |
mkdir -p ~/.ssh/
echo "${{ secrets.GCP_STAGING_SSH_PRIVATE_KEY }}" > ~/.ssh/private.key
chmod 600 ~/.ssh/private.key
echo "${{ secrets.GCP_STAGING_SSH_KNOWN_HOSTS }}" >> ~/.ssh/known_hosts
- name: Deploy to GCP
run: |
ssh -i ~/.ssh/private.key ${{ secrets.GCP_STAGING_USERNAME }}@${{ secrets.GCP_STAGING_HOST }} << 'EOF'
# Stop and remove any existing container
docker stop -t 60 sourcebot-staging || true
docker rm sourcebot-staging || true
# Run new container
docker run -d \
-p 80:3000 \
--rm \
--pull always \
--env-file .env.staging \
-v /mnt/data:/data \
--name sourcebot-staging \
ghcr.io/sourcebot-dev/sourcebot:staging
EOF

View file

@ -15,6 +15,7 @@ env:
jobs: jobs:
build: build:
runs-on: ${{ matrix.runs-on}} runs-on: ${{ matrix.runs-on}}
environment: oss
permissions: permissions:
contents: read contents: read
packages: write packages: write
@ -30,8 +31,6 @@ jobs:
- platform: linux/arm64 - platform: linux/arm64
runs-on: ubuntu-24.04-arm runs-on: ubuntu-24.04-arm
steps: steps:
- name: Prepare - name: Prepare
run: | run: |
@ -79,8 +78,8 @@ jobs:
platforms: ${{ matrix.platform }} platforms: ${{ matrix.platform }}
outputs: type=image,name=${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true,annotation.org.opencontainers.image.description=Blazingly fast code search outputs: type=image,name=${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true,annotation.org.opencontainers.image.description=Blazingly fast code search
build-args: | build-args: |
SOURCEBOT_VERSION=${{ github.ref_name }} NEXT_PUBLIC_SOURCEBOT_VERSION=${{ github.ref_name }}
POSTHOG_PAPIK=${{ secrets.POSTHOG_PAPIK }} NEXT_PUBLIC_POSTHOG_PAPIK=${{ vars.NEXT_PUBLIC_POSTHOG_PAPIK }}
- name: Export digest - name: Export digest
run: | run: |

View file

@ -1,134 +0,0 @@
name: Publish to ghcr (staging)
on:
push:
branches: ["v3"]
env:
REGISTRY_IMAGE: ghcr.io/sourcebot-dev/sourcebot
jobs:
build:
runs-on: ${{ matrix.runs-on}}
permissions:
contents: read
packages: write
id-token: write
strategy:
matrix:
platform: [linux/amd64, linux/arm64]
include:
- platform: linux/amd64
runs-on: ubuntu-latest
- platform: linux/arm64
runs-on: ubuntu-24.04-arm
steps:
- name: Prepare
run: |
platform=${{ matrix.platform }}
echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: "true"
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
tags: staging
- name: Install cosign
uses: sigstore/cosign-installer@v3.5.0
with:
cosign-release: "v2.2.4"
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Packages Docker Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build Docker image
id: build
uses: docker/build-push-action@v6
with:
context: .
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: ${{ matrix.platform }}
outputs: type=image,name=${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
build-args: |
SOURCEBOT_VERSION=${{ github.ref_name }}
POSTHOG_PAPIK=${{ secrets.POSTHOG_PAPIK }}
SOURCEBOT_ENCRYPTION_KEY=${{ secrets.STAGING_SOURCEBOT_ENCRYPTION_KEY }}
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-${{ env.PLATFORM_PAIR }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
- name: Sign the published Docker image
env:
TAGS: ${{ steps.meta.outputs.tags }}
DIGEST: ${{ steps.build.outputs.digest }}
run: echo "${TAGS}" | xargs -I {} cosign sign --yes {}@${DIGEST}
merge:
runs-on: ubuntu-latest
permissions:
packages: write
needs:
- build
steps:
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
tags: staging
- name: Login to GitHub Packages Docker Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
$(printf '${{ env.REGISTRY_IMAGE }}@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}

4
.gitignore vendored
View file

@ -153,10 +153,10 @@ dist
# if you are NOT using Zero-installs, then: # if you are NOT using Zero-installs, then:
# comment the following lines # comment the following lines
!.yarn/cache # !.yarn/cache
# and uncomment the following lines # and uncomment the following lines
# .pnp.* .pnp.*
# End of https://www.toptal.com/developers/gitignore/api/yarn,node # End of https://www.toptal.com/developers/gitignore/api/yarn,node

2
.gitmodules vendored
View file

@ -1,3 +1,5 @@
[submodule "vendor/zoekt"] [submodule "vendor/zoekt"]
path = vendor/zoekt path = vendor/zoekt
url = https://github.com/sourcebot-dev/zoekt url = https://github.com/sourcebot-dev/zoekt
# @todo : update this to main when we have a release
branch=v3

View file

@ -1,6 +1,7 @@
{ {
"recommendations": [ "recommendations": [
"dbaeumer.vscode-eslint", "dbaeumer.vscode-eslint",
"bradlc.vscode-tailwindcss" "bradlc.vscode-tailwindcss",
"prisma.prisma"
] ]
} }

935
.yarn/releases/yarn-4.7.0.cjs vendored Executable file

File diff suppressed because one or more lines are too long

3
.yarnrc.yml Normal file
View file

@ -0,0 +1,3 @@
enableGlobalCache: false
nodeLinker: node-modules
yarnPath: .yarn/releases/yarn-4.7.0.cjs

40
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,40 @@
## Build from source
>[!NOTE]
> Building from source is only required if you'd like to contribute. The recommended way to use Sourcebot is to use the [pre-built docker image](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot).
1. Install <a href="https://go.dev/doc/install"><img src="https://go.dev/favicon.ico" width="16" height="16"> go</a>, <a href="https://nodejs.org/"><img src="https://nodejs.org/favicon.ico" width="16" height="16"> NodeJS</a>, [redis](https://redis.io/), and [postgres](https://www.postgresql.org/). Note that a NodeJS version of at least `21.1.0` is required.
2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt)
```sh
// macOS:
brew install universal-ctags
// Linux:
snap install universal-ctags
```
3. Clone the repository with submodules:
```sh
git clone --recurse-submodules https://github.com/sourcebot-dev/sourcebot.git
```
4. Run `make` to build zoekt and install dependencies:
```sh
cd sourcebot
make
```
The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively.
5. Create a copy of `.env.development` and name it `.env.development.local`. Update the required environment variables.
6. If you're using a declerative configuration file (the default behavior if you didn't enable auth), create a configuration file and update the `CONFIG_PATH` environment variable in your `.env.development.local` file.
7. Start Sourcebot with the command:
```sh
yarn dev
```
A `.sourcebot` directory will be created and zoekt will begin to index the repositories found in the `config.json` file.
8. Start searching at `http://localhost:3000`.

View file

@ -1,5 +1,26 @@
# ------ Global scope variables ------
# Set of global build arguments.
# These are considered "public" and will be baked into the image.
# The convention is to prefix these with `NEXT_PUBLIC_` so that
# they can be optionally be passed as client-side environment variables
# in the webapp.
# @see: https://docs.docker.com/build/building/variables/#scoping
ARG NEXT_PUBLIC_SOURCEBOT_VERSION
# PAPIK = Project API Key
# Note that this key does not need to be kept secret, so it's not
# necessary to use Docker build secrets here.
# @see: https://posthog.com/tutorials/api-capture-events#authenticating-with-the-project-api-key
ARG NEXT_PUBLIC_POSTHOG_PAPIK
ARG NEXT_PUBLIC_SENTRY_ENVIRONMENT
ARG NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT
ARG NEXT_PUBLIC_SENTRY_WEBAPP_DSN
ARG NEXT_PUBLIC_SENTRY_BACKEND_DSN
FROM node:20-alpine3.19 AS node-alpine FROM node:20-alpine3.19 AS node-alpine
FROM golang:1.23.4-alpine3.19 AS go-alpine FROM golang:1.23.4-alpine3.19 AS go-alpine
# ----------------------------------
# ------ Build Zoekt ------ # ------ Build Zoekt ------
FROM go-alpine AS zoekt-builder FROM go-alpine AS zoekt-builder
@ -9,76 +30,154 @@ COPY vendor/zoekt/go.mod vendor/zoekt/go.sum ./
RUN go mod download RUN go mod download
COPY vendor/zoekt ./ COPY vendor/zoekt ./
RUN CGO_ENABLED=0 GOOS=linux go build -o /cmd/ ./cmd/... RUN CGO_ENABLED=0 GOOS=linux go build -o /cmd/ ./cmd/...
# -------------------------
# ------ Build shared libraries ------
FROM node-alpine AS shared-libs-builder
WORKDIR /app
COPY package.json yarn.lock* .yarnrc.yml ./
COPY .yarn ./.yarn
COPY ./packages/db ./packages/db
COPY ./packages/schemas ./packages/schemas
COPY ./packages/crypto ./packages/crypto
COPY ./packages/error ./packages/error
RUN yarn workspace @sourcebot/db install
RUN yarn workspace @sourcebot/schemas install
RUN yarn workspace @sourcebot/crypto install
RUN yarn workspace @sourcebot/error install
# ------------------------------------
# ------ Build Web ------ # ------ Build Web ------
FROM node-alpine AS web-builder FROM node-alpine AS web-builder
ENV SKIP_ENV_VALIDATION=1
# -----------
ARG NEXT_PUBLIC_SOURCEBOT_VERSION
ENV NEXT_PUBLIC_SOURCEBOT_VERSION=$NEXT_PUBLIC_SOURCEBOT_VERSION
ARG NEXT_PUBLIC_POSTHOG_PAPIK
ENV NEXT_PUBLIC_POSTHOG_PAPIK=$NEXT_PUBLIC_POSTHOG_PAPIK
ARG NEXT_PUBLIC_SENTRY_ENVIRONMENT
ENV NEXT_PUBLIC_SENTRY_ENVIRONMENT=$NEXT_PUBLIC_SENTRY_ENVIRONMENT
ARG NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT
ENV NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT=$NEXT_PUBLIC_SOURCEBOT_CLOUD_ENVIRONMENT
ARG NEXT_PUBLIC_SENTRY_WEBAPP_DSN
ENV NEXT_PUBLIC_SENTRY_WEBAPP_DSN=$NEXT_PUBLIC_SENTRY_WEBAPP_DSN
# To upload source maps to Sentry, we need to set the following build-time args.
# It's important that we don't set these for oss builds, otherwise the Sentry
# auth token will be exposed.
# @see : next.config.mjs
ARG SENTRY_ORG
ENV SENTRY_ORG=$SENTRY_ORG
ARG SENTRY_WEBAPP_PROJECT
ENV SENTRY_WEBAPP_PROJECT=$SENTRY_WEBAPP_PROJECT
# SMUAT = Source Map Upload Auth Token
ARG SENTRY_SMUAT
ENV SENTRY_SMUAT=$SENTRY_SMUAT
# -----------
RUN apk add --no-cache libc6-compat RUN apk add --no-cache libc6-compat
WORKDIR /app WORKDIR /app
COPY package.json yarn.lock* ./ COPY package.json yarn.lock* .yarnrc.yml ./
COPY .yarn ./.yarn
COPY ./packages/web ./packages/web COPY ./packages/web ./packages/web
COPY --from=shared-libs-builder /app/node_modules ./node_modules
COPY --from=shared-libs-builder /app/packages/db ./packages/db
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
COPY --from=shared-libs-builder /app/packages/crypto ./packages/crypto
COPY --from=shared-libs-builder /app/packages/error ./packages/error
# Fixes arm64 timeouts # Fixes arm64 timeouts
RUN yarn config set registry https://registry.npmjs.org/ RUN yarn workspace @sourcebot/web install
RUN yarn config set network-timeout 1200000
RUN yarn workspace @sourcebot/web install --frozen-lockfile
ENV NEXT_TELEMETRY_DISABLED=1 ENV NEXT_TELEMETRY_DISABLED=1
# @see: https://phase.dev/blog/nextjs-public-runtime-variables/
ARG NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=BAKED_NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED
ARG NEXT_PUBLIC_SOURCEBOT_VERSION=BAKED_NEXT_PUBLIC_SOURCEBOT_VERSION
ENV NEXT_PUBLIC_PUBLIC_SEARCH_DEMO=BAKED_NEXT_PUBLIC_PUBLIC_SEARCH_DEMO
ENV NEXT_PUBLIC_POSTHOG_PAPIK=BAKED_NEXT_PUBLIC_POSTHOG_PAPIK
# @note: leading "/" is required for the basePath property. @see: https://nextjs.org/docs/app/api-reference/next-config-js/basePath
ARG NEXT_PUBLIC_DOMAIN_SUB_PATH=/BAKED_NEXT_PUBLIC_DOMAIN_SUB_PATH
RUN yarn workspace @sourcebot/web build RUN yarn workspace @sourcebot/web build
ENV SKIP_ENV_VALIDATION=0
# ------------------------------
# ------ Build Backend ------ # ------ Build Backend ------
FROM node-alpine AS backend-builder FROM node-alpine AS backend-builder
ENV SKIP_ENV_VALIDATION=1
# -----------
ARG NEXT_PUBLIC_SOURCEBOT_VERSION
ENV NEXT_PUBLIC_SOURCEBOT_VERSION=$NEXT_PUBLIC_SOURCEBOT_VERSION
# To upload source maps to Sentry, we need to set the following build-time args.
# It's important that we don't set these for oss builds, otherwise the Sentry
# auth token will be exposed.
ARG SENTRY_ORG
ENV SENTRY_ORG=$SENTRY_ORG
ARG SENTRY_BACKEND_PROJECT
ENV SENTRY_BACKEND_PROJECT=$SENTRY_BACKEND_PROJECT
# SMUAT = Source Map Upload Auth Token
ARG SENTRY_SMUAT
ENV SENTRY_SMUAT=$SENTRY_SMUAT
# -----------
WORKDIR /app WORKDIR /app
COPY package.json yarn.lock* ./ COPY package.json yarn.lock* .yarnrc.yml ./
COPY .yarn ./.yarn
COPY ./schemas ./schemas COPY ./schemas ./schemas
COPY ./packages/backend ./packages/backend COPY ./packages/backend ./packages/backend
RUN yarn workspace @sourcebot/backend install --frozen-lockfile COPY --from=shared-libs-builder /app/node_modules ./node_modules
COPY --from=shared-libs-builder /app/packages/db ./packages/db
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
COPY --from=shared-libs-builder /app/packages/crypto ./packages/crypto
COPY --from=shared-libs-builder /app/packages/error ./packages/error
RUN yarn workspace @sourcebot/backend install
RUN yarn workspace @sourcebot/backend build RUN yarn workspace @sourcebot/backend build
# Upload source maps to Sentry if we have the necessary build-time args.
RUN if [ -n "$SENTRY_SMUAT" ] && [ -n "$SENTRY_ORG" ] && [ -n "$SENTRY_BACKEND_PROJECT" ] && [ -n "$NEXT_PUBLIC_SOURCEBOT_VERSION" ]; then \
apk add --no-cache curl; \
curl -sL https://sentry.io/get-cli/ | sh; \
sentry-cli login --auth-token $SENTRY_SMUAT; \
sentry-cli sourcemaps inject --org $SENTRY_ORG --project $SENTRY_BACKEND_PROJECT --release $NEXT_PUBLIC_SOURCEBOT_VERSION ./packages/backend/dist; \
sentry-cli sourcemaps upload --org $SENTRY_ORG --project $SENTRY_BACKEND_PROJECT --release $NEXT_PUBLIC_SOURCEBOT_VERSION ./packages/backend/dist; \
fi
ENV SKIP_ENV_VALIDATION=0
# ------------------------------
# ------ Runner ------ # ------ Runner ------
FROM node-alpine AS runner FROM node-alpine AS runner
# -----------
ARG NEXT_PUBLIC_SOURCEBOT_VERSION
ENV NEXT_PUBLIC_SOURCEBOT_VERSION=$NEXT_PUBLIC_SOURCEBOT_VERSION
ARG NEXT_PUBLIC_POSTHOG_PAPIK
ENV NEXT_PUBLIC_POSTHOG_PAPIK=$NEXT_PUBLIC_POSTHOG_PAPIK
ARG NEXT_PUBLIC_SENTRY_ENVIRONMENT
ENV NEXT_PUBLIC_SENTRY_ENVIRONMENT=$NEXT_PUBLIC_SENTRY_ENVIRONMENT
ARG NEXT_PUBLIC_SENTRY_WEBAPP_DSN
ENV NEXT_PUBLIC_SENTRY_WEBAPP_DSN=$NEXT_PUBLIC_SENTRY_WEBAPP_DSN
ARG NEXT_PUBLIC_SENTRY_BACKEND_DSN
ENV NEXT_PUBLIC_SENTRY_BACKEND_DSN=$NEXT_PUBLIC_SENTRY_BACKEND_DSN
# -----------
RUN echo "Sourcebot Version: $NEXT_PUBLIC_SOURCEBOT_VERSION"
WORKDIR /app WORKDIR /app
ENV NODE_ENV=production ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1 ENV NEXT_TELEMETRY_DISABLED=1
ENV DATA_DIR=/data ENV DATA_DIR=/data
ENV CONFIG_PATH=$DATA_DIR/config.json
ENV DATA_CACHE_DIR=$DATA_DIR/.sourcebot ENV DATA_CACHE_DIR=$DATA_DIR/.sourcebot
ENV DATABASE_DATA_DIR=$DATA_CACHE_DIR/db
ARG SOURCEBOT_VERSION=unknown ENV REDIS_DATA_DIR=$DATA_CACHE_DIR/redis
ENV SOURCEBOT_VERSION=$SOURCEBOT_VERSION ENV DATABASE_URL="postgresql://postgres@localhost:5432/sourcebot"
RUN echo "Sourcebot Version: $SOURCEBOT_VERSION" ENV REDIS_URL="redis://localhost:6379"
ENV SRC_TENANT_ENFORCEMENT_MODE=strict
ARG PUBLIC_SEARCH_DEMO=false
ENV PUBLIC_SEARCH_DEMO=$PUBLIC_SEARCH_DEMO
RUN echo "Public Search Demo: $PUBLIC_SEARCH_DEMO"
# Valid values are: debug, info, warn, error # Valid values are: debug, info, warn, error
ENV SOURCEBOT_LOG_LEVEL=info ENV SOURCEBOT_LOG_LEVEL=info
# Configures the sub-path of the domain to serve Sourcebot from.
# For example, if DOMAIN_SUB_PATH is set to "/sb", Sourcebot
# will serve from http(s)://example.com/sb
ENV DOMAIN_SUB_PATH=/
# PAPIK = Project API Key
# Note that this key does not need to be kept secret, so it's not
# necessary to use Docker build secrets here.
# @see: https://posthog.com/tutorials/api-capture-events#authenticating-with-the-project-api-key
ARG POSTHOG_PAPIK=
ENV POSTHOG_PAPIK=$POSTHOG_PAPIK
# Sourcebot collects anonymous usage data using [PostHog](https://posthog.com/). Uncomment this line to disable. # Sourcebot collects anonymous usage data using [PostHog](https://posthog.com/). Uncomment this line to disable.
# ENV SOURCEBOT_TELEMETRY_DISABLED=1 # ENV SOURCEBOT_TELEMETRY_DISABLED=1
# Configure dependencies COPY package.json yarn.lock* .yarnrc.yml ./
RUN apk add --no-cache git ca-certificates bind-tools tini jansson wget supervisor uuidgen curl perl jq COPY .yarn ./.yarn
# Configure zoekt # Configure zoekt
COPY vendor/zoekt/install-ctags-alpine.sh . COPY vendor/zoekt/install-ctags-alpine.sh .
@ -96,15 +195,28 @@ COPY --from=zoekt-builder \
/cmd/zoekt-index \ /cmd/zoekt-index \
/usr/local/bin/ /usr/local/bin/
# Configure the webapp # Copy all of the things
COPY --from=web-builder /app/packages/web/public ./packages/web/public COPY --from=web-builder /app/packages/web/public ./packages/web/public
COPY --from=web-builder /app/packages/web/.next/standalone ./ COPY --from=web-builder /app/packages/web/.next/standalone ./
COPY --from=web-builder /app/packages/web/.next/static ./packages/web/.next/static COPY --from=web-builder /app/packages/web/.next/static ./packages/web/.next/static
# Configure the backend
COPY --from=backend-builder /app/node_modules ./node_modules COPY --from=backend-builder /app/node_modules ./node_modules
COPY --from=backend-builder /app/packages/backend ./packages/backend COPY --from=backend-builder /app/packages/backend ./packages/backend
COPY --from=shared-libs-builder /app/node_modules ./node_modules
COPY --from=shared-libs-builder /app/packages/db ./packages/db
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
COPY --from=shared-libs-builder /app/packages/crypto ./packages/crypto
COPY --from=shared-libs-builder /app/packages/error ./packages/error
# Configure dependencies
RUN apk add --no-cache git ca-certificates bind-tools tini jansson wget supervisor uuidgen curl perl jq redis postgresql postgresql-contrib openssl util-linux unzip
# Configure the database
RUN mkdir -p /run/postgresql && \
chown -R postgres:postgres /run/postgresql && \
chmod 775 /run/postgresql
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY prefix-output.sh ./prefix-output.sh COPY prefix-output.sh ./prefix-output.sh
RUN chmod +x ./prefix-output.sh RUN chmod +x ./prefix-output.sh
@ -117,3 +229,4 @@ EXPOSE 3000
ENV PORT=3000 ENV PORT=3000
ENV HOSTNAME="0.0.0.0" ENV HOSTNAME="0.0.0.0"
ENTRYPOINT ["/sbin/tini", "--", "./entrypoint.sh"] ENTRYPOINT ["/sbin/tini", "--", "./entrypoint.sh"]
# ------------------------------

View file

@ -1,9 +1,9 @@
CMDS := zoekt ui CMDS := zoekt yarn
ALL: $(CMDS) ALL: $(CMDS)
ui: yarn:
yarn install yarn install
zoekt: zoekt:
@ -20,6 +20,19 @@ clean:
packages/web/.next \ packages/web/.next \
packages/backend/dist \ packages/backend/dist \
packages/backend/node_modules \ packages/backend/node_modules \
packages/db/node_modules \
packages/db/dist \
packages/schemas/node_modules \
packages/schemas/dist \
packages/crypto/node_modules \
packages/crypto/dist \
packages/error/node_modules \
packages/error/dist \
.sourcebot .sourcebot
soft-reset:
rm -rf .sourcebot
redis-cli FLUSHALL
.PHONY: bin .PHONY: bin

406
README.md
View file

@ -5,12 +5,37 @@
<img height="150" src=".github/images/logo_light.png"> <img height="150" src=".github/images/logo_light.png">
</picture> </picture>
</div> </div>
<div align="center">
<div>
<h3>
<a href="https://app.sourcebot.dev">
<strong>Sourcebot Cloud</strong>
</a> ·
<a href="https://docs.sourcebot.dev/self-hosting/overview">
<strong>Self Host</strong>
</a> ·
<a href="https://sourcebot.dev/search">
<strong>Demo</strong>
</a>
</h3>
</div>
<div>
<a href="https://docs.sourcebot.dev/"><strong>Docs</strong></a> ·
<a href="https://github.com/sourcebot-dev/sourcebot/issues"><strong>Report Bug</strong></a> ·
<a href="https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas"><strong>Feature Request</strong></a> ·
<a href="https://www.sourcebot.dev/changelog"><strong>Changelog</strong></a> ·
<a href="https://www.sourcebot.dev/contact"><strong>Contact</strong></a> ·
</div>
<br/>
<span>Sourcebot uses <a href="https://github.com/sourcebot-dev/sourcebot/discussions"><strong>Github Discussions</strong></a> for Support and Feature Requests.</span>
<br/>
<br/>
<div>
</div>
</div>
<p align="center"> <p align="center">
Blazingly fast code search 🏎️ <a href="mailto:team@sourcebot.dev"><img src="https://img.shields.io/badge/Email%20Us-brightgreen" /></a>
</p>
<p align="center">
<a href="https://sourcebot.dev/search"><img src="https://img.shields.io/badge/Try the Demo!-blue?logo=googlechrome&logoColor=orange"/></a>
<a href="mailto:brendan@sourcebot.dev"><img src="https://img.shields.io/badge/Email%20Us-brightgreen" /></a>
<a href="https://github.com/sourcebot-dev/sourcebot/blob/main/LICENSE"><img src="https://img.shields.io/github/license/sourcebot-dev/sourcebot"/></a> <a href="https://github.com/sourcebot-dev/sourcebot/blob/main/LICENSE"><img src="https://img.shields.io/github/license/sourcebot-dev/sourcebot"/></a>
<a href="https://github.com/sourcebot-dev/sourcebot/actions/workflows/ghcr-publish.yml"><img src="https://img.shields.io/github/actions/workflow/status/sourcebot-dev/sourcebot/ghcr-publish.yml"/><a> <a href="https://github.com/sourcebot-dev/sourcebot/actions/workflows/ghcr-publish.yml"><img src="https://img.shields.io/github/actions/workflow/status/sourcebot-dev/sourcebot/ghcr-publish.yml"/><a>
<a href="https://github.com/sourcebot-dev/sourcebot/stargazers"><img src="https://img.shields.io/github/stars/sourcebot-dev/sourcebot" /></a> <a href="https://github.com/sourcebot-dev/sourcebot/stargazers"><img src="https://img.shields.io/github/stars/sourcebot-dev/sourcebot" /></a>
@ -23,384 +48,69 @@ Blazingly fast code search 🏎️
# About # About
Sourcebot is a fast code indexing and search tool for your codebases. It is built ontop of the [zoekt](https://github.com/sourcegraph/zoekt) indexer, originally authored by Han-Wen Nienhuys and now [maintained by Sourcegraph](https://sourcegraph.com/blog/sourcegraph-accepting-zoekt-maintainership). Sourcebot is the open source Sourcegraph alternative. Index all your repos and branches across multiple code hosts (GitHub, GitLab, Gitea, or Gerrit) and search through them using a blazingly fast interface.
https://github.com/user-attachments/assets/98d46192-5469-430f-ad9e-5c042adbb10d https://github.com/user-attachments/assets/98d46192-5469-430f-ad9e-5c042adbb10d
## Features ## Features
- 💻 **One-command deployment**: Get started instantly using Docker on your own machine. - 💻 **One-command deployment**: Get started instantly using Docker on your own machine.
- 🔍 **Multi-repo search**: Effortlessly index and search through multiple public and private repositories in GitHub, GitLab, Gitea, or Gerrit. - 🔍 **Multi-repo search**: Index and search through multiple public and private repositories and branches on GitHub, GitLab, Gitea, or Gerrit.
- ⚡**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine. - ⚡**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine.
- 📂 **Full file visualization**: Instantly view the entire file when selecting any search result.
- 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation - 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation
- 📂 **Full file visualization**: Instantly view the entire file when selecting any search result.
You can try out our public hosted demo [here](https://sourcebot.dev/search)! You can try out our public hosted demo [here](https://sourcebot.dev/search)!
# Getting Started # Deply Sourcebot
Get started with a single docker command: Sourcebot can be deployed in seconds using our official docker image. Visit our [docs](https://docs.sourcebot.dev/self-hosting/overview) for more information.
``` 1. Create a config
docker run -p 3000:3000 --rm --name sourcebot ghcr.io/sourcebot-dev/sourcebot:latest ```json
``` touch config.json
Navigate to `localhost:3000` to start searching the Sourcebot repo. Want to search your own repos? Checkout how to [configure Sourcebot](#configuring-sourcebot).
<details>
<summary>What does this command do?</summary>
- Pull and run the Sourcebot docker image from [ghcr.io/sourcebot-dev/sourcebot:latest](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot). Make sure you have [docker installed](https://docs.docker.com/get-started/get-docker/).
- Read the repos listed in [default config](./default-config.json) and start indexing them.
- Map port 3000 between your machine and the docker image.
- Starts the web server on port 3000.
</details>
## Configuring Sourcebot
Sourcebot supports indexing and searching through public and private repositories hosted on
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
<img src="https://github.com/favicon.ico" width="16" height="16" alt="GitHub icon">
</picture> GitHub, <img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab, <img src="https://gitea.com/favicon.ico" width="16" height="16"> Gitea, and <img src="https://gerrit-review.googlesource.com/favicon.ico" width="16" height="16"> Gerrit. This section will guide you through configuring the repositories that Sourcebot indexes.
1. Create a new folder on your machine that stores your configs and `.sourcebot` cache, and navigate into it:
```sh
mkdir sourcebot_workspace
cd sourcebot_workspace
```
2. Create a new config following the [configuration schema](./schemas/v2/index.json) to specify which repositories Sourcebot should index. For example, let's index llama.cpp:
```sh
touch my_config.json
echo '{ echo '{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json", "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
"repos": [ "connections": {
{ // Comments are supported
"starter-connection": {
"type": "github", "type": "github",
"repos": [ "repos": [
"ggerganov/llama.cpp" "sourcebot-dev/sourcebot"
] ]
} }
] }
}' > my_config.json }' > config.jsono
``` ```
>[!NOTE] 2. Run the docker container
> Sourcebot can also index all repos owned by a organization, user, group, etc., instead of listing them individually. For examples, see the [configs](./configs) directory. For additional usage information, see the [configuration schema](./schemas/v2/index.json).
3. Run Sourcebot and point it to the new config you created with the `-e CONFIG_PATH` flag:
```sh ```sh
docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e CONFIG_PATH=/data/my_config.json ghcr.io/sourcebot-dev/sourcebot:latest docker run -p 3000:3000 --pull=always --rm -v $(pwd):/data -e CONFIG_PATH=/data/config.json --name sourcebot ghcr.io/sourcebot-dev/sourcebot:latest
``` ```
<details> <details>
<summary>What does this command do?</summary> <summary>What does this command do?</summary>
- Pull and run the Sourcebot docker image from [ghcr.io/sourcebot-dev/sourcebot:latest](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot). - Pull and run the Sourcebot docker image from [ghcr.io/sourcebot-dev/sourcebot:latest](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot).
- Mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache. - Mount the current directory (`-v $(pwd):/data`) to allow Sourcebot to persist the `.sourcebot` cache.
- Mirrors (clones) llama.cpp at `HEAD` into `.sourcebot/github/ggerganov/llama.cpp`. - Clones sourcebot at `HEAD` into `.sourcebot/github/sourcebot-dev/sourcebot`.
- Indexes llama.cpp into a .zoekt index file in `.sourcebot/index/`. - Indexes sourcebot into a .zoekt index file in `.sourcebot/index/`.
- Map port 3000 between your machine and the docker image. - Map port 3000 between your machine and the docker image.
- Starts the web server on port 3000. - Starts the web server on port 3000.
</details> </details>
<br> </br>
3. Start searching at `http://localhost:3000`
You should see a `.sourcebot` folder in your current directory. This folder stores a cache of the repositories zoekt has indexed. The `HEAD` commit of a repository is re-indexed [every hour](./packages/backend/src/constants.ts). Indexing private repos? See [Providing an access token](#providing-an-access-token).
</br> </br>
## Providing an access token To learn how to configure Sourcebot to index your own repos, please refer to our [docs](https://docs.sourcebot.dev/self-hosting/overview).
This will depend on the code hosting platform you're using:
<div>
<details>
<summary>
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/github-favicon-inverted.png">
<img src="https://github.com/favicon.ico" width="16" height="16" alt="GitHub icon">
</picture> GitHub
</summary>
In order to index private repositories, you'll need to generate a GitHub Personal Access Token (PAT). Create a new PAT [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope:
![GitHub PAT creation](.github/images/github-pat-creation.png)
Next, update your configuration with the `token` field:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "github",
"token": "ghp_mytoken",
...
}
]
}
```
You can also pass tokens as environment variables:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "github",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITHUB_TOKEN`.
"env": "GITHUB_TOKEN"
},
...
}
]
}
```
You'll need to pass this environment variable each time you run Sourcebot:
<pre>
docker run -e <b>GITHUB_TOKEN=ghp_mytoken</b> /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
</pre>
</details>
<details>
<summary><img src="https://gitlab.com/favicon.ico" width="16" height="16" /> GitLab</summary>
Generate a GitLab Personal Access Token (PAT) [here](https://gitlab.com/-/user_settings/personal_access_tokens) and make sure you select the `read_api` scope:
![GitLab PAT creation](.github/images/gitlab-pat-creation.png)
Next, update your configuration with the `token` field:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "gitlab",
"token": "glpat-mytoken",
...
}
]
}
```
You can also pass tokens as environment variables:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "gitlab",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITLAB_TOKEN`.
"env": "GITLAB_TOKEN"
},
...
}
]
}
```
You'll need to pass this environment variable each time you run Sourcebot:
<pre>
docker run -e <b>GITLAB_TOKEN=glpat-mytoken</b> /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
</pre>
</details>
<details>
<summary><img src="https://gitea.com/favicon.ico" width="16" height="16"> Gitea</summary>
Generate a Gitea access token [here](http://gitea.com/user/settings/applications). At minimum, you'll need to select the `read:repository` scope, but `read:user` and `read:organization` are required for the `user` and `org` fields of your config file:
![Gitea Access token creation](.github/images/gitea-pat-creation.png)
Next, update your configuration with the `token` field:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "gitea",
"token": "my-secret-token",
...
}
]
}
```
You can also pass tokens as environment variables:
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "gitea",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITEA_TOKEN`.
"env": "GITEA_TOKEN"
},
...
}
]
}
```
You'll need to pass this environment variable each time you run Sourcebot:
<pre>
docker run -e <b>GITEA_TOKEN=my-secret-token</b> /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
</pre>
</details>
<details>
<summary><img src="https://gerrit-review.googlesource.com/favicon.ico" width="16" height="16"> Gerrit</summary>
Gerrit authentication is not yet currently supported.
</details>
</div>
## Using a self-hosted GitLab / GitHub instance
If you're using a self-hosted GitLab or GitHub instance with a custom domain, you can specify the domain in your config file. See [configs/self-hosted.json](configs/self-hosted.json) for examples.
## Searching multiple branches
By default, Sourcebot will index the default branch. To configure Sourcebot to index multiple branches (or tags), the `revisions` field can be used:
```jsonc
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "github",
"revisions": {
// Index the `main` branch and any branches matching the `releases/*` glob pattern.
"branches": [
"main",
"releases/*"
],
// Index the `latest` tag and any tags matching the `v*.*.*` glob pattern.
"tags": [
"latest",
"v*.*.*"
]
},
"repos": [
"my_org/repo_a",
"my_org/repo_b"
]
}
]
}
```
For each repository (in this case, `repo_a` and `repo_b`), Sourcebot will index all branches and tags matching the `branches` and `tags` patterns provided. Any branches or tags that don't match the patterns will be ignored and not indexed.
To search on a specific revision, use the `revision` filter in the search bar:
<picture>
<source media="(prefers-color-scheme: dark)" srcset=".github/images/revisions_filter_dark.png">
<img style="max-width:700px;width:100%" src=".github/images/revisions_filter_light.png">
</picture>
## Searching a local directory
Local directories can be searched by using the `local` type in your config file:
```jsonc
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v2/index.json",
"repos": [
{
"type": "local",
"path": "/repos/my-repo",
// re-index files when a change is detected
"watch": true,
"exclude": {
// exclude paths from being indexed
"paths": [
"node_modules",
"build"
]
}
}
]
}
```
You'll need to mount the directory as a volume when running Sourcebot:
<pre>
docker run <b>-v /path/to/my-repo:/repos/my-repo</b> /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
</pre>
## Build from source
> [!NOTE] > [!NOTE]
> Building from source is only required if you'd like to contribute. The recommended way to use Sourcebot is to use the [pre-built docker image](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot). > Sourcebot collects [anonymous usage data](https://sourcebot.dev/search/search?query=captureEvent%5C(%20repo%3Asourcebot) by default to help us improve the product. No sensitive data is collected, but if you'd like to disable this you can do so by setting the `SOURCEBOT_TELEMETRY_DISABLED` environment
> variable to `false`. Please refer to our [telemetry docs](https://docs.sourcebot.dev/self-hosting/overview#telemetry) for more information.
1. Install <a href="https://go.dev/doc/install"><img src="https://go.dev/favicon.ico" width="16" height="16"> go</a> and <a href="https://nodejs.org/"><img src="https://nodejs.org/favicon.ico" width="16" height="16"> NodeJS</a>. Note that a NodeJS version of at least `21.1.0` is required. # Build from source
>[!NOTE]
> Building from source is only required if you'd like to contribute. If you'd just like to use Sourcebot, we recommend checking out our self-hosting [docs](https://docs.sourcebot.dev/self-hosting/overview).
2. Install [ctags](https://github.com/universal-ctags/ctags) (required by zoekt) If you'd like to build from source, please checkout the `CONTRIBUTING.md` file for more information.
```sh
// macOS:
brew install universal-ctags
// Linux:
snap install universal-ctags
```
3. Clone the repository with submodules:
```sh
git clone --recurse-submodules https://github.com/sourcebot-dev/sourcebot.git
```
4. Run `make` to build zoekt and install dependencies:
```sh
cd sourcebot
make
```
The zoekt binaries and web dependencies are placed into `bin` and `node_modules` respectively.
5. Create a `config.json` file at the repository root. See [Configuring Sourcebot](#configuring-sourcebot) for more information.
6. Start Sourcebot with the command:
```sh
yarn dev
```
A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`.
7. Start searching at `http://localhost:3000`.
## Telemetry
By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We do not collect or transmit [any information related to your codebase](https://sourcebot.dev/search/search?query=captureEvent%20repo%3Asourcebot%20case%3Ano). In addition, all events are [sanitized](./packages/web/src/app/posthogProvider.tsx) to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)
If you'd like to disable all telemetry, you can do so by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `1` in the docker run command:
<pre>
docker run -e <b>SOURCEBOT_TELEMETRY_DISABLED=1</b> /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
</pre>
Or if you are [building locally](#build-from-source), create a `.env.local` file at the repository root with the following contents:
```sh
SOURCEBOT_TELEMETRY_DISABLED=1
NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=1
```
## Attributions
Sourcebot makes use of the following libraries:
- [@vscode/codicons](https://github.com/microsoft/vscode-codicons) under the [CC BY 4.0 License](https://github.com/microsoft/vscode-codicons/blob/main/LICENSE).

View file

@ -1,7 +1,6 @@
{ {
"$schema": "./schemas/v2/index.json", "$schema": "./schemas/v2/index.json",
"settings": { "settings": {
"autoDeleteStaleRepos": true,
"reindexInterval": 86400000, // 24 hours "reindexInterval": 86400000, // 24 hours
"resyncInterval": 86400000 // 24 hours "resyncInterval": 86400000 // 24 hours
}, },
@ -17,6 +16,7 @@
} }
}, },
"repos": [ "repos": [
"torvalds/linux",
"pytorch/pytorch", "pytorch/pytorch",
"commaai/openpilot", "commaai/openpilot",
"ggerganov/whisper.cpp", "ggerganov/whisper.cpp",
@ -24,8 +24,189 @@
"codemirror/dev", "codemirror/dev",
"tailwindlabs/tailwindcss", "tailwindlabs/tailwindcss",
"sourcebot-dev/sourcebot", "sourcebot-dev/sourcebot",
"freeCodeCamp/freeCodeCamp",
"EbookFoundation/free-programming-books",
"sindresorhus/awesome",
"public-apis/public-apis",
"codecrafters-io/build-your-own-x",
"jwasham/coding-interview-university",
"kamranahmedse/developer-roadmap",
"donnemartin/system-design-primer",
"996icu/996.ICU",
"facebook/react",
"vinta/awesome-python",
"vuejs/vue",
"practical-tutorials/project-based-learning",
"awesome-selfhosted/awesome-selfhosted",
"TheAlgorithms/Python",
"trekhleb/javascript-algorithms",
"tensorflow/tensorflow",
"getify/You-Dont-Know-JS",
"CyC2018/CS-Notes",
"ohmyzsh/ohmyzsh",
"ossu/computer-science",
"twbs/bootstrap",
"Significant-Gravitas/AutoGPT",
"flutter/flutter",
"microsoft/vscode",
"github/gitignore",
"jackfrued/Python-100-Days",
"jlevy/the-art-of-command-line",
"trimstray/the-book-of-secret-knowledge",
"Snailclimb/JavaGuide",
"airbnb/javascript",
"AUTOMATIC1111/stable-diffusion-webui",
"huggingface/transformers",
"avelino/awesome-go",
"ytdl-org/youtube-dl",
"vercel/next.js",
"labuladong/fucking-algorithm",
"golang/go",
"Chalarangelo/30-seconds-of-code",
"yangshun/tech-interview-handbook",
"facebook/react-native",
"electron/electron",
"Genymobile/scrcpy",
"f/awesome-chatgpt-prompts",
"microsoft/PowerToys",
"justjavac/free-programming-books-zh_CN",
"kubernetes/kubernetes",
"d3/d3",
"nodejs/node",
"massgravel/Microsoft-Activation-Scripts",
"axios/axios",
"mrdoob/three.js",
"krahets/hello-algo",
"facebook/create-react-app",
"ollama/ollama",
"microsoft/TypeScript",
"goldbergyoni/nodebestpractices",
"rust-lang/rust",
"denoland/deno",
"angular/angular", "angular/angular",
"ggerganov/llama.cpp", "langchain-ai/langchain",
"microsoft/terminal",
"521xueweihan/HelloGitHub",
"mui/material-ui",
"ant-design/ant-design",
"yt-dlp/yt-dlp",
"ryanmcdermott/clean-code-javascript",
"godotengine/godot",
"ripienaar/free-for-dev",
"iluwatar/java-design-patterns",
"puppeteer/puppeteer",
"papers-we-love/papers-we-love",
"PanJiaChen/vue-element-admin",
"iptv-org/iptv",
"fatedier/frp",
"excalidraw/excalidraw",
"tauri-apps/tauri",
"Hack-with-Github/Awesome-Hacking",
"nvbn/thefuck",
"mtdvio/every-programmer-should-know",
"storybookjs/storybook",
"neovim/neovim",
"microsoft/Web-Dev-For-Beginners",
"django/django",
"florinpop17/app-ideas",
"animate-css/animate.css",
"nvm-sh/nvm",
"gothinkster/realworld",
"bitcoin/bitcoin",
"sveltejs/svelte",
"opencv/opencv",
"gin-gonic/gin",
"laravel/laravel",
"fastapi/fastapi",
"macrozheng/mall",
"jaywcjlove/awesome-mac",
"tonsky/FiraCode",
"ChatGPTNextWeb/ChatGPT-Next-Web",
"rustdesk/rustdesk",
"tensorflow/models",
"doocs/advanced-java",
"shadcn-ui/ui",
"gohugoio/hugo",
"MisterBooo/LeetCodeAnimation",
"spring-projects/spring-boot",
"supabase/supabase",
"oven-sh/bun",
"FortAwesome/Font-Awesome",
"home-assistant/core",
"typicode/json-server",
"mermaid-js/mermaid",
"openai/whisper",
"netdata/netdata",
"vuejs/awesome-vue",
"DopplerHQ/awesome-interview-questions",
"3b1b/manim",
"2dust/v2rayN",
"nomic-ai/gpt4all",
"elastic/elasticsearch",
"anuraghazra/github-readme-stats",
"microsoft/ML-For-Beginners",
"MunGell/awesome-for-beginners",
"fighting41love/funNLP",
"vitejs/vite",
"thedaviddias/Front-End-Checklist",
"coder/code-server",
"moby/moby",
"CompVis/stable-diffusion",
"base-org/node",
"nestjs/nest",
"pallets/flask",
"hakimel/reveal.js",
"Anduin2017/HowToCook",
"microsoft/playwright",
"swiftlang/swift",
"Developer-Y/cs-video-courses",
"redis/redis",
"bregman-arie/devops-exercises",
"josephmisiti/awesome-machine-learning",
"binary-husky/gpt_academic",
"junegunn/fzf",
"syncthing/syncthing",
"hoppscotch/hoppscotch",
"protocolbuffers/protobuf",
"enaqx/awesome-react",
"expressjs/express",
"microsoft/generative-ai-for-beginners",
"grafana/grafana",
"abi/screenshot-to-code",
"ByteByteGoHq/system-design-101",
"chartjs/Chart.js",
"webpack/webpack",
"d2l-ai/d2l-zh",
"sdmg15/Best-websites-a-programmer-should-visit",
"strapi/strapi",
"python/cpython",
"leonardomso/33-js-concepts",
"kdn251/interviews",
"ventoy/Ventoy",
"ansible/ansible",
"apache/superset",
"tesseract-ocr/tesseract",
"lydiahallie/javascript-questions",
"xtekky/gpt4free",
"FuelLabs/sway",
"twitter/the-algorithm",
"keras-team/keras",
"resume/resume.github.com",
"swisskyrepo/PayloadsAllTheThings",
"ocornut/imgui",
"socketio/socket.io",
"awesomedata/awesome-public-datasets",
"louislam/uptime-kuma",
"kelseyhightower/nocode",
"sherlock-project/sherlock",
"reduxjs/redux",
"apache/echarts",
"obsproject/obs-studio",
"openai/openai-cookbook",
"fffaraz/awesome-cpp",
"scikit-learn/scikit-learn",
"TheAlgorithms/Java",
"atom/atom",
"Eugeny/tabby", "Eugeny/tabby",
"lodash/lodash", "lodash/lodash",
"caddyserver/caddy", "caddyserver/caddy",

59
docs/.editorconfig Normal file
View file

@ -0,0 +1,59 @@
[*]
cpp_indent_braces=false
cpp_indent_multi_line_relative_to=innermost_parenthesis
cpp_indent_within_parentheses=indent
cpp_indent_preserve_within_parentheses=false
cpp_indent_case_labels=false
cpp_indent_case_contents=true
cpp_indent_case_contents_when_block=false
cpp_indent_lambda_braces_when_parameter=true
cpp_indent_goto_labels=one_left
cpp_indent_preprocessor=leftmost_column
cpp_indent_access_specifiers=false
cpp_indent_namespace_contents=true
cpp_indent_preserve_comments=false
cpp_new_line_before_open_brace_namespace=ignore
cpp_new_line_before_open_brace_type=ignore
cpp_new_line_before_open_brace_function=ignore
cpp_new_line_before_open_brace_block=ignore
cpp_new_line_before_open_brace_lambda=ignore
cpp_new_line_scope_braces_on_separate_lines=false
cpp_new_line_close_brace_same_line_empty_type=false
cpp_new_line_close_brace_same_line_empty_function=false
cpp_new_line_before_catch=true
cpp_new_line_before_else=true
cpp_new_line_before_while_in_do_while=false
cpp_space_before_function_open_parenthesis=remove
cpp_space_within_parameter_list_parentheses=false
cpp_space_between_empty_parameter_list_parentheses=false
cpp_space_after_keywords_in_control_flow_statements=true
cpp_space_within_control_flow_statement_parentheses=false
cpp_space_before_lambda_open_parenthesis=false
cpp_space_within_cast_parentheses=false
cpp_space_after_cast_close_parenthesis=false
cpp_space_within_expression_parentheses=false
cpp_space_before_block_open_brace=true
cpp_space_between_empty_braces=false
cpp_space_before_initializer_list_open_brace=false
cpp_space_within_initializer_list_braces=true
cpp_space_preserve_in_initializer_list=true
cpp_space_before_open_square_bracket=false
cpp_space_within_square_brackets=false
cpp_space_before_empty_square_brackets=false
cpp_space_between_empty_square_brackets=false
cpp_space_group_square_brackets=true
cpp_space_within_lambda_brackets=false
cpp_space_between_empty_lambda_brackets=false
cpp_space_before_comma=false
cpp_space_after_comma=true
cpp_space_remove_around_member_operators=true
cpp_space_before_inheritance_colon=true
cpp_space_before_constructor_colon=true
cpp_space_remove_before_semicolon=true
cpp_space_after_semicolon=false
cpp_space_remove_around_unary_operator=true
cpp_space_around_binary_operator=insert
cpp_space_around_assignment_operator=insert
cpp_space_pointer_reference_alignment=left
cpp_space_around_ternary_operator=insert
cpp_wrap_preserve_blocks=one_liners

32
docs/README.md Normal file
View file

@ -0,0 +1,32 @@
# Mintlify Starter Kit
Click on `Use this template` to copy the Mintlify starter kit. The starter kit contains examples including
- Guide pages
- Navigation
- Customizations
- API Reference pages
- Use of popular components
### Development
Install the [Mintlify CLI](https://www.npmjs.com/package/mintlify) to preview the documentation changes locally. To install, use the following command
```
npm i -g mintlify
```
Run the following command at the root of your documentation (where docs.json is)
```
mintlify dev
```
### Publishing Changes
Install our Github App to auto propagate changes from your repo to your deployment. Changes will be deployed to production automatically after pushing to the default branch. Find the link to install on your dashboard.
#### Troubleshooting
- Mintlify dev isn't running - Run `mintlify install` it'll re-install dependencies.
- Page loads as a 404 - Make sure you are running in a folder with `docs.json`

107
docs/development.mdx Normal file
View file

@ -0,0 +1,107 @@
---
title: 'Development'
description: 'Preview changes locally to update your docs'
---
<Info>
**Prerequisite**: Please install Node.js (version 19 or higher) before proceeding. <br />
Please upgrade to ```docs.json``` before proceeding and delete the legacy ```mint.json``` file.
</Info>
Follow these steps to install and run Mintlify on your operating system:
**Step 1**: Install Mintlify:
<CodeGroup>
```bash npm
npm i -g mintlify
```
```bash yarn
yarn global add mintlify
```
</CodeGroup>
**Step 2**: Navigate to the docs directory (where the `docs.json` file is located) and execute the following command:
```bash
mintlify dev
```
A local preview of your documentation will be available at `http://localhost:3000`.
### Custom Ports
By default, Mintlify uses port 3000. You can customize the port Mintlify runs on by using the `--port` flag. To run Mintlify on port 3333, for instance, use this command:
```bash
mintlify dev --port 3333
```
If you attempt to run Mintlify on a port that's already in use, it will use the next available port:
```md
Port 3000 is already in use. Trying 3001 instead.
```
## Mintlify Versions
Please note that each CLI release is associated with a specific version of Mintlify. If your local website doesn't align with the production version, please update the CLI:
<CodeGroup>
```bash npm
npm i -g mintlify@latest
```
```bash yarn
yarn global upgrade mintlify
```
</CodeGroup>
## Validating Links
The CLI can assist with validating reference links made in your documentation. To identify any broken links, use the following command:
```bash
mintlify broken-links
```
## Deployment
<Tip>
Unlimited editors available under the [Pro
Plan](https://mintlify.com/pricing) and above.
</Tip>
If the deployment is successful, you should see the following:
<Frame>
<img src="/images/checks-passed.png" style={{ borderRadius: '0.5rem' }} />
</Frame>
## Code Formatting
We suggest using extensions on your IDE to recognize and format MDX. If you're a VSCode user, consider the [MDX VSCode extension](https://marketplace.visualstudio.com/items?itemName=unifiedjs.vscode-mdx) for syntax highlighting, and [Prettier](https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode) for code formatting.
## Troubleshooting
<AccordionGroup>
<Accordion title='Error: Could not load the "sharp" module using the darwin-arm64 runtime'>
This may be due to an outdated version of node. Try the following:
1. Remove the currently-installed version of mintlify: `npm remove -g mintlify`
2. Upgrade to Node v19 or higher.
3. Reinstall mintlify: `npm install -g mintlify`
</Accordion>
<Accordion title="Issue: Encountering an unknown error">
Solution: Go to the root of your device and delete the \~/.mintlify folder. Afterwards, run `mintlify dev` again.
</Accordion>
</AccordionGroup>
Curious about what changed in the CLI version? [Check out the CLI changelog.](https://www.npmjs.com/package/mintlify?activeTab=versions)

123
docs/docs.json Normal file
View file

@ -0,0 +1,123 @@
{
"$schema": "https://mintlify.com/docs.json",
"theme": "mint",
"name": "Sourcebot",
"colors": {
"primary": "#851EE7",
"light": "#FFFFFF",
"dark": "#851EE7"
},
"favicon": "/fav.svg",
"styling": {
"eyebrows": "section"
},
"navigation": {
"anchors": [
{
"anchor": "Docs",
"icon": "book-open",
"groups": [
{
"group": "General",
"pages": [
"docs/overview",
"docs/getting-started",
"docs/getting-started-selfhost"
]
},
{
"group": "Connecting your code",
"pages": [
"docs/connections/overview",
"docs/connections/github",
"docs/connections/gitlab",
"docs/connections/gitea",
"docs/connections/gerrit",
"docs/connections/request-new"
]
},
{
"group": "More",
"pages": [
"docs/more/roles-and-permissions"
]
}
]
},
{
"anchor": "Self Hosting",
"icon": "server",
"groups": [
{
"group": "Getting Started",
"pages": [
"self-hosting/overview",
"self-hosting/configuration"
]
},
{
"group": "More",
"pages": [
"self-hosting/more/authentication",
"self-hosting/more/tenancy",
"self-hosting/more/transactional-emails",
"self-hosting/more/declarative-config"
]
},
{
"group": "Security",
"pages": [
]
},
{
"group": "Upgrade",
"pages": [
"self-hosting/upgrade/v2-to-v3-guide"
]
}
]
},
{
"anchor": "Changelog",
"href": "https://sourcebot.dev/changelog",
"icon": "list-check"
},
{
"anchor": "Support",
"href": "https://github.com/sourcebot-dev/sourcebot/discussions/categories/support",
"icon": "life-ring"
}
]
},
"logo": {
"light": "/logo/light.png",
"dark": "/logo/dark.png"
},
"navbar": {
"links": [
{
"label": "GitHub",
"href": "https://github.com/sourcebot-dev/sourcebot"
}
],
"primary": {
"type": "button",
"label": "Sourcebot Cloud",
"href": "https://app.sourcebot.dev"
}
},
"footer": {
"socials": {
"github": "https://github.com/sourcebot-dev/sourcebot"
}
},
"integrations": {
"posthog": {
"apiKey": "phc_DBGufjG0rkj3OEhuTcZ04xfeZB6eDhO7dP8ZCnqH7K7"
}
},
"appearance": {
"default": "dark",
"strict": true
}
}

View file

@ -0,0 +1,125 @@
---
title: Linking code from Gerrit
sidebarTitle: Gerrit
---
<Note>Authenticating with Gerrit is currently not supported. If you need this capability, please raise a [feature request](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).</Note>
Sourcebot can sync code from self-hosted gerrit instances.
## Connecting to a Gerrit instance
To connect to a gerrit instance, provide the `url` property to your config:
```json
{
"type": "gerrit",
"url": "https://gerrit.example.com"
// .. rest of config ..
}
```
## Examples
<AccordionGroup>
<Accordion title="Sync projects by glob pattern">
```json
{
"type": "gerrit",
"url": "https://gerrit.example.com",
// Sync all repos under project1 and project2/sub-project
"projects": [
"project1/**",
"project2/sub-project/**"
]
}
```
</Accordion>
<Accordion title="Exclude repos from syncing">
```json
{
"type": "gerrit",
"url": "https://gerrit.example.com",
// Sync all repos under project1 and project2/sub-project...
"projects": [
"project1/**",
"project2/sub-project/**"
],
// ...except:
"exclude": {
// any project that matches these glob patterns
"projects": [
"project1/foo-project",
"project2/sub-project/some-sub-folder/**"
]
}
}
```
</Accordion>
</AccordionGroup>
## Schema reference
<Accordion title="Reference">
[schemas/v3/gerrit.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/gerrit.json)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GerritConnectionConfig",
"properties": {
"type": {
"const": "gerrit",
"description": "Gerrit Configuration"
},
"url": {
"type": "string",
"format": "url",
"description": "The URL of the Gerrit host.",
"examples": [
"https://gerrit.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of specific projects to sync. If not specified, all projects will be synced. Glob patterns are supported",
"examples": [
[
"project1/repo1",
"project2/**"
]
]
},
"exclude": {
"type": "object",
"properties": {
"projects": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"project1/repo1",
"project2/**"
]
],
"description": "List of specific projects to exclude from syncing."
}
},
"additionalProperties": false
}
},
"required": [
"type",
"url"
],
"additionalProperties": false
}
```
</Accordion>

View file

@ -0,0 +1,308 @@
---
title: Linking code from Gitea
sidebarTitle: Gitea
---
Sourcebot can sync code from Gitea Cloud, and self-hosted.
## Examples
<AccordionGroup>
<Accordion title="Sync individual repos">
```json
{
"type": "gitea",
"repos": [
"sourcebot-dev/sourcebot",
"getsentry/sentry",
"torvalds/linux"
]
}
```
</Accordion>
<Accordion title="Sync all repos in a organization">
```json
{
"type": "gitea",
"orgs": [
"sourcebot-dev",
"getsentry",
"vercel"
]
}
```
</Accordion>
<Accordion title="Sync all repos owned by a user">
```json
{
"type": "gitea",
"users": [
"torvalds",
"ggerganov"
]
}
```
</Accordion>
<Accordion title="Exclude repos from syncing">
```json
{
"type": "gitea",
// Include all repos in my-org...
"orgs": [
"my-org"
],
// ...except:
"exclude": {
// repos that are archived
"archived": true,
// repos that are forks
"forks": true,
// repos that match these glob patterns
"repos": [
"my-org/repo1",
"my-org/repo2",
"my-org/sub-org-1/**",
"my-org/sub-org-*/**"
]
}
}
```
</Accordion>
</AccordionGroup>
## Authenticating with Gitea
In order to index private repositories, you'll need to generate a Gitea access token. Generate a Gitea access token [here](http://gitea.com/user/settings/applications). At minimum, you'll need to select the `read:repository` scope. `read:user` and `read:organization` are required for the `user` and `org` fields of your config file:
![Gitea Access token creation](/images/gitea_pat_creation.png)
Next, provide the access token via the `token` property, either as an environment variable or a secret:
<Tabs>
<Tab title="Environment Variable">
<Note>Environment variables are only supported in a [declarative config](/self-hosting/more/declarative-config) and cannot be used in the web UI.</Note>
1. Add the `token` property to your connection config:
```json
{
"type": "gitea",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITEA_TOKEN`.
"env": "GITEA_TOKEN"
}
// .. rest of config ..
}
```
2. Pass this environment variable each time you run Sourcebot:
```bash
docker run \
-e GITEA_TOKEN=<PAT> \
/* additional args */ \
ghcr.io/sourcebot-dev/sourcebot:latest
```
</Tab>
<Tab title="Secret">
<Note>Secrets are only supported when [authentication](/self-hosting/more/authentication) is enabled.</Note>
1. Navigate to **Secrets** in settings and create a new secret with your PAT:
![](/images/secrets_list.png)
2. Add the `token` property to your connection config:
```json
{
"type": "gitea",
"token": {
"secret": "mysecret"
}
// .. rest of config ..
}
```
</Tab>
</Tabs>
## Connecting to a custom Gitea
To connect to a custom Gitea deployment, provide the `url` property to your config:
```json
{
"type": "gitea",
"url": "https://gitea.example.com"
// .. rest of config ..
}
```
## Schema reference
<Accordion title="Reference">
[schemas/v3/gitea.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/gitea.json)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GiteaConnectionConfig",
"properties": {
"type": {
"const": "gitea",
"description": "Gitea Configuration"
},
"token": {
"description": "A Personal Access Token (PAT).",
"examples": [
{
"secret": "SECRET_KEY"
}
],
"anyOf": [
{
"type": "object",
"properties": {
"secret": {
"type": "string",
"description": "The name of the secret that contains the token."
}
},
"required": [
"secret"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"env": {
"type": "string",
"description": "The name of the environment variable that contains the token. Only supported in declarative connection configs."
}
},
"required": [
"env"
],
"additionalProperties": false
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://gitea.com",
"description": "The URL of the Gitea host. Defaults to https://gitea.com",
"examples": [
"https://gitea.com",
"https://gitea.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"orgs": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-org-name"
]
],
"description": "List of organizations to sync with. All repositories in the organization visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property. If a `token` is provided, it must have the read:organization scope."
},
"repos": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+\\/[\\w.-]+$"
},
"description": "List of individual repositories to sync with. Expected to be formatted as '{orgName}/{repoName}' or '{userName}/{repoName}'."
},
"users": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"username-1",
"username-2"
]
],
"description": "List of users to sync with. All repositories that the user owns will be synced, unless explicitly defined in the `exclude` property. If a `token` is provided, it must have the read:user scope."
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked repositories from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived repositories from syncing."
},
"repos": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of individual repositories to exclude from syncing. Glob patterns are supported."
}
},
"additionalProperties": false
},
"revisions": {
"type": "object",
"description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 revisions can be indexed, with any additional revisions being ignored.",
"properties": {
"branches": {
"type": "array",
"description": "List of branches to include when indexing. For a given repo, only the branches that exist on the repo's remote *and* match at least one of the provided `branches` will be indexed. The default branch (HEAD) is always indexed. Glob patterns are supported. A maximum of 64 branches can be indexed, with any additional branches being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"main",
"release/*"
],
[
"**"
]
],
"default": []
},
"tags": {
"type": "array",
"description": "List of tags to include when indexing. For a given repo, only the tags that exist on the repo's remote *and* match at least one of the provided `tags` will be indexed. Glob patterns are supported. A maximum of 64 tags can be indexed, with any additional tags being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"latest",
"v2.*.*"
],
[
"**"
]
],
"default": []
}
},
"additionalProperties": false
}
},
"required": [
"type"
],
"additionalProperties": false
}
```
</Accordion>

View file

@ -0,0 +1,391 @@
---
title: Linking code from GitHub
sidebarTitle: GitHub
---
Sourcebot can sync code from GitHub.com, GitHub Enterprise Server, and GitHub Enterprise Cloud.
## Examples
<AccordionGroup>
<Accordion title="Sync individual repos">
```json
{
"type": "github",
"repos": [
"sourcebot-dev/sourcebot",
"getsentry/sentry",
"torvalds/linux"
]
}
```
</Accordion>
<Accordion title="Sync all repos in a organization">
```json
{
"type": "github",
"orgs": [
"sourcebot-dev",
"getsentry",
"vercel"
]
}
```
</Accordion>
<Accordion title="Sync all repos owned by a user">
```json
{
"type": "github",
"users": [
"torvalds",
"ggerganov"
]
}
```
</Accordion>
<Accordion title="Filter repos by topic">
```json
{
"type": "github",
// Sync all repos in `my-org` that have a topic that...
"orgs": [
"my-org"
],
// ...match one of these glob patterns.
"topics": [
"test-*",
"ci-*",
"k8s"
]
}
```
</Accordion>
<Accordion title="Exclude repos from syncing">
```json
{
"type": "github",
// Include all repos in my-org...
"orgs": [
"my-org"
],
// ...except:
"exclude": {
// repos that are archived
"archived": true,
// repos that are forks
"forks": true,
// repos that match these glob patterns
"repos": [
"my-org/repo1",
"my-org/repo2",
"my-org/sub-org-1/**",
"my-org/sub-org-*/**"
],
"size": {
// repos that are less than 1MB (in bytes)...
"min": 1048576,
// or repos greater than 100MB (in bytes)
"max": 104857600
},
// repos with topics that match these glob patterns
"topics": [
"test-*",
"ci"
]
}
}
```
</Accordion>
</AccordionGroup>
## Authenticating with GitHub
In order to index private repositories, you'll need to generate a GitHub Personal Access Token (PAT). Create a new PAT [here](https://github.com/settings/tokens/new) and make sure you select the `repo` scope:
![GitHub PAT Scope](/images/github_pat_scopes.png)
Next, provide the PAT via the `token` property, either as an environment variable or a secret:
<Tabs>
<Tab title="Environment Variable">
<Note>Environment variables are only supported in a [declarative config](/self-hosting/more/declarative-config) and cannot be used in the web UI.</Note>
1. Add the `token` property to your connection config:
```json
{
"type": "github",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITHUB_TOKEN`.
"env": "GITHUB_TOKEN"
}
// .. rest of config ..
}
```
2. Pass this environment variable each time you run Sourcebot:
```bash
docker run \
-e GITHUB_TOKEN=<PAT> \
/* additional args */ \
ghcr.io/sourcebot-dev/sourcebot:latest
```
</Tab>
<Tab title="Secret">
<Note>Secrets are only supported when [authentication](/self-hosting/more/authentication) is enabled.</Note>
1. Navigate to **Secrets** in settings and create a new secret with your PAT:
![](/images/secrets_list.png)
2. Add the `token` property to your connection config:
```json
{
"type": "github",
"token": {
"secret": "mysecret"
}
// .. rest of config ..
}
```
</Tab>
</Tabs>
## Connecting to a custom GitHub host
To connect to a GitHub host other than `github.com`, provide the `url` property to your config:
```json
{
"type": "github",
"url": "https://github.example.com"
// .. rest of config ..
}
```
## Schema reference
<Accordion title="Reference">
[schemas/v3/github.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/github.json)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GithubConnectionConfig",
"properties": {
"type": {
"const": "github",
"description": "GitHub Configuration"
},
"token": {
"description": "A Personal Access Token (PAT).",
"examples": [
{
"secret": "SECRET_KEY"
}
],
"anyOf": [
{
"type": "object",
"properties": {
"secret": {
"type": "string",
"description": "The name of the secret that contains the token."
}
},
"required": [
"secret"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"env": {
"type": "string",
"description": "The name of the environment variable that contains the token. Only supported in declarative connection configs."
}
},
"required": [
"env"
],
"additionalProperties": false
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://github.com",
"description": "The URL of the GitHub host. Defaults to https://github.com",
"examples": [
"https://github.com",
"https://github.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"users": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+$"
},
"default": [],
"examples": [
[
"torvalds",
"DHH"
]
],
"description": "List of users to sync with. All repositories that the user owns will be synced, unless explicitly defined in the `exclude` property."
},
"orgs": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+$"
},
"default": [],
"examples": [
[
"my-org-name"
],
[
"sourcebot-dev",
"commaai"
]
],
"description": "List of organizations to sync with. All repositories in the organization visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property."
},
"repos": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+\\/[\\w.-]+$"
},
"default": [],
"description": "List of individual repositories to sync with. Expected to be formatted as '{orgName}/{repoName}' or '{userName}/{repoName}'."
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"default": [],
"description": "List of repository topics to include when syncing. Only repositories that match at least one of the provided `topics` will be synced. If not specified, all repositories will be synced, unless explicitly defined in the `exclude` property. Glob patterns are supported.",
"examples": [
[
"docs",
"core"
]
]
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked repositories from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived repositories from syncing."
},
"repos": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of individual repositories to exclude from syncing. Glob patterns are supported."
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of repository topics to exclude when syncing. Repositories that match one of the provided `topics` will be excluded from syncing. Glob patterns are supported.",
"examples": [
[
"tests",
"ci"
]
]
},
"size": {
"type": "object",
"description": "Exclude repositories based on their disk usage. Note: the disk usage is calculated by GitHub and may not reflect the actual disk usage when cloned.",
"properties": {
"min": {
"type": "integer",
"description": "Minimum repository size (in bytes) to sync (inclusive). Repositories less than this size will be excluded from syncing."
},
"max": {
"type": "integer",
"description": "Maximum repository size (in bytes) to sync (inclusive). Repositories greater than this size will be excluded from syncing."
}
},
"additionalProperties": false
}
},
"additionalProperties": false
},
"revisions": {
"type": "object",
"description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 revisions can be indexed, with any additional revisions being ignored.",
"properties": {
"branches": {
"type": "array",
"description": "List of branches to include when indexing. For a given repo, only the branches that exist on the repo's remote *and* match at least one of the provided `branches` will be indexed. The default branch (HEAD) is always indexed. Glob patterns are supported. A maximum of 64 branches can be indexed, with any additional branches being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"main",
"release/*"
],
[
"**"
]
],
"default": []
},
"tags": {
"type": "array",
"description": "List of tags to include when indexing. For a given repo, only the tags that exist on the repo's remote *and* match at least one of the provided `tags` will be indexed. Glob patterns are supported. A maximum of 64 tags can be indexed, with any additional tags being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"latest",
"v2.*.*"
],
[
"**"
]
],
"default": []
}
},
"additionalProperties": false
}
},
"required": [
"type"
],
"additionalProperties": false
}
```
</Accordion>

View file

@ -0,0 +1,384 @@
---
title: Linking code from GitLab
sidebarTitle: GitLab
---
Sourcebot can sync code from GitLab.com, Self Managed (CE & EE), and Dedicated.
## Examples
<AccordionGroup>
<Accordion title="Sync individual projects">
```json
{
"type": "gitlab",
"projects": [
"my-group/foo",
"my-group/subgroup/bar"
]
}
```
</Accordion>
<Accordion title="Sync all projects in a group/subgroup">
```json
{
"type": "gitlab",
"groups": [
"my-group",
"my-other-group/sub-group"
]
}
```
</Accordion>
<Accordion title="Sync all projects in a self managed instance">
<Note>This option is ignored if `url` is unset. See [connecting to a custom gitlab host](/docs/connections/gitlab#connecting-to-a-custom-gitlab-host).</Note>
```json
{
"type": "gitlab",
"url": "https://gitlab.example.com",
// Index all projects in this self-managed instance
"all": true
}
```
</Accordion>
<Accordion title="Sync all projects owned by a user">
```json
{
"type": "gitlab",
"users": [
"user-1",
"user-2"
]
}
```
</Accordion>
<Accordion title="Filter projects by topic">
```json
{
"type": "gitlab",
// Sync all projects in `my-group` that have a topic that...
"groups": [
"my-group"
],
// ...match one of these glob patterns.
"topics": [
"test-*",
"ci-*",
"k8s"
]
}
```
</Accordion>
<Accordion title="Exclude projects from syncing">
```json
{
"type": "gitlab",
// Include all projects in these groups...
"groups": [
"my-group",
"my-other-group/sub-group"
]
// ...except:
"exclude": {
// projects that are archived
"archived": true,
// projects that are forks
"forks": true,
// projects that match these glob patterns
"projects": [
"my-group/foo/**",
"my-group/bar/**",
"my-other-group/sub-group/specific-project"
],
// repos with topics that match these glob patterns
"topics": [
"test-*",
"ci"
]
}
}
```
</Accordion>
</AccordionGroup>
## Authenticating with GitLab
In order to index private projects, you'll need to generate a GitLab Personal Access Token (PAT). Create a new PAT [here](https://gitlab.com/-/user_settings/personal_access_tokens) and make sure you select the `read_api` scope:
![GitLab PAT Scope](/images/gitlab_pat_scopes.png)
Next, provide the PAT via the `token` property, either as an environment variable or a secret:
<Tabs>
<Tab title="Environment Variable">
<Note>Environment variables are only supported in a [declarative config](/self-hosting/more/declarative-config) and cannot be used in the web UI.</Note>
1. Add the `token` property to your connection config:
```json
{
"type": "gitlab",
"token": {
// note: this env var can be named anything. It
// doesn't need to be `GITLAB_TOKEN`.
"env": "GITLAB_TOKEN"
}
// .. rest of config ..
}
```
2. Pass this environment variable each time you run Sourcebot:
```bash
docker run \
-e GITLAB_TOKEN=<PAT> \
/* additional args */ \
ghcr.io/sourcebot-dev/sourcebot:latest
```
</Tab>
<Tab title="Secret">
<Note>Secrets are only supported when [authentication](/self-hosting/more/authentication) is enabled.</Note>
1. Navigate to **Secrets** in settings and create a new secret with your PAT:
![](/images/secrets_list.png)
2. Add the `token` property to your connection config:
```json
{
"type": "gitlab",
"token": {
"secret": "mysecret"
}
// .. rest of config ..
}
```
</Tab>
</Tabs>
## Connecting to a custom GitLab host
To connect to a GitLab host other than `gitlab.com`, provide the `url` property to your config:
```json
{
"type": "gitlab",
"url": "https://gitlab.example.com"
// .. rest of config ..
}
```
## Schema reference
<Accordion title="Reference">
[schemas/v3/gitlab.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/gitlab.json)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GitlabConnectionConfig",
"properties": {
"type": {
"const": "gitlab",
"description": "GitLab Configuration"
},
"token": {
"description": "An authentication token.",
"examples": [
{
"secret": "SECRET_KEY"
}
],
"anyOf": [
{
"type": "object",
"properties": {
"secret": {
"type": "string",
"description": "The name of the secret that contains the token."
}
},
"required": [
"secret"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"env": {
"type": "string",
"description": "The name of the environment variable that contains the token. Only supported in declarative connection configs."
}
},
"required": [
"env"
],
"additionalProperties": false
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://gitlab.com",
"description": "The URL of the GitLab host. Defaults to https://gitlab.com",
"examples": [
"https://gitlab.com",
"https://gitlab.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"all": {
"type": "boolean",
"default": false,
"description": "Sync all projects visible to the provided `token` (if any) in the GitLab instance. This option is ignored if `url` is either unset or set to https://gitlab.com ."
},
"users": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of users to sync with. All projects owned by the user and visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property."
},
"groups": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-group"
],
[
"my-group/sub-group-a",
"my-group/sub-group-b"
]
],
"description": "List of groups to sync with. All projects in the group (and recursive subgroups) visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property. Subgroups can be specified by providing the path to the subgroup (e.g. `my-group/sub-group-a`)."
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-group/my-project"
],
[
"my-group/my-sub-group/my-project"
]
],
"description": "List of individual projects to sync with. The project's namespace must be specified. See: https://docs.gitlab.com/ee/user/namespace/"
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"description": "List of project topics to include when syncing. Only projects that match at least one of the provided `topics` will be synced. If not specified, all projects will be synced, unless explicitly defined in the `exclude` property. Glob patterns are supported.",
"examples": [
[
"docs",
"core"
]
]
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked projects from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived projects from syncing."
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"examples": [
[
"my-group/my-project"
]
],
"description": "List of projects to exclude from syncing. Glob patterns are supported. The project's namespace must be specified, see: https://docs.gitlab.com/ee/user/namespace/"
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of project topics to exclude when syncing. Projects that match one of the provided `topics` will be excluded from syncing. Glob patterns are supported.",
"examples": [
[
"tests",
"ci"
]
]
}
},
"additionalProperties": false
},
"revisions": {
"type": "object",
"description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 revisions can be indexed, with any additional revisions being ignored.",
"properties": {
"branches": {
"type": "array",
"description": "List of branches to include when indexing. For a given repo, only the branches that exist on the repo's remote *and* match at least one of the provided `branches` will be indexed. The default branch (HEAD) is always indexed. Glob patterns are supported. A maximum of 64 branches can be indexed, with any additional branches being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"main",
"release/*"
],
[
"**"
]
],
"default": []
},
"tags": {
"type": "array",
"description": "List of tags to include when indexing. For a given repo, only the tags that exist on the repo's remote *and* match at least one of the provided `tags` will be indexed. Glob patterns are supported. A maximum of 64 tags can be indexed, with any additional tags being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"latest",
"v2.*.*"
],
[
"**"
]
],
"default": []
}
},
"additionalProperties": false
}
},
"required": [
"type"
],
"additionalProperties": false
}
```
</Accordion>

View file

@ -0,0 +1,33 @@
---
title: Overview
sidebarTitle: Overview
---
To connect your code to Sourcebot you create **connections**. A **connection** is a configuration object that describes how Sourcebot should fetch information from a supported code host.
There are two ways to define connections:
<AccordionGroup>
<Accordion title="Declarative configuration file">
This is only supported when self-hosting, and is the default mechanism to define connections. Connections are defined in a [JSON file](/self-hosting/more/declarative-config)
and the path to the file is provided through the `CONFIG_PATH` environment variable
</Accordion>
<Accordion title="UI connection management">
This is the only way to define connections when using Sourcebot Cloud, and can be configured when self-hosting by enabling [authentication](/self-hosting/more/authentications).
In this method, connections are defined and managed within the webapp:
![Connections page](/images/connection_page.png)
</Accordion>
</AccordionGroup>
### Supported code hosts
<CardGroup cols={2}>
<Card horizontal title="GitHub" icon="github" href="/docs/connections/github" />
<Card horizontal title="GitLab" icon="gitlab" href="/docs/connections/gitlab" />
<Card horizontal title="Gitea" href="/docs/connections/gitea" />
<Card horizontal title="Gerrit" href="/docs/connections/gerrit" />
</CardGroup>
<Note>Missing your code host? [Submit a feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).</Note>

View file

@ -0,0 +1,7 @@
---
sidebarTitle: Request another host
url: https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas
title: Request another code host
---
Is your code host not supported? Please open a [feature request](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).

View file

@ -0,0 +1,8 @@
---
sidebarTitle: Quick start guide (self-host)
url: /self-hosting/overview
---
{/*This page acts as a navigation link*/}
[Quick start guide (self-host)](/self-hosting/overview)

View file

@ -0,0 +1,55 @@
---
title: Cloud quick start guide
sidebarTitle: Quick start guide (cloud)
---
<Note>Looking for a self-hosted solution? Checkout our [self-hosting docs](/self-hosting/overview).</Note>
This page will provide a quick walkthrough of how to get onboarded on Sourcebot, import your code, and start searching.
{/*@todo: record a quick start guide
<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/4KzFe50RQkQ"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>
*/}
<Steps>
<Step title="Register an account">
Head over to [app.sourcebot.dev](https://app.sourcebot.dev) and create an account.
</Step>
<Step title="Create an organization">
After logging in, you'll be asked to create an organization. You'll invite your team members to this organization later so they can also use Sourcebot.
![Org Creation](/images/org_create.png)
</Step>
<Step title="Link your code host">
After selecting a code host you want to connect to, you'll be presented with the connection creation page. This page has the following three inputs:
- Connection name (required): The name of the connection within Sourcebot
- Secret (optional): An [access token](/access-tokens/overview) that is used to fetch private repos
- Configuration: The JSON configuration schema that defines the repos/orgs to fetch.
For a more detailed explanation of connections, check out the [Connections](/docs/connections/overview) page.
The example below shows a connection named `sourcebot-org` that fetches all of the repos for the `sourcebot-dev` GitHub organization, but excludes the `sourcebot-dev/zoekt` repo
<Note>This page won't let you continue with an invalid connection schema. If you're hitting errors, make sure the input you're providing is a valid JSON</Note>
![Connection Create Example](/images/create_connection_example.png)
</Step>
</Steps>
### Search
Once you create your organization's first connection successfully, you'll be redirected to your org's main search page. From here, you can use the search bar to search across all
of the repos you've indexed
![Onboard Complete](/images/onboard_complete.png)
Congrats, you've successfuly setup Sourcebot! Read on to learn more about the Sourcebot's capabilities. Checkout the [Connections](/docs/connections/overview) page to learn how to control which repos Sourcebot fetches

View file

@ -0,0 +1,13 @@
---
title: Roles and Permissions
---
<Note>Looking to sync permissions with your identify provider? We're working on it - [reach out](https://www.sourcebot.dev/contact) to us to learn more</Note>
If you're using Sourcebot Cloud, or are self-hosting with [authentication](/self-hosting/more/authentication) enabled, you may have multiple members in your organization. Each
member has a role which defines their permissions:
| Role | Permission |
| :--- | :--------- |
| `Owner` | Each organization has a single `Owner`. This user has full access rights, including: connection management, organization management, and inviting new members. |
| `Member` | Read-only access to the organization. A `Member` can search across the repos indexed by an organization's connections, but may not manage the organization or its connections. |

22
docs/docs/overview.mdx Normal file
View file

@ -0,0 +1,22 @@
---
title: "Overview"
---
import ConnectionCards from '/snippets/connection-cards.mdx';
Sourcebot is an **[open-source](https://github.com/sourcebot-dev/sourcebot) code search tool** that is purpose built to search multi-million line codebases in seconds. It integrates with [GitHub](/docs/connections/github), [GitLab](/docs/connections/gitlab), and [other platforms](/docs/connections).
## Getting Started
There are two ways to get started using Sourcebot:
<CardGroup cols={2}>
<Card horizontal title="Self-Host" icon="server" href="/self-hosting/overview">
Deploy Sourcebot on your own infrastructure.
</Card>
<Card horizontal title="Sourcebot Cloud" icon="cloud" href="/docs/getting-started">
Use Sourcebot on our managed infrastructure.
</Card>
</CardGroup>
We also have a [public demo](https://sourcebot.dev/search) if you'd like to try Sourcebot out before registering.

9
docs/fav.svg Normal file
View file

@ -0,0 +1,9 @@
<svg width="100" height="100" viewBox="0 0 100 100" fill="none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<rect width="100" height="100" fill="url(#pattern0_64_7)"/>
<defs>
<pattern id="pattern0_64_7" patternContentUnits="objectBoundingBox" width="1" height="1">
<use xlink:href="#image0_64_7" transform="scale(0.03125)"/>
</pattern>
<image id="image0_64_7" width="32" height="32" preserveAspectRatio="none" xlink:href=""/>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 284 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

BIN
docs/images/demo.mp4 Normal file

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 188 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

BIN
docs/images/login.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 402 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 215 KiB

BIN
docs/images/org_create.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 183 KiB

BIN
docs/images/org_switch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
docs/logo/dark.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

21
docs/logo/dark.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 12 KiB

BIN
docs/logo/light.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

21
docs/logo/light.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 12 KiB

View file

@ -0,0 +1,59 @@
---
title: Configuration
sidebarTitle: Configuration
---
## Environment Variables
Sourcebot accepts a variety of environment variables to fine tune your deployment.
| Variable | Default | Description |
| :------- | :------ | :---------- |
| `SOURCEBOT_LOG_LEVEL` | `info` | The Sourcebot logging level. Valid values are `debug`, `info`, `warn`, `error`, in order of severity. |
| `DATABASE_URL` | `postgresql://postgres@ localhost:5432/sourcebot` | Connection string of your Postgres database. By default, a Postgres database is automatically provisioned at startup within the container. |
| `REDIS_URL` | `redis://localhost:6379` | Connection string of your Redis instance. By default, a Redis database is automatically provisioned at startup within the container. |
| `SOURCEBOT_ENCRYPTION_KEY` | - | Used to encrypt connection secrets. Generated using `openssl rand -base64 24`. Automatically generated at startup if no value is provided. |
| `AUTH_SECRET` | - | Used to validate login session cookies. Generated using `openssl rand -base64 33`. Automatically generated at startup if no value is provided. |
| `AUTH_URL` | - | URL of your Sourcebot deployment, e.g., `https://example.com` or `http://localhost:3000`. Required when `SOURCEBOT_AUTH_ENABLED` is `true`. |
| `SOURCEBOT_TENANCY_MODE` | `single` | The tenancy configuration for Sourcebot. Valid values are `single` or `multi`. See [this doc](/self-hosting/more/tenancy) for more info. |
| `SOURCEBOT_AUTH_ENABLED` | `false` | Enables/disables authentication in Sourcebot. If set to `false`, `SOURCEBOT_TENANCY_MODE` must be `single`. See [this doc](/self-hosting/more/authentication) for more info. |
| `SOURCEBOT_TELEMETRY_DISABLED` | `false` | Enables/disables telemetry collection in Sourcebot. See [this doc](/self-hosting/security/telemetry) for more info. |
| `DATA_DIR` | `/data` | The directory within the container to store all persistent data. Typically, this directory will be volume mapped such that data is persisted across container restarts (e.g., `docker run -v $(pwd):/data`) |
| `DATA_CACHE_DIR` | `$DATA_DIR/.sourcebot` | The root data directory in which all data written to disk by Sourcebot will be located. |
| `DATABASE_DATA_DIR` | `$DATA_CACHE_DIR/db` | The data directory for the default Postgres database. |
| `REDIS_DATA_DIR` | `$DATA_CACHE_DIR/redis` | The data directory for the default Redis instance. |
## Additional Features
There are additional features that can be enabled and configured via environment variables.
<CardGroup cols={2}>
<Card horizontal title="Authentication" icon="lock" href="/self-hosting/more/authentication" />
<Card horizontal title="Tenancy" icon="users" href="/self-hosting/more/tenancy" />
<Card horizontal title="Transactional Emails" icon="envelope" href="/self-hosting/more/transactional-emails" />
<Card horizontal title="Declarative Configs" icon="page" href="/self-hosting/more/declarative-config" />
</CardGroup>
## Health Check and Version Endpoints
Sourcebot includes a health check endpoint that indicates if the application is alive, returning `200 OK` if it is:
```sh
curl http://localhost:3000/api/health
```
It also includes a version endpoint to check the current version of the application:
```sh
curl http://localhost:3000/api/version
```
Sample response:
```json
{
"version": "v3.0.0"
}
```

View file

@ -0,0 +1,63 @@
---
title: Authentication
sidebarTitle: Authentication
---
<Note>SSO is currently not supported. If you'd like SSO, please reach out using our [contact form](https://www.sourcebot.dev/contact)</Note>
<Warning>If you're switching from non-auth, delete the Sourcebot cache (the `.sourcebot` folder) before starting.</Warning>
Sourcebot has built-in authentication that gates access to your organization. OAuth, email codes, and email / password are supported. To enable authentication, set the `SOURCEBOT_AUTH_ENABLED` environment variable to `true`.
When authentication is enabled:
- [Connection managment](/docs/connections/overview) happens through the UI
- Members must be invited to an organization to gain access
- If you're in single-tenant mode, the first user to register will be made the owner of the default organization. Check out the [roles page](/docs/more/roles-and-permissions) for more info on the different roles and permissions
![Login Page](/images/login.png)
# Authentication Providers
<Warning>Make sure the `AUTH_URL` environment variable is [configured correctly](/self-hosting/configuration) when using Sourcebot in a deployed environment.</Warning>
To enable an authentication provider in Sourcebot, configure the required environment variables for the provider. Under the hood, Sourcebot uses Auth.js which supports [many providers](https://authjs.dev/getting-started/authentication/oauth). Submit a [feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas) if you want us to add support for a specific provider.
## Email / Password
---
Email / password authentication is enabled by default. It can be **disabled** by setting `AUTH_CREDENTIALS_LOGIN_ENABLED` to `false`.
## Email codes
---
Email codes are 6 digit codes sent to a provided email. Email codes are enabled when transactional emails are configured using the following environment variables:
- `SMTP_CONNECTION_URL`
- `EMAIL_FROM_ADDRESS`
See [transactional emails](/self-hosting/more/transactional-emails) for more details.
## GitHub
---
[Auth.js GitHub Provider Docs](https://authjs.dev/getting-started/providers/github)
**Required environment variables:**
- `AUTH_GITHUB_CLIENT_ID`
- `AUTH_GITHUB_CLIENT_SECRET`
## Google
---
[Auth.js Google Provider Docs](https://next-auth.js.org/providers/google)
**Required environment variables:**
- `AUTH_GOOGLE_CLIENT_ID`
- `AUTH_GOOGLE_CLIENT_SECRET`
---
# Troubleshooting
- If you experience issues logging in, logging out, or accessing an organization you should have access to, try clearing your cookies & performing a full page refresh (`Cmd/Ctrl + Shift + R` on most browsers).
- Still not working? Reach out to us on our [discord](https://discord.com/invite/6Fhp27x7Pb) or [github discussions](https://github.com/sourcebot-dev/sourcebot/discussions)

View file

@ -0,0 +1,624 @@
---
title: Configuring Sourcebot from a file (declarative config)
sidebarTitle: Declarative config
---
Some teams require Sourcebot to be configured via a file (where it can be stored in version control, run through CI/CD pipelines, etc.) instead of a web UI. For more information on configuring connections, see this [overview](/docs/connections/overview).
| Variable | Description |
| :------- | :---------- |
| `CONFIG_PATH` | Path to declarative config. |
```json
{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/refs/heads/main/schemas/v3/index.json",
"connections": {
"connection-1": {
"type": "github",
"repos": [
"sourcebot-dev/sourcebot"
]
}
}
}
```
## Schema reference
<Accordion title="Reference">
[schemas/v3/index.json](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/index.json)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "SourcebotConfig",
"definitions": {
"Settings": {
"type": "object",
"description": "Defines the globabl settings for Sourcebot.",
"properties": {
"maxFileSize": {
"type": "number",
"description": "The maximum size of a file (in bytes) to be indexed. Files that exceed this maximum will not be indexed. Defaults to 2MB.",
"minimum": 1
},
"maxTrigramCount": {
"type": "number",
"description": "The maximum number of trigrams per document. Files that exceed this maximum will not be indexed. Default to 20000.",
"minimum": 1
},
"reindexIntervalMs": {
"type": "number",
"description": "The interval (in milliseconds) at which the indexer should re-index all repositories. Defaults to 1 hour.",
"minimum": 1
},
"resyncConnectionPollingIntervalMs": {
"type": "number",
"description": "The polling rate (in milliseconds) at which the db should be checked for connections that need to be re-synced. Defaults to 1 second.",
"minimum": 1
},
"reindexRepoPollingIntervalMs": {
"type": "number",
"description": "The polling rate (in milliseconds) at which the db should be checked for repos that should be re-indexed. Defaults to 1 second.",
"minimum": 1
},
"maxConnectionSyncJobConcurrency": {
"type": "number",
"description": "The number of connection sync jobs to run concurrently. Defaults to 8.",
"minimum": 1
},
"maxRepoIndexingJobConcurrency": {
"type": "number",
"description": "The number of repo indexing jobs to run concurrently. Defaults to 8.",
"minimum": 1
},
"maxRepoGarbageCollectionJobConcurrency": {
"type": "number",
"description": "The number of repo GC jobs to run concurrently. Defaults to 8.",
"minimum": 1
},
"repoGarbageCollectionGracePeriodMs": {
"type": "number",
"description": "The grace period (in milliseconds) for garbage collection. Used to prevent deleting shards while they're being loaded. Defaults to 10 seconds.",
"minimum": 1
},
"repoIndexTimeoutMs": {
"type": "number",
"description": "The timeout (in milliseconds) for a repo indexing to timeout. Defaults to 2 hours.",
"minimum": 1
}
},
"additionalProperties": false
}
},
"properties": {
"$schema": {
"type": "string"
},
"settings": {
"$ref": "#/definitions/Settings"
},
"connections": {
"type": "object",
"description": "Defines a collection of connections from varying code hosts that Sourcebot should sync with. This is only available in single-tenancy mode.",
"patternProperties": {
"^[a-zA-Z0-9_-]+$": {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "ConnectionConfig",
"oneOf": [
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GithubConnectionConfig",
"properties": {
"type": {
"const": "github",
"description": "GitHub Configuration"
},
"token": {
"description": "A Personal Access Token (PAT).",
"examples": [
{
"secret": "SECRET_KEY"
}
],
"anyOf": [
{
"type": "object",
"properties": {
"secret": {
"type": "string",
"description": "The name of the secret that contains the token."
}
},
"required": [
"secret"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"env": {
"type": "string",
"description": "The name of the environment variable that contains the token. Only supported in declarative connection configs."
}
},
"required": [
"env"
],
"additionalProperties": false
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://github.com",
"description": "The URL of the GitHub host. Defaults to https://github.com",
"examples": [
"https://github.com",
"https://github.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"users": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+$"
},
"default": [],
"examples": [
[
"torvalds",
"DHH"
]
],
"description": "List of users to sync with. All repositories that the user owns will be synced, unless explicitly defined in the `exclude` property."
},
"orgs": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+$"
},
"default": [],
"examples": [
[
"my-org-name"
],
[
"sourcebot-dev",
"commaai"
]
],
"description": "List of organizations to sync with. All repositories in the organization visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property."
},
"repos": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+\\/[\\w.-]+$"
},
"default": [],
"description": "List of individual repositories to sync with. Expected to be formatted as '{orgName}/{repoName}' or '{userName}/{repoName}'."
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"default": [],
"description": "List of repository topics to include when syncing. Only repositories that match at least one of the provided `topics` will be synced. If not specified, all repositories will be synced, unless explicitly defined in the `exclude` property. Glob patterns are supported.",
"examples": [
[
"docs",
"core"
]
]
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked repositories from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived repositories from syncing."
},
"repos": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of individual repositories to exclude from syncing. Glob patterns are supported."
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of repository topics to exclude when syncing. Repositories that match one of the provided `topics` will be excluded from syncing. Glob patterns are supported.",
"examples": [
[
"tests",
"ci"
]
]
},
"size": {
"type": "object",
"description": "Exclude repositories based on their disk usage. Note: the disk usage is calculated by GitHub and may not reflect the actual disk usage when cloned.",
"properties": {
"min": {
"type": "integer",
"description": "Minimum repository size (in bytes) to sync (inclusive). Repositories less than this size will be excluded from syncing."
},
"max": {
"type": "integer",
"description": "Maximum repository size (in bytes) to sync (inclusive). Repositories greater than this size will be excluded from syncing."
}
},
"additionalProperties": false
}
},
"additionalProperties": false
},
"revisions": {
"type": "object",
"description": "The revisions (branches, tags) that should be included when indexing. The default branch (HEAD) is always indexed. A maximum of 64 revisions can be indexed, with any additional revisions being ignored.",
"properties": {
"branches": {
"type": "array",
"description": "List of branches to include when indexing. For a given repo, only the branches that exist on the repo's remote *and* match at least one of the provided `branches` will be indexed. The default branch (HEAD) is always indexed. Glob patterns are supported. A maximum of 64 branches can be indexed, with any additional branches being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"main",
"release/*"
],
[
"**"
]
],
"default": []
},
"tags": {
"type": "array",
"description": "List of tags to include when indexing. For a given repo, only the tags that exist on the repo's remote *and* match at least one of the provided `tags` will be indexed. Glob patterns are supported. A maximum of 64 tags can be indexed, with any additional tags being ignored.",
"items": {
"type": "string"
},
"examples": [
[
"latest",
"v2.*.*"
],
[
"**"
]
],
"default": []
}
},
"additionalProperties": false
}
},
"required": [
"type"
],
"additionalProperties": false
},
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GitlabConnectionConfig",
"properties": {
"type": {
"const": "gitlab",
"description": "GitLab Configuration"
},
"token": {
"$ref": "#/properties/connections/patternProperties/%5E%5Ba-zA-Z0-9_-%5D%2B%24/oneOf/0/properties/token",
"description": "An authentication token.",
"examples": [
{
"secret": "SECRET_KEY"
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://gitlab.com",
"description": "The URL of the GitLab host. Defaults to https://gitlab.com",
"examples": [
"https://gitlab.com",
"https://gitlab.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"all": {
"type": "boolean",
"default": false,
"description": "Sync all projects visible to the provided `token` (if any) in the GitLab instance. This option is ignored if `url` is either unset or set to https://gitlab.com ."
},
"users": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of users to sync with. All projects owned by the user and visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property."
},
"groups": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-group"
],
[
"my-group/sub-group-a",
"my-group/sub-group-b"
]
],
"description": "List of groups to sync with. All projects in the group (and recursive subgroups) visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property. Subgroups can be specified by providing the path to the subgroup (e.g. `my-group/sub-group-a`)."
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-group/my-project"
],
[
"my-group/my-sub-group/my-project"
]
],
"description": "List of individual projects to sync with. The project's namespace must be specified. See: https://docs.gitlab.com/ee/user/namespace/"
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"description": "List of project topics to include when syncing. Only projects that match at least one of the provided `topics` will be synced. If not specified, all projects will be synced, unless explicitly defined in the `exclude` property. Glob patterns are supported.",
"examples": [
[
"docs",
"core"
]
]
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked projects from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived projects from syncing."
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"examples": [
[
"my-group/my-project"
]
],
"description": "List of projects to exclude from syncing. Glob patterns are supported. The project's namespace must be specified, see: https://docs.gitlab.com/ee/user/namespace/"
},
"topics": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of project topics to exclude when syncing. Projects that match one of the provided `topics` will be excluded from syncing. Glob patterns are supported.",
"examples": [
[
"tests",
"ci"
]
]
}
},
"additionalProperties": false
},
"revisions": {
"$ref": "#/properties/connections/patternProperties/%5E%5Ba-zA-Z0-9_-%5D%2B%24/oneOf/0/properties/revisions"
}
},
"required": [
"type"
],
"additionalProperties": false
},
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GiteaConnectionConfig",
"properties": {
"type": {
"const": "gitea",
"description": "Gitea Configuration"
},
"token": {
"$ref": "#/properties/connections/patternProperties/%5E%5Ba-zA-Z0-9_-%5D%2B%24/oneOf/0/properties/token",
"description": "A Personal Access Token (PAT).",
"examples": [
{
"secret": "SECRET_KEY"
}
]
},
"url": {
"type": "string",
"format": "url",
"default": "https://gitea.com",
"description": "The URL of the Gitea host. Defaults to https://gitea.com",
"examples": [
"https://gitea.com",
"https://gitea.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"orgs": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"my-org-name"
]
],
"description": "List of organizations to sync with. All repositories in the organization visible to the provided `token` (if any) will be synced, unless explicitly defined in the `exclude` property. If a `token` is provided, it must have the read:organization scope."
},
"repos": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[\\w.-]+\\/[\\w.-]+$"
},
"description": "List of individual repositories to sync with. Expected to be formatted as '{orgName}/{repoName}' or '{userName}/{repoName}'."
},
"users": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"username-1",
"username-2"
]
],
"description": "List of users to sync with. All repositories that the user owns will be synced, unless explicitly defined in the `exclude` property. If a `token` is provided, it must have the read:user scope."
},
"exclude": {
"type": "object",
"properties": {
"forks": {
"type": "boolean",
"default": false,
"description": "Exclude forked repositories from syncing."
},
"archived": {
"type": "boolean",
"default": false,
"description": "Exclude archived repositories from syncing."
},
"repos": {
"type": "array",
"items": {
"type": "string"
},
"default": [],
"description": "List of individual repositories to exclude from syncing. Glob patterns are supported."
}
},
"additionalProperties": false
},
"revisions": {
"$ref": "#/properties/connections/patternProperties/%5E%5Ba-zA-Z0-9_-%5D%2B%24/oneOf/0/properties/revisions"
}
},
"required": [
"type"
],
"additionalProperties": false
},
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "GerritConnectionConfig",
"properties": {
"type": {
"const": "gerrit",
"description": "Gerrit Configuration"
},
"url": {
"type": "string",
"format": "url",
"description": "The URL of the Gerrit host.",
"examples": [
"https://gerrit.example.com"
],
"pattern": "^https?:\\/\\/[^\\s/$.?#].[^\\s]*$"
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of specific projects to sync. If not specified, all projects will be synced. Glob patterns are supported",
"examples": [
[
"project1/repo1",
"project2/**"
]
]
},
"exclude": {
"type": "object",
"properties": {
"projects": {
"type": "array",
"items": {
"type": "string"
},
"examples": [
[
"project1/repo1",
"project2/**"
]
],
"description": "List of specific projects to exclude from syncing."
}
},
"additionalProperties": false
}
},
"required": [
"type",
"url"
],
"additionalProperties": false
}
]
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
```
</Accordion>

View file

@ -0,0 +1,27 @@
---
title: Multi Tenancy Mode
sidebarTitle: Multi tenancy
---
<Warning>If you're switching from single-tenant mode, delete the Sourcebot cache (the `.sourcebot` folder) before starting.</Warning>
<Warning>[Authentication](/self-hosting/more/authentication) must be enabled to enable multi tenancy mode</Warning>
Multi tenancy allows your Sourcebot deployment to have **multiple organizations**, each with their own set of members and repos. To enable multi tenancy mode, define an environment variable
named `SOURCEBOT_AUTH_ENABLED` and set its value to `multi`. When multi tenancy mode is enabled:
- Any members or repos that are configured in an organization are isolated to that organization
- Members must be invited to an organization to gain access
- Members may be a part of multiple organizations and switch through them in the UI
### Organization creation form
When you sign in for the first time (assuming you didn't go through an invite), you'll be presented with the organization creation form. The member who creates
the organization will be the Owner.
![Org creation](/images/org_create.png)
### Switching between organizations
To switch between organizations, press the drop down on the top left of the navigation menu. This also provides an option to create a new organization:
![Org switching](/images/org_switch.png)

View file

@ -0,0 +1,14 @@
---
title: Transactional Email
sidebarTitle: Transactional email
---
To enable transactional emails in your deployment, set the following environment variables. We recommend using [Resend](https://resend.com/), but you can use any provider. Setting this enables you to:
- Send emails when new members are invited
- Log into the Sourcebot deployment using [email codes](self-hosting/more/authentication#email-codes)
| Variable | Description |
| :------- | :---------- |
| `SMTP_CONNECTION_URL` | SMTP server connection. |
| `EMAIL_FROM_ADDRESS` | The sender's email address |

View file

@ -0,0 +1,126 @@
---
title: Self-host Sourcebot
sidebarTitle: Overview
---
<Note>Want a managed solution? Checkout [Sourcebot Cloud](/docs/getting-started).</Note>
Sourcebot is open source and can be self-hosted using our official [Docker image](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot).
## Quick Start Guide
{/*@todo: record a self-hosting quick start guide
<iframe
width="560"
height="315"
src="https://www.youtube.com/embed/4KzFe50RQkQ"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>
*/}
<Steps>
<Step title="Create a config">
By default, Sourcebot requires a configuration file with a list of [code host connections](/docs/connections/overview) that specify what repositories should be **synced** (cloned and indexed). To get started, run the following command to create a starter `config.json`:
```bash
touch config.json
echo '{
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
"connections": {
// Comments are supported
"starter-connection": {
"type": "github",
"repos": [
"sourcebot-dev/sourcebot"
]
}
}
}' > config.json
```
This config creates a single GitHub connection named `starter-connection` that specifies [Sourcebot](https://github.com/sourcebot-dev/sourcebot) as a repo to sync.
</Step>
<Step title="Launch your instance">
Sourcebot is packaged as a [single Docker image](https://github.com/sourcebot-dev/sourcebot/pkgs/container/sourcebot). In the same directory as `config.json`, run the following command to start your instance:
``` bash
docker run \
-p 3000:3000 \
--pull=always \
--rm \
-v $(pwd):/data \
-e CONFIG_PATH=/data/config.json \
--name sourcebot \
ghcr.io/sourcebot-dev/sourcebot:latest
```
Navigate to `localhost:3000` to start searching the Sourcebot repo.
<Accordion title="Details">
**This command**:
- pulls the latest version of the `sourcebot` docker image.
- mounts the working directory to `/data` in the container to allow Sourcebot to persist data across restarts, and to access the `config.json`. In your local directory, you should see a `.sourcebot` folder created that contains all persistent data.
- runs any pending database migrations.
- starts up all services, including the webserver exposed on port 3000.
- reads `config.json` and starts syncing.
</Accordion>
<Warning>Hit an issue? Please let us know on [GitHub discussions](https://github.com/sourcebot-dev/sourcebot/discussions/categories/support) or by [emailing us](mailto:team@sourcebot.dev).</Warning>
</Step>
<Step title="Link your code">
Sourcebot supports indexing public & private code on the following code hosts:
<CardGroup cols={2}>
<Card horizontal title="GitHub" href="/docs/connections/github" />
<Card horizontal title="GitLab" href="/docs/connections/gitlab" />
<Card horizontal title="Gitea" href="/docs/connections/gitea" />
<Card horizontal title="Gerrit" href="/docs/connections/gerrit" />
</CardGroup>
<Note>Missing your code host? [Submit a feature request on GitHub](https://github.com/sourcebot-dev/sourcebot/discussions/categories/ideas).</Note>
</Step>
</Steps>
## Architecture
Sourcebot is shipped as a single docker container that runs a collection of services using [supervisord](https://supervisord.org/):
![architecture diagram](/images/architecture_diagram.png)
{/*TODO: outline the different services, how Sourcebot communicates with code hosts, and the different*/}
Sourcebot consists of the following components:
- **Web Server** : main Next.js web application serving the Sourcebot UI.
- **Backend Worker** : Node.js process that incrementally syncs with code hosts (e.g., GitHub, GitLab etc.) and asynchronously indexes configured repositories.
- **Zoekt** : the [open-source](https://github.com/sourcegraph/zoekt), trigram indexing code search engine that powers Sourcebot under the hood.
- **Postgres** : transactional database for storing business-logic data.
- **Redis Job Queue** : fast in-memory store. Used with [BullMQ](https://docs.bullmq.io/) for queuing asynchronous work.
- **`.sourcebot/` cache** : file-system cache where persistent data is written.
You can use managed Redis / Postgres services that run outside of the Sourcebot container by providing the `REDIS_URL` and `DATABASE_URL` environment variables, respectively. See the [configuration](/self-hosting/configuration) for more configuration options.
## Scalability
One of our design philosophies for Sourcebot is to keep our infrastructure [radically simple](https://www.radicalsimpli.city/) while balancing scalability concerns. Depending on the number of repositories you have indexed and the instance you are running Sourcebot on, you may experience slow search times or other performance degradations. Our recommendation is to vertically scale your instance by increasing the number of CPU cores and memory.
Sourcebot does not support horizontal scaling at this time, but it is on our roadmap. If this is something your team would be interested in, please contact us at [team@sourcebot.dev](mailto:team@sourcebot.dev).
## Telemetry
By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We don't collect or transmit <a href="https://sourcebot.dev/search/search?query=captureEvent%5C(%20repo%3Asourcebot">any information related to your codebase</a>. In addition, all events are [sanitized](https://github.com/sourcebot-dev/sourcebot/blob/HEAD/packages/web/src/app/posthogProvider.tsx) to ensure that no sensitive details (ex. ip address, query info) leave your machine.
The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made.
If you'd like to disable all telemetry, you can do so by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `true`:
```bash
docker run \
-e SOURCEBOT_TELEMETRY_DISABLED=true \
/* additional args */ \
ghcr.io/sourcebot-dev/sourcebot:latest
```

View file

@ -0,0 +1,93 @@
---
title: V2 to V3 Guide
sidebarTitle: V2 to V3 guide
---
This guide will walk you through upgrading your Sourcebot deployment from v2 to v3.
<Warning>
Please note that the following features are no longer supported in v3:
- Local file indexing
- Raw remote `.git` repo indexing (i.e. not through a supported code host)
If your deployment is dependent on these features, please [reach out](https://github.com/sourcebot-dev/sourcebot/discussions).
</Warning>
<Warning>This migration will require you to reindex all your repos</Warning>
<Steps>
<Step title="Spin down deployment">
</Step>
<Step title="Delete Sourcebot cache (.sourcebot directory)">
</Step>
<Step title="Migrate your configuration file to the v3 schema">
The main change between the v3 and v2 schemas is how the data is structured. In v2, you defined a `repos` array which contained unnamed config objects:
```json
{
"$schema": "./schemas/v2/index.json",
"repos": [
{
"type": "github",
"repos": [
"sourcebot-dev/sourcebot"
]
},
{
"type": "gitlab":
"groups": [
"wireshark"
]
}
]
}
```
In v3, you define a `connections` map which contains named `connection` objects:
```json
{
"$schema": "./schemas/v3/index.json",
"connections": {
"sourcebot-connection": {
"type": "github",
"repos": [
"sourcebot-dev/sourcebot"
]
},
"wireshark-connection": {
"type": "gitlab":
"groups": [
"wireshark
]
}
}
}
```
The schema of the connections defined here is the same as the "repos" you defined in the v2 schema. Some helpful notes:
- The name of the connection (`sourcebot-connection` and `wireshark-connection` above) is only used to identify the connection in Sourcebot. It can be any string that contains letters, digits, hyphens, or underscores
- A connection is associated with one and only one code host platform, and this must be specified in the connections `type` field
- Make sure you update the `$schema` field to point to the v3 schema
- The `settings` object doesn't need to be changed. We've added new settings params (check out the v3 schema for more details)
</Step>
<Step title="Start your Sourcebot deployment">
When you start up your Sourcebot deployment, it will create a fresh cache and begin indexing against your new v3 configuration file.
If there are issues with your configuration file it will provide an error in the console.
After updating your configuration file, restart your Sourcebot deployment to pick up the new changes.
</Step>
<Step title="You're done!">
Congrats, you've successfully migrated to v3! Please let us know what you think of the new features by reaching out on our [discord](https://discord.gg/6Fhp27x7Pb) or [GitHub discussion](https://github.com/sourcebot-dev/sourcebot/discussions/categories/support)
</Step>
</Steps>
## Troubleshooting
Some things to check:
- Make sure you update the `$schema` field in the configuration file to point to the v3 schema
- Make sure you have a name for each `connection`, and that the name only contains letters, digits, hyphens, or underscores
- Make sure each `connection` has a `type` field with a valid value (`gitlab`, `github`, `gitea`, `gerrit`)
Having troubles migrating from v2 to v3? Reach out to us on [discord](https://discord.gg/6Fhp27x7Pb) or [GitHub discussion](https://github.com/sourcebot-dev/sourcebot/discussions/categories/support) and we'll try our best to help

View file

@ -0,0 +1,4 @@
<CardGroup cols={2}>
<Card title="GitHub" icon="github" href="/docs/connections/github"></Card>
<Card title="GitLab" icon="gitlab" href="/docs/connections/gitlab"></Card>
</CardGroup>

View file

@ -1,16 +1,26 @@
#!/bin/sh #!/bin/sh
set -e set -e
echo -e "\e[34m[Info] Sourcebot version: $SOURCEBOT_VERSION\e[0m" echo -e "\e[34m[Info] Sourcebot version: $NEXT_PUBLIC_SOURCEBOT_VERSION\e[0m"
# If we don't have a PostHog key, then we need to disable telemetry. # If we don't have a PostHog key, then we need to disable telemetry.
if [ -z "$POSTHOG_PAPIK" ]; then if [ -z "$NEXT_PUBLIC_POSTHOG_PAPIK" ]; then
echo -e "\e[33m[Warning] POSTHOG_PAPIK was not set. Setting SOURCEBOT_TELEMETRY_DISABLED.\e[0m" echo -e "\e[33m[Warning] NEXT_PUBLIC_POSTHOG_PAPIK was not set. Setting SOURCEBOT_TELEMETRY_DISABLED.\e[0m"
export SOURCEBOT_TELEMETRY_DISABLED=1 export SOURCEBOT_TELEMETRY_DISABLED=true
fi
if [ -n "$SOURCEBOT_TELEMETRY_DISABLED" ]; then
# Validate that SOURCEBOT_TELEMETRY_DISABLED is either "true" or "false"
if [ "$SOURCEBOT_TELEMETRY_DISABLED" != "true" ] && [ "$SOURCEBOT_TELEMETRY_DISABLED" != "false" ]; then
echo -e "\e[31m[Error] SOURCEBOT_TELEMETRY_DISABLED must be either 'true' or 'false'. Got '$SOURCEBOT_TELEMETRY_DISABLED'\e[0m"
exit 1
fi
else
export SOURCEBOT_TELEMETRY_DISABLED=false
fi fi
# Issue a info message about telemetry # Issue a info message about telemetry
if [ ! -z "$SOURCEBOT_TELEMETRY_DISABLED" ]; then if [ "$SOURCEBOT_TELEMETRY_DISABLED" = "true" ]; then
echo -e "\e[34m[Info] Disabling telemetry since SOURCEBOT_TELEMETRY_DISABLED was set.\e[0m" echo -e "\e[34m[Info] Disabling telemetry since SOURCEBOT_TELEMETRY_DISABLED was set.\e[0m"
fi fi
@ -19,9 +29,59 @@ if [ ! -d "$DATA_CACHE_DIR" ]; then
mkdir -p "$DATA_CACHE_DIR" mkdir -p "$DATA_CACHE_DIR"
fi fi
# Check if DATABASE_DATA_DIR exists, if not initialize it
if [ ! -d "$DATABASE_DATA_DIR" ]; then
echo -e "\e[34m[Info] Initializing database at $DATABASE_DATA_DIR...\e[0m"
mkdir -p $DATABASE_DATA_DIR && chown -R postgres:postgres "$DATABASE_DATA_DIR"
su postgres -c "initdb -D $DATABASE_DATA_DIR"
fi
# Create the redis data directory if it doesn't exist
if [ ! -d "$REDIS_DATA_DIR" ]; then
mkdir -p $REDIS_DATA_DIR
fi
if [ -z "$SOURCEBOT_ENCRYPTION_KEY" ]; then
echo -e "\e[33m[Warning] SOURCEBOT_ENCRYPTION_KEY is not set.\e[0m"
if [ -f "$DATA_CACHE_DIR/.secret" ]; then
echo -e "\e[34m[Info] Loading environment variables from $DATA_CACHE_DIR/.secret\e[0m"
else
echo -e "\e[34m[Info] Generating a new encryption key...\e[0m"
SOURCEBOT_ENCRYPTION_KEY=$(openssl rand -base64 24)
echo "SOURCEBOT_ENCRYPTION_KEY=\"$SOURCEBOT_ENCRYPTION_KEY\"" >> "$DATA_CACHE_DIR/.secret"
fi
set -a
. "$DATA_CACHE_DIR/.secret"
set +a
fi
# @see : https://authjs.dev/getting-started/deployment#auth_secret
if [ -z "$AUTH_SECRET" ]; then
echo -e "\e[33m[Warning] AUTH_SECRET is not set.\e[0m"
if [ -f "$DATA_CACHE_DIR/.authjs-secret" ]; then
echo -e "\e[34m[Info] Loading environment variables from $DATA_CACHE_DIR/.authjs-secret\e[0m"
else
echo -e "\e[34m[Info] Generating a new encryption key...\e[0m"
AUTH_SECRET=$(openssl rand -base64 33)
echo "AUTH_SECRET=\"$AUTH_SECRET\"" >> "$DATA_CACHE_DIR/.authjs-secret"
fi
set -a
. "$DATA_CACHE_DIR/.authjs-secret"
set +a
fi
if [ -z "$AUTH_URL" ]; then
echo -e "\e[33m[Warning] AUTH_URL is not set.\e[0m"
export AUTH_URL="http://localhost:3000"
fi
# In order to detect if this is the first run, we create a `.installed` file in # In order to detect if this is the first run, we create a `.installed` file in
# the cache directory. # the cache directory.
FIRST_RUN_FILE="$DATA_CACHE_DIR/.installedv2" FIRST_RUN_FILE="$DATA_CACHE_DIR/.installedv3"
if [ ! -f "$FIRST_RUN_FILE" ]; then if [ ! -f "$FIRST_RUN_FILE" ]; then
touch "$FIRST_RUN_FILE" touch "$FIRST_RUN_FILE"
@ -29,13 +89,13 @@ if [ ! -f "$FIRST_RUN_FILE" ]; then
# If this is our first run, send a `install` event to PostHog # If this is our first run, send a `install` event to PostHog
# (if telemetry is enabled) # (if telemetry is enabled)
if [ -z "$SOURCEBOT_TELEMETRY_DISABLED" ]; then if [ "$SOURCEBOT_TELEMETRY_DISABLED" = "false" ]; then
if ! ( curl -L --output /dev/null --silent --fail --header "Content-Type: application/json" -d '{ if ! ( curl -L --output /dev/null --silent --fail --header "Content-Type: application/json" -d '{
"api_key": "'"$POSTHOG_PAPIK"'", "api_key": "'"$NEXT_PUBLIC_POSTHOG_PAPIK"'",
"event": "install", "event": "install",
"distinct_id": "'"$SOURCEBOT_INSTALL_ID"'", "distinct_id": "'"$SOURCEBOT_INSTALL_ID"'",
"properties": { "properties": {
"sourcebot_version": "'"$SOURCEBOT_VERSION"'" "sourcebot_version": "'"$NEXT_PUBLIC_SOURCEBOT_VERSION"'"
} }
}' https://us.i.posthog.com/capture/ ) then }' https://us.i.posthog.com/capture/ ) then
echo -e "\e[33m[Warning] Failed to send install event.\e[0m" echo -e "\e[33m[Warning] Failed to send install event.\e[0m"
@ -46,17 +106,17 @@ else
PREVIOUS_VERSION=$(cat "$FIRST_RUN_FILE" | jq -r '.version') PREVIOUS_VERSION=$(cat "$FIRST_RUN_FILE" | jq -r '.version')
# If the version has changed, we assume an upgrade has occurred. # If the version has changed, we assume an upgrade has occurred.
if [ "$PREVIOUS_VERSION" != "$SOURCEBOT_VERSION" ]; then if [ "$PREVIOUS_VERSION" != "$NEXT_PUBLIC_SOURCEBOT_VERSION" ]; then
echo -e "\e[34m[Info] Upgraded from version $PREVIOUS_VERSION to $SOURCEBOT_VERSION\e[0m" echo -e "\e[34m[Info] Upgraded from version $PREVIOUS_VERSION to $NEXT_PUBLIC_SOURCEBOT_VERSION\e[0m"
if [ -z "$SOURCEBOT_TELEMETRY_DISABLED" ]; then if [ "$SOURCEBOT_TELEMETRY_DISABLED" = "false" ]; then
if ! ( curl -L --output /dev/null --silent --fail --header "Content-Type: application/json" -d '{ if ! ( curl -L --output /dev/null --silent --fail --header "Content-Type: application/json" -d '{
"api_key": "'"$POSTHOG_PAPIK"'", "api_key": "'"$NEXT_PUBLIC_POSTHOG_PAPIK"'",
"event": "upgrade", "event": "upgrade",
"distinct_id": "'"$SOURCEBOT_INSTALL_ID"'", "distinct_id": "'"$SOURCEBOT_INSTALL_ID"'",
"properties": { "properties": {
"from_version": "'"$PREVIOUS_VERSION"'", "from_version": "'"$PREVIOUS_VERSION"'",
"to_version": "'"$SOURCEBOT_VERSION"'" "to_version": "'"$NEXT_PUBLIC_SOURCEBOT_VERSION"'"
} }
}' https://us.i.posthog.com/capture/ ) then }' https://us.i.posthog.com/capture/ ) then
echo -e "\e[33m[Warning] Failed to send upgrade event.\e[0m" echo -e "\e[33m[Warning] Failed to send upgrade event.\e[0m"
@ -65,94 +125,34 @@ else
fi fi
fi fi
echo "{\"version\": \"$SOURCEBOT_VERSION\", \"install_id\": \"$SOURCEBOT_INSTALL_ID\"}" > "$FIRST_RUN_FILE" echo "{\"version\": \"$NEXT_PUBLIC_SOURCEBOT_VERSION\", \"install_id\": \"$SOURCEBOT_INSTALL_ID\"}" > "$FIRST_RUN_FILE"
# Fallback to sample config if a config does not exist
if echo "$CONFIG_PATH" | grep -qE '^https?://'; then
if ! curl --output /dev/null --silent --head --fail "$CONFIG_PATH"; then
echo -e "\e[33m[Warning] Remote config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m"
CONFIG_PATH="./default-config.json"
fi
elif [ ! -f "$CONFIG_PATH" ]; then
echo -e "\e[33m[Warning] Config file at '$CONFIG_PATH' not found. Falling back on sample config.\e[0m"
CONFIG_PATH="./default-config.json"
fi
echo -e "\e[34m[Info] Using config file at: '$CONFIG_PATH'.\e[0m" # Start the database and wait for it to be ready before starting any other service
if [ "$DATABASE_URL" = "postgresql://postgres@localhost:5432/sourcebot" ]; then
# Update NextJs public env variables w/o requiring a rebuild. su postgres -c "postgres -D $DATABASE_DATA_DIR" &
# @see: https://phase.dev/blog/nextjs-public-runtime-variables/ until pg_isready -h localhost -p 5432 -U postgres; do
{ echo -e "\e[34m[Info] Waiting for the database to be ready...\e[0m"
# Infer NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED if it is not set sleep 1
if [ -z "$NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED" ] && [ ! -z "$SOURCEBOT_TELEMETRY_DISABLED" ]; then
export NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED="$SOURCEBOT_TELEMETRY_DISABLED"
fi
# Infer NEXT_PUBLIC_SOURCEBOT_VERSION if it is not set
if [ -z "$NEXT_PUBLIC_SOURCEBOT_VERSION" ] && [ ! -z "$SOURCEBOT_VERSION" ]; then
export NEXT_PUBLIC_SOURCEBOT_VERSION="$SOURCEBOT_VERSION"
fi
# Infer NEXT_PUBLIC_PUBLIC_SEARCH_DEMO if it is not set
if [ -z "$NEXT_PUBLIC_PUBLIC_SEARCH_DEMO" ] && [ ! -z "$PUBLIC_SEARCH_DEMO" ]; then
export NEXT_PUBLIC_PUBLIC_SEARCH_DEMO="$PUBLIC_SEARCH_DEMO"
fi
# Always infer NEXT_PUBLIC_POSTHOG_PAPIK
export NEXT_PUBLIC_POSTHOG_PAPIK="$POSTHOG_PAPIK"
# Iterate over all .js files in .next & public, making substitutions for the `BAKED_` sentinal values
# with their actual desired runtime value.
find /app/packages/web/public /app/packages/web/.next -type f -name "*.js" |
while read file; do
sed -i "s|BAKED_NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED|${NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED}|g" "$file"
sed -i "s|BAKED_NEXT_PUBLIC_SOURCEBOT_VERSION|${NEXT_PUBLIC_SOURCEBOT_VERSION}|g" "$file"
sed -i "s|BAKED_NEXT_PUBLIC_POSTHOG_PAPIK|${NEXT_PUBLIC_POSTHOG_PAPIK}|g" "$file"
sed -i "s|BAKED_NEXT_PUBLIC_PUBLIC_SEARCH_DEMO|${NEXT_PUBLIC_PUBLIC_SEARCH_DEMO}|g" "$file"
done done
}
# Check if the database already exists, and create it if it dne
EXISTING_DB=$(psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname = 'sourcebot'")
# Update specifically NEXT_PUBLIC_DOMAIN_SUB_PATH w/o requiring a rebuild. if [ "$EXISTING_DB" = "1" ]; then
# Ultimately, the DOMAIN_SUB_PATH sets the `basePath` param in the next.config.mjs. echo "Database 'sourcebot' already exists; skipping creation."
# Similar to above, we pass in a `BAKED_` sentinal value into next.config.mjs at build else
# time. Unlike above, the `basePath` configuration is set in files other than just javascript echo "Creating database 'sourcebot'..."
# code (e.g., manifest files, css files, etc.), so this section has subtle differences. psql -U postgres -c "CREATE DATABASE \"sourcebot\""
#
# @see: https://nextjs.org/docs/app/api-reference/next-config-js/basePath
# @see: https://phase.dev/blog/nextjs-public-runtime-variables/
{
if [ ! -z "$DOMAIN_SUB_PATH" ]; then
# If the sub-path is "/", this creates problems with certain replacements. For example:
# /BAKED_NEXT_PUBLIC_DOMAIN_SUB_PATH/_next/image -> //_next/image (notice the double slash...)
# To get around this, we default to an empty sub-path, which is the default when no sub-path is defined.
if [ "$DOMAIN_SUB_PATH" = "/" ]; then
DOMAIN_SUB_PATH=""
# Otherwise, we need to ensure that the sub-path starts with a slash, since this is a requirement
# for the basePath property. For example, assume DOMAIN_SUB_PATH=/bot, then:
# /BAKED_NEXT_PUBLIC_DOMAIN_SUB_PATH/_next/image -> /bot/_next/image
elif [[ ! "$DOMAIN_SUB_PATH" =~ ^/ ]]; then
DOMAIN_SUB_PATH="/$DOMAIN_SUB_PATH"
fi fi
fi fi
if [ ! -z "$DOMAIN_SUB_PATH" ]; then # Run a Database migration
echo -e "\e[34m[Info] DOMAIN_SUB_PATH was set to "$DOMAIN_SUB_PATH". Overriding default path.\e[0m" echo -e "\e[34m[Info] Running database migration...\e[0m"
fi yarn workspace @sourcebot/db prisma:migrate:prod
# Always set NEXT_PUBLIC_DOMAIN_SUB_PATH to DOMAIN_SUB_PATH (even if it is empty!!)
export NEXT_PUBLIC_DOMAIN_SUB_PATH="$DOMAIN_SUB_PATH"
# Iterate over _all_ files in the web directory, making substitutions for the `BAKED_` sentinal values
# with their actual desired runtime value.
find /app/packages/web -type f |
while read file; do
# @note: the leading "/" is required here as it is included at build time. See Dockerfile.
sed -i "s|/BAKED_NEXT_PUBLIC_DOMAIN_SUB_PATH|${NEXT_PUBLIC_DOMAIN_SUB_PATH}|g" "$file"
done
}
# Create the log directory
mkdir -p /var/log/sourcebot
# Run supervisord # Run supervisord
exec supervisord -c /etc/supervisor/conf.d/supervisord.conf exec supervisord -c /etc/supervisor/conf.d/supervisord.conf

31
grafana.alloy Normal file
View file

@ -0,0 +1,31 @@
prometheus.scrape "local_app" {
targets = [
{
__address__ = "localhost:6070",
},
{
__address__ = "localhost:3060",
},
]
metrics_path = "/metrics"
scrape_timeout = "500ms"
scrape_interval = "15s"
job_name = sys.env("GRAFANA_ENVIRONMENT")
forward_to = [
prometheus.remote_write.grafana_cloud.receiver,
]
}
prometheus.remote_write "grafana_cloud" {
endpoint {
url = sys.env("GRAFANA_ENDPOINT")
basic_auth {
username = sys.env("GRAFANA_PROM_USERNAME")
password = sys.env("GRAFANA_PROM_PASSWORD")
}
}
}

View file

@ -4,14 +4,21 @@
"packages/*" "packages/*"
], ],
"scripts": { "scripts": {
"build": "yarn workspaces run build", "build": "cross-env SKIP_ENV_VALIDATION=1 yarn workspaces run build",
"test": "yarn workspaces run test", "test": "yarn workspaces run test",
"dev": "npm-run-all --print-label --parallel dev:zoekt dev:backend dev:web", "dev": "yarn dev:prisma:migrate:dev && npm-run-all --print-label --parallel dev:zoekt dev:backend dev:web",
"dev:zoekt": "export PATH=\"$PWD/bin:$PATH\" && zoekt-webserver -index .sourcebot/index -rpc", "with-env": "cross-env PATH=\"$PWD/bin:$PATH\" dotenv -e .env.development -c --",
"dev:backend": "yarn workspace @sourcebot/backend dev:watch", "dev:zoekt": "yarn with-env zoekt-webserver -index .sourcebot/index -rpc",
"dev:web": "yarn workspace @sourcebot/web dev" "dev:backend": "yarn with-env yarn workspace @sourcebot/backend dev:watch",
"dev:web": "yarn with-env yarn workspace @sourcebot/web dev",
"dev:prisma:migrate:dev": "yarn with-env yarn workspace @sourcebot/db prisma:migrate:dev",
"dev:prisma:studio": "yarn with-env yarn workspace @sourcebot/db prisma:studio",
"dev:prisma:migrate:reset": "yarn with-env yarn workspace @sourcebot/db prisma:migrate:reset"
}, },
"devDependencies": { "devDependencies": {
"cross-env": "^7.0.3",
"dotenv-cli": "^8.0.0",
"npm-run-all": "^4.1.5" "npm-run-all": "^4.1.5"
} },
"packageManager": "yarn@4.7.0"
} }

View file

@ -1 +0,0 @@
POSTHOG_HOST=https://us.i.posthog.com

View file

@ -1,3 +1,4 @@
dist/ dist/
!.env !.env
# Sentry Config File
.sentryclirc .sentryclirc

View file

@ -5,16 +5,16 @@
"main": "index.js", "main": "index.js",
"type": "module", "type": "module",
"scripts": { "scripts": {
"dev:watch": "yarn generate:types && tsc-watch --preserveWatchOutput --onSuccess \"yarn dev --configPath ../../config.json --cacheDir ../../.sourcebot\"", "dev:watch": "tsc-watch --preserveWatchOutput --onSuccess \"yarn dev --cacheDir ../../.sourcebot\"",
"dev": "export PATH=\"$PWD/../../bin:$PATH\" && export CTAGS_COMMAND=ctags && node ./dist/index.js", "dev": "node ./dist/index.js",
"build": "yarn generate:types && tsc", "build": "tsc",
"generate:types": "tsx tools/generateTypes.ts", "test": "cross-env SKIP_ENV_VALIDATION=1 vitest --config ./vitest.config.ts"
"test": "vitest --config ./vitest.config.ts"
}, },
"devDependencies": { "devDependencies": {
"@types/argparse": "^2.0.16", "@types/argparse": "^2.0.16",
"@types/micromatch": "^4.0.9", "@types/micromatch": "^4.0.9",
"@types/node": "^22.7.5", "@types/node": "^22.7.5",
"cross-env": "^7.0.3",
"json-schema-to-typescript": "^15.0.2", "json-schema-to-typescript": "^15.0.2",
"tsc-watch": "^6.2.0", "tsc-watch": "^6.2.0",
"tsx": "^4.19.1", "tsx": "^4.19.1",
@ -23,17 +23,34 @@
}, },
"dependencies": { "dependencies": {
"@gitbeaker/rest": "^40.5.1", "@gitbeaker/rest": "^40.5.1",
"@logtail/node": "^0.5.2",
"@logtail/winston": "^0.5.2",
"@octokit/rest": "^21.0.2", "@octokit/rest": "^21.0.2",
"@sentry/cli": "^2.42.2",
"@sentry/node": "^9.3.0",
"@sentry/profiling-node": "^9.3.0",
"@sourcebot/crypto": "workspace:*",
"@sourcebot/db": "workspace:*",
"@sourcebot/error": "workspace:*",
"@sourcebot/schemas": "workspace:*",
"@t3-oss/env-core": "^0.12.0",
"@types/express": "^5.0.0",
"ajv": "^8.17.1",
"argparse": "^2.0.1", "argparse": "^2.0.1",
"bullmq": "^5.34.10",
"cross-fetch": "^4.0.0", "cross-fetch": "^4.0.0",
"dotenv": "^16.4.5", "dotenv": "^16.4.5",
"express": "^4.21.2",
"gitea-js": "^1.22.0", "gitea-js": "^1.22.0",
"glob": "^11.0.0", "glob": "^11.0.0",
"ioredis": "^5.4.2",
"lowdb": "^7.0.1", "lowdb": "^7.0.1",
"micromatch": "^4.0.8", "micromatch": "^4.0.8",
"posthog-node": "^4.2.1", "posthog-node": "^4.2.1",
"prom-client": "^15.1.3",
"simple-git": "^3.27.0", "simple-git": "^3.27.0",
"strip-json-comments": "^5.0.1", "strip-json-comments": "^5.0.1",
"winston": "^3.15.0" "winston": "^3.15.0",
"zod": "^3.24.2"
} }
} }

View file

@ -0,0 +1,318 @@
import { Connection, ConnectionSyncStatus, PrismaClient, Prisma } from "@sourcebot/db";
import { Job, Queue, Worker } from 'bullmq';
import { Settings } from "./types.js";
import { ConnectionConfig } from "@sourcebot/schemas/v3/connection.type";
import { createLogger } from "./logger.js";
import { Redis } from 'ioredis';
import { RepoData, compileGithubConfig, compileGitlabConfig, compileGiteaConfig, compileGerritConfig } from "./repoCompileUtils.js";
import { BackendError, BackendException } from "@sourcebot/error";
import { captureEvent } from "./posthog.js";
import { env } from "./env.js";
import * as Sentry from "@sentry/node";
interface IConnectionManager {
scheduleConnectionSync: (connection: Connection) => Promise<void>;
registerPollingCallback: () => void;
dispose: () => void;
}
const QUEUE_NAME = 'connectionSyncQueue';
type JobPayload = {
connectionId: number,
orgId: number,
config: ConnectionConfig,
};
type JobResult = {
repoCount: number,
}
export class ConnectionManager implements IConnectionManager {
private worker: Worker;
private queue: Queue<JobPayload>;
private logger = createLogger('ConnectionManager');
constructor(
private db: PrismaClient,
private settings: Settings,
redis: Redis,
) {
this.queue = new Queue<JobPayload>(QUEUE_NAME, {
connection: redis,
});
this.worker = new Worker(QUEUE_NAME, this.runSyncJob.bind(this), {
connection: redis,
concurrency: this.settings.maxConnectionSyncJobConcurrency,
});
this.worker.on('completed', this.onSyncJobCompleted.bind(this));
this.worker.on('failed', this.onSyncJobFailed.bind(this));
}
public async scheduleConnectionSync(connection: Connection) {
await this.db.$transaction(async (tx) => {
await tx.connection.update({
where: { id: connection.id },
data: { syncStatus: ConnectionSyncStatus.IN_SYNC_QUEUE },
});
const connectionConfig = connection.config as unknown as ConnectionConfig;
await this.queue.add('connectionSyncJob', {
connectionId: connection.id,
orgId: connection.orgId,
config: connectionConfig,
});
this.logger.info(`Added job to queue for connection ${connection.id}`);
}).catch((err: unknown) => {
this.logger.error(`Failed to add job to queue for connection ${connection.id}: ${err}`);
});
}
public async registerPollingCallback() {
setInterval(async () => {
const connections = await this.db.connection.findMany({
where: {
syncStatus: ConnectionSyncStatus.SYNC_NEEDED,
}
});
for (const connection of connections) {
await this.scheduleConnectionSync(connection);
}
}, this.settings.resyncConnectionPollingIntervalMs);
}
private async runSyncJob(job: Job<JobPayload>): Promise<JobResult> {
const { config, orgId } = job.data;
// @note: We aren't actually doing anything with this atm.
const abortController = new AbortController();
const connection = await this.db.connection.findUnique({
where: {
id: job.data.connectionId,
},
});
if (!connection) {
const e = new BackendException(BackendError.CONNECTION_SYNC_CONNECTION_NOT_FOUND, {
message: `Connection ${job.data.connectionId} not found`,
});
Sentry.captureException(e);
throw e;
}
// Reset the syncStatusMetadata to an empty object at the start of the sync job
await this.db.connection.update({
where: {
id: job.data.connectionId,
},
data: {
syncStatus: ConnectionSyncStatus.SYNCING,
syncStatusMetadata: {}
}
})
let result: {
repoData: RepoData[],
notFound: {
users: string[],
orgs: string[],
repos: string[],
}
} = {
repoData: [],
notFound: {
users: [],
orgs: [],
repos: [],
}
};
try {
result = await (async () => {
switch (config.type) {
case 'github': {
return await compileGithubConfig(config, job.data.connectionId, orgId, this.db, abortController);
}
case 'gitlab': {
return await compileGitlabConfig(config, job.data.connectionId, orgId, this.db);
}
case 'gitea': {
return await compileGiteaConfig(config, job.data.connectionId, orgId, this.db);
}
case 'gerrit': {
return await compileGerritConfig(config, job.data.connectionId, orgId);
}
}
})();
} catch (err) {
this.logger.error(`Failed to compile repo data for connection ${job.data.connectionId}: ${err}`);
Sentry.captureException(err);
if (err instanceof BackendException) {
throw err;
} else {
throw new BackendException(BackendError.CONNECTION_SYNC_SYSTEM_ERROR, {
message: `Failed to compile repo data for connection ${job.data.connectionId}`,
});
}
}
let { repoData, notFound } = result;
// Push the information regarding not found users, orgs, and repos to the connection's syncStatusMetadata. Note that
// this won't be overwritten even if the connection job fails
await this.db.connection.update({
where: {
id: job.data.connectionId,
},
data: {
syncStatusMetadata: { notFound }
}
});
// Filter out any duplicates by external_id and external_codeHostUrl.
repoData = repoData.filter((repo, index, self) => {
return index === self.findIndex(r =>
r.external_id === repo.external_id &&
r.external_codeHostUrl === repo.external_codeHostUrl
);
})
// @note: to handle orphaned Repos we delete all RepoToConnection records for this connection,
// and then recreate them when we upsert the repos. For example, if a repo is no-longer
// captured by the connection's config (e.g., it was deleted, marked archived, etc.), it won't
// appear in the repoData array above, and so the RepoToConnection record won't be re-created.
// Repos that have no RepoToConnection records are considered orphaned and can be deleted.
await this.db.$transaction(async (tx) => {
const deleteStart = performance.now();
await tx.connection.update({
where: {
id: job.data.connectionId,
},
data: {
repos: {
deleteMany: {}
}
}
});
const deleteDuration = performance.now() - deleteStart;
this.logger.info(`Deleted all RepoToConnection records for connection ${job.data.connectionId} in ${deleteDuration}ms`);
const totalUpsertStart = performance.now();
for (const repo of repoData) {
const upsertStart = performance.now();
await tx.repo.upsert({
where: {
external_id_external_codeHostUrl_orgId: {
external_id: repo.external_id,
external_codeHostUrl: repo.external_codeHostUrl,
orgId: orgId,
}
},
update: repo,
create: repo,
})
const upsertDuration = performance.now() - upsertStart;
this.logger.info(`Upserted repo ${repo.external_id} in ${upsertDuration}ms`);
}
const totalUpsertDuration = performance.now() - totalUpsertStart;
this.logger.info(`Upserted ${repoData.length} repos in ${totalUpsertDuration}ms`);
}, { timeout: env.CONNECTION_MANAGER_UPSERT_TIMEOUT_MS });
return {
repoCount: repoData.length,
};
}
private async onSyncJobCompleted(job: Job<JobPayload>, result: JobResult) {
this.logger.info(`Connection sync job ${job.id} completed`);
const { connectionId } = job.data;
let syncStatusMetadata: Record<string, unknown> = (await this.db.connection.findUnique({
where: { id: connectionId },
select: { syncStatusMetadata: true }
}))?.syncStatusMetadata as Record<string, unknown> ?? {};
const { notFound } = syncStatusMetadata as { notFound: {
users: string[],
orgs: string[],
repos: string[],
}};
await this.db.connection.update({
where: {
id: connectionId,
},
data: {
syncStatus:
notFound.users.length > 0 ||
notFound.orgs.length > 0 ||
notFound.repos.length > 0 ? ConnectionSyncStatus.SYNCED_WITH_WARNINGS : ConnectionSyncStatus.SYNCED,
syncedAt: new Date()
}
})
captureEvent('backend_connection_sync_job_completed', {
connectionId: connectionId,
repoCount: result.repoCount,
});
}
private async onSyncJobFailed(job: Job<JobPayload> | undefined, err: unknown) {
this.logger.info(`Connection sync job failed with error: ${err}`);
Sentry.captureException(err, {
tags: {
connectionid: job?.data.connectionId,
jobId: job?.id,
queue: QUEUE_NAME,
}
});
if (job) {
const { connectionId } = job.data;
captureEvent('backend_connection_sync_job_failed', {
connectionId: connectionId,
error: err instanceof BackendException ? err.code : 'UNKNOWN',
});
// We may have pushed some metadata during the execution of the job, so we make sure to not overwrite the metadata here
let syncStatusMetadata: Record<string, unknown> = (await this.db.connection.findUnique({
where: { id: connectionId },
select: { syncStatusMetadata: true }
}))?.syncStatusMetadata as Record<string, unknown> ?? {};
if (err instanceof BackendException) {
syncStatusMetadata = {
...syncStatusMetadata,
error: err.code,
...err.metadata,
}
} else {
syncStatusMetadata = {
...syncStatusMetadata,
error: 'UNKNOWN',
}
}
await this.db.connection.update({
where: {
id: connectionId,
},
data: {
syncStatus: ConnectionSyncStatus.FAILED,
syncedAt: new Date(),
syncStatusMetadata: syncStatusMetadata as Prisma.InputJsonValue,
}
});
}
}
public dispose() {
this.worker.close();
this.queue.close();
}
}

View file

@ -0,0 +1,47 @@
import * as Sentry from "@sentry/node";
type ValidResult<T> = {
type: 'valid';
data: T[];
};
type NotFoundResult = {
type: 'notFound';
value: string;
};
type CustomResult<T> = ValidResult<T> | NotFoundResult;
export function processPromiseResults<T>(
results: PromiseSettledResult<CustomResult<T>>[],
): {
validItems: T[];
notFoundItems: string[];
} {
const validItems: T[] = [];
const notFoundItems: string[] = [];
results.forEach(result => {
if (result.status === 'fulfilled') {
const value = result.value;
if (value.type === 'valid') {
validItems.push(...value.data);
} else {
notFoundItems.push(value.value);
}
}
});
return {
validItems,
notFoundItems,
};
}
export function throwIfAnyFailed<T>(results: PromiseSettledResult<T>[]) {
const failedResult = results.find(result => result.status === 'rejected');
if (failedResult) {
Sentry.captureException(failedResult.reason);
throw failedResult.reason;
}
}

View file

@ -6,7 +6,12 @@ import { Settings } from "./types.js";
export const DEFAULT_SETTINGS: Settings = { export const DEFAULT_SETTINGS: Settings = {
maxFileSize: 2 * 1024 * 1024, // 2MB in bytes maxFileSize: 2 * 1024 * 1024, // 2MB in bytes
maxTrigramCount: 20000, maxTrigramCount: 20000,
autoDeleteStaleRepos: true, reindexIntervalMs: 1000 * 60 * 60, // 1 hour
reindexInterval: 1000 * 60 * 60, // 1 hour in milliseconds resyncConnectionPollingIntervalMs: 1000 * 1, // 1 second
resyncInterval: 1000 * 60 * 60 * 24, // 1 day in milliseconds reindexRepoPollingIntervalMs: 1000 * 1, // 1 second
maxConnectionSyncJobConcurrency: 8,
maxRepoIndexingJobConcurrency: 8,
maxRepoGarbageCollectionJobConcurrency: 8,
repoGarbageCollectionGracePeriodMs: 10 * 1000, // 10 seconds
repoIndexTimeoutMs: 1000 * 60 * 60 * 2, // 2 hours
} }

View file

@ -1,125 +0,0 @@
import { expect, test } from 'vitest';
import { DEFAULT_DB_DATA, migration_addDeleteStaleRepos, migration_addMaxFileSize, migration_addReindexInterval, migration_addResyncInterval, migration_addSettings, Schema } from './db';
import { DEFAULT_SETTINGS } from './constants';
import { DeepPartial } from './types';
import { Low } from 'lowdb';
class InMemoryAdapter<T> {
private data: T;
async read() {
return this.data;
}
async write(data: T) {
this.data = data;
}
}
export const createMockDB = (defaultData: Schema = DEFAULT_DB_DATA) => {
const db = new Low(new InMemoryAdapter<Schema>(), defaultData);
return db;
}
test('migration_addSettings adds the `settings` field with defaults if it does not exist', () => {
const schema: DeepPartial<Schema> = {};
const migratedSchema = migration_addSettings(schema as Schema);
expect(migratedSchema).toStrictEqual({
settings: DEFAULT_SETTINGS,
});
});
test('migration_addMaxFileSize adds the `maxFileSize` field with the default value if it does not exist', () => {
const schema: DeepPartial<Schema> = {
settings: {},
}
const migratedSchema = migration_addMaxFileSize(schema as Schema);
expect(migratedSchema).toStrictEqual({
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
}
});
});
test('migration_addMaxFileSize will throw if `settings` is not defined', () => {
const schema: DeepPartial<Schema> = {};
expect(() => migration_addMaxFileSize(schema as Schema)).toThrow();
});
test('migration_addDeleteStaleRepos adds the `autoDeleteStaleRepos` field with the default value if it does not exist', () => {
const schema: DeepPartial<Schema> = {
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
},
}
const migratedSchema = migration_addDeleteStaleRepos(schema as Schema);
expect(migratedSchema).toStrictEqual({
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
autoDeleteStaleRepos: DEFAULT_SETTINGS.autoDeleteStaleRepos,
}
});
});
test('migration_addReindexInterval adds the `reindexInterval` field with the default value if it does not exist', () => {
const schema: DeepPartial<Schema> = {
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
autoDeleteStaleRepos: DEFAULT_SETTINGS.autoDeleteStaleRepos,
},
}
const migratedSchema = migration_addReindexInterval(schema as Schema);
expect(migratedSchema).toStrictEqual({
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
autoDeleteStaleRepos: DEFAULT_SETTINGS.autoDeleteStaleRepos,
reindexInterval: DEFAULT_SETTINGS.reindexInterval,
}
});
});
test('migration_addReindexInterval preserves existing reindexInterval value if already set', () => {
const customInterval = 60;
const schema: DeepPartial<Schema> = {
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
reindexInterval: customInterval,
},
}
const migratedSchema = migration_addReindexInterval(schema as Schema);
expect(migratedSchema.settings.reindexInterval).toBe(customInterval);
});
test('migration_addResyncInterval adds the `resyncInterval` field with the default value if it does not exist', () => {
const schema: DeepPartial<Schema> = {
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
autoDeleteStaleRepos: DEFAULT_SETTINGS.autoDeleteStaleRepos,
},
}
const migratedSchema = migration_addResyncInterval(schema as Schema);
expect(migratedSchema).toStrictEqual({
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
autoDeleteStaleRepos: DEFAULT_SETTINGS.autoDeleteStaleRepos,
resyncInterval: DEFAULT_SETTINGS.resyncInterval,
}
});
});
test('migration_addResyncInterval preserves existing resyncInterval value if already set', () => {
const customInterval = 120;
const schema: DeepPartial<Schema> = {
settings: {
maxFileSize: DEFAULT_SETTINGS.maxFileSize,
resyncInterval: customInterval,
},
}
const migratedSchema = migration_addResyncInterval(schema as Schema);
expect(migratedSchema.settings.resyncInterval).toBe(customInterval);
});

View file

@ -1,123 +0,0 @@
import { JSONFilePreset } from "lowdb/node";
import { type Low } from "lowdb";
import { AppContext, Repository, Settings } from "./types.js";
import { DEFAULT_SETTINGS } from "./constants.js";
import { createLogger } from "./logger.js";
const logger = createLogger('db');
export type Schema = {
settings: Settings,
repos: {
[key: string]: Repository;
}
}
export const DEFAULT_DB_DATA: Schema = {
repos: {},
settings: DEFAULT_SETTINGS,
}
export type Database = Low<Schema>;
export const loadDB = async (ctx: AppContext): Promise<Database> => {
const db = await JSONFilePreset<Schema>(`${ctx.cachePath}/db.json`, DEFAULT_DB_DATA);
await applyMigrations(db);
return db;
}
export const updateRepository = async (repoId: string, data: Repository, db: Database) => {
db.data.repos[repoId] = {
...db.data.repos[repoId],
...data,
}
await db.write();
}
export const updateSettings = async (settings: Settings, db: Database) => {
db.data.settings = settings;
await db.write();
}
export const createRepository = async (repo: Repository, db: Database) => {
db.data.repos[repo.id] = repo;
await db.write();
}
export const applyMigrations = async (db: Database) => {
const log = (name: string) => {
logger.info(`Applying migration '${name}'`);
}
await db.update((schema) => {
// @NOTE: please ensure new migrations are added after older ones!
schema = migration_addSettings(schema, log);
schema = migration_addMaxFileSize(schema, log);
schema = migration_addDeleteStaleRepos(schema, log);
schema = migration_addReindexInterval(schema, log);
schema = migration_addResyncInterval(schema, log);
return schema;
});
}
/**
* @see: https://github.com/sourcebot-dev/sourcebot/pull/118
*/
export const migration_addSettings = (schema: Schema, log?: (name: string) => void) => {
if (!schema.settings) {
log?.("addSettings");
schema.settings = DEFAULT_SETTINGS;
}
return schema;
}
/**
* @see: https://github.com/sourcebot-dev/sourcebot/pull/118
*/
export const migration_addMaxFileSize = (schema: Schema, log?: (name: string) => void) => {
if (!schema.settings.maxFileSize) {
log?.("addMaxFileSize");
schema.settings.maxFileSize = DEFAULT_SETTINGS.maxFileSize;
}
return schema;
}
/**
* @see: https://github.com/sourcebot-dev/sourcebot/pull/128
*/
export const migration_addDeleteStaleRepos = (schema: Schema, log?: (name: string) => void) => {
if (schema.settings.autoDeleteStaleRepos === undefined) {
log?.("addDeleteStaleRepos");
schema.settings.autoDeleteStaleRepos = DEFAULT_SETTINGS.autoDeleteStaleRepos;
}
return schema;
}
/**
* @see: https://github.com/sourcebot-dev/sourcebot/pull/134
*/
export const migration_addReindexInterval = (schema: Schema, log?: (name: string) => void) => {
if (schema.settings.reindexInterval === undefined) {
log?.("addReindexInterval");
schema.settings.reindexInterval = DEFAULT_SETTINGS.reindexInterval;
}
return schema;
}
/**
* @see: https://github.com/sourcebot-dev/sourcebot/pull/134
*/
export const migration_addResyncInterval = (schema: Schema, log?: (name: string) => void) => {
if (schema.settings.resyncInterval === undefined) {
log?.("addResyncInterval");
schema.settings.resyncInterval = DEFAULT_SETTINGS.resyncInterval;
}
return schema;
}

View file

@ -0,0 +1,52 @@
import { createEnv } from "@t3-oss/env-core";
import { z } from "zod";
import dotenv from 'dotenv';
// Booleans are specified as 'true' or 'false' strings.
const booleanSchema = z.enum(["true", "false"]);
// Numbers are treated as strings in .env files.
// coerce helps us convert them to numbers.
// @see: https://zod.dev/?id=coercion-for-primitives
const numberSchema = z.coerce.number();
dotenv.config({
path: './.env',
});
dotenv.config({
path: './.env.local',
override: true
});
export const env = createEnv({
server: {
SOURCEBOT_ENCRYPTION_KEY: z.string(),
SOURCEBOT_LOG_LEVEL: z.enum(["info", "debug", "warn", "error"]).default("info"),
SOURCEBOT_TELEMETRY_DISABLED: booleanSchema.default("false"),
SOURCEBOT_INSTALL_ID: z.string().default("unknown"),
NEXT_PUBLIC_SOURCEBOT_VERSION: z.string().default("unknown"),
NEXT_PUBLIC_POSTHOG_PAPIK: z.string().optional(),
FALLBACK_GITHUB_CLOUD_TOKEN: z.string().optional(),
FALLBACK_GITLAB_CLOUD_TOKEN: z.string().optional(),
FALLBACK_GITEA_CLOUD_TOKEN: z.string().optional(),
REDIS_URL: z.string().url().default("redis://localhost:6379"),
NEXT_PUBLIC_SENTRY_BACKEND_DSN: z.string().optional(),
NEXT_PUBLIC_SENTRY_ENVIRONMENT: z.string().optional(),
LOGTAIL_TOKEN: z.string().optional(),
LOGTAIL_HOST: z.string().url().optional(),
DATABASE_URL: z.string().url().default("postgresql://postgres:postgres@localhost:5432/postgres"),
CONFIG_PATH: z.string().optional(),
CONNECTION_MANAGER_UPSERT_TIMEOUT_MS: numberSchema.default(10000),
},
runtimeEnv: process.env,
emptyStringAsUndefined: true,
skipValidation: process.env.SKIP_ENV_VALIDATION === "1",
});

View file

@ -1,23 +0,0 @@
import dotenv from 'dotenv';
export const getEnv = (env: string | undefined, defaultValue?: string) => {
return env ?? defaultValue;
}
export const getEnvBoolean = (env: string | undefined, defaultValue: boolean) => {
if (!env) {
return defaultValue;
}
return env === 'true' || env === '1';
}
dotenv.config({
path: './.env',
});
export const SOURCEBOT_LOG_LEVEL = getEnv(process.env.SOURCEBOT_LOG_LEVEL, 'info')!;
export const SOURCEBOT_TELEMETRY_DISABLED = getEnvBoolean(process.env.SOURCEBOT_TELEMETRY_DISABLED, false)!;
export const SOURCEBOT_INSTALL_ID = getEnv(process.env.SOURCEBOT_INSTALL_ID, 'unknown')!;
export const SOURCEBOT_VERSION = getEnv(process.env.SOURCEBOT_VERSION, 'unknown')!;
export const POSTHOG_PAPIK = getEnv(process.env.POSTHOG_PAPIK);
export const POSTHOG_HOST = getEnv(process.env.POSTHOG_HOST);

View file

@ -1,9 +1,11 @@
import fetch from 'cross-fetch'; import fetch from 'cross-fetch';
import { GerritConfig } from './schemas/v2.js'; import { GerritConfig } from "@sourcebot/schemas/v2/index.type"
import { AppContext, GitRepository } from './types.js';
import { createLogger } from './logger.js'; import { createLogger } from './logger.js';
import path from 'path'; import micromatch from "micromatch";
import { measure, marshalBool, excludeReposByName, includeReposByName } from './utils.js'; import { measure, fetchWithRetry } from './utils.js';
import { BackendError } from '@sourcebot/error';
import { BackendException } from '@sourcebot/error';
import * as Sentry from "@sentry/node";
// https://gerrit-review.googlesource.com/Documentation/rest-api.html // https://gerrit-review.googlesource.com/Documentation/rest-api.html
interface GerritProjects { interface GerritProjects {
@ -16,6 +18,13 @@ interface GerritProjectInfo {
web_links?: GerritWebLink[]; web_links?: GerritWebLink[];
} }
interface GerritProject {
name: string;
id: string;
state?: string;
web_links?: GerritWebLink[];
}
interface GerritWebLink { interface GerritWebLink {
name: string; name: string;
url: string; url: string;
@ -23,86 +32,55 @@ interface GerritWebLink {
const logger = createLogger('Gerrit'); const logger = createLogger('Gerrit');
export const getGerritReposFromConfig = async (config: GerritConfig, ctx: AppContext): Promise<GitRepository[]> => { export const getGerritReposFromConfig = async (config: GerritConfig): Promise<GerritProject[]> => {
const url = config.url.endsWith('/') ? config.url : `${config.url}/`; const url = config.url.endsWith('/') ? config.url : `${config.url}/`;
const hostname = new URL(config.url).hostname; const hostname = new URL(config.url).hostname;
const { durationMs, data: projects } = await measure(async () => { let { durationMs, data: projects } = await measure(async () => {
try { try {
return fetchAllProjects(url) const fetchFn = () => fetchAllProjects(url);
return fetchWithRetry(fetchFn, `projects from ${url}`, logger);
} catch (err) { } catch (err) {
Sentry.captureException(err);
if (err instanceof BackendException) {
throw err;
}
logger.error(`Failed to fetch projects from ${url}`, err); logger.error(`Failed to fetch projects from ${url}`, err);
return null; return null;
} }
}); });
if (!projects) { if (!projects) {
return []; const e = new Error(`Failed to fetch projects from ${url}`);
Sentry.captureException(e);
throw e;
} }
// exclude "All-Projects" and "All-Users" projects // exclude "All-Projects" and "All-Users" projects
delete projects['All-Projects']; const excludedProjects = ['All-Projects', 'All-Users', 'All-Avatars', 'All-Archived-Projects'];
delete projects['All-Users']; projects = projects.filter(project => !excludedProjects.includes(project.name));
delete projects['All-Avatars']
delete projects['All-Archived-Projects']
logger.debug(`Fetched ${Object.keys(projects).length} projects in ${durationMs}ms.`);
let repos: GitRepository[] = Object.keys(projects).map((projectName) => {
const project = projects[projectName];
let webUrl = "https://www.gerritcodereview.com/";
// Gerrit projects can have multiple web links; use the first one
if (project.web_links) {
const webLink = project.web_links[0];
if (webLink) {
webUrl = webLink.url;
}
}
const repoId = `${hostname}/${projectName}`;
const repoPath = path.resolve(path.join(ctx.reposPath, `${repoId}.git`));
const cloneUrl = `${url}${encodeURIComponent(projectName)}`;
return {
vcs: 'git',
codeHost: 'gerrit',
name: projectName,
id: repoId,
cloneUrl: cloneUrl,
path: repoPath,
isStale: false, // Gerrit projects are typically not stale
isFork: false, // Gerrit doesn't have forks in the same way as GitHub
isArchived: false,
gitConfigMetadata: {
// Gerrit uses Gitiles for web UI. This can sometimes be "browse" type in zoekt
'zoekt.web-url-type': 'gitiles',
'zoekt.web-url': webUrl,
'zoekt.name': repoId,
'zoekt.archived': marshalBool(false),
'zoekt.fork': marshalBool(false),
'zoekt.public': marshalBool(true), // Assuming projects are public; adjust as needed
},
branches: [],
tags: []
} satisfies GitRepository;
});
// include repos by glob if specified in config // include repos by glob if specified in config
if (config.projects) { if (config.projects) {
repos = includeReposByName(repos, config.projects); projects = projects.filter((project) => {
return micromatch.isMatch(project.name, config.projects!);
});
} }
if (config.exclude && config.exclude.projects) { if (config.exclude && config.exclude.projects) {
repos = excludeReposByName(repos, config.exclude.projects); projects = projects.filter((project) => {
return !micromatch.isMatch(project.name, config.exclude!.projects!);
});
} }
return repos; logger.debug(`Fetched ${Object.keys(projects).length} projects in ${durationMs}ms.`);
return projects;
}; };
const fetchAllProjects = async (url: string): Promise<GerritProjects> => { const fetchAllProjects = async (url: string): Promise<GerritProject[]> => {
const projectsEndpoint = `${url}projects/`; const projectsEndpoint = `${url}projects/`;
let allProjects: GerritProjects = {}; let allProjects: GerritProject[] = [];
let start = 0; // Start offset for pagination let start = 0; // Start offset for pagination
let hasMoreProjects = true; let hasMoreProjects = true;
@ -110,17 +88,43 @@ const fetchAllProjects = async (url: string): Promise<GerritProjects> => {
const endpointWithParams = `${projectsEndpoint}?S=${start}`; const endpointWithParams = `${projectsEndpoint}?S=${start}`;
logger.debug(`Fetching projects from Gerrit at ${endpointWithParams}`); logger.debug(`Fetching projects from Gerrit at ${endpointWithParams}`);
const response = await fetch(endpointWithParams); let response: Response;
try {
response = await fetch(endpointWithParams);
if (!response.ok) { if (!response.ok) {
throw new Error(`Failed to fetch projects from Gerrit: ${response.statusText}`); console.log(`Failed to fetch projects from Gerrit at ${endpointWithParams} with status ${response.status}`);
const e = new BackendException(BackendError.CONNECTION_SYNC_FAILED_TO_FETCH_GERRIT_PROJECTS, {
status: response.status,
});
Sentry.captureException(e);
throw e;
}
} catch (err) {
Sentry.captureException(err);
if (err instanceof BackendException) {
throw err;
}
const status = (err as any).code;
console.log(`Failed to fetch projects from Gerrit at ${endpointWithParams} with status ${status}`);
throw new BackendException(BackendError.CONNECTION_SYNC_FAILED_TO_FETCH_GERRIT_PROJECTS, {
status: status,
});
} }
const text = await response.text(); const text = await response.text();
const jsonText = text.replace(")]}'\n", ''); // Remove XSSI protection prefix const jsonText = text.replace(")]}'\n", ''); // Remove XSSI protection prefix
const data: GerritProjects = JSON.parse(jsonText); const data: GerritProjects = JSON.parse(jsonText);
// Merge the current batch of projects with allProjects // Add fetched projects to allProjects
Object.assign(allProjects, data); for (const [projectName, projectInfo] of Object.entries(data)) {
allProjects.push({
name: projectName,
id: projectInfo.id,
state: projectInfo.state,
web_links: projectInfo.web_links
})
}
// Check if there are more projects to fetch // Check if there are more projects to fetch
hasMoreProjects = Object.values(data).some( hasMoreProjects = Object.values(data).some(

View file

@ -1,48 +1,41 @@
import { GitRepository, AppContext } from './types.js';
import { simpleGit, SimpleGitProgressEvent } from 'simple-git'; import { simpleGit, SimpleGitProgressEvent } from 'simple-git';
import { existsSync } from 'fs';
import { createLogger } from './logger.js';
import { GitConfig } from './schemas/v2.js';
import path from 'path';
const logger = createLogger('git');
export const cloneRepository = async (repo: GitRepository, onProgress?: (event: SimpleGitProgressEvent) => void) => {
if (existsSync(repo.path)) {
logger.warn(`${repo.id} already exists. Skipping clone.`)
return;
}
export const cloneRepository = async (cloneURL: string, path: string, gitConfig?: Record<string, string>, onProgress?: (event: SimpleGitProgressEvent) => void) => {
const git = simpleGit({ const git = simpleGit({
progress: onProgress, progress: onProgress,
}); });
const gitConfig = Object.entries(repo.gitConfigMetadata ?? {}).flatMap( const configParams = Object.entries(gitConfig ?? {}).flatMap(
([key, value]) => ['--config', `${key}=${value}`] ([key, value]) => ['--config', `${key}=${value}`]
); );
try {
await git.clone( await git.clone(
repo.cloneUrl, cloneURL,
repo.path, path,
[ [
"--bare", "--bare",
...gitConfig ...configParams
] ]
); );
await git.cwd({ await git.cwd({
path: repo.path, path,
}).addConfig("remote.origin.fetch", "+refs/heads/*:refs/heads/*"); }).addConfig("remote.origin.fetch", "+refs/heads/*:refs/heads/*");
} catch (error) {
throw new Error(`Failed to clone repository`);
}
} }
export const fetchRepository = async (repo: GitRepository, onProgress?: (event: SimpleGitProgressEvent) => void) => { export const fetchRepository = async (path: string, onProgress?: (event: SimpleGitProgressEvent) => void) => {
const git = simpleGit({ const git = simpleGit({
progress: onProgress, progress: onProgress,
}); });
try {
await git.cwd({ await git.cwd({
path: repo.path, path: path,
}).fetch( }).fetch(
"origin", "origin",
[ [
@ -50,81 +43,24 @@ export const fetchRepository = async (repo: GitRepository, onProgress?: (event:
"--progress" "--progress"
] ]
); );
}
const isValidGitRepo = async (url: string): Promise<boolean> => {
const git = simpleGit();
try {
await git.listRemote([url]);
return true;
} catch (error) { } catch (error) {
logger.debug(`Error checking if ${url} is a valid git repo: ${error}`); throw new Error(`Failed to fetch repository ${path}`);
return false;
} }
} }
const stripProtocolAndGitSuffix = (url: string): string => { export const getBranches = async (path: string) => {
return url.replace(/^[a-zA-Z]+:\/\//, '').replace(/\.git$/, '');
}
const getRepoNameFromUrl = (url: string): string => {
const strippedUrl = stripProtocolAndGitSuffix(url);
return strippedUrl.split('/').slice(-2).join('/');
}
export const getGitRepoFromConfig = async (config: GitConfig, ctx: AppContext) => {
const repoValid = await isValidGitRepo(config.url);
if (!repoValid) {
logger.error(`Git repo provided in config with url ${config.url} is not valid`);
return null;
}
const cloneUrl = config.url;
const repoId = stripProtocolAndGitSuffix(cloneUrl);
const repoName = getRepoNameFromUrl(config.url);
const repoPath = path.resolve(path.join(ctx.reposPath, `${repoId}.git`));
const repo: GitRepository = {
vcs: 'git',
id: repoId,
name: repoName,
path: repoPath,
isStale: false,
cloneUrl: cloneUrl,
branches: [],
tags: [],
}
if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
const git = simpleGit(); const git = simpleGit();
const branchList = await git.listRemote(['--heads', cloneUrl]); const branches = await git.cwd({
const branches = branchList path,
.split('\n') }).branch();
.map(line => line.split('\t')[1])
.filter(Boolean)
.map(branch => branch.replace('refs/heads/', ''));
repo.branches = branches.filter(branch => return branches.all;
branchGlobs.some(glob => new RegExp(glob).test(branch))
);
} }
if (config.revisions.tags) { export const getTags = async (path: string) => {
const tagGlobs = config.revisions.tags;
const git = simpleGit(); const git = simpleGit();
const tagList = await git.listRemote(['--tags', cloneUrl]); const tags = await git.cwd({
const tags = tagList path,
.split('\n') }).tags();
.map(line => line.split('\t')[1]) return tags.all;
.filter(Boolean)
.map(tag => tag.replace('refs/tags/', ''));
repo.tags = tags.filter(tag =>
tagGlobs.some(glob => new RegExp(glob).test(tag))
);
}
}
return repo;
} }

View file

@ -1,160 +1,132 @@
import { Api, giteaApi, HttpResponse, Repository as GiteaRepository } from 'gitea-js'; import { Api, giteaApi, HttpResponse, Repository as GiteaRepository } from 'gitea-js';
import { GiteaConfig } from './schemas/v2.js'; import { GiteaConnectionConfig } from '@sourcebot/schemas/v3/gitea.type';
import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, getTokenFromConfig, marshalBool, measure } from './utils.js'; import { getTokenFromConfig, measure } from './utils.js';
import { AppContext, GitRepository } from './types.js';
import fetch from 'cross-fetch'; import fetch from 'cross-fetch';
import { createLogger } from './logger.js'; import { createLogger } from './logger.js';
import path from 'path';
import micromatch from 'micromatch'; import micromatch from 'micromatch';
import { PrismaClient } from '@sourcebot/db';
import { processPromiseResults, throwIfAnyFailed } from './connectionUtils.js';
import * as Sentry from "@sentry/node";
import { env } from './env.js';
const logger = createLogger('Gitea'); const logger = createLogger('Gitea');
const GITEA_CLOUD_HOSTNAME = "gitea.com";
export const getGiteaReposFromConfig = async (config: GiteaConfig, ctx: AppContext) => { export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig, orgId: number, db: PrismaClient) => {
const token = config.token ? getTokenFromConfig(config.token, ctx) : undefined; const hostname = config.url ?
new URL(config.url).hostname :
GITEA_CLOUD_HOSTNAME;
const token = config.token ?
await getTokenFromConfig(config.token, orgId, db, logger) :
hostname === GITEA_CLOUD_HOSTNAME ?
env.FALLBACK_GITEA_CLOUD_TOKEN :
undefined;
const api = giteaApi(config.url ?? 'https://gitea.com', { const api = giteaApi(config.url ?? 'https://gitea.com', {
token, token: token,
customFetch: fetch, customFetch: fetch,
}); });
let allRepos: GiteaRepository[] = []; let allRepos: GiteaRepository[] = [];
let notFound: {
users: string[],
orgs: string[],
repos: string[],
} = {
users: [],
orgs: [],
repos: [],
};
if (config.orgs) { if (config.orgs) {
const _repos = await getReposForOrgs(config.orgs, api); const { validRepos, notFoundOrgs } = await getReposForOrgs(config.orgs, api);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.orgs = notFoundOrgs;
} }
if (config.repos) { if (config.repos) {
const _repos = await getRepos(config.repos, api); const { validRepos, notFoundRepos } = await getRepos(config.repos, api);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.repos = notFoundRepos;
} }
if (config.users) { if (config.users) {
const _repos = await getReposOwnedByUsers(config.users, api); const { validRepos, notFoundUsers } = await getReposOwnedByUsers(config.users, api);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.users = notFoundUsers;
} }
let repos: GitRepository[] = allRepos allRepos = allRepos.filter(repo => repo.full_name !== undefined);
.map((repo) => { allRepos = allRepos.filter(repo => {
const hostname = config.url ? new URL(config.url).hostname : 'gitea.com'; if (repo.full_name === undefined) {
const repoId = `${hostname}/${repo.full_name!}`; logger.warn(`Repository with undefined full_name found: orgId=${orgId}, repoId=${repo.id}`);
const repoPath = path.resolve(path.join(ctx.reposPath, `${repoId}.git`)); return false;
const cloneUrl = new URL(repo.clone_url!);
if (token) {
cloneUrl.username = token;
} }
return true;
return {
vcs: 'git',
codeHost: 'gitea',
name: repo.full_name!,
id: repoId,
cloneUrl: cloneUrl.toString(),
path: repoPath,
isStale: false,
isFork: repo.fork!,
isArchived: !!repo.archived,
gitConfigMetadata: {
'zoekt.web-url-type': 'gitea',
'zoekt.web-url': repo.html_url!,
'zoekt.name': repoId,
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork!),
'zoekt.public': marshalBool(repo.internal === false && repo.private === false),
},
branches: [],
tags: []
} satisfies GitRepository;
}); });
if (config.exclude) { let repos = allRepos
if (!!config.exclude.forks) { .filter((repo) => {
repos = excludeForkedRepos(repos, logger); const isExcluded = shouldExcludeRepo({
} repo,
exclude: config.exclude,
});
if (!!config.exclude.archived) { return !isExcluded;
repos = excludeArchivedRepos(repos, logger); });
}
if (config.exclude.repos) {
repos = excludeReposByName(repos, config.exclude.repos, logger);
}
}
logger.debug(`Found ${repos.length} total repositories.`); logger.debug(`Found ${repos.length} total repositories.`);
if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let branches = (await getBranchesForRepo(owner, name, api)).map(branch => branch.name!);
branches = micromatch.match(branches, branchGlobs);
return { return {
...repo, validRepos: repos,
branches, notFound,
}; };
})
)
} }
if (config.revisions.tags) { const shouldExcludeRepo = ({
const tagGlobs = config.revisions.tags; repo,
repos = await Promise.all( exclude
repos.map(async (repo) => { } : {
const [owner, name] = repo.name.split('/'); repo: GiteaRepository,
let tags = (await getTagsForRepo(owner, name, api)).map(tag => tag.name!); exclude?: {
tags = micromatch.match(tags, tagGlobs); forks?: boolean,
archived?: boolean,
repos?: string[],
}
}) => {
let reason = '';
const repoName = repo.full_name!;
return { const shouldExclude = (() => {
...repo, if (!!exclude?.forks && repo.fork) {
tags, reason = `\`exclude.forks\` is true`;
}; return true;
}) }
)
if (!!exclude?.archived && !!repo.archived) {
reason = `\`exclude.archived\` is true`;
return true;
}
if (exclude?.repos) {
if (micromatch.isMatch(repoName, exclude.repos)) {
reason = `\`exclude.repos\` contains ${repoName}`;
return true;
} }
} }
return repos; return false;
})();
if (shouldExclude) {
logger.debug(`Excluding repo ${repoName}. Reason: ${reason}`);
} }
const getTagsForRepo = async <T>(owner: string, repo: string, api: Api<T>) => { return shouldExclude;
try {
logger.debug(`Fetching tags for repo ${owner}/${repo}...`);
const { durationMs, data: tags } = await measure(() =>
paginate((page) => api.repos.repoListTags(owner, repo, {
page
}))
);
logger.debug(`Found ${tags.length} tags in repo ${owner}/${repo} in ${durationMs}ms.`);
return tags;
} catch (e) {
logger.error(`Failed to fetch tags for repo ${owner}/${repo}.`, e);
return [];
}
}
const getBranchesForRepo = async <T>(owner: string, repo: string, api: Api<T>) => {
try {
logger.debug(`Fetching branches for repo ${owner}/${repo}...`);
const { durationMs, data: branches } = await measure(() =>
paginate((page) => api.repos.repoListBranches(owner, repo, {
page
}))
);
logger.debug(`Found ${branches.length} branches in repo ${owner}/${repo} in ${durationMs}ms.`);
return branches;
} catch (e) {
logger.error(`Failed to fetch branches for repo ${owner}/${repo}.`, e);
return [];
}
} }
const getReposOwnedByUsers = async <T>(users: string[], api: Api<T>) => { const getReposOwnedByUsers = async <T>(users: string[], api: Api<T>) => {
const repos = (await Promise.all(users.map(async (user) => { const results = await Promise.allSettled(users.map(async (user) => {
try { try {
logger.debug(`Fetching repos for user ${user}...`); logger.debug(`Fetching repos for user ${user}...`);
@ -165,18 +137,35 @@ const getReposOwnedByUsers = async <T>(users: string[], api: Api<T>) => {
); );
logger.debug(`Found ${data.length} repos owned by user ${user} in ${durationMs}ms.`); logger.debug(`Found ${data.length} repos owned by user ${user} in ${durationMs}ms.`);
return data; return {
} catch (e) { type: 'valid' as const,
logger.error(`Failed to fetch repos for user ${user}.`, e); data
return []; };
} } catch (e: any) {
}))).flat(); Sentry.captureException(e);
return repos; if (e?.status === 404) {
logger.error(`User ${user} not found or no access`);
return {
type: 'notFound' as const,
value: user
};
}
throw e;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundUsers } = processPromiseResults<GiteaRepository>(results);
return {
validRepos,
notFoundUsers,
};
} }
const getReposForOrgs = async <T>(orgs: string[], api: Api<T>) => { const getReposForOrgs = async <T>(orgs: string[], api: Api<T>) => {
return (await Promise.all(orgs.map(async (org) => { const results = await Promise.allSettled(orgs.map(async (org) => {
try { try {
logger.debug(`Fetching repos for org ${org}...`); logger.debug(`Fetching repos for org ${org}...`);
@ -188,16 +177,35 @@ const getReposForOrgs = async <T>(orgs: string[], api: Api<T>) => {
); );
logger.debug(`Found ${data.length} repos for org ${org} in ${durationMs}ms.`); logger.debug(`Found ${data.length} repos for org ${org} in ${durationMs}ms.`);
return data; return {
} catch (e) { type: 'valid' as const,
logger.error(`Failed to fetch repos for org ${org}.`, e); data
return []; };
} catch (e: any) {
Sentry.captureException(e);
if (e?.status === 404) {
logger.error(`Organization ${org} not found or no access`);
return {
type: 'notFound' as const,
value: org
};
} }
}))).flat(); throw e;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundOrgs } = processPromiseResults<GiteaRepository>(results);
return {
validRepos,
notFoundOrgs,
};
} }
const getRepos = async <T>(repos: string[], api: Api<T>) => { const getRepos = async <T>(repos: string[], api: Api<T>) => {
return (await Promise.all(repos.map(async (repo) => { const results = await Promise.allSettled(repos.map(async (repo) => {
try { try {
logger.debug(`Fetching repository info for ${repo}...`); logger.debug(`Fetching repository info for ${repo}...`);
@ -207,13 +215,31 @@ const getRepos = async <T>(repos: string[], api: Api<T>) => {
); );
logger.debug(`Found repo ${repo} in ${durationMs}ms.`); logger.debug(`Found repo ${repo} in ${durationMs}ms.`);
return {
type: 'valid' as const,
data: [response.data]
};
} catch (e: any) {
Sentry.captureException(e);
return [response.data]; if (e?.status === 404) {
} catch (e) { logger.error(`Repository ${repo} not found or no access`);
logger.error(`Failed to fetch repository info for ${repo}.`, e); return {
return []; type: 'notFound' as const,
value: repo
};
} }
}))).flat(); throw e;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundRepos } = processPromiseResults<GiteaRepository>(results);
return {
validRepos,
notFoundRepos,
};
} }
// @see : https://docs.gitea.com/development/api-usage#pagination // @see : https://docs.gitea.com/development/api-usage#pagination
@ -224,7 +250,9 @@ const paginate = async <T>(request: (page: number) => Promise<HttpResponse<T[],
const totalCountString = result.headers.get('x-total-count'); const totalCountString = result.headers.get('x-total-count');
if (!totalCountString) { if (!totalCountString) {
throw new Error("Header 'x-total-count' not found"); const e = new Error("Header 'x-total-count' not found");
Sentry.captureException(e);
throw e;
} }
const totalCount = parseInt(totalCountString); const totalCount = parseInt(totalCountString);

View file

@ -0,0 +1,206 @@
import { expect, test } from 'vitest';
import { OctokitRepository, shouldExcludeRepo } from './github';
test('shouldExcludeRepo returns true when clone_url is undefined', () => {
const repo = { full_name: 'test/repo' } as OctokitRepository;
expect(shouldExcludeRepo({
repo,
})).toBe(true);
});
test('shouldExcludeRepo returns false when the repo is not excluded.', () => {
const repo = {
full_name: 'test/repo',
clone_url: 'https://github.com/test/repo.git',
} as OctokitRepository;
expect(shouldExcludeRepo({
repo,
})).toBe(false);
});
test('shouldExcludeRepo handles forked repos correctly', () => {
const repo = {
full_name: 'test/forked-repo',
clone_url: 'https://github.com/test/forked-repo.git',
fork: true,
} as OctokitRepository;
expect(shouldExcludeRepo({ repo })).toBe(false);
expect(shouldExcludeRepo({ repo, exclude: { forks: true } })).toBe(true);
expect(shouldExcludeRepo({ repo, exclude: { forks: false } })).toBe(false);
});;
test('shouldExcludeRepo handles archived repos correctly', () => {
const repo = {
full_name: 'test/archived-repo',
clone_url: 'https://github.com/test/archived-repo.git',
archived: true,
} as OctokitRepository;
expect(shouldExcludeRepo({ repo })).toBe(false);
expect(shouldExcludeRepo({ repo, exclude: { archived: true } })).toBe(true);
expect(shouldExcludeRepo({ repo, exclude: { archived: false } })).toBe(false);
});
test('shouldExcludeRepo handles include.topics correctly', () => {
const repo = {
full_name: 'test/repo',
clone_url: 'https://github.com/test/repo.git',
topics: [
'test-topic',
'another-topic'
] as string[],
} as OctokitRepository;
expect(shouldExcludeRepo({
repo,
include: {}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
include: {
topics: [],
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
include: {
topics: ['a-topic-that-does-not-exist'],
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
include: {
topics: ['test-topic'],
}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
include: {
topics: ['test-*'],
}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
include: {
topics: ['TEST-tOpIC'],
}
})).toBe(false);
});
test('shouldExcludeRepo handles exclude.topics correctly', () => {
const repo = {
full_name: 'test/repo',
clone_url: 'https://github.com/test/repo.git',
topics: [
'test-topic',
'another-topic'
],
} as OctokitRepository;
expect(shouldExcludeRepo({
repo,
exclude: {}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
exclude: {
topics: [],
}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
exclude: {
topics: ['a-topic-that-does-not-exist'],
}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
exclude: {
topics: ['test-topic'],
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
topics: ['test-*'],
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
topics: ['TEST-tOpIC'],
}
})).toBe(true);
});
test('shouldExcludeRepo handles exclude.size correctly', () => {
const repo = {
full_name: 'test/repo',
clone_url: 'https://github.com/test/repo.git',
size: 6, // 6KB
} as OctokitRepository;
expect(shouldExcludeRepo({
repo,
exclude: {
size: {
min: 10 * 1000, // 10KB
}
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
size: {
max: 2 * 1000, // 2KB
}
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
size: {
min: 5 * 1000, // 5KB
max: 10 * 1000, // 10KB
}
}
})).toBe(false);
});
test('shouldExcludeRepo handles exclude.repos correctly', () => {
const repo = {
full_name: 'test/example-repo',
clone_url: 'https://github.com/test/example-repo.git',
} as OctokitRepository;
expect(shouldExcludeRepo({
repo,
exclude: {
repos: []
}
})).toBe(false);
expect(shouldExcludeRepo({
repo,
exclude: {
repos: ['test/example-repo']
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
repos: ['test/*']
}
})).toBe(true);
expect(shouldExcludeRepo({
repo,
exclude: {
repos: ['repo-does-not-exist']
}
})).toBe(false);
});

View file

@ -1,14 +1,18 @@
import { Octokit } from "@octokit/rest"; import { Octokit } from "@octokit/rest";
import { GitHubConfig } from "./schemas/v2.js"; import { GithubConnectionConfig } from "@sourcebot/schemas/v3/github.type";
import { createLogger } from "./logger.js"; import { createLogger } from "./logger.js";
import { AppContext, GitRepository } from "./types.js"; import { getTokenFromConfig, measure, fetchWithRetry } from "./utils.js";
import path from 'path';
import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, excludeReposByTopic, getTokenFromConfig, includeReposByTopic, marshalBool, measure } from "./utils.js";
import micromatch from "micromatch"; import micromatch from "micromatch";
import { PrismaClient } from "@sourcebot/db";
import { BackendException, BackendError } from "@sourcebot/error";
import { processPromiseResults, throwIfAnyFailed } from "./connectionUtils.js";
import * as Sentry from "@sentry/node";
import { env } from "./env.js";
const logger = createLogger("GitHub"); const logger = createLogger("GitHub");
const GITHUB_CLOUD_HOSTNAME = "github.com";
type OctokitRepository = { export type OctokitRepository = {
name: string, name: string,
id: number, id: number,
full_name: string, full_name: string,
@ -22,11 +26,30 @@ type OctokitRepository = {
forks_count?: number, forks_count?: number,
archived?: boolean, archived?: boolean,
topics?: string[], topics?: string[],
// @note: this is expressed in kilobytes.
size?: number, size?: number,
owner: {
avatar_url: string,
}
} }
export const getGitHubReposFromConfig = async (config: GitHubConfig, signal: AbortSignal, ctx: AppContext) => { const isHttpError = (error: unknown, status: number): boolean => {
const token = config.token ? getTokenFromConfig(config.token, ctx) : undefined; return error !== null
&& typeof error === 'object'
&& 'status' in error
&& error.status === status;
}
export const getGitHubReposFromConfig = async (config: GithubConnectionConfig, orgId: number, db: PrismaClient, signal: AbortSignal) => {
const hostname = config.url ?
new URL(config.url).hostname :
GITHUB_CLOUD_HOSTNAME;
const token = config.token ?
await getTokenFromConfig(config.token, orgId, db, logger) :
hostname === GITHUB_CLOUD_HOSTNAME ?
env.FALLBACK_GITHUB_CLOUD_TOKEN :
undefined;
const octokit = new Octokit({ const octokit = new Octokit({
auth: token, auth: token,
@ -35,217 +58,174 @@ export const getGitHubReposFromConfig = async (config: GitHubConfig, signal: Abo
} : {}), } : {}),
}); });
if (token) {
try {
await octokit.rest.users.getAuthenticated();
} catch (error) {
Sentry.captureException(error);
if (isHttpError(error, 401)) {
const e = new BackendException(BackendError.CONNECTION_SYNC_INVALID_TOKEN, {
...(config.token && 'secret' in config.token ? {
secretKey: config.token.secret,
} : {}),
});
Sentry.captureException(e);
throw e;
}
const e = new BackendException(BackendError.CONNECTION_SYNC_SYSTEM_ERROR, {
message: `Failed to authenticate with GitHub`,
});
Sentry.captureException(e);
throw e;
}
}
let allRepos: OctokitRepository[] = []; let allRepos: OctokitRepository[] = [];
let notFound: {
users: string[],
orgs: string[],
repos: string[],
} = {
users: [],
orgs: [],
repos: [],
};
if (config.orgs) { if (config.orgs) {
const _repos = await getReposForOrgs(config.orgs, octokit, signal); const { validRepos, notFoundOrgs } = await getReposForOrgs(config.orgs, octokit, signal);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.orgs = notFoundOrgs;
} }
if (config.repos) { if (config.repos) {
const _repos = await getRepos(config.repos, octokit, signal); const { validRepos, notFoundRepos } = await getRepos(config.repos, octokit, signal);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.repos = notFoundRepos;
} }
if (config.users) { if (config.users) {
const isAuthenticated = config.token !== undefined; const isAuthenticated = config.token !== undefined;
const _repos = await getReposOwnedByUsers(config.users, isAuthenticated, octokit, signal); const { validRepos, notFoundUsers } = await getReposOwnedByUsers(config.users, isAuthenticated, octokit, signal);
allRepos = allRepos.concat(_repos); allRepos = allRepos.concat(validRepos);
notFound.users = notFoundUsers;
} }
// Marshall results to our type let repos = allRepos
let repos: GitRepository[] = allRepos
.filter((repo) => { .filter((repo) => {
if (!repo.clone_url) { const isExcluded = shouldExcludeRepo({
logger.warn(`Repository ${repo.name} missing property 'clone_url'. Excluding.`) repo,
return false; include: {
} topics: config.topics,
return true;
})
.map((repo) => {
const hostname = config.url ? new URL(config.url).hostname : 'github.com';
const repoId = `${hostname}/${repo.full_name}`;
const repoPath = path.resolve(path.join(ctx.reposPath, `${repoId}.git`));
const cloneUrl = new URL(repo.clone_url!);
if (token) {
cloneUrl.username = token;
}
return {
vcs: 'git',
codeHost: 'github',
name: repo.full_name,
id: repoId,
cloneUrl: cloneUrl.toString(),
path: repoPath,
isStale: false,
isFork: repo.fork,
isArchived: !!repo.archived,
topics: repo.topics ?? [],
gitConfigMetadata: {
'zoekt.web-url-type': 'github',
'zoekt.web-url': repo.html_url,
'zoekt.name': repoId,
'zoekt.github-stars': (repo.stargazers_count ?? 0).toString(),
'zoekt.github-watchers': (repo.watchers_count ?? 0).toString(),
'zoekt.github-subscribers': (repo.subscribers_count ?? 0).toString(),
'zoekt.github-forks': (repo.forks_count ?? 0).toString(),
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork),
'zoekt.public': marshalBool(repo.private === false)
}, },
sizeInBytes: repo.size ? repo.size * 1000 : undefined, exclude: config.exclude,
branches: [],
tags: [],
} satisfies GitRepository;
}); });
if (config.topics) { return !isExcluded;
const topics = config.topics.map(topic => topic.toLowerCase());
repos = includeReposByTopic(repos, topics, logger);
}
if (config.exclude) {
if (!!config.exclude.forks) {
repos = excludeForkedRepos(repos, logger);
}
if (!!config.exclude.archived) {
repos = excludeArchivedRepos(repos, logger);
}
if (config.exclude.repos) {
repos = excludeReposByName(repos, config.exclude.repos, logger);
}
if (config.exclude.topics) {
const topics = config.exclude.topics.map(topic => topic.toLowerCase());
repos = excludeReposByTopic(repos, topics, logger);
}
if (config.exclude.size) {
const min = config.exclude.size.min;
const max = config.exclude.size.max;
if (min) {
repos = repos.filter((repo) => {
// If we don't have a size, we can't filter by size.
if (!repo.sizeInBytes) {
return true;
}
if (repo.sizeInBytes < min) {
logger.debug(`Excluding repo ${repo.name}. Reason: repo is less than \`exclude.size.min\`=${min} bytes.`);
return false;
}
return true;
}); });
}
if (max) {
repos = repos.filter((repo) => {
// If we don't have a size, we can't filter by size.
if (!repo.sizeInBytes) {
return true;
}
if (repo.sizeInBytes > max) {
logger.debug(`Excluding repo ${repo.name}. Reason: repo is greater than \`exclude.size.max\`=${max} bytes.`);
return false;
}
return true;
});
}
}
}
logger.debug(`Found ${repos.length} total repositories.`); logger.debug(`Found ${repos.length} total repositories.`);
if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let branches = (await getBranchesForRepo(owner, name, octokit, signal)).map(branch => branch.name);
branches = micromatch.match(branches, branchGlobs);
return { return {
...repo, validRepos: repos,
branches, notFound,
}; };
})
)
} }
if (config.revisions.tags) { export const shouldExcludeRepo = ({
const tagGlobs = config.revisions.tags;
repos = await Promise.all(
repos.map(async (repo) => {
const [owner, name] = repo.name.split('/');
let tags = (await getTagsForRepo(owner, name, octokit, signal)).map(tag => tag.name);
tags = micromatch.match(tags, tagGlobs);
return {
...repo,
tags,
};
})
)
}
}
return repos;
}
const getTagsForRepo = async (owner: string, repo: string, octokit: Octokit, signal: AbortSignal) => {
try {
logger.debug(`Fetching tags for repo ${owner}/${repo}...`);
const { durationMs, data: tags } = await measure(() => octokit.paginate(octokit.repos.listTags, {
owner,
repo, repo,
per_page: 100, include,
request: { exclude
signal } : {
} repo: OctokitRepository,
})); include?: {
topics?: GithubConnectionConfig['topics']
},
exclude?: GithubConnectionConfig['exclude']
}) => {
let reason = '';
const repoName = repo.full_name;
logger.debug(`Found ${tags.length} tags for repo ${owner}/${repo} in ${durationMs}ms`); const shouldExclude = (() => {
return tags; if (!repo.clone_url) {
} catch (e) { reason = 'clone_url is undefined';
logger.debug(`Error fetching tags for repo ${owner}/${repo}: ${e}`); return true;
return []; }
if (!!exclude?.forks && repo.fork) {
reason = `\`exclude.forks\` is true`;
return true;
}
if (!!exclude?.archived && !!repo.archived) {
reason = `\`exclude.archived\` is true`;
return true;
}
if (exclude?.repos) {
if (micromatch.isMatch(repoName, exclude.repos)) {
reason = `\`exclude.repos\` contains ${repoName}`;
return true;
} }
} }
const getBranchesForRepo = async (owner: string, repo: string, octokit: Octokit, signal: AbortSignal) => { if (exclude?.topics) {
try { const configTopics = exclude.topics.map(topic => topic.toLowerCase());
logger.debug(`Fetching branches for repo ${owner}/${repo}...`); const repoTopics = repo.topics ?? [];
const { durationMs, data: branches } = await measure(() => octokit.paginate(octokit.repos.listBranches, {
owner, const matchingTopics = repoTopics.filter((topic) => micromatch.isMatch(topic, configTopics));
repo, if (matchingTopics.length > 0) {
per_page: 100, reason = `\`exclude.topics\` matches the following topics: ${matchingTopics.join(', ')}`;
request: { return true;
signal
}
}));
logger.debug(`Found ${branches.length} branches for repo ${owner}/${repo} in ${durationMs}ms`);
return branches;
} catch (e) {
logger.debug(`Error fetching branches for repo ${owner}/${repo}: ${e}`);
return [];
} }
} }
if (include?.topics) {
const configTopics = include.topics.map(topic => topic.toLowerCase());
const repoTopics = repo.topics ?? [];
const matchingTopics = repoTopics.filter((topic) => micromatch.isMatch(topic, configTopics));
if (matchingTopics.length === 0) {
reason = `\`include.topics\` does not match any of the following topics: ${configTopics.join(', ')}`;
return true;
}
}
const repoSizeInBytes = repo.size ? repo.size * 1000 : undefined;
if (exclude?.size && repoSizeInBytes) {
const min = exclude.size.min;
const max = exclude.size.max;
if (min && repoSizeInBytes < min) {
reason = `repo is less than \`exclude.size.min\`=${min} bytes.`;
return true;
}
if (max && repoSizeInBytes > max) {
reason = `repo is greater than \`exclude.size.max\`=${max} bytes.`;
return true;
}
}
return false;
})();
if (shouldExclude) {
logger.debug(`Excluding repo ${repoName}. Reason: ${reason}`);
return true;
}
return false;
}
const getReposOwnedByUsers = async (users: string[], isAuthenticated: boolean, octokit: Octokit, signal: AbortSignal) => { const getReposOwnedByUsers = async (users: string[], isAuthenticated: boolean, octokit: Octokit, signal: AbortSignal) => {
const repos = (await Promise.all(users.map(async (user) => { const results = await Promise.allSettled(users.map(async (user) => {
try { try {
logger.debug(`Fetching repository info for user ${user}...`); logger.debug(`Fetching repository info for user ${user}...`);
const { durationMs, data } = await measure(async () => { const { durationMs, data } = await measure(async () => {
const fetchFn = async () => {
if (isAuthenticated) { if (isAuthenticated) {
return octokit.paginate(octokit.repos.listForAuthenticatedUser, { return octokit.paginate(octokit.repos.listForAuthenticatedUser, {
username: user, username: user,
@ -265,65 +245,130 @@ const getReposOwnedByUsers = async (users: string[], isAuthenticated: boolean, o
}, },
}); });
} }
};
return fetchWithRetry(fetchFn, `user ${user}`, logger);
}); });
logger.debug(`Found ${data.length} owned by user ${user} in ${durationMs}ms.`); logger.debug(`Found ${data.length} owned by user ${user} in ${durationMs}ms.`);
return data; return {
} catch (e) { type: 'valid' as const,
logger.error(`Failed to fetch repository info for user ${user}.`, e); data
return []; };
} } catch (error) {
}))).flat(); Sentry.captureException(error);
logger.error(`Failed to fetch repositories for user ${user}.`, error);
return repos; if (isHttpError(error, 404)) {
logger.error(`User ${user} not found or no access`);
return {
type: 'notFound' as const,
value: user
};
}
throw error;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundUsers } = processPromiseResults<OctokitRepository>(results);
return {
validRepos,
notFoundUsers,
};
} }
const getReposForOrgs = async (orgs: string[], octokit: Octokit, signal: AbortSignal) => { const getReposForOrgs = async (orgs: string[], octokit: Octokit, signal: AbortSignal) => {
const repos = (await Promise.all(orgs.map(async (org) => { const results = await Promise.allSettled(orgs.map(async (org) => {
try { try {
logger.debug(`Fetching repository info for org ${org}...`); logger.info(`Fetching repository info for org ${org}...`);
const { durationMs, data } = await measure(() => octokit.paginate(octokit.repos.listForOrg, { const { durationMs, data } = await measure(async () => {
const fetchFn = () => octokit.paginate(octokit.repos.listForOrg, {
org: org, org: org,
per_page: 100, per_page: 100,
request: { request: {
signal signal
} }
});
return fetchWithRetry(fetchFn, `org ${org}`, logger);
});
logger.info(`Found ${data.length} in org ${org} in ${durationMs}ms.`);
return {
type: 'valid' as const,
data
};
} catch (error) {
Sentry.captureException(error);
logger.error(`Failed to fetch repositories for org ${org}.`, error);
if (isHttpError(error, 404)) {
logger.error(`Organization ${org} not found or no access`);
return {
type: 'notFound' as const,
value: org
};
}
throw error;
}
})); }));
logger.debug(`Found ${data.length} in org ${org} in ${durationMs}ms.`); throwIfAnyFailed(results);
return data; const { validItems: validRepos, notFoundItems: notFoundOrgs } = processPromiseResults<OctokitRepository>(results);
} catch (e) {
logger.error(`Failed to fetch repository info for org ${org}.`, e);
return [];
}
}))).flat();
return repos; return {
validRepos,
notFoundOrgs,
};
} }
const getRepos = async (repoList: string[], octokit: Octokit, signal: AbortSignal) => { const getRepos = async (repoList: string[], octokit: Octokit, signal: AbortSignal) => {
const repos = (await Promise.all(repoList.map(async (repo) => { const results = await Promise.allSettled(repoList.map(async (repo) => {
try { try {
logger.debug(`Fetching repository info for ${repo}...`);
const [owner, repoName] = repo.split('/'); const [owner, repoName] = repo.split('/');
const { durationMs, data: result } = await measure(() => octokit.repos.get({ logger.info(`Fetching repository info for ${repo}...`);
const { durationMs, data: result } = await measure(async () => {
const fetchFn = () => octokit.repos.get({
owner, owner,
repo: repoName, repo: repoName,
request: { request: {
signal signal
} }
});
return fetchWithRetry(fetchFn, repo, logger);
});
logger.info(`Found info for repository ${repo} in ${durationMs}ms`);
return {
type: 'valid' as const,
data: [result.data]
};
} catch (error) {
Sentry.captureException(error);
logger.error(`Failed to fetch repository ${repo}.`, error);
if (isHttpError(error, 404)) {
logger.error(`Repository ${repo} not found or no access`);
return {
type: 'notFound' as const,
value: repo
};
}
throw error;
}
})); }));
logger.debug(`Found info for repository ${repo} in ${durationMs}ms`); throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundRepos } = processPromiseResults<OctokitRepository>(results);
return [result.data]; return {
} catch (e) { validRepos,
logger.error(`Failed to fetch repository info for ${repo}.`, e); notFoundRepos,
return []; };
}
}))).flat();
return repos;
} }

View file

@ -0,0 +1,43 @@
import { expect, test } from 'vitest';
import { shouldExcludeProject } from './gitlab';
import { ProjectSchema } from '@gitbeaker/rest';
test('shouldExcludeProject returns false when the project is not excluded.', () => {
const project = {
path_with_namespace: 'test/project',
} as ProjectSchema;
expect(shouldExcludeProject({
project,
})).toBe(false);
});
test('shouldExcludeProject returns true when the project is excluded by exclude.archived.', () => {
const project = {
path_with_namespace: 'test/project',
archived: true,
} as ProjectSchema;
expect(shouldExcludeProject({
project,
exclude: {
archived: true,
}
})).toBe(true)
});
test('shouldExcludeProject returns true when the project is excluded by exclude.forks.', () => {
const project = {
path_with_namespace: 'test/project',
forked_from_project: {}
} as unknown as ProjectSchema;
expect(shouldExcludeProject({
project,
exclude: {
forks: true,
}
})).toBe(true)
});

View file

@ -1,208 +1,260 @@
import { Gitlab, ProjectSchema } from "@gitbeaker/rest"; import { Gitlab, ProjectSchema } from "@gitbeaker/rest";
import { GitLabConfig } from "./schemas/v2.js";
import { excludeArchivedRepos, excludeForkedRepos, excludeReposByName, excludeReposByTopic, getTokenFromConfig, includeReposByTopic, marshalBool, measure } from "./utils.js";
import { createLogger } from "./logger.js";
import { AppContext, GitRepository } from "./types.js";
import path from 'path';
import micromatch from "micromatch"; import micromatch from "micromatch";
import { createLogger } from "./logger.js";
import { GitlabConnectionConfig } from "@sourcebot/schemas/v3/gitlab.type"
import { getTokenFromConfig, measure, fetchWithRetry } from "./utils.js";
import { PrismaClient } from "@sourcebot/db";
import { processPromiseResults, throwIfAnyFailed } from "./connectionUtils.js";
import * as Sentry from "@sentry/node";
import { env } from "./env.js";
const logger = createLogger("GitLab"); const logger = createLogger("GitLab");
const GITLAB_CLOUD_HOSTNAME = "gitlab.com"; export const GITLAB_CLOUD_HOSTNAME = "gitlab.com";
export const getGitLabReposFromConfig = async (config: GitlabConnectionConfig, orgId: number, db: PrismaClient) => {
const hostname = config.url ?
new URL(config.url).hostname :
GITLAB_CLOUD_HOSTNAME;
const token = config.token ?
await getTokenFromConfig(config.token, orgId, db, logger) :
hostname === GITLAB_CLOUD_HOSTNAME ?
env.FALLBACK_GITLAB_CLOUD_TOKEN :
undefined;
export const getGitLabReposFromConfig = async (config: GitLabConfig, ctx: AppContext) => {
const token = config.token ? getTokenFromConfig(config.token, ctx) : undefined;
const api = new Gitlab({ const api = new Gitlab({
...(config.token ? { ...(token ? {
token, token,
} : {}), } : {}),
...(config.url ? { ...(config.url ? {
host: config.url, host: config.url,
} : {}), } : {}),
}); });
const hostname = config.url ? new URL(config.url).hostname : GITLAB_CLOUD_HOSTNAME;
let allRepos: ProjectSchema[] = [];
let allProjects: ProjectSchema[] = []; let notFound: {
orgs: string[],
users: string[],
repos: string[],
} = {
orgs: [],
users: [],
repos: [],
};
if (config.all === true) { if (config.all === true) {
if (hostname !== GITLAB_CLOUD_HOSTNAME) { if (hostname !== GITLAB_CLOUD_HOSTNAME) {
try { try {
logger.debug(`Fetching all projects visible in ${config.url}...`); logger.debug(`Fetching all projects visible in ${config.url}...`);
const { durationMs, data: _projects } = await measure(() => api.Projects.all({ const { durationMs, data: _projects } = await measure(async () => {
const fetchFn = () => api.Projects.all({
perPage: 100, perPage: 100,
})); });
return fetchWithRetry(fetchFn, `all projects in ${config.url}`, logger);
});
logger.debug(`Found ${_projects.length} projects in ${durationMs}ms.`); logger.debug(`Found ${_projects.length} projects in ${durationMs}ms.`);
allProjects = allProjects.concat(_projects); allRepos = allRepos.concat(_projects);
} catch (e) { } catch (e) {
Sentry.captureException(e);
logger.error(`Failed to fetch all projects visible in ${config.url}.`, e); logger.error(`Failed to fetch all projects visible in ${config.url}.`, e);
throw e;
} }
} else { } else {
logger.warn(`Ignoring option all:true in ${ctx.configPath} : host is ${GITLAB_CLOUD_HOSTNAME}`); logger.warn(`Ignoring option all:true in config : host is ${GITLAB_CLOUD_HOSTNAME}`);
} }
} }
if (config.groups) { if (config.groups) {
const _projects = (await Promise.all(config.groups.map(async (group) => { const results = await Promise.allSettled(config.groups.map(async (group) => {
try { try {
logger.debug(`Fetching project info for group ${group}...`); logger.debug(`Fetching project info for group ${group}...`);
const { durationMs, data } = await measure(() => api.Groups.allProjects(group, { const { durationMs, data } = await measure(async () => {
const fetchFn = () => api.Groups.allProjects(group, {
perPage: 100, perPage: 100,
includeSubgroups: true includeSubgroups: true
})); });
return fetchWithRetry(fetchFn, `group ${group}`, logger);
});
logger.debug(`Found ${data.length} projects in group ${group} in ${durationMs}ms.`); logger.debug(`Found ${data.length} projects in group ${group} in ${durationMs}ms.`);
return data; return {
} catch (e) { type: 'valid' as const,
logger.error(`Failed to fetch project info for group ${group}.`, e); data
return []; };
} } catch (e: any) {
}))).flat(); Sentry.captureException(e);
logger.error(`Failed to fetch projects for group ${group}.`, e);
allProjects = allProjects.concat(_projects); const status = e?.cause?.response?.status;
if (status === 404) {
logger.error(`Group ${group} not found or no access`);
return {
type: 'notFound' as const,
value: group
};
}
throw e;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundOrgs } = processPromiseResults(results);
allRepos = allRepos.concat(validRepos);
notFound.orgs = notFoundOrgs;
} }
if (config.users) { if (config.users) {
const _projects = (await Promise.all(config.users.map(async (user) => { const results = await Promise.allSettled(config.users.map(async (user) => {
try { try {
logger.debug(`Fetching project info for user ${user}...`); logger.debug(`Fetching project info for user ${user}...`);
const { durationMs, data } = await measure(() => api.Users.allProjects(user, { const { durationMs, data } = await measure(async () => {
const fetchFn = () => api.Users.allProjects(user, {
perPage: 100, perPage: 100,
})); });
return fetchWithRetry(fetchFn, `user ${user}`, logger);
});
logger.debug(`Found ${data.length} projects owned by user ${user} in ${durationMs}ms.`); logger.debug(`Found ${data.length} projects owned by user ${user} in ${durationMs}ms.`);
return data; return {
} catch (e) { type: 'valid' as const,
logger.error(`Failed to fetch project info for user ${user}.`, e); data
return []; };
} } catch (e: any) {
}))).flat(); Sentry.captureException(e);
logger.error(`Failed to fetch projects for user ${user}.`, e);
allProjects = allProjects.concat(_projects); const status = e?.cause?.response?.status;
if (status === 404) {
logger.error(`User ${user} not found or no access`);
return {
type: 'notFound' as const,
value: user
};
}
throw e;
}
}));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundUsers } = processPromiseResults(results);
allRepos = allRepos.concat(validRepos);
notFound.users = notFoundUsers;
} }
if (config.projects) { if (config.projects) {
const _projects = (await Promise.all(config.projects.map(async (project) => { const results = await Promise.allSettled(config.projects.map(async (project) => {
try { try {
logger.debug(`Fetching project info for project ${project}...`); logger.debug(`Fetching project info for project ${project}...`);
const { durationMs, data } = await measure(() => api.Projects.show(project)); const { durationMs, data } = await measure(async () => {
const fetchFn = () => api.Projects.show(project);
return fetchWithRetry(fetchFn, `project ${project}`, logger);
});
logger.debug(`Found project ${project} in ${durationMs}ms.`); logger.debug(`Found project ${project} in ${durationMs}ms.`);
return [data];
} catch (e) {
logger.error(`Failed to fetch project info for project ${project}.`, e);
return [];
}
}))).flat();
allProjects = allProjects.concat(_projects);
}
let repos: GitRepository[] = allProjects
.map((project) => {
const repoId = `${hostname}/${project.path_with_namespace}`;
const repoPath = path.resolve(path.join(ctx.reposPath, `${repoId}.git`))
const isFork = project.forked_from_project !== undefined;
const cloneUrl = new URL(project.http_url_to_repo);
if (token) {
cloneUrl.username = 'oauth2';
cloneUrl.password = token;
}
return { return {
vcs: 'git', type: 'valid' as const,
codeHost: 'gitlab', data: [data]
name: project.path_with_namespace, };
id: repoId, } catch (e: any) {
cloneUrl: cloneUrl.toString(), Sentry.captureException(e);
path: repoPath, logger.error(`Failed to fetch project ${project}.`, e);
isStale: false,
isFork, const status = e?.cause?.response?.status;
isArchived: project.archived,
topics: project.topics ?? [], if (status === 404) {
gitConfigMetadata: { logger.error(`Project ${project} not found or no access`);
'zoekt.web-url-type': 'gitlab', return {
'zoekt.web-url': project.web_url, type: 'notFound' as const,
'zoekt.name': repoId, value: project
'zoekt.gitlab-stars': project.star_count?.toString() ?? '0', };
'zoekt.gitlab-forks': project.forks_count?.toString() ?? '0', }
'zoekt.archived': marshalBool(project.archived), throw e;
'zoekt.fork': marshalBool(isFork), }
'zoekt.public': marshalBool(project.visibility === 'public'), }));
throwIfAnyFailed(results);
const { validItems: validRepos, notFoundItems: notFoundRepos } = processPromiseResults(results);
allRepos = allRepos.concat(validRepos);
notFound.repos = notFoundRepos;
}
let repos = allRepos
.filter((project) => {
const isExcluded = shouldExcludeProject({
project,
include: {
topics: config.topics,
}, },
branches: [], exclude: config.exclude
tags: [],
} satisfies GitRepository;
}); });
if (config.topics) { return !isExcluded;
const topics = config.topics.map(topic => topic.toLowerCase()); });
repos = includeReposByTopic(repos, topics, logger);
}
if (config.exclude) {
if (!!config.exclude.forks) {
repos = excludeForkedRepos(repos, logger);
}
if (!!config.exclude.archived) {
repos = excludeArchivedRepos(repos, logger);
}
if (config.exclude.projects) {
repos = excludeReposByName(repos, config.exclude.projects, logger);
}
if (config.exclude.topics) {
const topics = config.exclude.topics.map(topic => topic.toLowerCase());
repos = excludeReposByTopic(repos, topics, logger);
}
}
logger.debug(`Found ${repos.length} total repositories.`); logger.debug(`Found ${repos.length} total repositories.`);
if (config.revisions) {
if (config.revisions.branches) {
const branchGlobs = config.revisions.branches;
repos = await Promise.all(repos.map(async (repo) => {
try {
logger.debug(`Fetching branches for repo ${repo.name}...`);
let { durationMs, data } = await measure(() => api.Branches.all(repo.name));
logger.debug(`Found ${data.length} branches in repo ${repo.name} in ${durationMs}ms.`);
let branches = data.map((branch) => branch.name);
branches = micromatch.match(branches, branchGlobs);
return { return {
...repo, validRepos: repos,
branches, notFound,
}; };
} catch (e) {
logger.error(`Failed to fetch branches for repo ${repo.name}.`, e);
return repo;
}
}));
} }
if (config.revisions.tags) { export const shouldExcludeProject = ({
const tagGlobs = config.revisions.tags; project,
repos = await Promise.all(repos.map(async (repo) => { include,
try { exclude,
logger.debug(`Fetching tags for repo ${repo.name}...`); }: {
let { durationMs, data } = await measure(() => api.Tags.all(repo.name)); project: ProjectSchema,
logger.debug(`Found ${data.length} tags in repo ${repo.name} in ${durationMs}ms.`); include?: {
topics?: GitlabConnectionConfig['topics'],
},
exclude?: GitlabConnectionConfig['exclude'],
}) => {
const projectName = project.path_with_namespace;
let reason = '';
let tags = data.map((tag) => tag.name); const shouldExclude = (() => {
tags = micromatch.match(tags, tagGlobs); if (!!exclude?.archived && project.archived) {
reason = `\`exclude.archived\` is true`;
return { return true;
...repo,
tags,
};
} catch (e) {
logger.error(`Failed to fetch tags for repo ${repo.name}.`, e);
return repo;
} }
}));
if (!!exclude?.forks && project.forked_from_project !== undefined) {
reason = `\`exclude.forks\` is true`;
return true;
}
if (exclude?.projects) {
if (micromatch.isMatch(projectName, exclude.projects)) {
reason = `\`exclude.projects\` contains ${projectName}`;
return true;
} }
} }
return repos; if (include?.topics) {
const configTopics = include.topics.map(topic => topic.toLowerCase());
const projectTopics = project.topics ?? [];
const matchingTopics = projectTopics.filter((topic) => micromatch.isMatch(topic, configTopics));
if (matchingTopics.length === 0) {
reason = `\`include.topics\` does not match any of the following topics: ${configTopics.join(', ')}`;
return true;
}
}
if (exclude?.topics) {
const configTopics = exclude.topics.map(topic => topic.toLowerCase());
const projectTopics = project.topics ?? [];
const matchingTopics = projectTopics.filter((topic) => micromatch.isMatch(topic, configTopics));
if (matchingTopics.length > 0) {
reason = `\`exclude.topics\` matches the following topics: ${matchingTopics.join(', ')}`;
return true;
}
}
})();
if (shouldExclude) {
logger.debug(`Excluding project ${projectName}. Reason: ${reason}`);
return true;
}
return false;
} }

View file

@ -1,10 +1,40 @@
import "./instrument.js";
import * as Sentry from "@sentry/node";
import { ArgumentParser } from "argparse"; import { ArgumentParser } from "argparse";
import { existsSync } from 'fs'; import { existsSync } from 'fs';
import { mkdir } from 'fs/promises'; import { mkdir } from 'fs/promises';
import path from 'path'; import path from 'path';
import { isRemotePath } from "./utils.js";
import { AppContext } from "./types.js"; import { AppContext } from "./types.js";
import { main } from "./main.js" import { main } from "./main.js"
import { PrismaClient } from "@sourcebot/db";
// Register handler for normal exit
process.on('exit', (code) => {
console.log(`Process is exiting with code: ${code}`);
});
// Register handlers for abnormal terminations
process.on('SIGINT', () => {
console.log('Process interrupted (SIGINT)');
process.exit(130);
});
process.on('SIGTERM', () => {
console.log('Process terminated (SIGTERM)');
process.exit(143);
});
// Register handlers for uncaught exceptions and unhandled rejections
process.on('uncaughtException', (err) => {
console.log(`Uncaught exception: ${err.message}`);
process.exit(1);
});
process.on('unhandledRejection', (reason, promise) => {
console.log(`Unhandled rejection at: ${promise}, reason: ${reason}`);
process.exit(1);
});
const parser = new ArgumentParser({ const parser = new ArgumentParser({
@ -12,26 +42,15 @@ const parser = new ArgumentParser({
}); });
type Arguments = { type Arguments = {
configPath: string;
cacheDir: string; cacheDir: string;
} }
parser.add_argument("--configPath", {
help: "Path to config file",
required: true,
});
parser.add_argument("--cacheDir", { parser.add_argument("--cacheDir", {
help: "Path to .sourcebot cache directory", help: "Path to .sourcebot cache directory",
required: true, required: true,
}); });
const args = parser.parse_args() as Arguments; const args = parser.parse_args() as Arguments;
if (!isRemotePath(args.configPath) && !existsSync(args.configPath)) {
console.error(`Config file ${args.configPath} does not exist`);
process.exit(1);
}
const cacheDir = args.cacheDir; const cacheDir = args.cacheDir;
const reposPath = path.join(cacheDir, 'repos'); const reposPath = path.join(cacheDir, 'repos');
const indexPath = path.join(cacheDir, 'index'); const indexPath = path.join(cacheDir, 'index');
@ -47,9 +66,21 @@ const context: AppContext = {
indexPath, indexPath,
reposPath, reposPath,
cachePath: cacheDir, cachePath: cacheDir,
configPath: args.configPath,
} }
main(context).finally(() => { const prisma = new PrismaClient();
main(prisma, context)
.then(async () => {
await prisma.$disconnect();
})
.catch(async (e) => {
console.error(e);
Sentry.captureException(e);
await prisma.$disconnect();
process.exit(1);
})
.finally(() => {
console.log("Shutting down..."); console.log("Shutting down...");
}); });

View file

@ -0,0 +1,12 @@
import * as Sentry from "@sentry/node";
import { env } from "./env.js";
if (!!env.NEXT_PUBLIC_SENTRY_BACKEND_DSN && !!env.NEXT_PUBLIC_SENTRY_ENVIRONMENT) {
Sentry.init({
dsn: env.NEXT_PUBLIC_SENTRY_BACKEND_DSN,
release: env.NEXT_PUBLIC_SOURCEBOT_VERSION,
environment: env.NEXT_PUBLIC_SENTRY_ENVIRONMENT,
});
} else {
console.debug("Sentry was not initialized");
}

View file

@ -1,71 +0,0 @@
import { existsSync, FSWatcher, statSync, watch } from "fs";
import { createLogger } from "./logger.js";
import { LocalConfig } from "./schemas/v2.js";
import { AppContext, LocalRepository } from "./types.js";
import { resolvePathRelativeToConfig } from "./utils.js";
import path from "path";
const logger = createLogger('local');
const fileWatchers = new Map<string, FSWatcher>();
const abortControllers = new Map<string, AbortController>();
export const getLocalRepoFromConfig = (config: LocalConfig, ctx: AppContext) => {
const repoPath = resolvePathRelativeToConfig(config.path, ctx.configPath);
logger.debug(`Resolved path '${config.path}' to '${repoPath}'`);
if (!existsSync(repoPath)) {
throw new Error(`The local repository path '${repoPath}' referenced in ${ctx.configPath} does not exist`);
}
const stat = statSync(repoPath);
if (!stat.isDirectory()) {
throw new Error(`The local repository path '${repoPath}' referenced in ${ctx.configPath} is not a directory`);
}
const repo: LocalRepository = {
vcs: 'local',
name: path.basename(repoPath),
id: repoPath,
path: repoPath,
isStale: false,
excludedPaths: config.exclude?.paths ?? [],
watch: config.watch ?? true,
}
return repo;
}
export const initLocalRepoFileWatchers = (repos: LocalRepository[], onUpdate: (repo: LocalRepository, ac: AbortSignal) => Promise<void>) => {
// Close all existing watchers
fileWatchers.forEach((watcher) => {
watcher.close();
});
repos
.filter(repo => !repo.isStale && repo.watch)
.forEach((repo) => {
logger.info(`Watching local repository ${repo.id} for changes...`);
const watcher = watch(repo.path, async () => {
const existingController = abortControllers.get(repo.id);
if (existingController) {
existingController.abort();
}
const controller = new AbortController();
abortControllers.set(repo.id, controller);
try {
await onUpdate(repo, controller.signal);
} catch (err: any) {
if (err.name !== 'AbortError') {
logger.error(`Error while watching local repository ${repo.id} for changes:`);
console.log(err);
} else {
logger.debug(`Aborting watch for local repository ${repo.id} due to abort signal`);
}
}
});
fileWatchers.set(repo.id, watcher);
});
}

View file

@ -1,11 +1,14 @@
import winston, { format } from 'winston'; import winston, { format } from 'winston';
import { SOURCEBOT_LOG_LEVEL } from './environment.js'; import { Logtail } from '@logtail/node';
import { LogtailTransport } from '@logtail/winston';
import { env } from './env.js';
const { combine, colorize, timestamp, prettyPrint, errors, printf, label: labelFn } = format; const { combine, colorize, timestamp, prettyPrint, errors, printf, label: labelFn } = format;
const createLogger = (label: string) => { const createLogger = (label: string) => {
return winston.createLogger({ return winston.createLogger({
level: SOURCEBOT_LOG_LEVEL, level: env.SOURCEBOT_LOG_LEVEL,
format: combine( format: combine(
errors({ stack: true }), errors({ stack: true }),
timestamp(), timestamp(),
@ -28,6 +31,13 @@ const createLogger = (label: string) => {
}), }),
), ),
}), }),
...(env.LOGTAIL_TOKEN && env.LOGTAIL_HOST ? [
new LogtailTransport(
new Logtail(env.LOGTAIL_TOKEN, {
endpoint: env.LOGTAIL_HOST,
})
)
] : []),
] ]
}); });
} }

View file

@ -1,206 +0,0 @@
import { expect, test, vi } from 'vitest';
import { deleteStaleRepository, isAllRepoReindexingRequired, isRepoReindexingRequired } from './main';
import { AppContext, GitRepository, LocalRepository, Repository, Settings } from './types';
import { DEFAULT_DB_DATA } from './db';
import { createMockDB } from './db.test';
import { rm } from 'fs/promises';
import path from 'path';
import { glob } from 'glob';
vi.mock('fs/promises', () => ({
rm: vi.fn(),
}));
vi.mock('glob', () => ({
glob: vi.fn().mockReturnValue(['fake_index.zoekt']),
}));
vi.mock('fs', () => ({
existsSync: vi.fn().mockReturnValue(true),
}));
const createMockContext = (rootPath: string = '/app') => {
return {
configPath: path.join(rootPath, 'config.json'),
cachePath: path.join(rootPath, '.sourcebot'),
indexPath: path.join(rootPath, '.sourcebot/index'),
reposPath: path.join(rootPath, '.sourcebot/repos'),
} satisfies AppContext;
}
test('isRepoReindexingRequired should return false when no changes are made', () => {
const previous: Repository = {
vcs: 'git',
name: 'test',
id: 'test',
path: '',
cloneUrl: '',
isStale: false,
branches: ['main'],
tags: ['v1.0'],
};
const current = previous;
expect(isRepoReindexingRequired(previous, current)).toBe(false);
})
test('isRepoReindexingRequired should return true when git branches change', () => {
const previous: Repository = {
vcs: 'git',
name: 'test',
id: 'test',
path: '',
cloneUrl: '',
isStale: false,
branches: ['main'],
tags: ['v1.0'],
};
const current: Repository = {
...previous,
branches: ['main', 'feature']
};
expect(isRepoReindexingRequired(previous, current)).toBe(true);
});
test('isRepoReindexingRequired should return true when git tags change', () => {
const previous: Repository = {
vcs: 'git',
name: 'test',
id: 'test',
path: '',
cloneUrl: '',
isStale: false,
branches: ['main'],
tags: ['v1.0'],
};
const current: Repository = {
...previous,
tags: ['v1.0', 'v2.0']
};
expect(isRepoReindexingRequired(previous, current)).toBe(true);
});
test('isRepoReindexingRequired should return true when local excludedPaths change', () => {
const previous: Repository = {
vcs: 'local',
name: 'test',
id: 'test',
path: '/',
isStale: false,
excludedPaths: ['node_modules'],
watch: false,
};
const current: Repository = {
...previous,
excludedPaths: ['node_modules', 'dist']
};
expect(isRepoReindexingRequired(previous, current)).toBe(true);
});
test('isAllRepoReindexingRequired should return false when fileLimitSize has not changed', () => {
const previous: Settings = {
maxFileSize: 1000,
autoDeleteStaleRepos: true,
}
const current: Settings = {
...previous,
}
expect(isAllRepoReindexingRequired(previous, current)).toBe(false);
});
test('isAllRepoReindexingRequired should return true when fileLimitSize has changed', () => {
const previous: Settings = {
maxFileSize: 1000,
autoDeleteStaleRepos: true,
}
const current: Settings = {
...previous,
maxFileSize: 2000,
}
expect(isAllRepoReindexingRequired(previous, current)).toBe(true);
});
test('isAllRepoReindexingRequired should return false when autoDeleteStaleRepos has changed', () => {
const previous: Settings = {
maxFileSize: 1000,
autoDeleteStaleRepos: true,
}
const current: Settings = {
...previous,
autoDeleteStaleRepos: false,
}
expect(isAllRepoReindexingRequired(previous, current)).toBe(false);
});
test('deleteStaleRepository can delete a git repository', async () => {
const ctx = createMockContext();
const repo: GitRepository = {
id: 'github.com/sourcebot-dev/sourcebot',
vcs: 'git',
name: 'sourcebot',
cloneUrl: 'https://github.com/sourcebot-dev/sourcebot',
path: `${ctx.reposPath}/github.com/sourcebot-dev/sourcebot`,
branches: ['main'],
tags: [''],
isStale: true,
}
const db = createMockDB({
...DEFAULT_DB_DATA,
repos: {
'github.com/sourcebot-dev/sourcebot': repo,
}
});
await deleteStaleRepository(repo, db, ctx);
expect(db.data.repos['github.com/sourcebot-dev/sourcebot']).toBeUndefined();
expect(rm).toHaveBeenCalledWith(`${ctx.reposPath}/github.com/sourcebot-dev/sourcebot`, {
recursive: true,
});
expect(glob).toHaveBeenCalledWith(`github.com%2Fsourcebot-dev%2Fsourcebot*.zoekt`, {
cwd: ctx.indexPath,
absolute: true
});
expect(rm).toHaveBeenCalledWith(`fake_index.zoekt`);
});
test('deleteStaleRepository can delete a local repository', async () => {
const ctx = createMockContext();
const repo: LocalRepository = {
vcs: 'local',
name: 'UnrealEngine',
id: '/path/to/UnrealEngine',
path: '/path/to/UnrealEngine',
watch: false,
excludedPaths: [],
isStale: true,
}
const db = createMockDB({
...DEFAULT_DB_DATA,
repos: {
'/path/to/UnrealEngine': repo,
}
});
await deleteStaleRepository(repo, db, ctx);
expect(db.data.repos['/path/to/UnrealEngine']).toBeUndefined();
expect(rm).not.toHaveBeenCalledWith('/path/to/UnrealEngine');
expect(glob).toHaveBeenCalledWith(`UnrealEngine*.zoekt`, {
cwd: ctx.indexPath,
absolute: true
});
expect(rm).toHaveBeenCalledWith('fake_index.zoekt');
});

View file

@ -1,414 +1,72 @@
import { readFile, rm } from 'fs/promises'; import { PrismaClient } from '@sourcebot/db';
import { existsSync, watch } from 'fs';
import { SourcebotConfigurationSchema } from "./schemas/v2.js";
import { getGitHubReposFromConfig } from "./github.js";
import { getGitLabReposFromConfig } from "./gitlab.js";
import { getGiteaReposFromConfig } from "./gitea.js";
import { getGerritReposFromConfig } from "./gerrit.js";
import { AppContext, LocalRepository, GitRepository, Repository, Settings } from "./types.js";
import { cloneRepository, fetchRepository, getGitRepoFromConfig } from "./git.js";
import { createLogger } from "./logger.js"; import { createLogger } from "./logger.js";
import { createRepository, Database, loadDB, updateRepository, updateSettings } from './db.js'; import { AppContext } from "./types.js";
import { arraysEqualShallow, isRemotePath, measure } from "./utils.js"; import { DEFAULT_SETTINGS } from './constants.js';
import { DEFAULT_SETTINGS } from "./constants.js"; import { Redis } from 'ioredis';
import { ConnectionManager } from './connectionManager.js';
import { RepoManager } from './repoManager.js';
import { env } from './env.js';
import { PromClient } from './promClient.js';
import { isRemotePath } from './utils.js';
import { readFile } from 'fs/promises';
import stripJsonComments from 'strip-json-comments'; import stripJsonComments from 'strip-json-comments';
import { indexGitRepository, indexLocalRepository } from "./zoekt.js"; import { SourcebotConfig } from '@sourcebot/schemas/v3/index.type';
import { getLocalRepoFromConfig, initLocalRepoFileWatchers } from "./local.js"; import { indexSchema } from '@sourcebot/schemas/v3/index.schema';
import { captureEvent } from "./posthog.js"; import { Ajv } from "ajv";
import { glob } from 'glob';
import path from 'path';
const logger = createLogger('main'); const logger = createLogger('main');
const ajv = new Ajv({
const syncGitRepository = async (repo: GitRepository, settings: Settings, ctx: AppContext) => { validateFormats: false,
let fetchDuration_s: number | undefined = undefined;
let cloneDuration_s: number | undefined = undefined;
if (existsSync(repo.path)) {
logger.info(`Fetching ${repo.id}...`);
const { durationMs } = await measure(() => fetchRepository(repo, ({ method, stage , progress}) => {
logger.info(`git.${method} ${stage} stage ${progress}% complete for ${repo.id}`)
}));
fetchDuration_s = durationMs / 1000;
process.stdout.write('\n');
logger.info(`Fetched ${repo.id} in ${fetchDuration_s}s`);
} else {
logger.info(`Cloning ${repo.id}...`);
const { durationMs } = await measure(() => cloneRepository(repo, ({ method, stage, progress }) => {
logger.info(`git.${method} ${stage} stage ${progress}% complete for ${repo.id}`)
}));
cloneDuration_s = durationMs / 1000;
process.stdout.write('\n');
logger.info(`Cloned ${repo.id} in ${cloneDuration_s}s`);
}
logger.info(`Indexing ${repo.id}...`);
const { durationMs } = await measure(() => indexGitRepository(repo, settings, ctx));
const indexDuration_s = durationMs / 1000;
logger.info(`Indexed ${repo.id} in ${indexDuration_s}s`);
return {
fetchDuration_s,
cloneDuration_s,
indexDuration_s,
}
}
const syncLocalRepository = async (repo: LocalRepository, settings: Settings, ctx: AppContext, signal?: AbortSignal) => {
logger.info(`Indexing ${repo.id}...`);
const { durationMs } = await measure(() => indexLocalRepository(repo, settings, ctx, signal));
const indexDuration_s = durationMs / 1000;
logger.info(`Indexed ${repo.id} in ${indexDuration_s}s`);
return {
indexDuration_s,
}
}
export const deleteStaleRepository = async (repo: Repository, db: Database, ctx: AppContext) => {
logger.info(`Deleting stale repository ${repo.id}:`);
// Delete the checked out git repository (if applicable)
if (repo.vcs === "git" && existsSync(repo.path)) {
logger.info(`\tDeleting git directory ${repo.path}...`);
await rm(repo.path, {
recursive: true,
});
}
// Delete all .zoekt index files
{
// .zoekt index files are named with the repository name,
// index version, and shard number. Some examples:
//
// git repos:
// github.com%2Fsourcebot-dev%2Fsourcebot_v16.00000.zoekt
// gitlab.com%2Fmy-org%2Fmy-project.00000.zoekt
//
// local repos:
// UnrealEngine_v16.00000.zoekt
// UnrealEngine_v16.00001.zoekt
// ...
// UnrealEngine_v16.00016.zoekt
//
// Notice that local repos are named with the repository basename and
// git repos are named with the query-encoded repository name. Form a
// glob pattern with the correct prefix & suffix to match the correct
// index file(s) for the repository.
//
// @see : https://github.com/sourcegraph/zoekt/blob/c03b77fbf18b76904c0e061f10f46597eedd7b14/build/builder.go#L348
const indexFilesGlobPattern = (() => {
switch (repo.vcs) {
case 'git':
return `${encodeURIComponent(repo.id)}*.zoekt`;
case 'local':
return `${path.basename(repo.path)}*.zoekt`;
}
})();
const indexFiles = await glob(indexFilesGlobPattern, {
cwd: ctx.indexPath,
absolute: true
}); });
await Promise.all(indexFiles.map((file) => { const getSettings = async (configPath?: string) => {
if (!existsSync(file)) { if (!configPath) {
return; return DEFAULT_SETTINGS;
} }
logger.info(`\tDeleting index file ${file}...`);
return rm(file);
}));
}
// Delete db entry
logger.info(`\tDeleting db entry...`);
await db.update(({ repos }) => {
delete repos[repo.id];
});
logger.info(`Deleted stale repository ${repo.id}`);
captureEvent('repo_deleted', {
vcs: repo.vcs,
codeHost: repo.codeHost,
})
}
/**
* Certain configuration changes (e.g., a branch is added) require
* a reindexing of the repository.
*/
export const isRepoReindexingRequired = (previous: Repository, current: Repository) => {
/**
* Checks if the any of the `revisions` properties have changed.
*/
const isRevisionsChanged = () => {
if (previous.vcs !== 'git' || current.vcs !== 'git') {
return false;
}
return (
!arraysEqualShallow(previous.branches, current.branches) ||
!arraysEqualShallow(previous.tags, current.tags)
);
}
/**
* Check if the `exclude.paths` property has changed.
*/
const isExcludePathsChanged = () => {
if (previous.vcs !== 'local' || current.vcs !== 'local') {
return false;
}
return !arraysEqualShallow(previous.excludedPaths, current.excludedPaths);
}
return (
isRevisionsChanged() ||
isExcludePathsChanged()
)
}
/**
* Certain settings changes (e.g., the file limit size is changed) require
* a reindexing of _all_ repositories.
*/
export const isAllRepoReindexingRequired = (previous: Settings, current: Settings) => {
return (
previous?.maxFileSize !== current?.maxFileSize
)
}
const syncConfig = async (configPath: string, db: Database, signal: AbortSignal, ctx: AppContext) => {
const configContent = await (async () => { const configContent = await (async () => {
if (isRemotePath(configPath)) { if (isRemotePath(configPath)) {
const response = await fetch(configPath, { const response = await fetch(configPath);
signal,
});
if (!response.ok) { if (!response.ok) {
throw new Error(`Failed to fetch config file ${configPath}: ${response.statusText}`); throw new Error(`Failed to fetch config file ${configPath}: ${response.statusText}`);
} }
return response.text(); return response.text();
} else { } else {
return readFile(configPath, { return readFile(configPath, { encoding: 'utf-8' });
encoding: 'utf-8',
signal,
});
} }
})(); })();
// @todo: we should validate the configuration file's structure here. const config = JSON.parse(stripJsonComments(configContent)) as SourcebotConfig;
const config = JSON.parse(stripJsonComments(configContent)) as SourcebotConfigurationSchema; const isValidConfig = ajv.validate(indexSchema, config);
if (!isValidConfig) {
throw new Error(`Config file '${configPath}' is invalid: ${ajv.errorsText(ajv.errors)}`);
}
// Update the settings return {
const updatedSettings: Settings = { ...DEFAULT_SETTINGS,
maxFileSize: config.settings?.maxFileSize ?? DEFAULT_SETTINGS.maxFileSize, ...config.settings,
maxTrigramCount: config.settings?.maxTrigramCount ?? DEFAULT_SETTINGS.maxTrigramCount,
autoDeleteStaleRepos: config.settings?.autoDeleteStaleRepos ?? DEFAULT_SETTINGS.autoDeleteStaleRepos,
reindexInterval: config.settings?.reindexInterval ?? DEFAULT_SETTINGS.reindexInterval,
resyncInterval: config.settings?.resyncInterval ?? DEFAULT_SETTINGS.resyncInterval,
}
const _isAllRepoReindexingRequired = isAllRepoReindexingRequired(db.data.settings, updatedSettings);
await updateSettings(updatedSettings, db);
// Fetch all repositories from the config file
let configRepos: Repository[] = [];
for (const repoConfig of config.repos ?? []) {
switch (repoConfig.type) {
case 'github': {
const gitHubRepos = await getGitHubReposFromConfig(repoConfig, signal, ctx);
configRepos.push(...gitHubRepos);
break;
}
case 'gitlab': {
const gitLabRepos = await getGitLabReposFromConfig(repoConfig, ctx);
configRepos.push(...gitLabRepos);
break;
}
case 'gitea': {
const giteaRepos = await getGiteaReposFromConfig(repoConfig, ctx);
configRepos.push(...giteaRepos);
break;
}
case 'gerrit': {
const gerritRepos = await getGerritReposFromConfig(repoConfig, ctx);
configRepos.push(...gerritRepos);
break;
}
case 'local': {
const repo = getLocalRepoFromConfig(repoConfig, ctx);
configRepos.push(repo);
break;
}
case 'git': {
const gitRepo = await getGitRepoFromConfig(repoConfig, ctx);
gitRepo && configRepos.push(gitRepo);
break;
}
} }
} }
// De-duplicate on id export const main = async (db: PrismaClient, context: AppContext) => {
configRepos.sort((a, b) => { const redis = new Redis(env.REDIS_URL, {
return a.id.localeCompare(b.id); maxRetriesPerRequest: null
}); });
configRepos = configRepos.filter((item, index, self) => { redis.ping().then(() => {
if (index === 0) return true; logger.info('Connected to redis');
if (item.id === self[index - 1].id) { }).catch((err: unknown) => {
logger.debug(`Duplicate repository ${item.id} found in config file.`); logger.error('Failed to connect to redis');
return false; console.error(err);
} process.exit(1);
return true;
}); });
logger.info(`Discovered ${configRepos.length} unique repositories from config.`); const settings = await getSettings(env.CONFIG_PATH);
// Merge the repositories into the database const promClient = new PromClient();
for (const newRepo of configRepos) {
if (newRepo.id in db.data.repos) { const connectionManager = new ConnectionManager(db, settings, redis);
const existingRepo = db.data.repos[newRepo.id]; connectionManager.registerPollingCallback();
const isReindexingRequired = _isAllRepoReindexingRequired || isRepoReindexingRequired(existingRepo, newRepo);
if (isReindexingRequired) { const repoManager = new RepoManager(db, settings, redis, promClient, context);
logger.info(`Marking ${newRepo.id} for reindexing due to configuration change.`); await repoManager.blockingPollLoop();
}
await updateRepository(existingRepo.id, {
...newRepo,
...(isReindexingRequired ? {
lastIndexedDate: undefined,
}: {})
}, db);
} else {
await createRepository(newRepo, db);
captureEvent("repo_created", {
vcs: newRepo.vcs,
codeHost: newRepo.codeHost,
});
}
}
// Find repositories that are in the database, but not in the configuration file
{
const a = configRepos.map(repo => repo.id);
const b = Object.keys(db.data.repos);
const diff = b.filter(x => !a.includes(x));
for (const id of diff) {
await db.update(({ repos }) => {
const repo = repos[id];
if (repo.isStale) {
return;
}
logger.warn(`Repository ${id} is no longer listed in the configuration file or was not found. Marking as stale.`);
repo.isStale = true;
});
}
}
}
export const main = async (context: AppContext) => {
const db = await loadDB(context);
let abortController = new AbortController();
let isSyncing = false;
const _syncConfig = async () => {
if (isSyncing) {
abortController.abort();
abortController = new AbortController();
}
logger.info(`Syncing configuration file ${context.configPath} ...`);
isSyncing = true;
try {
const { durationMs } = await measure(() => syncConfig(context.configPath, db, abortController.signal, context))
logger.info(`Synced configuration file ${context.configPath} in ${durationMs / 1000}s`);
isSyncing = false;
} catch (err: any) {
if (err.name === "AbortError") {
// @note: If we're aborting, we don't want to set isSyncing to false
// since it implies another sync is in progress.
} else {
isSyncing = false;
logger.error(`Failed to sync configuration file ${context.configPath} with error:`);
console.log(err);
}
}
const localRepos = Object.values(db.data.repos).filter(repo => repo.vcs === 'local');
initLocalRepoFileWatchers(localRepos, async (repo, signal) => {
logger.info(`Change detected to local repository ${repo.id}. Re-syncing...`);
await syncLocalRepository(repo, db.data.settings, context, signal);
await db.update(({ repos }) => repos[repo.id].lastIndexedDate = new Date().toUTCString());
});
}
// Re-sync on file changes if the config file is local
if (!isRemotePath(context.configPath)) {
watch(context.configPath, () => {
logger.info(`Config file ${context.configPath} changed. Re-syncing...`);
_syncConfig();
});
}
// Re-sync at a fixed interval
setInterval(() => {
_syncConfig();
}, db.data.settings.resyncInterval);
// Sync immediately on startup
await _syncConfig();
while (true) {
const repos = db.data.repos;
for (const [_, repo] of Object.entries(repos)) {
const lastIndexed = repo.lastIndexedDate ? new Date(repo.lastIndexedDate) : new Date(0);
if (repo.isStale) {
if (db.data.settings.autoDeleteStaleRepos) {
await deleteStaleRepository(repo, db, context);
} else {
// skip deletion...
}
continue;
}
if (lastIndexed.getTime() > (Date.now() - db.data.settings.reindexInterval)) {
continue;
}
try {
let indexDuration_s: number | undefined;
let fetchDuration_s: number | undefined;
let cloneDuration_s: number | undefined;
if (repo.vcs === 'git') {
const stats = await syncGitRepository(repo, db.data.settings, context);
indexDuration_s = stats.indexDuration_s;
fetchDuration_s = stats.fetchDuration_s;
cloneDuration_s = stats.cloneDuration_s;
} else if (repo.vcs === 'local') {
const stats = await syncLocalRepository(repo, db.data.settings, context);
indexDuration_s = stats.indexDuration_s;
}
} catch (err: any) {
// @todo : better error handling here..
logger.error(err);
continue;
}
await db.update(({ repos }) => repos[repo.id].lastIndexedDate = new Date().toUTCString());
}
await new Promise(resolve => setTimeout(resolve, 1000));
}
} }

View file

@ -1,29 +1,29 @@
import { PostHog } from 'posthog-node'; import { PostHog } from 'posthog-node';
import { PosthogEvent, PosthogEventMap } from './posthogEvents.js'; import { PosthogEvent, PosthogEventMap } from './posthogEvents.js';
import { POSTHOG_HOST, POSTHOG_PAPIK, SOURCEBOT_INSTALL_ID, SOURCEBOT_TELEMETRY_DISABLED, SOURCEBOT_VERSION } from './environment.js'; import { env } from './env.js';
let posthog: PostHog | undefined = undefined; let posthog: PostHog | undefined = undefined;
if (POSTHOG_PAPIK) { if (env.NEXT_PUBLIC_POSTHOG_PAPIK) {
posthog = new PostHog( posthog = new PostHog(
POSTHOG_PAPIK, env.NEXT_PUBLIC_POSTHOG_PAPIK,
{ {
host: POSTHOG_HOST, host: "https://us.i.posthog.com",
} }
); );
} }
export function captureEvent<E extends PosthogEvent>(event: E, properties: PosthogEventMap[E]) { export function captureEvent<E extends PosthogEvent>(event: E, properties: PosthogEventMap[E]) {
if (SOURCEBOT_TELEMETRY_DISABLED) { if (env.SOURCEBOT_TELEMETRY_DISABLED === 'true') {
return; return;
} }
posthog?.capture({ posthog?.capture({
distinctId: SOURCEBOT_INSTALL_ID, distinctId: env.SOURCEBOT_INSTALL_ID,
event: event, event: event,
properties: { properties: {
...properties, ...properties,
sourcebot_version: SOURCEBOT_VERSION, sourcebot_version: env.NEXT_PUBLIC_SOURCEBOT_VERSION,
}, },
}); });
} }

View file

@ -5,17 +5,24 @@ export type PosthogEventMap = {
vcs: string; vcs: string;
codeHost?: string; codeHost?: string;
}, },
repo_synced: {
vcs: string;
codeHost?: string;
fetchDuration_s?: number;
cloneDuration_s?: number;
indexDuration_s?: number;
},
repo_deleted: { repo_deleted: {
vcs: string; vcs: string;
codeHost?: string; codeHost?: string;
} },
//////////////////////////////////////////////////////////////////
backend_connection_sync_job_failed: {
connectionId: number,
error: string,
},
backend_connection_sync_job_completed: {
connectionId: number,
repoCount: number,
},
backend_revisions_truncated: {
repoId: number,
revisionCount: number,
},
//////////////////////////////////////////////////////////////////
} }
export type PosthogEvent = keyof PosthogEventMap; export type PosthogEvent = keyof PosthogEventMap;

View file

@ -0,0 +1,106 @@
import express, { Request, Response } from 'express';
import client, { Registry, Counter, Gauge } from 'prom-client';
export class PromClient {
private registry: Registry;
private app: express.Application;
public activeRepoIndexingJobs: Gauge<string>;
public pendingRepoIndexingJobs: Gauge<string>;
public repoIndexingReattemptsTotal: Counter<string>;
public repoIndexingFailTotal: Counter<string>;
public repoIndexingSuccessTotal: Counter<string>;
public activeRepoGarbageCollectionJobs: Gauge<string>;
public repoGarbageCollectionErrorTotal: Counter<string>;
public repoGarbageCollectionFailTotal: Counter<string>;
public repoGarbageCollectionSuccessTotal: Counter<string>;
public readonly PORT = 3060;
constructor() {
this.registry = new Registry();
this.activeRepoIndexingJobs = new Gauge({
name: 'active_repo_indexing_jobs',
help: 'The number of repo indexing jobs in progress',
labelNames: ['repo'],
});
this.registry.registerMetric(this.activeRepoIndexingJobs);
this.pendingRepoIndexingJobs = new Gauge({
name: 'pending_repo_indexing_jobs',
help: 'The number of repo indexing jobs waiting in queue',
labelNames: ['repo'],
});
this.registry.registerMetric(this.pendingRepoIndexingJobs);
this.repoIndexingReattemptsTotal = new Counter({
name: 'repo_indexing_reattempts',
help: 'The number of repo indexing reattempts',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoIndexingReattemptsTotal);
this.repoIndexingFailTotal = new Counter({
name: 'repo_indexing_fails',
help: 'The number of repo indexing fails',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoIndexingFailTotal);
this.repoIndexingSuccessTotal = new Counter({
name: 'repo_indexing_successes',
help: 'The number of repo indexing successes',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoIndexingSuccessTotal);
this.activeRepoGarbageCollectionJobs = new Gauge({
name: 'active_repo_garbage_collection_jobs',
help: 'The number of repo garbage collection jobs in progress',
labelNames: ['repo'],
});
this.registry.registerMetric(this.activeRepoGarbageCollectionJobs);
this.repoGarbageCollectionErrorTotal = new Counter({
name: 'repo_garbage_collection_errors',
help: 'The number of repo garbage collection errors',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoGarbageCollectionErrorTotal);
this.repoGarbageCollectionFailTotal = new Counter({
name: 'repo_garbage_collection_fails',
help: 'The number of repo garbage collection fails',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoGarbageCollectionFailTotal);
this.repoGarbageCollectionSuccessTotal = new Counter({
name: 'repo_garbage_collection_successes',
help: 'The number of repo garbage collection successes',
labelNames: ['repo'],
});
this.registry.registerMetric(this.repoGarbageCollectionSuccessTotal);
client.collectDefaultMetrics({
register: this.registry,
});
this.app = express();
this.app.get('/metrics', async (req: Request, res: Response) => {
res.set('Content-Type', this.registry.contentType);
const metrics = await this.registry.metrics();
res.end(metrics);
});
this.app.listen(this.PORT, () => {
console.log(`Prometheus metrics server is running on port ${this.PORT}`);
});
}
getRegistry(): Registry {
return this.registry;
}
}

View file

@ -0,0 +1,279 @@
import { GithubConnectionConfig } from '@sourcebot/schemas/v3/github.type';
import { getGitHubReposFromConfig } from "./github.js";
import { getGitLabReposFromConfig } from "./gitlab.js";
import { getGiteaReposFromConfig } from "./gitea.js";
import { getGerritReposFromConfig } from "./gerrit.js";
import { Prisma, PrismaClient } from '@sourcebot/db';
import { WithRequired } from "./types.js"
import { marshalBool } from "./utils.js";
import { GerritConnectionConfig, GiteaConnectionConfig, GitlabConnectionConfig } from '@sourcebot/schemas/v3/connection.type';
import { RepoMetadata } from './types.js';
export type RepoData = WithRequired<Prisma.RepoCreateInput, 'connections'>;
export const compileGithubConfig = async (
config: GithubConnectionConfig,
connectionId: number,
orgId: number,
db: PrismaClient,
abortController: AbortController): Promise<{
repoData: RepoData[],
notFound: {
users: string[],
orgs: string[],
repos: string[],
}
}> => {
const gitHubReposResult = await getGitHubReposFromConfig(config, orgId, db, abortController.signal);
const gitHubRepos = gitHubReposResult.validRepos;
const notFound = gitHubReposResult.notFound;
const hostUrl = config.url ?? 'https://github.com';
const hostname = new URL(hostUrl).hostname;
const repos = gitHubRepos.map((repo) => {
const repoName = `${hostname}/${repo.full_name}`;
const cloneUrl = new URL(repo.clone_url!);
const record: RepoData = {
external_id: repo.id.toString(),
external_codeHostType: 'github',
external_codeHostUrl: hostUrl,
cloneUrl: cloneUrl.toString(),
webUrl: repo.html_url,
name: repoName,
imageUrl: repo.owner.avatar_url,
isFork: repo.fork,
isArchived: !!repo.archived,
org: {
connect: {
id: orgId,
},
},
connections: {
create: {
connectionId: connectionId,
}
},
metadata: {
gitConfig: {
'zoekt.web-url-type': 'github',
'zoekt.web-url': repo.html_url,
'zoekt.name': repoName,
'zoekt.github-stars': (repo.stargazers_count ?? 0).toString(),
'zoekt.github-watchers': (repo.watchers_count ?? 0).toString(),
'zoekt.github-subscribers': (repo.subscribers_count ?? 0).toString(),
'zoekt.github-forks': (repo.forks_count ?? 0).toString(),
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork),
'zoekt.public': marshalBool(repo.private === false),
},
branches: config.revisions?.branches ?? undefined,
tags: config.revisions?.tags ?? undefined,
} satisfies RepoMetadata,
};
return record;
})
return {
repoData: repos,
notFound,
};
}
export const compileGitlabConfig = async (
config: GitlabConnectionConfig,
connectionId: number,
orgId: number,
db: PrismaClient) => {
const gitlabReposResult = await getGitLabReposFromConfig(config, orgId, db);
const gitlabRepos = gitlabReposResult.validRepos;
const notFound = gitlabReposResult.notFound;
const hostUrl = config.url ?? 'https://gitlab.com';
const hostname = new URL(hostUrl).hostname;
const repos = gitlabRepos.map((project) => {
const projectUrl = `${hostUrl}/${project.path_with_namespace}`;
const cloneUrl = new URL(project.http_url_to_repo);
const isFork = project.forked_from_project !== undefined;
const repoName = `${hostname}/${project.path_with_namespace}`;
const record: RepoData = {
external_id: project.id.toString(),
external_codeHostType: 'gitlab',
external_codeHostUrl: hostUrl,
cloneUrl: cloneUrl.toString(),
webUrl: projectUrl,
name: repoName,
imageUrl: project.avatar_url,
isFork: isFork,
isArchived: !!project.archived,
org: {
connect: {
id: orgId,
},
},
connections: {
create: {
connectionId: connectionId,
}
},
metadata: {
gitConfig: {
'zoekt.web-url-type': 'gitlab',
'zoekt.web-url': projectUrl,
'zoekt.name': repoName,
'zoekt.gitlab-stars': (project.stargazers_count ?? 0).toString(),
'zoekt.gitlab-forks': (project.forks_count ?? 0).toString(),
'zoekt.archived': marshalBool(project.archived),
'zoekt.fork': marshalBool(isFork),
'zoekt.public': marshalBool(project.private === false)
},
branches: config.revisions?.branches ?? undefined,
tags: config.revisions?.tags ?? undefined,
} satisfies RepoMetadata,
};
return record;
})
return {
repoData: repos,
notFound,
};
}
export const compileGiteaConfig = async (
config: GiteaConnectionConfig,
connectionId: number,
orgId: number,
db: PrismaClient) => {
const giteaReposResult = await getGiteaReposFromConfig(config, orgId, db);
const giteaRepos = giteaReposResult.validRepos;
const notFound = giteaReposResult.notFound;
const hostUrl = config.url ?? 'https://gitea.com';
const hostname = new URL(hostUrl).hostname;
const repos = giteaRepos.map((repo) => {
const cloneUrl = new URL(repo.clone_url!);
const repoName = `${hostname}/${repo.full_name!}`;
const record: RepoData = {
external_id: repo.id!.toString(),
external_codeHostType: 'gitea',
external_codeHostUrl: hostUrl,
cloneUrl: cloneUrl.toString(),
webUrl: repo.html_url,
name: repoName,
imageUrl: repo.owner?.avatar_url,
isFork: repo.fork!,
isArchived: !!repo.archived,
org: {
connect: {
id: orgId,
},
},
connections: {
create: {
connectionId: connectionId,
}
},
metadata: {
gitConfig: {
'zoekt.web-url-type': 'gitea',
'zoekt.web-url': repo.html_url!,
'zoekt.name': repoName,
'zoekt.archived': marshalBool(repo.archived),
'zoekt.fork': marshalBool(repo.fork!),
'zoekt.public': marshalBool(repo.internal === false && repo.private === false),
},
branches: config.revisions?.branches ?? undefined,
tags: config.revisions?.tags ?? undefined,
} satisfies RepoMetadata,
};
return record;
})
return {
repoData: repos,
notFound,
};
}
export const compileGerritConfig = async (
config: GerritConnectionConfig,
connectionId: number,
orgId: number) => {
const gerritRepos = await getGerritReposFromConfig(config);
const hostUrl = (config.url ?? 'https://gerritcodereview.com').replace(/\/$/, ''); // Remove trailing slash
const hostname = new URL(hostUrl).hostname;
const repos = gerritRepos.map((project) => {
const repoId = `${hostname}/${project.name}`;
const cloneUrl = new URL(`${config.url}/${encodeURIComponent(project.name)}`);
let webUrl = "https://www.gerritcodereview.com/";
// Gerrit projects can have multiple web links; use the first one
if (project.web_links) {
const webLink = project.web_links[0];
if (webLink) {
webUrl = webLink.url;
}
}
// Handle case where webUrl is just a gitiles path
// https://github.com/GerritCodeReview/plugins_gitiles/blob/5ee7f57/src/main/java/com/googlesource/gerrit/plugins/gitiles/GitilesWeblinks.java#L50
if (webUrl.startsWith('/plugins/gitiles/')) {
webUrl = `${hostUrl}${webUrl}`;
}
const record: RepoData = {
external_id: project.id.toString(),
external_codeHostType: 'gerrit',
external_codeHostUrl: hostUrl,
cloneUrl: cloneUrl.toString(),
webUrl: webUrl,
name: project.name,
isFork: false,
isArchived: false,
org: {
connect: {
id: orgId,
},
},
connections: {
create: {
connectionId: connectionId,
}
},
metadata: {
gitConfig: {
'zoekt.web-url-type': 'gitiles',
'zoekt.web-url': webUrl,
'zoekt.name': repoId,
'zoekt.archived': marshalBool(false),
'zoekt.fork': marshalBool(false),
'zoekt.public': marshalBool(true),
},
} satisfies RepoMetadata,
};
return record;
})
return {
repoData: repos,
notFound: {
users: [],
orgs: [],
repos: [],
}
};
}

View file

@ -0,0 +1,546 @@
import { Job, Queue, Worker } from 'bullmq';
import { Redis } from 'ioredis';
import { createLogger } from "./logger.js";
import { Connection, PrismaClient, Repo, RepoToConnection, RepoIndexingStatus, StripeSubscriptionStatus } from "@sourcebot/db";
import { GithubConnectionConfig, GitlabConnectionConfig, GiteaConnectionConfig } from '@sourcebot/schemas/v3/connection.type';
import { AppContext, Settings, RepoMetadata } from "./types.js";
import { getRepoPath, getTokenFromConfig, measure, getShardPrefix } from "./utils.js";
import { cloneRepository, fetchRepository } from "./git.js";
import { existsSync, readdirSync, promises } from 'fs';
import { indexGitRepository } from "./zoekt.js";
import { PromClient } from './promClient.js';
import * as Sentry from "@sentry/node";
interface IRepoManager {
blockingPollLoop: () => void;
dispose: () => void;
}
const REPO_INDEXING_QUEUE = 'repoIndexingQueue';
const REPO_GC_QUEUE = 'repoGarbageCollectionQueue';
type RepoWithConnections = Repo & { connections: (RepoToConnection & { connection: Connection })[] };
type RepoIndexingPayload = {
repo: RepoWithConnections,
}
type RepoGarbageCollectionPayload = {
repo: Repo,
}
export class RepoManager implements IRepoManager {
private indexWorker: Worker;
private indexQueue: Queue<RepoIndexingPayload>;
private gcWorker: Worker;
private gcQueue: Queue<RepoGarbageCollectionPayload>;
private logger = createLogger('RepoManager');
constructor(
private db: PrismaClient,
private settings: Settings,
redis: Redis,
private promClient: PromClient,
private ctx: AppContext,
) {
// Repo indexing
this.indexQueue = new Queue<RepoIndexingPayload>(REPO_INDEXING_QUEUE, {
connection: redis,
});
this.indexWorker = new Worker(REPO_INDEXING_QUEUE, this.runIndexJob.bind(this), {
connection: redis,
concurrency: this.settings.maxRepoIndexingJobConcurrency,
});
this.indexWorker.on('completed', this.onIndexJobCompleted.bind(this));
this.indexWorker.on('failed', this.onIndexJobFailed.bind(this));
// Garbage collection
this.gcQueue = new Queue<RepoGarbageCollectionPayload>(REPO_GC_QUEUE, {
connection: redis,
});
this.gcWorker = new Worker(REPO_GC_QUEUE, this.runGarbageCollectionJob.bind(this), {
connection: redis,
concurrency: this.settings.maxRepoGarbageCollectionJobConcurrency,
});
this.gcWorker.on('completed', this.onGarbageCollectionJobCompleted.bind(this));
this.gcWorker.on('failed', this.onGarbageCollectionJobFailed.bind(this));
}
public async blockingPollLoop() {
while (true) {
await this.fetchAndScheduleRepoIndexing();
await this.fetchAndScheduleRepoGarbageCollection();
await this.fetchAndScheduleRepoTimeouts();
await new Promise(resolve => setTimeout(resolve, this.settings.reindexRepoPollingIntervalMs));
}
}
///////////////////////////
// Repo indexing
///////////////////////////
private async scheduleRepoIndexingBulk(repos: RepoWithConnections[]) {
await this.db.$transaction(async (tx) => {
await tx.repo.updateMany({
where: { id: { in: repos.map(repo => repo.id) } },
data: { repoIndexingStatus: RepoIndexingStatus.IN_INDEX_QUEUE }
});
const reposByOrg = repos.reduce<Record<number, RepoWithConnections[]>>((acc, repo) => {
if (!acc[repo.orgId]) {
acc[repo.orgId] = [];
}
acc[repo.orgId].push(repo);
return acc;
}, {});
for (const orgId in reposByOrg) {
const orgRepos = reposByOrg[orgId];
// Set priority based on number of repos (more repos = lower priority)
// This helps prevent large orgs from overwhelming the indexQueue
const priority = Math.min(Math.ceil(orgRepos.length / 10), 2097152);
await this.indexQueue.addBulk(orgRepos.map(repo => ({
name: 'repoIndexJob',
data: { repo },
opts: {
priority: priority
}
})));
// Increment pending jobs counter for each repo added
orgRepos.forEach(repo => {
this.promClient.pendingRepoIndexingJobs.inc({ repo: repo.id.toString() });
});
this.logger.info(`Added ${orgRepos.length} jobs to indexQueue for org ${orgId} with priority ${priority}`);
}
}).catch((err: unknown) => {
this.logger.error(`Failed to add jobs to indexQueue for repos ${repos.map(repo => repo.id).join(', ')}: ${err}`);
});
}
private async fetchAndScheduleRepoIndexing() {
const thresholdDate = new Date(Date.now() - this.settings.reindexIntervalMs);
const repos = await this.db.repo.findMany({
where: {
OR: [
// "NEW" is really a misnomer here - it just means that the repo needs to be indexed
// immediately. In most cases, this will be because the repo was just created and
// is indeed "new". However, it could also be that a "retry" was requested on a failed
// index. So, we don't want to block on the indexedAt timestamp here.
{
repoIndexingStatus: RepoIndexingStatus.NEW,
},
// When the repo has already been indexed, we only want to reindex if the reindexing
// interval has elapsed (or if the date isn't set for some reason).
{
AND: [
{ repoIndexingStatus: RepoIndexingStatus.INDEXED },
{ OR: [
{ indexedAt: null },
{ indexedAt: { lt: thresholdDate } },
]}
]
}
]
},
include: {
connections: {
include: {
connection: true
}
}
}
});
if (repos.length > 0) {
await this.scheduleRepoIndexingBulk(repos);
}
}
// TODO: do this better? ex: try using the tokens from all the connections
// We can no longer use repo.cloneUrl directly since it doesn't contain the token for security reasons. As a result, we need to
// fetch the token here using the connections from the repo. Multiple connections could be referencing this repo, and each
// may have their own token. This method will just pick the first connection that has a token (if one exists) and uses that. This
// may technically cause syncing to fail if that connection's token just so happens to not have access to the repo it's referrencing.
private async getTokenForRepo(repo: RepoWithConnections, db: PrismaClient) {
const repoConnections = repo.connections;
if (repoConnections.length === 0) {
this.logger.error(`Repo ${repo.id} has no connections`);
return;
}
let token: string | undefined;
for (const repoConnection of repoConnections) {
const connection = repoConnection.connection;
if (connection.connectionType !== 'github' && connection.connectionType !== 'gitlab' && connection.connectionType !== 'gitea') {
continue;
}
const config = connection.config as unknown as GithubConnectionConfig | GitlabConnectionConfig | GiteaConnectionConfig;
if (config.token) {
token = await getTokenFromConfig(config.token, connection.orgId, db, this.logger);
if (token) {
break;
}
}
}
return token;
}
private async syncGitRepository(repo: RepoWithConnections, repoAlreadyInIndexingState: boolean) {
let fetchDuration_s: number | undefined = undefined;
let cloneDuration_s: number | undefined = undefined;
const repoPath = getRepoPath(repo, this.ctx);
const metadata = repo.metadata as RepoMetadata;
// If the repo was already in the indexing state, this job was likely killed and picked up again. As a result,
// to ensure the repo state is valid, we delete the repo if it exists so we get a fresh clone
if (repoAlreadyInIndexingState && existsSync(repoPath)) {
this.logger.info(`Deleting repo directory ${repoPath} during sync because it was already in the indexing state`);
await promises.rm(repoPath, { recursive: true, force: true });
}
if (existsSync(repoPath)) {
this.logger.info(`Fetching ${repo.id}...`);
const { durationMs } = await measure(() => fetchRepository(repoPath, ({ method, stage, progress }) => {
this.logger.debug(`git.${method} ${stage} stage ${progress}% complete for ${repo.id}`)
}));
fetchDuration_s = durationMs / 1000;
process.stdout.write('\n');
this.logger.info(`Fetched ${repo.name} in ${fetchDuration_s}s`);
} else {
this.logger.info(`Cloning ${repo.id}...`);
const token = await this.getTokenForRepo(repo, this.db);
const cloneUrl = new URL(repo.cloneUrl);
if (token) {
switch (repo.external_codeHostType) {
case 'gitlab':
cloneUrl.username = 'oauth2';
cloneUrl.password = token;
break;
case 'gitea':
case 'github':
default:
cloneUrl.username = token;
break;
}
}
const { durationMs } = await measure(() => cloneRepository(cloneUrl.toString(), repoPath, metadata.gitConfig, ({ method, stage, progress }) => {
this.logger.debug(`git.${method} ${stage} stage ${progress}% complete for ${repo.id}`)
}));
cloneDuration_s = durationMs / 1000;
process.stdout.write('\n');
this.logger.info(`Cloned ${repo.id} in ${cloneDuration_s}s`);
}
this.logger.info(`Indexing ${repo.id}...`);
const { durationMs } = await measure(() => indexGitRepository(repo, this.settings, this.ctx));
const indexDuration_s = durationMs / 1000;
this.logger.info(`Indexed ${repo.id} in ${indexDuration_s}s`);
return {
fetchDuration_s,
cloneDuration_s,
indexDuration_s,
}
}
private async runIndexJob(job: Job<RepoIndexingPayload>) {
this.logger.info(`Running index job (id: ${job.id}) for repo ${job.data.repo.id}`);
const repo = job.data.repo as RepoWithConnections;
// We have to use the existing repo object to get the repoIndexingStatus because the repo object
// inside the job is unchanged from when it was added to the queue.
const existingRepo = await this.db.repo.findUnique({
where: {
id: repo.id,
},
});
if (!existingRepo) {
this.logger.error(`Repo ${repo.id} not found`);
const e = new Error(`Repo ${repo.id} not found`);
Sentry.captureException(e);
throw e;
}
const repoAlreadyInIndexingState = existingRepo.repoIndexingStatus === RepoIndexingStatus.INDEXING;
await this.db.repo.update({
where: {
id: repo.id,
},
data: {
repoIndexingStatus: RepoIndexingStatus.INDEXING,
}
});
this.promClient.activeRepoIndexingJobs.inc();
this.promClient.pendingRepoIndexingJobs.dec({ repo: repo.id.toString() });
let indexDuration_s: number | undefined;
let fetchDuration_s: number | undefined;
let cloneDuration_s: number | undefined;
let stats;
let attempts = 0;
const maxAttempts = 3;
while (attempts < maxAttempts) {
try {
stats = await this.syncGitRepository(repo, repoAlreadyInIndexingState);
break;
} catch (error) {
Sentry.captureException(error);
attempts++;
this.promClient.repoIndexingReattemptsTotal.inc();
if (attempts === maxAttempts) {
this.logger.error(`Failed to sync repository ${repo.id} after ${maxAttempts} attempts. Error: ${error}`);
throw error;
}
const sleepDuration = 5000 * Math.pow(2, attempts - 1);
this.logger.error(`Failed to sync repository ${repo.id}, attempt ${attempts}/${maxAttempts}. Sleeping for ${sleepDuration / 1000}s... Error: ${error}`);
await new Promise(resolve => setTimeout(resolve, sleepDuration));
}
}
indexDuration_s = stats!.indexDuration_s;
fetchDuration_s = stats!.fetchDuration_s;
cloneDuration_s = stats!.cloneDuration_s;
}
private async onIndexJobCompleted(job: Job<RepoIndexingPayload>) {
this.logger.info(`Repo index job ${job.id} completed`);
this.promClient.activeRepoIndexingJobs.dec();
this.promClient.repoIndexingSuccessTotal.inc();
await this.db.repo.update({
where: {
id: job.data.repo.id,
},
data: {
indexedAt: new Date(),
repoIndexingStatus: RepoIndexingStatus.INDEXED,
}
});
}
private async onIndexJobFailed(job: Job<RepoIndexingPayload> | undefined, err: unknown) {
this.logger.info(`Repo index job failed (id: ${job?.id ?? 'unknown'}) with error: ${err}`);
Sentry.captureException(err, {
tags: {
repoId: job?.data.repo.id,
jobId: job?.id,
queue: REPO_INDEXING_QUEUE,
}
});
if (job) {
this.promClient.activeRepoIndexingJobs.dec();
this.promClient.repoIndexingFailTotal.inc();
await this.db.repo.update({
where: {
id: job.data.repo.id,
},
data: {
repoIndexingStatus: RepoIndexingStatus.FAILED,
indexedAt: new Date(),
}
})
}
}
///////////////////////////
// Repo garbage collection
///////////////////////////
private async scheduleRepoGarbageCollectionBulk(repos: Repo[]) {
await this.db.$transaction(async (tx) => {
await tx.repo.updateMany({
where: { id: { in: repos.map(repo => repo.id) } },
data: { repoIndexingStatus: RepoIndexingStatus.IN_GC_QUEUE }
});
await this.gcQueue.addBulk(repos.map(repo => ({
name: 'repoGarbageCollectionJob',
data: { repo },
})));
this.logger.info(`Added ${repos.length} jobs to gcQueue`);
});
}
private async fetchAndScheduleRepoGarbageCollection() {
////////////////////////////////////
// Get repos with no connections
////////////////////////////////////
const thresholdDate = new Date(Date.now() - this.settings.repoGarbageCollectionGracePeriodMs);
const reposWithNoConnections = await this.db.repo.findMany({
where: {
repoIndexingStatus: {
in: [
RepoIndexingStatus.INDEXED, // we don't include NEW repos here because they'll be picked up by the index queue (potential race condition)
RepoIndexingStatus.FAILED,
]
},
connections: {
none: {}
},
OR: [
{ indexedAt: null },
{ indexedAt: { lt: thresholdDate } }
]
},
});
if (reposWithNoConnections.length > 0) {
this.logger.info(`Garbage collecting ${reposWithNoConnections.length} repos with no connections: ${reposWithNoConnections.map(repo => repo.id).join(', ')}`);
}
////////////////////////////////////
// Get inactive org repos
////////////////////////////////////
const sevenDaysAgo = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000);
const inactiveOrgRepos = await this.db.repo.findMany({
where: {
org: {
stripeSubscriptionStatus: StripeSubscriptionStatus.INACTIVE,
stripeLastUpdatedAt: {
lt: sevenDaysAgo
}
},
OR: [
{ indexedAt: null },
{ indexedAt: { lt: thresholdDate } }
]
}
});
if (inactiveOrgRepos.length > 0) {
this.logger.info(`Garbage collecting ${inactiveOrgRepos.length} inactive org repos: ${inactiveOrgRepos.map(repo => repo.id).join(', ')}`);
}
const reposToDelete = [...reposWithNoConnections, ...inactiveOrgRepos];
if (reposToDelete.length > 0) {
await this.scheduleRepoGarbageCollectionBulk(reposToDelete);
}
}
private async runGarbageCollectionJob(job: Job<RepoGarbageCollectionPayload>) {
this.logger.info(`Running garbage collection job (id: ${job.id}) for repo ${job.data.repo.id}`);
this.promClient.activeRepoGarbageCollectionJobs.inc();
const repo = job.data.repo as Repo;
await this.db.repo.update({
where: {
id: repo.id
},
data: {
repoIndexingStatus: RepoIndexingStatus.GARBAGE_COLLECTING
}
});
// delete cloned repo
const repoPath = getRepoPath(repo, this.ctx);
if (existsSync(repoPath)) {
this.logger.info(`Deleting repo directory ${repoPath}`);
await promises.rm(repoPath, { recursive: true, force: true });
}
// delete shards
const shardPrefix = getShardPrefix(repo.orgId, repo.id);
const files = readdirSync(this.ctx.indexPath).filter(file => file.startsWith(shardPrefix));
for (const file of files) {
const filePath = `${this.ctx.indexPath}/${file}`;
this.logger.info(`Deleting shard file ${filePath}`);
await promises.rm(filePath, { force: true });
}
}
private async onGarbageCollectionJobCompleted(job: Job<RepoGarbageCollectionPayload>) {
this.logger.info(`Garbage collection job ${job.id} completed`);
this.promClient.activeRepoGarbageCollectionJobs.dec();
this.promClient.repoGarbageCollectionSuccessTotal.inc();
await this.db.repo.delete({
where: {
id: job.data.repo.id
}
});
}
private async onGarbageCollectionJobFailed(job: Job<RepoGarbageCollectionPayload> | undefined, err: unknown) {
this.logger.info(`Garbage collection job failed (id: ${job?.id ?? 'unknown'}) with error: ${err}`);
Sentry.captureException(err, {
tags: {
repoId: job?.data.repo.id,
jobId: job?.id,
queue: REPO_GC_QUEUE,
}
});
if (job) {
this.promClient.activeRepoGarbageCollectionJobs.dec();
this.promClient.repoGarbageCollectionFailTotal.inc();
await this.db.repo.update({
where: {
id: job.data.repo.id
},
data: {
repoIndexingStatus: RepoIndexingStatus.GARBAGE_COLLECTION_FAILED
}
});
}
}
private async fetchAndScheduleRepoTimeouts() {
const repos = await this.db.repo.findMany({
where: {
repoIndexingStatus: RepoIndexingStatus.INDEXING,
updatedAt: {
lt: new Date(Date.now() - this.settings.repoIndexTimeoutMs)
}
}
});
if (repos.length > 0) {
this.logger.info(`Scheduling ${repos.length} repo timeouts`);
await this.scheduleRepoTimeoutsBulk(repos);
}
}
private async scheduleRepoTimeoutsBulk(repos: Repo[]) {
await this.db.$transaction(async (tx) => {
await tx.repo.updateMany({
where: { id: { in: repos.map(repo => repo.id) } },
data: { repoIndexingStatus: RepoIndexingStatus.FAILED }
});
});
}
public async dispose() {
this.indexWorker.close();
this.indexQueue.close();
this.gcQueue.close();
this.gcWorker.close();
}
}

Some files were not shown because too many files have changed in this diff Show more