Sanitize posthog data and add more info in README (#4)

* remove page view provider

* sanatize current_url and ip properties in all posthog events

* add posthog usage info in README

* remove unneccessary ip property removal since we disable ip collection on posthog side

* add back ip sanitization on client side (we disabled it on server side but may as well also clear it on client) and revise README on telemetry

* add typo with asterisks in readme

* small grammar fix in README
This commit is contained in:
Michael Sukkarieh 2024-09-19 13:22:13 -07:00 committed by GitHub
parent 9dec454d06
commit c5b53c2d6c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 20 additions and 39 deletions

View file

@ -115,7 +115,7 @@ Sourcebot also supports indexing GitLab & BitBucket. Checkout the [index.json](.
zoekt will now index your repositories (at `HEAD`). By default, it will re-index existing repositories every hour, and discover new repositories every 24 hours. zoekt will now index your repositories (at `HEAD`). By default, it will re-index existing repositories every hour, and discover new repositories every 24 hours.
4. Go to `http://localhost:3000` - once a index has been created, you can start searching. 4. Go to `http://localhost:3000` - once an index has been created, you can start searching.
## Building Sourcebot ## Building Sourcebot
@ -182,14 +182,16 @@ The zoekt binaries and web dependencies are placed into `bin` and `node_modules`
A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`. A `.sourcebot` directory will be created and zoekt will begin to index the repositories found given `config.json`.
6. Go to `http://localhost:3000` - once a index has been created, you can start searching. 6. Go to `http://localhost:3000` - once an index has been created, you can start searching.
## Disabling Telemetry ## Telemetry
By default, Sourcebot collects anonymous usage data using [PostHog](https://posthog.com/). You can disable this by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `1` in the docker run command: By default, Sourcebot collects anonymized usage data through [PostHog](https://posthog.com/) to help us improve the performance and reliability of our tool. We do not collect or transmit [any information related to your codebase](https://github.com/search?q=repo:TaqlaAI/sourcebot++captureEvent&type=code). All events are [sanitized](https://github.com/TaqlaAI/sourcebot/blob/main/src/app/posthogProvider.tsx) to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)
If you'd like to disable all telemetry, you can do so by setting the environment variable `SOURCEBOT_TELEMETRY_DISABLED` to `1` in the docker run command:
```sh ```sh
docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 ...stuff... ghcr.io/taqlaai/sourcebot:main docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 /* additional args */ ghcr.io/taqlaai/sourcebot:main
``` ```
Or if you are building locally, add the following to your [.env](./.env) file: Or if you are building locally, add the following to your [.env](./.env) file:

View file

@ -9,10 +9,6 @@ import dynamic from "next/dynamic";
const inter = Inter({ subsets: ["latin"] }); const inter = Inter({ subsets: ["latin"] });
const PostHogPageView = dynamic(() => import('./posthogPageView'), {
ssr: false,
})
export const metadata: Metadata = { export const metadata: Metadata = {
title: "Sourcebot", title: "Sourcebot",
description: "Sourcebot", description: "Sourcebot",
@ -31,7 +27,6 @@ export default function RootLayout({
> >
<body className={inter.className}> <body className={inter.className}>
<PHProvider> <PHProvider>
<PostHogPageView />
<ThemeProvider <ThemeProvider
attribute="class" attribute="class"
defaultTheme="system" defaultTheme="system"

View file

@ -1,28 +0,0 @@
'use client'
import { usePathname, useSearchParams } from "next/navigation";
import { useEffect } from "react";
import { usePostHog } from 'posthog-js/react';
export default function PostHogPageView(): null {
const pathname = usePathname();
const searchParams = useSearchParams();
const posthog = usePostHog();
useEffect(() => {
// Track pageviews
if (pathname && posthog) {
let url = window.origin + pathname
if (searchParams.toString()) {
url = url + `?${searchParams.toString()}`
}
posthog.capture(
'$pageview',
{
'$current_url': url,
}
)
}
}, [pathname, searchParams, posthog])
return null
}

View file

@ -9,7 +9,19 @@ if (typeof window !== 'undefined') {
api_host: "/ingest", api_host: "/ingest",
ui_host: NEXT_PUBLIC_POSTHOG_UI_HOST, ui_host: NEXT_PUBLIC_POSTHOG_UI_HOST,
person_profiles: 'identified_only', person_profiles: 'identified_only',
capture_pageview: false, // Disable automatic pageview capture, as we capture manually capture_pageview: false, // Disable automatic pageview capture
autocapture: false, // Disable automatic event capture
sanitize_properties: (properties: Record<string, any>, _event: string) => {
// https://posthog.com/docs/libraries/js#config
if (properties['$current_url']) {
properties['$current_url'] = null;
}
if (properties['$ip']) {
properties['$ip'] = null;
}
return properties;
}
}); });
} else { } else {
console.log("PostHog telemetry disabled"); console.log("PostHog telemetry disabled");