Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions plans/link-posthog-to-anon-extension-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# Link PostHog to Anonymous Extension Usage

## Problem

Cannot link PostHog analytics (`distinct_id`) with backend usage data (`anon:ip`) for anonymous extension users.

**Current**: PostHog generates `distinct_id` → Backend creates `anon:{ip}` → No connection

**Goal**: Pass PostHog `distinct_id` from extensions to backend via `x-posthog-distinct-id` header

---

## Deployment Order

### ⚠️ CRITICAL: Backend FIRST, Extensions AFTER

**Why**: Backend changes are backward-compatible (new field is nullable, header is optional).

**Order**:
1. ✅ Deploy backend → Extensions sending header before this = header ignored (safe)
2. ✅ Deploy extensions → Can be days/weeks later, no coordination needed

**Independent**: YES - No tight coupling, deploy at your own pace

---

## Repository 1: kilocode-backend (THIS REPO)

**GitHub**: https://github.com/Kilo-Org/cloud

### File 1: `src/db/migrations/0005_add_posthog_distinct_id_to_usage_metadata.sql` (NEW)
```sql
ALTER TABLE microdollar_usage_metadata ADD COLUMN posthog_distinct_id TEXT;

-- Use CONCURRENTLY to avoid blocking writes on large table
-- Note: Cannot be run in a transaction, may need separate execution
CREATE INDEX CONCURRENTLY idx_microdollar_usage_metadata_posthog_distinct_id
ON microdollar_usage_metadata(posthog_distinct_id);
```

**⚠️ Migration Note**: `CREATE INDEX CONCURRENTLY` cannot run in a transaction. If your migration runner wraps statements in transactions, you may need to:
1. Run the ALTER TABLE in one migration
2. Run the CREATE INDEX CONCURRENTLY separately (or use a non-concurrent index if table is small)

### File 2: `src/db/schema.ts` (line ~615)
**Before**:
```typescript
export const microdollar_usage_metadata = pgTable(
'microdollar_usage_metadata',
{
// ... 14 existing fields ...
has_tools: boolean(),
},
table => [index('idx_microdollar_usage_metadata_created_at').on(table.created_at)]
);
```

**After**:
```typescript
export const microdollar_usage_metadata = pgTable(
'microdollar_usage_metadata',
{
// ... 14 existing fields ...
has_tools: boolean(),
posthog_distinct_id: text(), // NEW
},
table => [
index('idx_microdollar_usage_metadata_created_at').on(table.created_at),
index('idx_microdollar_usage_metadata_posthog_distinct_id').on(table.posthog_distinct_id), // NEW
]
);
```

### File 3: `src/app/api/openrouter/[...path]/route.ts` (lines ~250, ~265)
**Before** (line ~265):
```typescript
const usageContext: MicrodollarUsageContext = {
// ... other fields ...
posthog_distinct_id: isAnonymousContext(user) ? undefined : user.google_user_email,
// ... other fields ...
};
```

**After** (add line ~250, modify line ~265):
```typescript
// NEW - line ~250
// Validate PostHog distinct_id from header (length limit only to prevent bloat)
const rawDistinctId = request.headers.get('x-posthog-distinct-id');
const posthogDistinctIdFromHeader = rawDistinctId && rawDistinctId.length <= 255
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: trim() can produce an empty string that still gets persisted

In the suggested backend parsing, rawDistinctId.trim() can become "" (e.g., header value is only whitespace). Because "" is not nullish, the later posthogDistinctIdFromHeader ?? undefined keeps it, so you can end up storing empty posthog_distinct_id values. Consider treating "" as undefined after trimming to avoid polluting analytics/coverage queries.

? rawDistinctId.trim()
: undefined;

const usageContext: MicrodollarUsageContext = {
// ... other fields ...
posthog_distinct_id: isAnonymousContext(user)
? (posthogDistinctIdFromHeader ?? undefined) // CHANGED
: user.google_user_email,
// ... other fields ...
};
```

**⚠️ Security Note**: Header is validated for length only (max 255 chars) to prevent DB bloat. No character restrictions to support all PostHog distinct_id formats.

**⚠️ PostHog Identity Note**: For authenticated users, PostHog's `distinct_id` IS the email (set via `posthog.identify(email)` in PostHogProvider.tsx:94). This means:
- Anonymous: `posthog_distinct_id` = header value (e.g., `"vscode-abc123"` or `"01JFKX..."`)
- Authenticated: `posthog_distinct_id` = email (e.g., `"user@example.com"`)
- PostHog links them via `alias()` call, so queries work across both states

### File 4: `src/lib/processUsage.ts` (VERIFY ONLY)
Check that `posthog_distinct_id` is included in INSERT around line ~550-600. Likely already works.

---

## Repository 2: Kilo-Org/kilocode (VSCode & JetBrains)

**GitHub**: https://github.com/Kilo-Org/kilocode
**Note**: Both extensions share TypeScript API code

### File 1: `src/core/kilocode/anonymous-id.ts` (NEW)
```typescript
import * as vscode from 'vscode';
import crypto from 'crypto';

const STORAGE_KEY = 'kilocode.posthogAnonymousId';

export async function getPostHogAnonymousId(context: vscode.ExtensionContext): Promise<string> {
let anonymousId = context.globalState.get<string>(STORAGE_KEY);

if (!anonymousId) {
const machineId = vscode.env.machineId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Using vscode.env.machineId (even hashed) can still be treated as personal data / a stable identifier

machineId is designed to be stable across time and can enable long-lived tracking. Even though you hash and truncate it, it’s still a deterministic per-machine identifier and may raise privacy / store-policy concerns. If the goal is just to correlate anonymous sessions, consider generating a random UUID once (stored in globalState) instead of deriving it from machineId, unless you explicitly need cross-reinstall stability and have confirmed policy compliance.

const hash = crypto.createHash('sha256')
.update(machineId)
.update('kilocode-posthog')
.digest('hex')
.substring(0, 16);

anonymousId = `vscode-${hash}`;
await context.globalState.update(STORAGE_KEY, anonymousId); // AWAIT to ensure persistence
}

return anonymousId;
}
```

**⚠️ Note**: Function is async and awaits `globalState.update()` to ensure ID is persisted before extension deactivates.

### File 2: `src/api/providers/kilocode-openrouter.ts` (line ~60)
**Before**:
```typescript
override customRequestOptions(metadata?: ApiHandlerCreateMessageMetadata) {
const headers: Record<string, string> = {
[X_KILOCODE_EDITORNAME]: getEditorNameHeader(),
};

// ... existing header logic ...

return Object.keys(headers).length > 0 ? { headers } : undefined;
}
```

**After**:
```typescript
override customRequestOptions(metadata?: ApiHandlerCreateMessageMetadata) {
const headers: Record<string, string> = {
[X_KILOCODE_EDITORNAME]: getEditorNameHeader(),
};

// ... existing header logic ...

// NEW: Add PostHog distinct_id
const distinctId = this.options.posthogDistinctId;
if (distinctId) {
headers['x-posthog-distinct-id'] = distinctId;
}

return Object.keys(headers).length > 0 ? { headers } : undefined;
}
```

### File 3: `src/shared/api.ts`
**Before**:
```typescript
export interface ApiHandlerOptions {
// ... existing fields ...
}
```

**After**:
```typescript
export interface ApiHandlerOptions {
// ... existing fields ...
posthogDistinctId?: string; // NEW
}
```

### File 4: `src/core/webview/ClineProvider.ts`
**Before**:
```typescript
const apiHandler = new KilocodeOpenrouterHandler({
// ... existing options ...
});
```

**After**:
```typescript
import { getPostHogAnonymousId } from '../kilocode/anonymous-id'; // NEW

const posthogDistinctId = await getPostHogAnonymousId(this.context); // NEW - await async function

const apiHandler = new KilocodeOpenrouterHandler({
// ... existing options ...
posthogDistinctId, // NEW
});
```

---

## Validation Queries

```sql
-- Check coverage rate
SELECT
COUNT(*) FILTER (WHERE mum.posthog_distinct_id IS NOT NULL) * 100.0 / NULLIF(COUNT(*), 0) as coverage_pct,
COUNT(*) as total_requests
FROM microdollar_usage mu
JOIN microdollar_usage_metadata mum ON mu.id = mum.id
WHERE mu.kilo_user_id LIKE 'anon:%' AND mu.created_at > NOW() - INTERVAL '7 days';

-- View anonymous users with PostHog tracking
SELECT
mu.kilo_user_id,
mum.posthog_distinct_id,
COUNT(*) as requests,
SUM(mu.cost) / 1000000.0 as cost_usd
FROM microdollar_usage mu
JOIN microdollar_usage_metadata mum ON mu.id = mum.id
WHERE mu.kilo_user_id LIKE 'anon:%' AND mum.posthog_distinct_id IS NOT NULL
GROUP BY mu.kilo_user_id, mum.posthog_distinct_id
ORDER BY requests DESC LIMIT 20;
```

---

## Summary

- **2 repositories**, **8 files**
- **1 backend endpoint** creates `anon:ip`: `/api/openrouter/` (no auth + free model)
- **Used by**: VSCode & JetBrains only (not CLI, not web)
- **Deploy**: Backend first (safe), extensions after (independent)