Skip to content

Feature: Trending #9

@veekaybee

Description

@veekaybee

Describe desired changes

We'd like to see trending GH repos over time on Bsky to see what people are hacking on.

Typically, trending topics for large volumes can be formulated as a combination of topic detection + anomaly detection:

  1. Pre-processing: Processing each document (post) one by one
  2. Performing topic detection: Extracting keywords from those posts, where keywords are topics (either LDA, LSA, BERTopic, or more recently, LLM topic extraction)
  3. Analyzing topics in a temporal context - i.e. how many texts per topic exist compared to the previous time period.
  4. Anomaly detection: extracting topics that are trending: anomalous to other periods

Implementation

Luckily! Since the volume is low and we batch deletes every two hours (see #6) , we don't need to do ANY of this, and can use a very simple algorithm:

Backend:

  1. On ingest, create a table that deserializes a map of repo link to count
  2. Collect links and counts while ingest is running
  3. Create a trending API that returns this data from the DB as an array with a timestamp

Frontend:

  1. Create a "Trending" tab
  2. Populate the trending tab from the API call

We might encounter issues of boundary conditions when we stop and restart the ingest feed, but we can handle with logic that clears the DB, or creates a restore point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions