-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Describe desired changes
We'd like to see trending GH repos over time on Bsky to see what people are hacking on.
Typically, trending topics for large volumes can be formulated as a combination of topic detection + anomaly detection:
- Pre-processing: Processing each document (post) one by one
- Performing topic detection: Extracting keywords from those posts, where keywords are topics (either LDA, LSA, BERTopic, or more recently, LLM topic extraction)
- Analyzing topics in a temporal context - i.e. how many texts per topic exist compared to the previous time period.
- Anomaly detection: extracting topics that are trending: anomalous to other periods
Implementation
Luckily! Since the volume is low and we batch deletes every two hours (see #6) , we don't need to do ANY of this, and can use a very simple algorithm:
Backend:
- On ingest, create a table that deserializes a map of repo link to count
- Collect links and counts while ingest is running
- Create a trending API that returns this data from the DB as an array with a timestamp
Frontend:
- Create a "Trending" tab
- Populate the trending tab from the API call
We might encounter issues of boundary conditions when we stop and restart the ingest feed, but we can handle with logic that clears the DB, or creates a restore point.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels