-
Notifications
You must be signed in to change notification settings - Fork 366
(feat: persistence) Add schema-v4.sql for metrics tables (#3337) #3523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add new schema version 4 with tables for storing scan and commit metrics reports as first-class entities. New tables: - scan_metrics_report: Stores scan metrics with trace correlation - scan_metrics_report_roles: Junction table for principal roles - commit_metrics_report: Stores commit metrics with trace correlation - commit_metrics_report_roles: Junction table for principal roles Key design decisions: - PRIMARY KEY (realm_id, report_id) for multi-tenancy - Junction tables with CASCADE DELETE for roles - Timestamp index for retention cleanup - JSONB metadata column for extensibility (Postgres), TEXT for H2 Schema is self-contained (includes all tables from v1-v3) to support fresh installs.
| CREATE TABLE IF NOT EXISTS commit_metrics_report_roles ( | ||
| realm_id TEXT NOT NULL, | ||
| report_id TEXT NOT NULL, | ||
| role_name TEXT NOT NULL, | ||
| PRIMARY KEY (realm_id, report_id, role_name), | ||
| FOREIGN KEY (realm_id, report_id) REFERENCES commit_metrics_report(realm_id, report_id) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
porting my comment from previous pr here, do we need a new table just for this ? @dimas-b thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
CREATE TABLE scan_metrics_report (
...
roles JSONB DEFAULT '[]'::JSONB, -- or TEXT for H2
...
);
Pros:
• No extra tables
• Single INSERT per report
• Works with both Postgres (JSONB) and H2 (TEXT)
• Can query with JSON operators: WHERE roles @> '["analyst"]'
Cons:
• Harder to index efficiently
• JSON parsing overhead
• Less type-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe the debate then is adding it in schema directly of the scan_metric report or normalizing the roles info into seperate i believe its fine to have role_name if we have principal name in the schema already, if we wanna be cautious we can just move the prinicipal_name and role name in the additional properties.
H2 is just for test, for PG we can make indexes on the JSONB columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @singhpk234. @dimas-b please let us know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I lost context on the RDBMS schema for metrics 😅
Do we have a description (somewhere) of anticipated query patterns in Polaris and whether we expect external (intependent) queries against this schema?
| PRIMARY KEY (event_id) | ||
| ); | ||
|
|
||
| -- Idempotency records (from v3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not longer in v3 per #3386
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have merged my PR with the latest from main. My PR now only contains the metrics tables I added, and a potential fix for the correct version.
|
@obelix74 @singhpk234 : WDYT about starting an RFC doc + |
|
@dimas-b there is a dev thread already please ref : https://lists.apache.org/thread/c83jnkvlwc2k3swm65cmvl4t0mt7p799 |
I am trying to solve two sets of asks from my product folks with this.
From the metrics perspective, today, with Track table scan operations:
For commit report queries:
Also many operational dashboards, and filtering by user, realm, engine name, version etc. I have not thought about roles in this flow at all, perhaps it will be useful. @singhpk234 recommended adding roles and I added them. I normalized the roles tables from a RDBMS perspective, but I didn't realize there are other similar fields stored as JSON already. |
| MERGE INTO version (version_key, version_value) | ||
| KEY (version_key) | ||
| VALUES ('version', 3); | ||
| VALUES ('version', 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this @obelix74 !
Add new schema version 4 with tables for storing scan and commit metrics reports as first-class entities.
New tables:
Key design decisions:
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)