Skip to content

Conversation

@tomfutago
Copy link
Contributor

Thank you for contributing to Spellbook 🪄

Please open the PR in draft and mark as ready when you want to request a review.

Description:

[...]


quick links for more information:

@github-actions github-actions bot added WIP work in progress dbt: tokens covers the Tokens dbt subproject labels Jan 2, 2026
@tomfutago
Copy link
Contributor Author

asof join test results summary

test descriptions

test strategy description
original lead() + range join uses lead() window function to compute next_update_day, then expands with range join
test1 asof + cross join cross join (address × token × days), then asof left join to find latest balance
test2 asof for next_update_day + utils.days asof self-join to find next balance (replaces lead), expand with inner join utils.days
test3 asof for next_update_day + range join asof self-join to find next balance (replaces lead), expand with original left join days range pattern

initial run (full refresh)

chain rows original test1 test2 test3
arbitrum 3.4b 797s 2418s ❌ (3.0×) 784s ✅ 763s
base 3.2b 735s 1596s ❌ (2.2×) 714s ✅ 694s
linea 686m 155s 271s ❌ (1.7×) 147s ✅ 135s
scroll 487m 106s 183s ❌ (1.7×) 84s ✅ 70s

winner: test3 — consistently 5-35% faster than original on full refresh.


incremental run

chain rows original test1 test2 test3
arbitrum 19.6m 93s 108s 107s 86s
base 33.3m 137s 131s 136s 136s
linea 2.6m 11s 11s 10s 10s
scroll 1.9m 14s 15s 8s

winner: test3 — fastest or tied on incremental runs.


conclusions

  1. test1 (cross join) — ❌ do not use. the cross join creates massive cardinality explosion, especially on larger chains (2-3× slower).

  2. test2 (asof + utils.days inner join) — ✅ slight improvement over original. good alternative.

  3. test3 (asof + range join) — ✅ best performer. combines asof for next_update_day calculation with original range join pattern. 5-35% faster on full refresh, fastest on incremental.

recommendation: test3 is the optimal asof implementation. it replaces only the lead() window function with asof self-join while keeping the proven range join expansion pattern.

@tomfutago tomfutago requested a review from a team January 5, 2026 15:10
@jeff-dude
Copy link
Member

asof join test results summary

test descriptions

test strategy description
original lead() + range join uses lead() window function to compute next_update_day, then expands with range join
test1 asof + cross join cross join (address × token × days), then asof left join to find latest balance
test2 asof for next_update_day + utils.days asof self-join to find next balance (replaces lead), expand with inner join utils.days
test3 asof for next_update_day + range join asof self-join to find next balance (replaces lead), expand with original left join days range pattern

initial run (full refresh)

chain rows original test1 test2 test3
arbitrum 3.4b 797s 2418s ❌ (3.0×) 784s ✅ 763s
base 3.2b 735s 1596s ❌ (2.2×) 714s ✅ 694s
linea 686m 155s 271s ❌ (1.7×) 147s ✅ 135s
scroll 487m 106s 183s ❌ (1.7×) 84s ✅ 70s
winner: test3 — consistently 5-35% faster than original on full refresh.

incremental run

chain rows original test1 test2 test3
arbitrum 19.6m 93s 108s 107s 86s
base 33.3m 137s 131s ✅ 136s 136s
linea 2.6m 11s 11s 10s 10s
scroll 1.9m — 14s 15s 8s
winner: test3 — fastest or tied on incremental runs.

conclusions

  1. test1 (cross join) — ❌ do not use. the cross join creates massive cardinality explosion, especially on larger chains (2-3× slower).
  2. test2 (asof + utils.days inner join) — ✅ slight improvement over original. good alternative.
  3. test3 (asof + range join) — ✅ best performer. combines asof for next_update_day calculation with original range join pattern. 5-35% faster on full refresh, fastest on incremental.

recommendation: test3 is the optimal asof implementation. it replaces only the lead() window function with asof self-join while keeping the proven range join expansion pattern.

follow up question:
was base chosen as one of the larger chains, so we can see performance across various size of chains?
would be good to confirm it's consistent across smaller + larger chains

@tomfutago
Copy link
Contributor Author

follow up question: was base chosen as one of the larger chains, so we can see performance across various size of chains? would be good to confirm it's consistent across smaller + larger chains

yep, it's to compare 2 big chains (arbitrum + base) vs 2 small ones (linea + scroll)

just few more notes:

  • test1 is meant to be closest "translation" of original logic into asof-type of logic, but performance gain on asof join is massively diminished by utils.days cross join (which was meant to replace current range join on day to next_update_day)
  • test2-3 are performing better but not as much as i hoped - not sure if asof self-join is the best approach here

hints welcome 🙏

Copy link
Collaborator

@0xRobin 0xRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomfutago I had a quick look, can you add another approach that uses just 1 asof join like this:
High level this should look like this:

with balance_updates as (
  select * from balances_source
  where address_filter
  and token_filter
  and incremental_filter
),

asof_subjects as (
select distinct 
address, token_address
from balance_updates
),

asof_spine as (
select
  timestamp
  ,address
  ,token_address 
from asof_subjects
cross join (select timestamp from utils.day where <incremental window>)
),

select
  timestamp
  ,address
  ,token_address
  ,balance
from asof_spine
asof join balance_updates
on address = address
  and token_address = token_address
  and timestamp <= balance_updated_at

There's also an issue that you're not accounting for balance updates that occured before the start of the incremental window (and carrying those into the current window), but I would keep that as a follow up. I have some examples on how to do that as well somewhere.

@tomfutago
Copy link
Contributor Author

thanks @0xRobin i might be missing something but your suggested approach looks like my test1, is it not?

@0xRobin
Copy link
Collaborator

0xRobin commented Jan 6, 2026

ah yes you are right! @tomfutago
Sorry I conflated test1 with the original.

I think all approaches are missing the historical lookback for incremental updates though, can you add that as well.
Equivalent to this in the original:
https://github.com/duneanalytics/spellbook/blob/8019858e5349b5ddcf1f8d8f571484b920d85fd0/dbt_macros/shared/balances_incremental_subset_daily.sql#L85C1-L104C16

I wonder if for approach 1 it would be better to do it in 2 stages, one to construct the cross join spine and the final one to asof join with the balance updates.

@tomfutago
Copy link
Contributor Author

updated - test3 version still seems to winning:


initial run (full refresh)

chain rows original test1 test2 test3
arbitrum 3.4b 864s 2980s ❌ (3.4×) 848s ✅ 855s
base 3.25b 1000s 2047s ❌ (2×) 1002s 988s
linea 688m 183s 268s ❌ (1.5×) 169s ✅ 73s ✅✅
scroll 488m 52s 90s ❌ (1.7×) 46s ✅ 44s

winner: test3 - up to 2.5× faster on smaller chains, consistent improvement across all.


incremental run

chain rows original test1 test2 test3
arbitrum 19.7m 94s 95s 100s 98s
base 33.4m 161s 149s 165s 156s
linea 2.6m 23s 22s 17s ✅ 16s
scroll 1.9m 17s 18s 17s 17s

winner: test3 - slight edge on incremental, test1 also competitive now.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dbt: tokens covers the Tokens dbt subproject WIP work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants