Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@
| [CAP-0078](cap-0078.md) | Host functions for performing limited TTL extensions | Dmytro Kozhevin | Draft |
| [CAP-0079](cap-0079.md) | Host functions for muxed address strkey conversions | Dmytro Kozhevin | Draft |
| [CAP-0080](cap-0080.md) | Host functions for efficient ZK BN254 use cases | Siddharth Suresh | Draft |
| [CAP-0081](cap-0081.md) | TTL-Ordered Eviction | Garand Tyson | Draft |

### Rejected Proposals
| Number | Title | Author | Status |
Expand Down
241 changes: 241 additions & 0 deletions core/cap-0081.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
## Preamble

```
CAP: 0081
Title: TTL-Ordered Eviction
Working Group:
Owner: Garand Tyson <@SirTyson>
Authors: Garand Tyson <@SirTyson>
Consulted:
Status: Draft
Created: 2026-01-28
Discussion: https://github.com/orgs/stellar/discussions/1868
Protocol version: 26
```

## Simple Summary

This CAP changes the eviction order of Soroban entries from
bucket-file-position order to TTL order. Entries are evicted in
`(liveUntilLedgerSeq, LedgerKey)` order, lowest TTL first, with LedgerKey as a
tiebreaker. This makes eviction order depend solely on intrinsic entry
properties rather than BucketList file layout.

## Working Group

As specified in the Preamble.

## Motivation

The current eviction mechanism (CAP-0046-12) scans bucket files on disk to find
expired entries. This approach has two fundamental issues:

1. **Performance**: The current eviction process requires unnecessary disk IO.
While only Soroban entries are evictable, we scan all entry types. Soroban
entries themselves are also always stored in-memory, but this is not
leveraged for the eviction scan. By changing eviction order, all eviction
scans can be done on in-memory state, increasing the rate of eviction and
reducing resource consumption.

2. **Non-intuitive ordering**: Eviction order is determined by an entry's
position in the BucketList structure. This is very complex, implementation
specific, and led to a correctness bug in Protocol 23. This simplified
ordering is intrinsic to just entry state and should lead to much simpler
implementation.

### Goals Alignment

The proposal aligns with Stellar’s goals of scalability, resilience, and
performance by reducing disk utilization and simplifying protocol.

## Abstract

This CAP modifies the eviction mechanism to:

1. **Order eviction by TTL**: Temporary and persistent entries are evicted
separately, each with their own ordering. Within each entry type, entries
are ordered by `(liveUntilLedgerSeq, LedgerKey)` — the entry with the lowest
`liveUntilLedgerSeq` is evicted first. For entries with the same TTL,
`LedgerKey` ordering provides a deterministic tiebreaker. Temporary and
persistent entries do not compete with each other for eviction order.

2. **Separate limits for temporary and persistent entries**: Temporary entries
have their own eviction limit (`maxTempEntriesToEvict`) separate from the
persistent entry archival limit (`maxPersistentEntriesToArchive`). Each
limit is applied independently per ledger.

3. **Remove disk I/O from eviction**: Since all Soroban entries are stored in
memory, eviction scans can be performed without disk IO.

4. **Remove complex background eviction scan implementation**: With the new
ordering, eviction scans can be done on the apply thread during ledger
close, greatly simplifying implementation.

## Specification

### XDR changes

This CAP introduces a new network config setting to separately limit temporary
entry eviction:

```diff mddiffcheck.ignore=true
enum ConfigSettingID
{
CONFIG_SETTING_CONTRACT_MAX_SIZE_BYTES = 0,
CONFIG_SETTING_CONTRACT_COMPUTE_V0 = 1,
// ... existing settings ...
CONFIG_SETTING_CONTRACT_PARALLEL_COMPUTE_V0 = 12,
CONFIG_SETTING_CONTRACT_LEDGER_COST_V0 = 13,
- CONFIG_SETTING_SCP_TIMING = 16
+ CONFIG_SETTING_SCP_TIMING = 16,
+ CONFIG_SETTING_MAX_TEMP_ENTRIES_TO_EVICT = 17
};

union ConfigSettingEntry switch (ConfigSettingID configSettingID)
{
// ... existing cases ...
+ case CONFIG_SETTING_MAX_TEMP_ENTRIES_TO_EVICT:
+ uint32 maxTempEntriesToEvict;
};
```

Additionally, `maxEntriesToArchive` will be renamed:

```diff mddiffcheck.ignore=true
struct StateArchivalSettings
{
uint32 maxEntryTTL;
uint32 minTemporaryTTL;
uint32 minPersistentTTL;
int64 persistentRentRateDenominator;
int64 tempRentRateDenominator;
- uint32 maxEntriesToArchive;
+ uint32 maxPersistentEntriesToArchive;
uint32 bucketListSizeWindowSampleSize;
uint32 bucketListWindowSamplePeriod;
- uint32 evictionScanSize;
+ uint32 maxPersistentBytesToArchive;
uint64 startingEvictionScanLevel;
};
```

`EvictionIterator`, and `startingEvictionScanLevel` will be deprecated and no
longer used.

### Semantics

#### Eviction Ordering

Entries eligible for eviction (entries where
`liveUntilLedgerSeq < currentLedgerSeq`) are evicted in the following order:

1. **Primary sort**: `liveUntilLedgerSeq` ascending, where entries expiring
soonest are evicted first
2. **Secondary sort**: Sorted via `LedgerKey`

#### Eviction Algorithm

On each ledger close, eviction proceeds separately for temporary and persistent
entries:

**Temporary Entry Eviction:**

1. Identify expired temporary entries: all `TEMPORARY` Soroban entries where
`liveUntilLedgerSeq < currentLedgerSeq`
2. Sort by `(liveUntilLedgerSeq, LedgerKey)`
3. Evict the first `maxTempEntriesToEvict` entries in this order

**Persistent Entry Archival:**

1. Identify expired persistent entries: all `PERSISTENT` Soroban entries where
`liveUntilLedgerSeq < currentLedgerSeq`
2. Sort by `(liveUntilLedgerSeq, LedgerKey)`
3. Archive entries in this order until we have archived
`maxPersistentBytesToArchive` bytes or `maxPersistentBytesToArchive`
entries, whichever limit occurs first

Both limits are applied independently per ledger. A ledger may evict up to
`maxTempEntriesToEvict` temporary entries and archive up to
`maxPersistentEntriesToArchive` persistent entries. Eviction occurs after
applying all transactions from a given ledger.

#### Initial Settings

- `maxPersistentEntriesToArchive` initial value: 1000
- `maxTempEntriesToEvict` initial value: 1000

Currently, `maxEntriesToArchive` is set to 1000. For simplicity, this value
will be kept for both limits.

- `maxPersistentBytesToArchive` initial value: 286720

`maxPersistentBytesToArchive` in practice meters the maximum number of bytes we
can write to the hot archive. While `evictionScanSize` implicity bounded this
before, it's value was not set with this in mind. We will adopt the current
`ledgerMaxWriteBytes` value. This represents the number of bytes we can safely
write to the Live BucketList. Given that the Hot Archive BucketList write is
independent and can be easily parallelized if need be, we can use the same
limit initially.

## Implementation

Maintaining the order of entries to evict can be done efficiently by leveraging
the in-memory soroban cache. Initially, a naive approach can be used. Later on,
performance can be improved with a more complex solution without a protocol
change.

### Naive implementation

The simplest implementation would maintain a global sorted index of all
entries, one for `PERSISTENT` entries and one for `TEMPORARY` entries:

```cpp
struct EvictionKey
{
std::shared_ptr<LedgerEntry const> entry;
uint32_t liveUntilLedgerSeq;
};

std::set<EvictionKey> evictionIndex; // Sorted by (liveUntilLedgerSeq, LedgerKey)
```

At current state sizes, this additional index requires about 48 MB (`entry` is
just a pointer to the LedgerEntry already allocated from the data cache). This
overhead should be acceptable, even with significant state growth. Maintaining
this set should be simple, as we can do updates along with the atomic commits
we currently make to the BucketList/in-memory soroban cache.

### Future Optimizations

While the memory overhead of this list is fairly small, there may be some
runtime issues with O(log n) operations on TTL bumps, eviction, entry creation,
etc. Should this become an issue, one option is to make updates to the
in-memory soroban cache and the `evictionIndex` in parallel with BucketList
writes. These are independent operations that do not race on any data, and disk
IO would dominate in-memory updates even in an ordered set.

Additionally, it is not necessary to keep an ordered list of all entries. The
maximum number of entries that can be evicted is bounded by network config
settings. This allows us to maintain an ordered list of a subset of entries
that are already eligible for eviction, or are just about to become eligible.
This list can be prepared and maintained outside of the ledgerClose path. While
a TTL can be increased such that an entry in the list must be "skipped," TTLs
can never decrease such that a new entry "jumps the line" and invalidates the
previously constructed list.

For initial rollout, the naive solution is likely good enough.

## Design Rationale

To evict a temporary entry, we must write a `DEADENTRY` to the live BucketList,
which is approximately `sizeof(LedgerKey)`. This size is bounded and relatively
small, so there is no need to an explicit byte limit when evicting temporay
entries.

To evict a persistent entry, we must write a `DEADENTRY` to the live BucketList
and a `ARCHIVED` entry to the Hot Archive. The size of the `ARCHIVED` entry is
`sizeof(LedgerEntry)`, so can be very significant and is not properly bounded
by an entry count limit alone. For this reason, a byte based limit is
necessary. While an entry count based limit is not required for safe core
operation, this still seems like a valuable limit to prevent overwhelming
downstream consumers.
Loading