docs: document NetSuite sync modes, delete handling, and table types#2667
docs: document NetSuite sync modes, delete handling, and table types#2667
Conversation
Add "Sync modes and data loading" sections to both NetSuite connector docs (SuiteAnalytics and SuiteQL) explaining how incremental vs full refresh is determined, how deletes are captured via DeletedRecord, and the distinction between base tables and linking/junction tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
jshearer
left a comment
There was a problem hiding this comment.
Mainly the SuiteQL information here is factually incorrect. If you don't already have access to the netsuite repo just make an access request and Mike or I can add you, there's a lot of valuable data in there about the history of the connectors, decisions we made in PR/commit messages etc.
|
|
||
| ## Sync modes and data loading | ||
|
|
||
| Each table binding uses one of two sync modes, determined by the cursor fields available on the table. |
There was a problem hiding this comment.
will come back after reading the whole thing, but... I would say the connector has at least 3 sync modes: snapshots, periodic backfills, and incremental replication.
|
|
||
| ### Full refresh | ||
|
|
||
| Tables without a `log_cursor` are synced via **full refresh** — the connector re-reads the entire table on each sync. This is common for linking and junction tables (e.g., `TransactionLine`, `NextTransactionLineLink`) that lack a `lastmodifieddate` column. |
There was a problem hiding this comment.
https://github.com/estuary/source-netsuite/pull/193 will make these referenced tables out of date
| There are two full refresh modes: | ||
|
|
||
| - **Paginated backfill** — Uses the [`page_cursor`](#bindings) to read the table in ordered pages. Pair this with a [`schedule`](#setting-a-schedule) cron expression to control how often the full refresh runs (e.g., daily). | ||
| - **Snapshot backfill** — Set [`snapshot_backfill: true`](#bindings) when no good page cursor exists and the table is small enough for a single query. Snapshot mode manages its own schedule via the `interval` field. Do **not** combine `snapshot_backfill` with a cron `schedule` — this will cause issues with delete emission. |
There was a problem hiding this comment.
It's probably worth noting that periodic snapshots using interval are capable of capturing deletions on tables that are not covered by DeletedRecord
|
|
||
| ### Table associations | ||
|
|
||
| Linking tables can optionally be loaded as **associations** of a parent table instead of — or in addition to — standalone bindings. For example, `TransactionAccountingLine` can be associated with `Transaction` so that when a transaction is modified, its related accounting lines are also loaded. |
There was a problem hiding this comment.
I... have been somewhat hesitant to actually talk about this feature because of the major footgun that unless every change in the child table is reflected in an update to the parent table's last-modified cursor (which is almost never the case), using this feature will only appear to work, but actually miss a bunch of change events.
| If you need to capture a table that is not yet supported, [contact support](mailto:support@estuary.dev) with the table name(s). | ||
| Estuary support will be able to confirm availability and, if needed, add the table(s) to the connector. | ||
|
|
||
| ## Sync modes and data loading |
There was a problem hiding this comment.
SuiteQL does not support incremental sync, deletion tracking (except via snapshots, those still capture deletes via _meta/row_id), associations, etc. This was an intentional decision we made to simplify the implementation given the constraints. Here's a comparison table that I had Claude put together. Are you in the netsuite connector repo? We should probably get you in there so you (or your agents ;)) can poke around.
Fundamental SuiteQL API Constraints That Drove the Design
These four API-level constraints shaped every design decision:
-
No table metadata introspection - SuiteQL has no equivalent of
OA_TABLES/OA_COLUMNS/OA_FKEYS. You cannot programmatically discover schemas, column types, primary keys, or foreign keys. The connector requires manual table/key specification viaEndpointConfig.tables, with defaults from a smallhints.py(19 lines, 10 tables). -
100-column silent failure - SuiteQL silently returns zero results if a table exceeds 100 columns. Users can specify explicit
columnslists, or useSELECT *with the understanding it fails silently on wide tables. -
100k row result limit - SuiteQL caps results at 100,000 rows per query. The connector uses rownum-based subquery pagination (
wrap_in_paged_subquery), but each subsequent page re-queries already-seen data, making it progressively slower. Practical ceiling is low single-digit millions of rows. -
Datetime information loss - SuiteQL returns datetime columns as date-only strings. Previous attempts to use
TO_CHAR()conversions caused timezone mismatch bugs, and including many date-to-datetime conversions in a single query also causes failures. The new connector accepts this and returns data as-is, without hour/minute/second information.
Features SuiteQL Does NOT Support vs SuiteAnalytics
| Capability | SuiteAnalytics | SuiteQL |
|---|---|---|
| Incremental replication | Yes (via lastmodifieddate cursor fields) |
No. Full refreshes only. |
| Automatic schema discovery | Yes (queries OA_COLUMNS with types, PKs, FKs) |
No. Manual table/key config only. |
| Deletion tracking | Yes (DeletedRecord table + tombstones) |
No. Full refresh overwrites. |
| Associated document loading | Yes (child rows loaded with parent changes) | No. |
| Maximum columns supported | 1,000 | 100 |
| Concurrent backfill chunks | Yes (parallel cursor-range splitting) | No. Single-threaded only. |
| Typed schema models | Yes (full Pydantic models with correct types) | No. All fields are passed through from the API response, relies on schema inference. |
| Boolean normalization | Yes (T/F to true/false) |
No. Data as-is. |
| Datetime with timezone | Yes (UTC assumption, proper datetime types) | No. Dates only. |
Summary
log_cursor/lastmodifieddate)DeletedRecordsystem table, including which table types support it (base tables) and which don't (linking/junction tables)Context
Customer conversations revealed that none of this behavior was documented. The docs previously listed configuration properties but didn't explain the resulting sync behavior, which tables get deletes, or how to check/change sync modes.
Test plan
🤖 Generated with Claude Code