Implement AutoPopulate 2.0 #1290

dimitri-yatsenko · 2025-12-22T19:22:01Z

Fix issue #1243.

Implement AutoPopulate 2.0.

This commit introduces a modern, extensible custom type system for DataJoint: **New Features:** - AttributeType base class with encode()/decode() methods - Global type registry with @register_type decorator - Entry point discovery for third-party type packages (datajoint.types) - Type chaining: dtype can reference another custom type - Automatic validation via validate() method before encoding - resolve_dtype() for resolving chained types **API Changes:** - New: dj.AttributeType, dj.register_type, dj.list_types - AttributeAdapter is now deprecated (backward-compatible wrapper) - Feature flag DJ_SUPPORT_ADAPTED_TYPES is no longer required **Entry Point Specification:** Third-party packages can declare types in pyproject.toml: [project.entry-points."datajoint.types"] zarr_array = "dj_zarr:ZarrArrayType" **Migration Path:** Old AttributeAdapter subclasses continue to work but emit DeprecationWarning. Migrate to AttributeType with encode/decode.

- Rewrite customtype.md with comprehensive documentation: - Overview of encode/decode pattern - Required components (type_name, dtype, encode, decode) - Type registration with @dj.register_type decorator - Validation with validate() method - Storage types (dtype options) - Type chaining for composable types - Key parameter for context-aware encoding - Entry point packages for distribution - Complete neuroscience example - Migration guide from AttributeAdapter - Best practices - Update attributes.md to reference custom types

Introduces `<djblob>` as an explicit AttributeType for DataJoint's native blob serialization, allowing users to be explicit about serialization behavior in table definitions. Key changes: - Add DJBlobType class with `serializes=True` flag to indicate it handles its own serialization (avoiding double pack/unpack) - Update table.py and fetch.py to respect the `serializes` flag, skipping blob.pack/unpack when adapter handles serialization - Add `dj.migrate` module with utilities for migrating existing schemas to use explicit `<djblob>` type declarations - Add tests for DJBlobType functionality - Document `<djblob>` type and migration procedure The migration is metadata-only - blob data format is unchanged. Existing `longblob` columns continue to work with implicit serialization for backward compatibility.

Simplified design: - Plain longblob columns store/return raw bytes (no serialization) - <djblob> type handles serialization via encode/decode - Legacy AttributeAdapter handles blob pack/unpack internally for backward compatibility This eliminates the need for the serializes flag by making blob serialization the responsibility of the adapter/type, not the framework. Migration to <djblob> is now required for existing schemas that rely on implicit serialization.

…adapted-type-1W3ap

…p' into claude/upgrade-adapted-type-1W3ap

Design specification for issue #1243 proposing: - Per-table jobs tables with native primary keys - Extended status values (pending, reserved, success, error, ignore) - Priority and scheduling support - Referential integrity via foreign keys - Automatic refresh on populate

Auto-populated tables must have primary keys composed entirely of foreign key references. This ensures 1:1 job correspondence and enables proper referential integrity for the jobs table.

- Jobs tables have matching primary key structure but no FK constraints - Stale jobs (from deleted upstream records) handled by refresh() - Added created_time field for stale detection - refresh() now returns {added, removed} counts - Updated rationale sections to reflect performance-focused design

- Jobs table automatically dropped when target table is dropped/altered - schema.jobs returns list of JobsTable objects for all auto-populated tables - Updated dashboard examples to use schema.jobs iteration

- Updated state transition diagram to show only automatic transitions - Added note that ignore is manually set and skipped by populate/refresh - reset() can move ignore jobs back to pending

Major changes: - Remove reset() method; use delete() + refresh() instead - Jobs go from any state → (none) via delete, then → pending via refresh() - Shorten deprecation roadmap: clean break, no legacy support - Jobs tables created lazily on first populate(reserve_jobs=True) - Legacy tables with extra PK attributes: jobs table uses only FK-derived keys

- Remove SELECT FOR UPDATE locking from job reservation - Conflicts (rare) resolved by make() transaction's duplicate key error - Second worker catches error and moves to next job - Simpler code, better performance on high-traffic jobs table

…apted-type-1W3ap

Each job is marked as 'reserved' individually before its make() call, matching the current implementation's behavior.

- Replace ASCII diagram with Mermaid stateDiagram - Remove separate schedule() and set_priority() methods - refresh() now handles scheduling via scheduled_time and priority params - Clarify complete() can delete or keep job based on settings

- ignore() can be called on keys not yet in jobs table - Reserve is done via update1() per key, client provides pid/host/connection_id - Removed specific SQL query from spec

If a success job's key is still in key_source but the target entry was deleted, refresh() will transition it back to pending.

Replaces multiple [*] start/end states with a single explicit "(none)" state for clarity.

- Use complete() and complete()* notation for conditional transitions - Same for refresh() and refresh()* - Remove clear_completed(); use (jobs & 'status="success"').delete() instead - Note that delete() requires no confirmation (low-cost operation)

- Priority: lower = more urgent (0 = highest), default = 5 - Acyclic state diagram with dual (none) states - delete() inherited from delete_quick(), use (jobs & cond).delete() - Added 'ignored' property for consistency - populate() logic: fetch pending first, only refresh if no pending found - Updated all examples to reflect new priority semantics

This commit implements the per-table jobs system specified in the Autopopulate 2.0 design document. New features: - Per-table JobsTable class (jobs_v2.py) with FK-derived primary keys - Status enum: pending, reserved, success, error, ignore - Priority system (lower = more urgent, 0 = highest, default = 5) - Scheduled processing via delay parameter - Methods: refresh(), reserve(), complete(), error(), ignore() - Properties: pending, reserved, errors, ignored, completed, progress() Configuration (settings.py): - New JobsSettings class with: - jobs.auto_refresh (default: True) - jobs.keep_completed (default: False) - jobs.stale_timeout (default: 3600 seconds) - jobs.default_priority (default: 5) AutoPopulate changes (autopopulate.py): - Added jobs property to access per-table JobsTable - Updated populate() with new parameters: priority, refresh - Updated _populate1() to use new JobsTable API - Collision errors (DuplicateError) handled silently per spec Schema changes (schemas.py): - Track auto-populated tables during decoration - schema.jobs now returns list of JobsTable objects - Added schema.legacy_jobs for backward compatibility

Override drop_quick() in Imported and Computed to also drop the associated jobs table when the main table is dropped.

Comprehensive test suite for the new per-table jobs system: - JobsTable structure and initialization - refresh() method with priority and delay - reserve() method and reservation conflicts - complete() method with keep option - error() method and message truncation - ignore() method - Status filter properties (pending, reserved, errors, ignored, completed) - progress() method - populate() with reserve_jobs=True - schema.jobs property - Configuration settings

- Remove unused `job` dict and `now` variable in refresh() - Remove unused `pk_attrs` in fetch_pending() - Remove unused datetime import - Apply ruff-format formatting changes

Replace schema-wide `~jobs` table with per-table JobsTable (Autopopulate 2.0): - Delete src/datajoint/jobs.py (old JobTable class) - Remove legacy_jobs property from Schema class - Delete tests/test_jobs.py (old schema-wide tests) - Remove clean_jobs fixture and schema.jobs.delete() cleanup calls - Update test_autopopulate.py to use new per-table jobs API The new system provides per-table job queues with FK-derived primary keys, rich status tracking (pending/reserved/success/error/ignore), priority scheduling, and proper handling of job collisions.

Now that the legacy schema-wide jobs system has been removed, rename the new per-table jobs module to its canonical name: - src/datajoint/jobs_v2.py -> src/datajoint/jobs.py - tests/test_jobs_v2.py -> tests/test_jobs.py - Update imports in autopopulate.py and test_jobs.py

- Use variable assignment for pk_section instead of chr(10) in f-string - Change error_stack type from mediumblob to <djblob> - Use update1() in error() instead of raw SQL and deprecated _update() - Remove config.override(enable_python_native_blobs=True) wrapper Note: reserve() keeps raw SQL for atomic conditional update with rowcount check - this is required for safe concurrent job reservation.

- reserve() now uses update1 instead of raw SQL - Remove status='pending' check since populate verifies this - Change return type from bool to None - Update autopopulate.py to not check reserve return value - Update tests to reflect new behavior

The new implementation always populates self - the target property is no longer needed. All references to self.target replaced with self.

- Inline the logic directly in populate() and progress() - Move restriction check to populate() - Use (self.key_source & AndList(restrictions)).proj() directly - Remove unused QueryExpression import

- Remove early jobs_table assignment, use self.jobs directly - Fix comment: key_source is correct behavior, not legacy - Use self.jobs directly in _get_pending_jobs

Method only called from one place, no need for separate function.

- Remove 'order' parameter (conflicts with priority/scheduled_time) - Remove 'limit' parameter, keep only 'max_calls' for simplicity - Remove unused 'random' import

…t' into claude/upgrade-adapted-type-1W3ap

Co-authored-by: dimitri-yatsenko <[email protected]>

claude and others added 13 commits December 21, 2025 02:18

Apply ruff-format fixes to AttributeType implementation

af9bd8d

Clarify migration handles all blob type variants

c8d8a22

Fix ruff linter errors: add migrate to __all__, remove unused import

61db015

Remove unused blob imports from fetch.py and table.py

c173356

Update docs: use <djblob> for serialized data, longblob for raw bytes

106f859

Merge branch 'claude/add-file-column-type-LtXQt' into claude/upgrade-…

e293fec

…adapted-type-1W3ap

Merge claude/add-file-column-type-LtXQt into upgrade-adapted-type

d66f76e

Merge remote-tracking branch 'origin/claude/upgrade-adapted-type-1W3a…

2f9b2be

…p' into claude/upgrade-adapted-type-1W3ap

github-actions bot added enhancement Indicates new improvements documentation Issues related to documentation labels Dec 22, 2025

claude added 15 commits December 22, 2025 20:44

Add foreign-key-only primary key constraint to spec

df94fcc

Auto-populated tables must have primary keys composed entirely of foreign key references. This ensures 1:1 job correspondence and enables proper referential integrity for the jobs table.

Add table drop/alter behavior and schema.jobs list API

4637708

- Jobs table automatically dropped when target table is dropped/altered - schema.jobs returns list of JobsTable objects for all auto-populated tables - Updated dashboard examples to use schema.jobs iteration

Clarify ignore status is manual, not automatic transition

68d876d

- Updated state transition diagram to show only automatic transitions - Added note that ignore is manually set and skipped by populate/refresh - reset() can move ignore jobs back to pending

Merge remote-tracking branch 'origin/pre/v2.0' into claude/upgrade-ad…

1d4b7b4

…apted-type-1W3ap

Clarify per-key reservation flow in populate()

8900fea

Each job is marked as 'reserved' individually before its make() call, matching the current implementation's behavior.

Merge pre/v2.0 into spec-issue-1243

1f56102

Add (none)->ignore transition, simplify reserve description

3018b8f

- ignore() can be called on keys not yet in jobs table - Reserve is done via update1() per key, client provides pid/host/connection_id - Removed specific SQL query from spec

Add success->pending transition via refresh()

7eda583

If a success job's key is still in key_source but the target entry was deleted, refresh() will transition it back to pending.

Use explicit (none) state in Mermaid diagram

bab7e10

Replaces multiple [*] start/end states with a single explicit "(none)" state for clarity.

claude added 4 commits December 23, 2025 00:30

Drop jobs table when auto-populated table is dropped

53bd28d

Override drop_quick() in Imported and Computed to also drop the associated jobs table when the main table is dropped.

Fix ruff linting errors and reformat

e89e064

- Remove unused `job` dict and `now` variable in refresh() - Remove unused `pk_attrs` in fetch_pending() - Remove unused datetime import - Apply ruff-format formatting changes

dimitri-yatsenko assigned ttngu207 Dec 23, 2025

dimitri-yatsenko requested review from mweitzel and ttngu207 December 23, 2025 00:39

claude added 15 commits December 23, 2025 00:45

Simplify reserve() to use update1

8430e2a

- reserve() now uses update1 instead of raw SQL - Remove status='pending' check since populate verifies this - Change return type from bool to None - Update autopopulate.py to not check reserve return value - Update tests to reflect new behavior

Use update1 in complete() method

34c302a

Simplify: use self.proj() for jobs table projections

e0d6fd9

Simplify ignore(): only insert new records, cannot convert existing

83b7f49

Use insert1 in _insert_job_with_status instead of explicit SQL

080b6c0

Remove AutoPopulate._job_key - no longer needed

84ba4b7

Remove AutoPopulate.target property

6ef2de7

The new implementation always populates self - the target property is no longer needed. All references to self.target replaced with self.

Remove legacy _make_tuples callback support - use self.make exclusively

55d7f32

Eliminate _jobs_to_do method

7b28c64

- Inline the logic directly in populate() and progress() - Move restriction check to populate() - Use (self.key_source & AndList(restrictions)).proj() directly - Remove unused QueryExpression import

Simplify jobs variable usage in populate()

d28fa7c

- Remove early jobs_table assignment, use self.jobs directly - Fix comment: key_source is correct behavior, not legacy - Use self.jobs directly in _get_pending_jobs

Inline _get_pending_jobs into populate()

7d595fb

Method only called from one place, no need for separate function.

Remove order parameter and consolidate limit/max_calls

0a5f3a9

- Remove 'order' parameter (conflicts with priority/scheduled_time) - Remove 'limit' parameter, keep only 'max_calls' for simplicity - Remove unused 'random' import

dimitri-yatsenko mentioned this pull request Dec 23, 2025

Update documentation using DataJoint Book #1291

Closed

claude added 2 commits December 23, 2025 22:31

Merge remote-tracking branch 'origin/claude/add-file-column-type-LtXQ…

8091225

…t' into claude/upgrade-adapted-type-1W3ap

Merge claude/upgrade-adapted-type-1W3ap

bab4db8

dimitri-yatsenko changed the base branch from pre/v2.0 to claude/upgrade-adapted-type-1W3ap December 24, 2025 19:29

dimitri-yatsenko mentioned this pull request Dec 25, 2025

Unexpected / poorly documented behaviour of keyword limit in table.populate() #1203

Open

Merge pre/v2.0

4a43f50

Co-authored-by: dimitri-yatsenko <[email protected]>

dimitri-yatsenko marked this pull request as draft December 30, 2025 23:43

Base automatically changed from claude/upgrade-adapted-type-1W3ap to pre/v2.0 January 1, 2026 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement AutoPopulate 2.0 #1290

Implement AutoPopulate 2.0 #1290

dimitri-yatsenko commented Dec 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement AutoPopulate 2.0 #1290

Are you sure you want to change the base?

Implement AutoPopulate 2.0 #1290

Conversation

dimitri-yatsenko commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dimitri-yatsenko commented Dec 22, 2025 •

edited

Loading