-
Notifications
You must be signed in to change notification settings - Fork 92
Implement AutoPopulate 2.0 #1290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dimitri-yatsenko
wants to merge
56
commits into
pre/v2.0
Choose a base branch
from
claude/spec-issue-1243-YvqmF
base: pre/v2.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+3,748
−462
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces a modern, extensible custom type system for DataJoint: **New Features:** - AttributeType base class with encode()/decode() methods - Global type registry with @register_type decorator - Entry point discovery for third-party type packages (datajoint.types) - Type chaining: dtype can reference another custom type - Automatic validation via validate() method before encoding - resolve_dtype() for resolving chained types **API Changes:** - New: dj.AttributeType, dj.register_type, dj.list_types - AttributeAdapter is now deprecated (backward-compatible wrapper) - Feature flag DJ_SUPPORT_ADAPTED_TYPES is no longer required **Entry Point Specification:** Third-party packages can declare types in pyproject.toml: [project.entry-points."datajoint.types"] zarr_array = "dj_zarr:ZarrArrayType" **Migration Path:** Old AttributeAdapter subclasses continue to work but emit DeprecationWarning. Migrate to AttributeType with encode/decode.
- Rewrite customtype.md with comprehensive documentation: - Overview of encode/decode pattern - Required components (type_name, dtype, encode, decode) - Type registration with @dj.register_type decorator - Validation with validate() method - Storage types (dtype options) - Type chaining for composable types - Key parameter for context-aware encoding - Entry point packages for distribution - Complete neuroscience example - Migration guide from AttributeAdapter - Best practices - Update attributes.md to reference custom types
Introduces `<djblob>` as an explicit AttributeType for DataJoint's native blob serialization, allowing users to be explicit about serialization behavior in table definitions. Key changes: - Add DJBlobType class with `serializes=True` flag to indicate it handles its own serialization (avoiding double pack/unpack) - Update table.py and fetch.py to respect the `serializes` flag, skipping blob.pack/unpack when adapter handles serialization - Add `dj.migrate` module with utilities for migrating existing schemas to use explicit `<djblob>` type declarations - Add tests for DJBlobType functionality - Document `<djblob>` type and migration procedure The migration is metadata-only - blob data format is unchanged. Existing `longblob` columns continue to work with implicit serialization for backward compatibility.
Simplified design: - Plain longblob columns store/return raw bytes (no serialization) - <djblob> type handles serialization via encode/decode - Legacy AttributeAdapter handles blob pack/unpack internally for backward compatibility This eliminates the need for the serializes flag by making blob serialization the responsibility of the adapter/type, not the framework. Migration to <djblob> is now required for existing schemas that rely on implicit serialization.
…adapted-type-1W3ap
…p' into claude/upgrade-adapted-type-1W3ap
Design specification for issue #1243 proposing: - Per-table jobs tables with native primary keys - Extended status values (pending, reserved, success, error, ignore) - Priority and scheduling support - Referential integrity via foreign keys - Automatic refresh on populate
Auto-populated tables must have primary keys composed entirely of foreign key references. This ensures 1:1 job correspondence and enables proper referential integrity for the jobs table.
- Jobs tables have matching primary key structure but no FK constraints
- Stale jobs (from deleted upstream records) handled by refresh()
- Added created_time field for stale detection
- refresh() now returns {added, removed} counts
- Updated rationale sections to reflect performance-focused design
- Jobs table automatically dropped when target table is dropped/altered - schema.jobs returns list of JobsTable objects for all auto-populated tables - Updated dashboard examples to use schema.jobs iteration
- Updated state transition diagram to show only automatic transitions - Added note that ignore is manually set and skipped by populate/refresh - reset() can move ignore jobs back to pending
Major changes: - Remove reset() method; use delete() + refresh() instead - Jobs go from any state → (none) via delete, then → pending via refresh() - Shorten deprecation roadmap: clean break, no legacy support - Jobs tables created lazily on first populate(reserve_jobs=True) - Legacy tables with extra PK attributes: jobs table uses only FK-derived keys
- Remove SELECT FOR UPDATE locking from job reservation - Conflicts (rare) resolved by make() transaction's duplicate key error - Second worker catches error and moves to next job - Simpler code, better performance on high-traffic jobs table
Each job is marked as 'reserved' individually before its make() call, matching the current implementation's behavior.
- Replace ASCII diagram with Mermaid stateDiagram - Remove separate schedule() and set_priority() methods - refresh() now handles scheduling via scheduled_time and priority params - Clarify complete() can delete or keep job based on settings
- ignore() can be called on keys not yet in jobs table - Reserve is done via update1() per key, client provides pid/host/connection_id - Removed specific SQL query from spec
If a success job's key is still in key_source but the target entry was deleted, refresh() will transition it back to pending.
Replaces multiple [*] start/end states with a single explicit "(none)" state for clarity.
- Use complete() and complete()* notation for conditional transitions - Same for refresh() and refresh()* - Remove clear_completed(); use (jobs & 'status="success"').delete() instead - Note that delete() requires no confirmation (low-cost operation)
- Priority: lower = more urgent (0 = highest), default = 5 - Acyclic state diagram with dual (none) states - delete() inherited from delete_quick(), use (jobs & cond).delete() - Added 'ignored' property for consistency - populate() logic: fetch pending first, only refresh if no pending found - Updated all examples to reflect new priority semantics
This commit implements the per-table jobs system specified in the Autopopulate 2.0 design document. New features: - Per-table JobsTable class (jobs_v2.py) with FK-derived primary keys - Status enum: pending, reserved, success, error, ignore - Priority system (lower = more urgent, 0 = highest, default = 5) - Scheduled processing via delay parameter - Methods: refresh(), reserve(), complete(), error(), ignore() - Properties: pending, reserved, errors, ignored, completed, progress() Configuration (settings.py): - New JobsSettings class with: - jobs.auto_refresh (default: True) - jobs.keep_completed (default: False) - jobs.stale_timeout (default: 3600 seconds) - jobs.default_priority (default: 5) AutoPopulate changes (autopopulate.py): - Added jobs property to access per-table JobsTable - Updated populate() with new parameters: priority, refresh - Updated _populate1() to use new JobsTable API - Collision errors (DuplicateError) handled silently per spec Schema changes (schemas.py): - Track auto-populated tables during decoration - schema.jobs now returns list of JobsTable objects - Added schema.legacy_jobs for backward compatibility
Override drop_quick() in Imported and Computed to also drop the associated jobs table when the main table is dropped.
Comprehensive test suite for the new per-table jobs system: - JobsTable structure and initialization - refresh() method with priority and delay - reserve() method and reservation conflicts - complete() method with keep option - error() method and message truncation - ignore() method - Status filter properties (pending, reserved, errors, ignored, completed) - progress() method - populate() with reserve_jobs=True - schema.jobs property - Configuration settings
- Remove unused `job` dict and `now` variable in refresh() - Remove unused `pk_attrs` in fetch_pending() - Remove unused datetime import - Apply ruff-format formatting changes
Replace schema-wide `~jobs` table with per-table JobsTable (Autopopulate 2.0): - Delete src/datajoint/jobs.py (old JobTable class) - Remove legacy_jobs property from Schema class - Delete tests/test_jobs.py (old schema-wide tests) - Remove clean_jobs fixture and schema.jobs.delete() cleanup calls - Update test_autopopulate.py to use new per-table jobs API The new system provides per-table job queues with FK-derived primary keys, rich status tracking (pending/reserved/success/error/ignore), priority scheduling, and proper handling of job collisions.
Now that the legacy schema-wide jobs system has been removed, rename the new per-table jobs module to its canonical name: - src/datajoint/jobs_v2.py -> src/datajoint/jobs.py - tests/test_jobs_v2.py -> tests/test_jobs.py - Update imports in autopopulate.py and test_jobs.py
- Use variable assignment for pk_section instead of chr(10) in f-string - Change error_stack type from mediumblob to <djblob> - Use update1() in error() instead of raw SQL and deprecated _update() - Remove config.override(enable_python_native_blobs=True) wrapper Note: reserve() keeps raw SQL for atomic conditional update with rowcount check - this is required for safe concurrent job reservation.
- reserve() now uses update1 instead of raw SQL - Remove status='pending' check since populate verifies this - Change return type from bool to None - Update autopopulate.py to not check reserve return value - Update tests to reflect new behavior
The new implementation always populates self - the target property is no longer needed. All references to self.target replaced with self.
- Inline the logic directly in populate() and progress() - Move restriction check to populate() - Use (self.key_source & AndList(restrictions)).proj() directly - Remove unused QueryExpression import
- Remove early jobs_table assignment, use self.jobs directly - Fix comment: key_source is correct behavior, not legacy - Use self.jobs directly in _get_pending_jobs
Method only called from one place, no need for separate function.
- Remove 'order' parameter (conflicts with priority/scheduled_time) - Remove 'limit' parameter, keep only 'max_calls' for simplicity - Remove unused 'random' import
…t' into claude/upgrade-adapted-type-1W3ap
Co-authored-by: dimitri-yatsenko <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix issue #1243.
Implement AutoPopulate 2.0.