-
Notifications
You must be signed in to change notification settings - Fork 508
feat(es-compat): add index_filter support for field capabilities API #6102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(es-compat): add index_filter support for field capabilities API #6102
Conversation
1753d4c to
60cde2c
Compare
Implements index_filter parameter support for the ES-compatible _field_caps endpoint, allowing users to filter field capabilities based on document queries. Changes: - Add query_ast field to ListFieldsRequest and LeafListFieldsRequest protos - Parse index_filter from ES Query DSL and convert to QueryAst - Pass query_ast through to leaf nodes for future filtering support - Add unit tests for index_filter parsing - Add REST API integration tests Note: This implementation accepts and parses the index_filter parameter for API compatibility. Full split-level document filtering will be added as a follow-up enhancement. Closes quickwit-oss#5693 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: ruo <ruoliu.dev@gmail.com>
60cde2c to
7173546
Compare
| search_body: FieldCapabilityRequestBody, | ||
| ) -> Result<quickwit_proto::search::ListFieldsRequest, ElasticsearchError> { | ||
| // Parse index_filter if provided | ||
| let query_ast_json: Option<String> = if search_body.index_filter.is_null() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a well-named function with the prototype, to implement this block of code?
(serde_json::Value) -> anyhow::Result<Option<QueryAst>
| - quickwit | ||
| endpoint: fieldcaps/_field_caps?fields=name | ||
| json_body: | ||
| index_filter: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kudos for checking these edge cases too!
| // Full query-based split filtering is a TODO for future enhancement | ||
| if query_ast.is_some() { | ||
| tracing::debug!( | ||
| "index_filter provided for field capabilities, but split-level filtering is not yet \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not merge this if it is not implemented.
| # Test _field_caps API with index_filter (term query) | ||
| method: [POST] | ||
| engines: | ||
| - quickwit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you testing agains QW only? If this PR's goal is to increase the compatibility with ES it should be tested agains ES as well.
| /// If `query_ast` is provided (from ES-compatible `index_filter`), it indicates that | ||
| /// the caller wants to filter fields based on documents matching the query. | ||
| /// | ||
| /// Note: Full query-based filtering is not yet implemented. When `query_ast` is provided, | ||
| /// fields from all documents are currently returned. This matches the ES behavior of | ||
| /// returning field capabilities for indices where at least one document matches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ES spec sais the following:
Filter indices if the provided query rewrites to match_none on every shard.
IMPORTANT: The filtering is done on a best-effort basis, it uses index statistics and mappings to rewrite queries to match_none instead of fully running the request. For instance a range query over a date field can rewrite to match_none if all documents within a shard (including deleted documents) are outside of the provided range. However, not all queries can rewrite to match_none so this API may return an index even if the provided filter matches no document.
Given that in QW we don't have the necessary info to know much about the fields in the metadata, we need to open the splits. I think nevertheless that we need to aim for a lightweight query execution, even if that returns too many fields.
| let query_ast: QueryAst = elastic_query_dsl.try_into().map_err(|err: anyhow::Error| { | ||
| ElasticsearchError::new( | ||
| StatusCode::BAD_REQUEST, | ||
| format!("Failed to convert index_filter: {err}"), | ||
| None, | ||
| ) | ||
| })?; | ||
|
|
||
| Some(serde_json::to_string(&query_ast).expect("QueryAst should be JSON serializable")) | ||
| }; | ||
|
|
||
| Ok(quickwit_proto::search::ListFieldsRequest { | ||
| index_id_patterns, | ||
| fields: search_params.fields.unwrap_or_default(), | ||
| start_timestamp: search_params.start_timestamp, | ||
| end_timestamp: search_params.end_timestamp, | ||
| query_ast: query_ast_json, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to leverage the two metastore pruning capabilities:
- refine start and end time using the query ast
- extract tags from query ast and provide them to split listing call
…_filter Address PR review comments for index_filter support in _field_caps API: - Extract `parse_index_filter_to_query_ast()` function with clean prototype - Implement split-level filtering via `split_matches_query()` using lightweight `query.count()` execution (no document materialization) - Add proper async handling with ByteRangeCache, warmup(), and run_cpu_intensive() for Quickwit's async-only storage - Add metastore-level pruning: - Tag extraction via `extract_tags_from_query()` - Time range extraction via `refine_start_end_timestamp_from_ast()` - Build DocMapper only when query_ast is provided (no overhead for common path without index_filter) - Fix REST API tests: use `json:` key (not `json_body:`), use lowercase term values to match tokenizer behavior - Update tests to run against both quickwit and elasticsearch engines Two-level filtering now implemented: 1. Metastore level: tags + time range from query AST 2. Split level: lightweight query execution for accurate filtering Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: ruo <ruoliu.dev@gmail.com>
Summary
index_filterparameter support for the ES-compatible_field_capsendpointTest plan
Notes
This implementation accepts and parses the
index_filterparameter for API compatibility with Elasticsearch. Full split-level document filtering (executing queries to prune splits with no matching documents) can be added as a follow-up enhancement.Closes #5693
🤖 Generated with Claude Code