-
Notifications
You must be signed in to change notification settings - Fork 245
Port some perf improvement commits #1315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Use expand_fast_array and direct array assignment instead of JS_CreateDataPropertyUint32 for better performance when creating arrays from value arrays. Ref: bellard/quickjs@f4951ef
- Made JS_GetGlobalVar and JS_SetGlobalVar inline with fast paths - Removed OP_check_var and OP_put_var_strict opcodes - Simplified optimize_scope_make_global_ref to not use strict mode special code - Simplified bytecode optimizations that referenced removed opcodes - Updated microbench.js to use normal functions instead of eval - Regenerated bytecode for updated opcode indices Note: This removes full compliance with the spec for strict mode variable assignment so that they are as fast as in non strict mode (V8, SpiderMonkey and JavaScriptCore do the same). Ref: bellard/quickjs@2c90110
Add inline fast paths for common property access cases: - For OP_get_field and OP_get_field2: Walk prototype chain directly for non-exotic objects with normal data properties - For OP_put_field: Set property directly for writable data properties Falls back to slow path for exotic objects and special property types. Ref: bellard/quickjs@57f8ec0
Add inline fast paths for OP_post_inc and OP_post_dec when the operand is an integer. Fall back to slow path for overflow cases (INT32_MAX for increment, INT32_MIN for decrement) and non-integer values. Ref: bellard/quickjs@e5de89f
Make string_buffer_putc() an inline function with fast paths for common cases: - Direct write for characters < 0x10000 in wide mode - Surrogate pair handling for characters >= 0x10000 with buffer space - Direct write for 8-bit characters in narrow mode Rename string_buffer_putc_slow to string_buffer_putc16_slow and add new string_buffer_putc_slow for full Unicode handling. Ref: bellard/quickjs@79f3ae2
Replace expensive prototype chain traversal with flag checking. Instead of iterating through prototypes to verify no numeric properties exist, the code now: - Adds std_array_prototype field to JSContext to track whether Array.prototype is "normal" (no small index get/set properties) - Adds is_prototype flag to JSObject to identify prototype objects - Removes has_small_array_index from JSShape (now handled differently) - Sets std_array_prototype = false when Array.prototype or Object.prototype is modified in relevant ways - Uses the flag in JS_SetPropertyValue() and OP_put_array_el for fast path decisions This trades one boolean flag check for iterating multiple prototype objects during common array append operations. Ref: bellard/quickjs@c8a8cf5
Add fast path for OP_get_length that directly accesses the length property without calling JS_GetProperty. This mirrors the optimization already done for OP_get_field and OP_get_field2. When the object has a simple length property (not a getter/setter), the value is retrieved directly by walking the prototype chain. Ref: bellard/quickjs@3e5f2bb
Add fast path for push() on fast arrays that bypasses standard object handling and directly manipulates the array's internal value store. The optimization activates when: - The array is a fast array with standard prototype - The array is extensible - The length property is an integer and writable - The new length doesn't overflow When conditions are met, elements are bulk-inserted directly into the internal values array without property lookup overhead. Ref: bellard/quickjs@9a421b3
Instead of using goto to jump to slow path on int32 overflow, directly convert to float64 inline. This improves instruction cache locality and reduces branching overhead. The change affects: - OP_add: inline float conversion on overflow - OP_add_loc: inline float conversion on overflow - OP_sub: inline float conversion on overflow Ref: bellard/quickjs@3d0cc29
154516f to
8177bd9
Compare
- Rename dbuf_realloc to dbuf_claim with clearer semantics: allocate 'len' more bytes relative to current size instead of absolute size - Add overflow protection in dbuf_claim - Change allocation growth from (size * 3 / 2) to (size + size / 2) with overflow checks - Remove unused dbuf_write function - Update all call sites across quickjs.c, libregexp.c, libunicode.c Ref: bellard/quickjs@0d4cd2d
Optimize destructuring by avoiding the creation of reference objects when there are no 'with' statements in the scope chain (which is always the case in strict mode). This uses depth=0 for direct variable access instead of depth=2 with reference creation. Additional optimizations: - has_with_scope() now skips checking in strict mode (no 'with' allowed) - In non-strict mode, modifying a function name is now ignored (OP_scope_put_var with JS_VAR_FUNCTION_NAME emits OP_drop) Note: This removes full compliance with the spec for lvalue resolution when direct eval is present in compound assignments. V8 and other engines behave the same way. Ref: bellard/quickjs@e015918
39f0d99 to
3f6275f
Compare
|
@bnoordhuis This is now ready to review. Not sure how you want to go about it though, it grew quite a bit 😅 |
Replace is_local/is_arg boolean fields in JSClosureVar with a single closure_type enum (JSClosureTypeEnum) that supports 8 distinct types: - JS_CLOSURE_LOCAL: local variable in parent function - JS_CLOSURE_ARG: argument variable in parent function - JS_CLOSURE_REF: closure variable reference in parent function - JS_CLOSURE_GLOBAL_REF: global variable reference - JS_CLOSURE_GLOBAL_DECL: global variable declaration (eval) - JS_CLOSURE_GLOBAL: global variable (eval) - JS_CLOSURE_MODULE_DECL: module variable definition (eval) - JS_CLOSURE_MODULE_IMPORT: module import definition (eval) Ref: bellard/quickjs@a6816be
- Add pre-computed JSShape objects for arguments, mapped_arguments - Use fast_array mode with var_refs for JS_CLASS_MAPPED_ARGUMENTS - Arguments object elements now alias function parameters via JSVarRef - Add js_mapped_arguments_finalizer and js_mapped_arguments_mark - Modify JS_NewObjectFromShape to accept props parameter for initialization - Add js_create_var_ref function for creating detached var_refs - Add var_refs to GC immediately in get_var_ref (instead of at close time) The mapped arguments optimization allows arguments[i] to directly reference function parameters, enabling changes to propagate bidirectionally in non-strict mode functions. Ref: bellard/quickjs@9f11034
|
Gentle ping @bnoordhuis :-) |
|
Lo siento, señor Corretgé, no lo había olvidado pero no tengo mucho tiempo la semana pasada o hoy. ¡A lo mejor mañana! (Sí, he practicando español.) |
bnoordhuis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM:
-
I'm somewhat skeptical that all that open-coding of property lookups is beneficial. Did you check how it affects benchmarks and release build size?
-
That last commit optimizing
argumentsis only useful in sloppy mode, right? Doesn't seem very useful, is it to game some benchmarks that don't run in strict mode?
| /* fast path */ | ||
| p = JS_VALUE_GET_OBJ(ctx->global_obj); | ||
| prs = find_own_property(&pr, p, prop); | ||
| if (prs) { | ||
| if (likely((prs->flags & JS_PROP_TMASK) == 0)) | ||
| return js_dup(pr->u.value); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fast path is a deoptimization for things like getters on the global object, because they're looked up here but not acted on, then JS_GetPropertyInternal does the same lookup again.
(Also true for auto-init properties but that's a one-time cost so not so significant.)
Getting rid of two bytecode opcodes is nice though.
|
|
||
| static JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop, | ||
| bool throw_ref_error) | ||
| static inline JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
inlineis currently probably a no-op; there's only one call site so if the compiler thinks it's eligible for inlining, it's going to do it anyway -
if additional call sites arise, you probably don't want it inlined because of code bloat/duplication (not that a compiler is obliged to actually inline it, it's only a hint)
-
if the one call site for JS_GetGlobalVar were to get removed,
inlinestops the compiler from issuing an unused function warning
| } | ||
|
|
||
| /* 0 <= c <= 0x10ffff */ | ||
| static inline int string_buffer_putc(StringBuffer *s, uint32_t c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hunch is that this function is too big and branchy to want to inline it everywhere (or that the compiler will inline it everywhere.) Probably more effective to have just the c < 0x100 case in there.
| - Object.prototype has no small index properties which are get/set or non writable | ||
| - the prototype of Object.prototype is null (always true as it is immutable) | ||
| */ | ||
| uint8_t std_array_prototype; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Causes a 7 byte gap, I think? Maybe turn int binary_object_size into uint32_t binary_object_size : 31 and then this into a single bit field?
| true and p->extensible = true */ | ||
| static int add_fast_array_element(JSContext *ctx, JSObject *p, | ||
| JSValue val, int flags) | ||
| static inline int add_fast_array_element(JSContext *ctx, JSObject *p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, I doubt inline is beneficial here.
| if (unlikely((int)r != r)) | ||
| goto add_slow; | ||
| sp[-2] = js_int32(r); | ||
| sp[-2] = __JS_NewFloat64((double)r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't post it as a suggestion but: js_float64 and the explicit cast to double is unnecessary.
Aside: that if (unlikely((int)r != r)) one line up is unspecified behavior. That's better than undefined behavior but checking if (r < INT32_MIN || r > INT32_MAX) is better still.
|
Thanks for the review Ben! I'll take a look! |
|
great job, looks like the same
|
|
This is useful, thanks! How are you taking those, so I can test locally? |
https://github.com/gengjiawen/js-engines-playground/blob/038a911a228b1b96fa73eb92aec82a4701d3251a/benchmark/benchmark.js#L5 replace first two with local binary. use this to get all mainstream engines |
See each individual commit for details.