Skip to content

Conversation

@saghul
Copy link
Contributor

@saghul saghul commented Jan 12, 2026

See each individual commit for details.

Use expand_fast_array and direct array assignment instead of
JS_CreateDataPropertyUint32 for better performance when creating
arrays from value arrays.

Ref: bellard/quickjs@f4951ef
- Made JS_GetGlobalVar and JS_SetGlobalVar inline with fast paths
- Removed OP_check_var and OP_put_var_strict opcodes
- Simplified optimize_scope_make_global_ref to not use strict mode special code
- Simplified bytecode optimizations that referenced removed opcodes
- Updated microbench.js to use normal functions instead of eval
- Regenerated bytecode for updated opcode indices

Note: This removes full compliance with the spec for strict mode variable
assignment so that they are as fast as in non strict mode (V8, SpiderMonkey
and JavaScriptCore do the same).

Ref: bellard/quickjs@2c90110
Add inline fast paths for common property access cases:
- For OP_get_field and OP_get_field2: Walk prototype chain directly
  for non-exotic objects with normal data properties
- For OP_put_field: Set property directly for writable data properties

Falls back to slow path for exotic objects and special property types.

Ref: bellard/quickjs@57f8ec0
Add inline fast paths for OP_post_inc and OP_post_dec when the
operand is an integer. Fall back to slow path for overflow cases
(INT32_MAX for increment, INT32_MIN for decrement) and non-integer
values.

Ref: bellard/quickjs@e5de89f
Make string_buffer_putc() an inline function with fast paths for common
cases:
- Direct write for characters < 0x10000 in wide mode
- Surrogate pair handling for characters >= 0x10000 with buffer space
- Direct write for 8-bit characters in narrow mode

Rename string_buffer_putc_slow to string_buffer_putc16_slow and add new
string_buffer_putc_slow for full Unicode handling.

Ref: bellard/quickjs@79f3ae2
Replace expensive prototype chain traversal with flag checking. Instead
of iterating through prototypes to verify no numeric properties exist,
the code now:

- Adds std_array_prototype field to JSContext to track whether
  Array.prototype is "normal" (no small index get/set properties)
- Adds is_prototype flag to JSObject to identify prototype objects
- Removes has_small_array_index from JSShape (now handled differently)
- Sets std_array_prototype = false when Array.prototype or
  Object.prototype is modified in relevant ways
- Uses the flag in JS_SetPropertyValue() and OP_put_array_el for fast
  path decisions

This trades one boolean flag check for iterating multiple prototype
objects during common array append operations.

Ref: bellard/quickjs@c8a8cf5
Add fast path for OP_get_length that directly accesses the length
property without calling JS_GetProperty. This mirrors the optimization
already done for OP_get_field and OP_get_field2.

When the object has a simple length property (not a getter/setter),
the value is retrieved directly by walking the prototype chain.

Ref: bellard/quickjs@3e5f2bb
Add fast path for push() on fast arrays that bypasses standard object
handling and directly manipulates the array's internal value store.

The optimization activates when:
- The array is a fast array with standard prototype
- The array is extensible
- The length property is an integer and writable
- The new length doesn't overflow

When conditions are met, elements are bulk-inserted directly into
the internal values array without property lookup overhead.

Ref: bellard/quickjs@9a421b3
Instead of using goto to jump to slow path on int32 overflow, directly
convert to float64 inline. This improves instruction cache locality
and reduces branching overhead.

The change affects:
- OP_add: inline float conversion on overflow
- OP_add_loc: inline float conversion on overflow
- OP_sub: inline float conversion on overflow

Ref: bellard/quickjs@3d0cc29
@saghul saghul force-pushed the port-perf branch 2 times, most recently from 154516f to 8177bd9 Compare January 12, 2026 23:04
- Rename dbuf_realloc to dbuf_claim with clearer semantics: allocate
  'len' more bytes relative to current size instead of absolute size
- Add overflow protection in dbuf_claim
- Change allocation growth from (size * 3 / 2) to (size + size / 2)
  with overflow checks
- Remove unused dbuf_write function
- Update all call sites across quickjs.c, libregexp.c, libunicode.c

Ref: bellard/quickjs@0d4cd2d
Optimize destructuring by avoiding the creation of reference objects
when there are no 'with' statements in the scope chain (which is always
the case in strict mode). This uses depth=0 for direct variable access
instead of depth=2 with reference creation.

Additional optimizations:
- has_with_scope() now skips checking in strict mode (no 'with' allowed)
- In non-strict mode, modifying a function name is now ignored
  (OP_scope_put_var with JS_VAR_FUNCTION_NAME emits OP_drop)

Note: This removes full compliance with the spec for lvalue resolution
when direct eval is present in compound assignments. V8 and other
engines behave the same way.

Ref: bellard/quickjs@e015918
@saghul saghul marked this pull request as ready for review January 13, 2026 05:50
@saghul saghul requested a review from bnoordhuis January 13, 2026 05:50
@saghul saghul force-pushed the port-perf branch 2 times, most recently from 39f0d99 to 3f6275f Compare January 13, 2026 14:12
@saghul
Copy link
Contributor Author

saghul commented Jan 13, 2026

@bnoordhuis This is now ready to review. Not sure how you want to go about it though, it grew quite a bit 😅

Replace is_local/is_arg boolean fields in JSClosureVar with a single
closure_type enum (JSClosureTypeEnum) that supports 8 distinct types:
- JS_CLOSURE_LOCAL: local variable in parent function
- JS_CLOSURE_ARG: argument variable in parent function
- JS_CLOSURE_REF: closure variable reference in parent function
- JS_CLOSURE_GLOBAL_REF: global variable reference
- JS_CLOSURE_GLOBAL_DECL: global variable declaration (eval)
- JS_CLOSURE_GLOBAL: global variable (eval)
- JS_CLOSURE_MODULE_DECL: module variable definition (eval)
- JS_CLOSURE_MODULE_IMPORT: module import definition (eval)

Ref: bellard/quickjs@a6816be
- Add pre-computed JSShape objects for arguments, mapped_arguments
- Use fast_array mode with var_refs for JS_CLASS_MAPPED_ARGUMENTS
- Arguments object elements now alias function parameters via JSVarRef
- Add js_mapped_arguments_finalizer and js_mapped_arguments_mark
- Modify JS_NewObjectFromShape to accept props parameter for initialization
- Add js_create_var_ref function for creating detached var_refs
- Add var_refs to GC immediately in get_var_ref (instead of at close time)

The mapped arguments optimization allows arguments[i] to directly reference
function parameters, enabling changes to propagate bidirectionally in
non-strict mode functions.

Ref: bellard/quickjs@9f11034
@saghul
Copy link
Contributor Author

saghul commented Jan 19, 2026

Gentle ping @bnoordhuis :-)

@bnoordhuis
Copy link
Contributor

Lo siento, señor Corretgé, no lo había olvidado pero no tengo mucho tiempo la semana pasada o hoy. ¡A lo mejor mañana!

(Sí, he practicando español.)

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM:

  • I'm somewhat skeptical that all that open-coding of property lookups is beneficial. Did you check how it affects benchmarks and release build size?

  • That last commit optimizing arguments is only useful in sloppy mode, right? Doesn't seem very useful, is it to game some benchmarks that don't run in strict mode?

Comment on lines 10477 to 10483
/* fast path */
p = JS_VALUE_GET_OBJ(ctx->global_obj);
prs = find_own_property(&pr, p, prop);
if (prs) {
if (likely((prs->flags & JS_PROP_TMASK) == 0))
return js_dup(pr->u.value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fast path is a deoptimization for things like getters on the global object, because they're looked up here but not acted on, then JS_GetPropertyInternal does the same lookup again.

(Also true for auto-init properties but that's a one-time cost so not so significant.)

Getting rid of two bytecode opcodes is nice though.


static JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop,
bool throw_ref_error)
static inline JSValue JS_GetGlobalVar(JSContext *ctx, JSAtom prop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • inline is currently probably a no-op; there's only one call site so if the compiler thinks it's eligible for inlining, it's going to do it anyway

  • if additional call sites arise, you probably don't want it inlined because of code bloat/duplication (not that a compiler is obliged to actually inline it, it's only a hint)

  • if the one call site for JS_GetGlobalVar were to get removed, inline stops the compiler from issuing an unused function warning

}

/* 0 <= c <= 0x10ffff */
static inline int string_buffer_putc(StringBuffer *s, uint32_t c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hunch is that this function is too big and branchy to want to inline it everywhere (or that the compiler will inline it everywhere.) Probably more effective to have just the c < 0x100 case in there.

- Object.prototype has no small index properties which are get/set or non writable
- the prototype of Object.prototype is null (always true as it is immutable)
*/
uint8_t std_array_prototype;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Causes a 7 byte gap, I think? Maybe turn int binary_object_size into uint32_t binary_object_size : 31 and then this into a single bit field?

true and p->extensible = true */
static int add_fast_array_element(JSContext *ctx, JSObject *p,
JSValue val, int flags)
static inline int add_fast_array_element(JSContext *ctx, JSObject *p,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I doubt inline is beneficial here.

if (unlikely((int)r != r))
goto add_slow;
sp[-2] = js_int32(r);
sp[-2] = __JS_NewFloat64((double)r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't post it as a suggestion but: js_float64 and the explicit cast to double is unnecessary.

Aside: that if (unlikely((int)r != r)) one line up is unspecified behavior. That's better than undefined behavior but checking if (r < INT32_MIN || r > INT32_MAX) is better still.

@saghul
Copy link
Contributor Author

saghul commented Jan 24, 2026

Thanks for the review Ben! I'll take a look!

@gengjiawen
Copy link

great job, looks like the same

Benchmark (Higher scores are better) QuickJS-ng PR #1315 QuickJS Bellard (local build) V8 --jitless
Richards 908 838 1157
DeltaBlue 882 825 1084
Crypto 797 716 1304
RayTrace 1243 1202 4400
EarleyBoyer 1784 1723 6072
RegExp 211 343 4187
Splay 2480 2451 8285
NavierStokes 1202 1263 1995
Score 986 1010 2723

@saghul
Copy link
Contributor Author

saghul commented Jan 24, 2026

This is useful, thanks! How are you taking those, so I can test locally?

@gengjiawen
Copy link

gengjiawen commented Jan 24, 2026

This is useful, thanks! How are you taking those, so I can test locally?

https://github.com/gengjiawen/js-engines-playground/blob/038a911a228b1b96fa73eb92aec82a4701d3251a/benchmark/benchmark.js#L5 replace first two with local binary.

use this to get all mainstream engines npm i -g jsvu && yes | jsvu || true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants