fix(ci): Fix E2E test flakiness on Cirrus Labs runners#5830
Draft
fix(ci): Fix E2E test flakiness on Cirrus Labs runners#5830
Conversation
After the react-native-test job was moved from GitHub-hosted macos-26 to Cirrus Labs Tart VMs (macos-tahoe-xcode:26.2.0), iOS simulators take longer to fully boot in the new virtualised environment. With `wait_for_boot` defaulting to false, Maestro was racing to connect before the simulator was ready, causing different failures on each run. - Add `wait_for_boot: true` to `futureware-tech/simulator-action` so the job blocks until the simulator has fully completed booting before Maestro connects. - Bump `MAESTRO_DRIVER_STARTUP_TIMEOUT` from 120s to 180s to give additional headroom for the Cirrus Labs runner environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After crash.yml taps "Crash" (Sentry.nativeCrash()), the plain `launchApp` (without clearState) causes the app to crash immediately on relaunch (~82ms) because the Sentry SDK reads the pending crash report during initialisation and hits a failure path. This writes a second crash report on top of the first, triggering iOS's simulator crash-loop guard for the bundle ID. The cascade: 1. nativeCrash → crash report #1 written 2. launchApp (no clearState) → app crashes on startup → crash report #2 3. Next test (captureMessage) gets the crash-loop ban → instant exit on launch Fix: add `clearState: true` to the post-crash launchApp so Maestro reinstalls the app, clearing both the crash report and the crash-loop state before assertTestReady runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… VMs The iOS E2E tests have been consistently failing since the migration to Cirrus Labs Tart VMs (c1cade4). The nested virtualisation makes the simulator slower to stabilise, causing Maestro's XCTest driver to lose communication with the app on first launches. Two fixes: 1. Set erase_before_boot: false — each Maestro flow already reinstalls the app via clearState, so erasing the entire simulator is redundant and adds overhead that destabilises the simulator on Tart VMs. 2. Add a warm-up step that launches and terminates Settings.app so that SpringBoard and other system services finish post-boot initialisation before Maestro connects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cirrus Labs Tart VMs intermittently fail individual app launches — the app process exits before the JS bundle finishes loading, causing Maestro to report "App crashed or stopped". A single retry of the full suite is the most reliable way to absorb this flakiness. Also increased the warmup sleep from 3s to 5s to give SpringBoard more time to settle on the slow nested-virtualisation runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of retrying the entire test suite, run each flow file individually with up to 3 attempts. This is more effective because different flows fail randomly on Tart VMs — retrying only the failed flow is faster and avoids re-running flows that already passed. The CLI now: 1. Lists all .yml files in the maestro/ directory 2. Runs each flow with `maestro test <flow.yml>` 3. On failure, retries the same flow up to 2 more times 4. Prints a summary of all results at the end Removes the suite-level retry wrapper from the workflow since per-flow retries in the CLI are more targeted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address CodeQL finding by using execFileSync with an argument array instead of execSync with a template string. This avoids shell interpolation of filesystem-sourced flow file names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ners - Increase MAESTRO_DRIVER_STARTUP_TIMEOUT to 180s for slow Tart VMs - Add wait_for_boot and erase_before_boot: false to simulator-action - Add simulator warm-up step before running iOS tests - Sort spaceflight news envelopes by timestamp instead of arrival order - Relax HTTP spans assertion to >= 1 (not all layers complete on slow VMs) - Search all envelopes for app start transaction (may arrive separately) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On slow Cirrus Labs Tart VMs, the app may crash during Maestro flow execution. Add up to 3 retries to handle transient app crashes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
App start transactions (origin: auto.app.start) have app_start_cold measurements but not time_to_initial_display/time_to_full_display. The filter already excluded ui.action.touch but not app start transactions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use nullish coalescing for httpSpans length check to avoid TypeError when spans is undefined - Document maestro retry envelope contamination limitation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The warm-up step is best-effort and should not fail the build if the Preferences app fails to launch or terminate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use consistent comment and sleep 5 across both workflows, as suggested in PR review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge both PRs that fix E2E test flakiness on Cirrus Labs Tart VMs: - iOS E2E fixes: simulator warm-up, per-flow retries, crash-loop prevention (#5752) - Sample app E2E fixes: increased timeouts, sorted envelopes, relaxed assertions (#5755) Conflict resolution: kept Maestro 2.3.0 from main with 180s timeout from #5755. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Semver Impact of This PR⚪ None (no version bump detected) 📋 Changelog PreviewThis is how your changes will appear in the changelog. This PR will not appear in the changelog. 🤖 This preview updates automatically when you update the PR. |
Contributor
This was referenced Mar 17, 2026
Reverts whitespace-only changes (@{ } -> @{}) in ObjC files that
cause clang-format CI failures.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Android (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 4a17c8f+dirty | 406.62 ms | 400.58 ms | -6.04 ms |
| df1f7df+dirty | 442.64 ms | 427.16 ms | -15.48 ms |
| a483f9f+dirty | 396.82 ms | 453.28 ms | 56.46 ms |
| 60cd796+dirty | 445.84 ms | 492.45 ms | 46.61 ms |
| 5c16cdc+dirty | 423.48 ms | 452.35 ms | 28.88 ms |
| 80e4616+dirty | 411.58 ms | 462.12 ms | 50.54 ms |
| 55b77fc+dirty | 411.87 ms | 417.16 ms | 5.29 ms |
| bca62c0+dirty | 414.36 ms | 451.06 ms | 36.70 ms |
| 0b64753+dirty | 448.67 ms | 474.61 ms | 25.94 ms |
| 4e6d7d7+dirty | 480.73 ms | 515.73 ms | 35.00 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 4a17c8f+dirty | 43.75 MiB | 47.99 MiB | 4.24 MiB |
| df1f7df+dirty | 43.75 MiB | 48.08 MiB | 4.33 MiB |
| a483f9f+dirty | 43.75 MiB | 48.41 MiB | 4.66 MiB |
| 60cd796+dirty | 43.75 MiB | 48.07 MiB | 4.32 MiB |
| 5c16cdc+dirty | 17.75 MiB | 19.68 MiB | 1.94 MiB |
| 80e4616+dirty | 43.75 MiB | 48.55 MiB | 4.80 MiB |
| 55b77fc+dirty | 43.75 MiB | 47.99 MiB | 4.24 MiB |
| bca62c0+dirty | 43.75 MiB | 48.41 MiB | 4.66 MiB |
| 0b64753+dirty | 17.75 MiB | 19.70 MiB | 1.95 MiB |
| 4e6d7d7+dirty | 43.75 MiB | 48.40 MiB | 4.64 MiB |
Contributor
iOS (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1229.13 ms | 1228.46 ms | -0.67 ms |
| 80e4616+dirty | 1221.32 ms | 1225.64 ms | 4.32 ms |
| 818a608+dirty | 1205.76 ms | 1208.00 ms | 2.24 ms |
| 77061ed+dirty | 1233.16 ms | 1234.88 ms | 1.71 ms |
| bef3709+dirty | 1222.07 ms | 1220.24 ms | -1.83 ms |
| a206511+dirty | 1185.00 ms | 1186.35 ms | 1.35 ms |
| 74979ac+dirty | 1210.49 ms | 1213.31 ms | 2.82 ms |
| a2bb688+dirty | 1223.53 ms | 1232.90 ms | 9.37 ms |
| 8a868fe+dirty | 1221.50 ms | 1230.78 ms | 9.28 ms |
| d590428+dirty | 1211.77 ms | 1220.51 ms | 8.75 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 2.63 MiB | 3.91 MiB | 1.28 MiB |
| 77061ed+dirty | 2.63 MiB | 3.98 MiB | 1.34 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 2.63 MiB | 3.99 MiB | 1.36 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
Contributor
iOS (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1216.61 ms | 1214.15 ms | -2.47 ms |
| 80e4616+dirty | 1206.90 ms | 1205.94 ms | -0.96 ms |
| 818a608+dirty | 1218.84 ms | 1223.18 ms | 4.34 ms |
| 77061ed+dirty | 1210.77 ms | 1218.45 ms | 7.68 ms |
| bef3709+dirty | 1217.79 ms | 1225.33 ms | 7.54 ms |
| a206511+dirty | 1225.02 ms | 1223.74 ms | -1.28 ms |
| 74979ac+dirty | 1212.33 ms | 1212.54 ms | 0.21 ms |
| a2bb688+dirty | 1244.82 ms | 1238.60 ms | -6.22 ms |
| 8a868fe+dirty | 1206.85 ms | 1215.04 ms | 8.19 ms |
| d590428+dirty | 1221.23 ms | 1225.27 ms | 4.03 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 3.19 MiB | 4.48 MiB | 1.29 MiB |
| 77061ed+dirty | 3.19 MiB | 4.54 MiB | 1.36 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 3.19 MiB | 4.56 MiB | 1.37 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
Contributor
Android (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 70250df+dirty | 418.08 ms | 480.84 ms | 62.76 ms |
| 8d89cc9+dirty | 357.69 ms | 415.79 ms | 58.10 ms |
| 1853710+dirty | 360.67 ms | 396.28 ms | 35.61 ms |
| 55b77fc+dirty | 410.46 ms | 414.11 ms | 3.65 ms |
| 69602ce+dirty | 375.37 ms | 405.28 ms | 29.91 ms |
| c1573b3+dirty | 355.65 ms | 448.82 ms | 93.17 ms |
| 90afdd3+dirty | 367.79 ms | 404.84 ms | 37.05 ms |
| 955f2eb+dirty | 388.13 ms | 433.56 ms | 45.44 ms |
| 80e4616+dirty | 427.31 ms | 461.15 ms | 33.84 ms |
| 276d348+dirty | 356.30 ms | 405.27 ms | 48.97 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 70250df+dirty | 43.94 MiB | 48.91 MiB | 4.97 MiB |
| 8d89cc9+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 1853710+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 55b77fc+dirty | 43.94 MiB | 48.82 MiB | 4.88 MiB |
| 69602ce+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| c1573b3+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| 90afdd3+dirty | 7.15 MiB | 8.43 MiB | 1.28 MiB |
| 955f2eb+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| 80e4616+dirty | 43.94 MiB | 49.38 MiB | 5.44 MiB |
| 276d348+dirty | 7.15 MiB | 8.42 MiB | 1.26 MiB |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📢 Type of change
📜 Description
iOS E2E fixes
wait_for_boot: trueanderase_before_boot: falseon simulator-actionMAESTRO_DRIVER_STARTUP_TIMEOUT: 180000(3 min)clearState: truein crash.yml to prevent crash-loop cascade afternativeCrash()Sample Application E2E fixes
MAESTRO_DRIVER_STARTUP_TIMEOUTto 180s for slow Tart VMswait_for_boot/erase_before_bootsettings💡 Motivation and Context
iOS E2E tests have been consistently failing since the Cirrus Labs Tart VM migration. Tart VMs use nested virtualisation which makes simulator operations significantly slower. These fixes address both the
e2e-v2workflow and thesample-applicationworkflow flakiness.Supersedes #5752 and #5755.
#skip-changelog
💚 How did you test it?
E2E TestsiOS jobs pass on CISample ApplicationiOS/Android test jobs pass on CI📝 Checklist
sendDefaultPIIis enabled🔮 Next steps
#skip-changelog