Skip to content

Conversation

@shimizukko
Copy link
Contributor

…emoves rdb-pool from 2 ranks

Currently, the algorithm removes rbd-pool only from /mnt/daos0/. rdb-pool is created in 3 out of 4 ranks randomly. Thus, if it’s not created in
/mnt/daos0/ of one of the nodes, the test will only remove one rdb-pool and the test will fail. Fix it so that it removes two rdb-pool.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…emoves rdb-pool from 2 ranks

Currently, the algorithm removes rbd-pool only from
/mnt/daos0/<pool>. rdb-pool is created in 3 out of 4
ranks randomly. Thus, if it’s not created in
/mnt/daos0/<pool> of one of the nodes, the test will
only remove one rdb-pool and the test will fail. Fix
it so that it removes two rdb-pool.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Ticket title is 'CR Test Fix - recovery/pool_list_consolidation.py test_lost_majority_ps_replicas'
Status is 'In Review'
Labels: 'catastrophic_recovery'
https://daosio.atlassian.net/browse/DAOS-18402

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Signed-off-by: Makito Kano <makito.kano@hpe.com>
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Test-repeat: 5
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17339/3/testReport/

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Test-repeat: 5
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Signed-off-by: Makito Kano <makito.kano@hpe.com>
@shimizukko shimizukko marked this pull request as ready for review January 24, 2026 00:13
@shimizukko shimizukko requested review from a team as code owners January 24, 2026 00:13
@shimizukko shimizukko requested review from Nasf-Fan, daltonbohning and dinghwah and removed request for Nasf-Fan January 24, 2026 00:14
Nasf-Fan
Nasf-Fan previously approved these changes Jan 26, 2026
Comment on lines 283 to 286
if self.server_managers[0].manager.job.using_control_metadata:
self.log.info("MD-on-SSD cluster. It will be supported later.")
# return results in PASS.
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a cancel, not a pass

Suggested change
if self.server_managers[0].manager.job.using_control_metadata:
self.log.info("MD-on-SSD cluster. It will be supported later.")
# return results in PASS.
return
if self.server_managers[0].manager.job.using_control_metadata:
self.log.info("MD-on-SSD cluster. It will be supported later.")
self.cancelForTicket('DAOS-18402')

Comment on lines 302 to 303
rdb_pool_path_0 = f"/mnt/daos0/{pool.uuid.lower()}/rdb-pool"
rdb_pool_path_1 = f"/mnt/daos1/{pool.uuid.lower()}/rdb-pool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get these mount paths from the params instead of hardcoding. E.g.

scm_mounts = set()
for engine_params in self.server_managers[0].manager.job.yaml.engine_params:
scm_mounts.add(engine_params.get_value('scm_mount'))

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Also get the mount paths from self.server_managers[0].manager.job.yaml.engine_params

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_lost_majority_ps_replicas
Signed-off-by: Makito Kano <makito.kano@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants