Skip to content

Code quality: bugs and smells identified in senior dev review #222

@jimisola

Description

@jimisola

Code Analysis — reqstool-client

Senior developer assessment. Findings are grouped by severity.


Critical Bugs

1. break instead of continue in SVC filter traversal

File: src/reqstool/model_generators/combined_indexed_dataset_generator.py:479

for import_urn in self._crd.parsing_graph[urn]:
    if self._crd.raw_datasets[import_urn].requirements_data.metadata.variant is VARIANTS.MICROSERVICE:
        break   # BUG: stops iteration entirely on first MICROSERVICE urn

The equivalent method __process_req_filters_per_urn (line 336) correctly uses continue. This break means that as soon as a MICROSERVICE urn appears in parsing_graph[urn], all remaining import urns are skipped for SVC filtering — silently producing wrong results.


2. Copy-paste bug: custom_exclude tree uses custom_imports guard

File: src/reqstool/model_generators/combined_indexed_dataset_generator.py:406–408 and 540–542

# requirements version
tree_custom_exclude = (
    None if req_filter.custom_imports is None   # should check custom_EXCLUDE
    else RequirementsELTransformer.parse_el(req_filter.custom_exclude)
)

# svcs version (same pattern)
tree_custom_exclude = (
    None if svc_filter.custom_imports is None   # should check custom_EXCLUDE
    else SVCsELTransformer.parse_el(svc_filter.custom_exclude)
)

When custom_imports is not None but custom_exclude is None, parse_el(None) will be called, crashing at runtime. The guard should check custom_exclude, not custom_imports.


3. __delete_mvr iterates UrnIds as lists

File: src/reqstool/model_generators/combined_indexed_dataset_generator.py:641–642

for index_list in self._mvrs_from_svc[svc_urn_id]:
    index_list.remove(mvr_urn_id)   # index_list is UrnId, not a list

_mvrs_from_svc[svc_urn_id] is List[UrnId], so index_list is a single UrnId object. Calling .remove() on it will raise AttributeError. The correct form is:

self._mvrs_from_svc[svc_urn_id].remove(mvr_urn_id)

4. is used for string equality throughout

Files: combined_indexed_dataset_generator.py:105,138,157,194, combined_raw_datasets_generator.py:139,186, semantic_validator.py:128,155,200

if self._crd.initial_model_urn is not urn:   # identity, not equality
if model is not combined_raw_dataset.initial_model_urn:

Python's is checks object identity, not value equality. This works only when Python interns the strings, which is not guaranteed. These should all use != or ==. The bug may be dormant in practice (short strings are often interned) but is a correctness defect.


5. check_ids_to_filter: checks delimiter in the list object, not each element

File: src/reqstool/common/utils.py:239

for id in ids:
    if ":" in ids:   # BUG: `ids` is the sequence, not `id`

This always evaluates to the same value for every iteration rather than checking whether the individual id string contains ":". Should be if ":" in id.


6. __import_implementations increments level twice, never decrements

File: src/reqstool/model_generators/combined_raw_datasets_generator.py:140,151

self.__level += 1      # line 140
for implementation in implementations:
    ...
self.__level += 1      # line 151 (should be -= 1, or the first is extraneous)

The indentation/logging depth counter increments twice without a corresponding decrement. The other method (__import_systems) correctly does += 1 before and -= 1 after. The second += 1 appears to be a copy-paste error and should be -= 1.


Significant Bugs / Type Errors

7. Wrong type annotation syntax — set(str) and List(UrnId) with parentheses

File: combined_indexed_dataset_generator.py:116,310,493

__initial_urn_accessible_urns_ms: set(str) = [...]  # set(str) calls set() with one arg
filtered_out_reqs: List(UrnId) = [...]              # List(...) is not valid
filtered_out_svcs: Set(UrnId) = set()

set(str) actually calls set with str as the single iterable argument, producing set() (empty set, not a type). List(UrnId) and Set(UrnId) use () where [] is required for subscript notation. These won't cause immediate runtime errors in the annotation position but will silently create wrong type metadata and may mislead static analysis tools.


8. ReferenceData broken default

File: src/reqstool/models/requirements.py:63

@dataclass
class ReferenceData:
    requirement_ids: Set[UrnId] = set[UrnId]

set[UrnId] as a default value is the subscripted generic type object set[UrnId], not an empty set. Every ReferenceData() instance will have requirement_ids equal to the set[UrnId] type object, not set(). Should be field(default_factory=set).


9. sys.exit() called inside a parser generator

File: src/reqstool/model_generators/combined_raw_datasets_generator.py:169–172

logging.fatal(f"Missing requirements file: ...")
sys.exit(1)

Calling sys.exit() inside a library/generator class bypasses the call stack, makes the code impossible to unit-test without subprocess wrapping, and prevents callers from handling the error. Should raise a custom exception (e.g. RequirementsFileNotFoundError) and let command.py:main() handle the exit.


10. UrnId.assure_urn_id returns str, not UrnId

File: src/reqstool/common/dataclasses/urn_id.py:21–27

Despite living on the UrnId class and taking a urn parameter, this method returns a raw string (f"{urn}:{id}"), not a UrnId instance. Its return type annotation is str. This means call sites receive an untyped string where a UrnId is expected, leading to type mismatches. This is also reflected in generic_el.py:87 where UrnId.assure_urn_id(...) is assigned to items used in ID comparisons that eventually compare against UrnId objects.


Code Smells

11. God class — CombinedIndexedDatasetGenerator

File: src/reqstool/model_generators/combined_indexed_dataset_generator.py (~650 lines)

One class handles parsing, indexing, filtering (recursive), and cascade-deletion of requirements/SVCs/MVRs. The __process_filters, __process_req_filters_per_urn, __get_filtered_out_requirements_for_filter_urn, and __delete_* methods are complex enough to be their own classes. The filtering logic in particular (recursive graph traversal + cascade deletion) is hard to reason about and test in isolation.


12. Data mutation after "construction"

CombinedIndexedDatasetGenerator.__post_init__ calls self.__generate() which calls self.process() which mutates self._requirements, self._svcs etc. in-place, then calls self.__create() to produce the final object. But __delete_requirement and __delete_svc also mutate svcdata.requirement_ids (a list on the dataclass) directly:

svcdata.requirement_ids.remove(req_urn_id)  # line 599

Dataclasses that are supposed to be immutable value objects are mutated after creation. SVCData should either be frozen=True with immutable replacement semantics (like dataclasses.replace) or explicitly documented as mutable.


13. Utils is a @dataclass with only static methods

File: src/reqstool/common/utils.py:22

@dataclass(kw_only=True)
class Utils:
    is_installed_package: bool = True
    ...
    @staticmethod
    def get_version() -> str: ...

Using @dataclass on a utility class that has one instance field and a dozen static methods is misleading. is_installed_package is set as a class attribute (Utils.is_installed_package = False in command.py) making it effectively global mutable state masquerading as an instance field. This should be a module-level variable or an injectable dependency.


14. TempDirectoryUtil — global module-level mutable state

File: src/reqstool/common/utils.py:316–336

class TempDirectoryUtil:
    tmpdir: tempfile.TemporaryDirectory = tempfile.TemporaryDirectory()  # created at import time
    count: int = 0

The temp directory is created at module import time. Multiple tests or concurrent runs share it. If a test fails mid-run the count is corrupted. This makes test isolation difficult.


15. _summarize_statisics typo is public API surface

File: src/reqstool/commands/status/status.py:115

statisics = _summarize_statisics(...)  # missing 't'

Minor, but the function name _summarize_statisics (missing t) is repeated across definition, call site, and the variable name. If tests reference the function name directly, a rename becomes a breaking change.


16. svc_table = svc_table = double assignment

File: src/reqstool/commands/status/status.py:252

svc_table = svc_table = tabulate(...)

Dead code — the double assignment was likely an artifact of a copy-paste. Has no runtime effect but signals poor review hygiene.


17. @Requirements decorators with no tests for those methods

The tool decorates its own methods with @Requirements("REQ_xxx") (self-dogfooding). However, multiple decorated methods contain the bugs listed above, indicating that the requirement coverage does not imply test coverage of the decorated paths.


Test Coverage Concerns

18. "Unit" tests are integration tests

All tests in tests/unit/reqstool/model_generators/ parse real YAML fixtures from tests/resources/ and instantiate the full CombinedRawDatasetsGeneratorCombinedIndexedDatasetGenerator pipeline. This is valuable but not a substitute for unit tests. The critical filter logic (__get_filtered_out_requirements_for_filter_urn, __process_svc_filters_per_urn) has no isolated unit tests.

19. No tests for the status command presentation layer

src/reqstool/commands/status/status.py is ~320 lines of table-building logic using hardcoded box-drawing characters and ANSI escape codes. There are no tests for any of the formatting functions (_build_table, _summarize_statisics, __numbers_as_percentage, __colorize_headers, _extend_row).

20. No tests for SemanticValidator

The validators that check cross-references (SVC → REQ, annotation → REQ, MVR → SVC) are tested only implicitly through the full pipeline tests. A direct unit test for each validator method would catch regressions faster.

21. No negative-path tests

None of the test files include tests for invalid input (missing required fields, bad URN formats, cyclic imports, filter expression parse errors). The expression language parser (generic_el.py) has no tests at all.

22. pytest.ini_options silently skips integration tests

addopts includes -m "not slow and not integration" by default. This is appropriate for fast CI, but there is no documented way to run the full test suite including integration tests in CI (no separate CI step for integration tests was identified in the workflows).


Minor / Style

  • accessible misspelled as accessable in multiple parameter names and log messages (combined_indexed_dataset_generator.py:368, 505).
  • _ prefix on "public" fields in StatisticsContainer is misleading — protected convention used on fields accessed directly from status.py.
  • command.py _add_subparsers_source adds --group-by / --sort-by to generate-json command even though those options only make sense for report generation.
  • requirements.py:94: filters: Dict[str, RequirementFilter] = field(default_factory=list) — default_factory is list but the type is Dict. Should be default_factory=dict.
  • combined_indexed_dataset_generator.py:105 declares and immediately overwrites a name: self.__initial_urn_is_variant_ms = self.initial_urn_is_ms = (...)__initial_urn_is_variant_ms is private (name-mangled) while initial_urn_is_ms is public; both point to the same value; only one of them should exist.

Summary Table

# File Severity Category
1 combined_indexed_dataset_generator.py:479 Critical Logic bug
2 combined_indexed_dataset_generator.py:406,541 Critical Logic bug
3 combined_indexed_dataset_generator.py:641 Critical Runtime crash
4 Multiple files High Identity vs equality
5 utils.py:239 High Logic bug
6 combined_raw_datasets_generator.py:151 High Logic bug
7 combined_indexed_dataset_generator.py:116,310,493 Medium Type error
8 models/requirements.py:63 Medium Wrong default
9 combined_raw_datasets_generator.py:172 Medium Testability
10 urn_id.py:21 Medium Type confusion
11 combined_indexed_dataset_generator.py Medium Complexity
12 combined_indexed_dataset_generator.py:599 Medium Mutability
13 utils.py:22 Low Design
14 utils.py:316 Low Test isolation
15–22 Various Low Style/tests

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions