Skip to content

Conversation

@drewjin
Copy link
Contributor

@drewjin drewjin commented Dec 29, 2025

Summary by CodeRabbit

  • New Features

    • Added comprehensive profiling framework with multiple backends (Simple Timer, VizTracer, PyTorch Profiler) and exporters (JSON, CSV, Summary).
    • Introduced benchmarking suite for evaluating model performance on datasets like GSM8K and HumanEval via lm-evaluation-harness integration.
    • Enhanced logging system with optional color support and Rich integration.
  • Documentation

    • Added Contributor Code of Conduct and Contributing guidelines.
    • Added benchmarking and profiler documentation.
  • Chores

    • Added development configuration files (.editorconfig, .gitattributes, pre-commit hooks).
    • Configured CI/CD automation and dependency management.

✏️ Tip: You can customize this high-level summary in your review settings.

@drewjin drewjin changed the title [Feat] Enhance Decoding Strategies for Easier Development [Feat] Enhance Decoding Strategies for Easier Development and More Efficient Inference Dec 29, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive infrastructure improvements including repository configuration, logging framework, developer documentation, and two new subsystems: a benchmarking framework (diffulex_bench) for LLM evaluation integration and a profiling framework (diffulex_profiler) with multiple backends for performance instrumentation. Logging is refactored across the codebase from print statements and transformers logging to a unified custom logger.

Changes

Cohort / File(s) Summary
Repository Configuration
.editorconfig, .gitattributes, .github/dependabot.yml, .pre-commit-config.yaml, .pymarkdown, .gitignore, Tilelang-failed_test_cases
New repo-level configuration for code style enforcement, git attributes, pre-commit hooks, linting rules, and dependency updates. Removes submodule reference and adds test failure directory to ignore list.
Documentation
CODE_OF_CONDUCT.md, CONTRIBUTING.md, diffulex_bench/README.md, diffulex_profiler/README.md
Contributor covenant, development setup guide, benchmarking framework documentation, and profiling framework documentation. Covers environment setup, testing, building, and usage patterns.
Logging Infrastructure
diffulex/logger.py
New comprehensive logging module with color support (Rich/Colorama), multiple output paths (console/file), and dynamic Logger.success() augmentation. Provides setup_logger(), get_logger(), and LoggerMixin public API.
Logger Integration
diffulex/__init__.py, diffulex/attention/__init__.py, diffulex/config.py, diffulex/engine/..., diffulex/model/config/.../configuration_*.py, diffulex/sampler/base.py, diffulex/utils/loader.py
Replaces print statements and transformers.utils.logging with diffulex.logger.get_logger across the codebase. Introduces logger module imports and structured logging calls.
Core Configuration Updates
diffulex/config.py, diffulex/diffulex.py
Adds device_ids field to Config dataclass with automatic population from CUDA_VISIBLE_DEVICES. Changes shm_name default and refactors worker selection logic to read data_parallel_size directly from kwargs.
Model Configuration
diffulex/strategy/fast_dllm_v2/engine/sequence.py
Adds TO_DUAL_CACHE and IN_DUAL_CACHE statuses to FDV2SubBlockStatus enum; extends FDV2SubBlock dataclass with sub_block_id and status fields.
Benchmark Framework
diffulex_bench/__init__.py, diffulex_bench/arg_parser.py, diffulex_bench/config.py, diffulex_bench/datasets.py, diffulex_bench/lm_eval_model.py, diffulex_bench/logger.py, diffulex_bench/main.py, diffulex_bench/metrics.py, diffulex_bench/report.py, diffulex_bench/runner.py
Complete new benchmark subsystem with CLI parser, config management (EngineConfig/EvalConfig/BenchmarkConfig), dataset loaders (GSM8K/HumanEval), lm-evaluation-harness integration (DiffulexLM), metrics computation, and result reporting.
Benchmark Configurations
diffulex_bench/configs/__init__.py, diffulex_bench/configs/example.yml, diffulex_bench/configs/dream_d2f_gsm8k.yml
Example YAML configurations for benchmark runs with nested engine and eval sections; D2F-specific thresholds and sampling parameters.
Profiler Framework – Core
diffulex_profiler/__init__.py, diffulex_profiler/profiler.py, diffulex_profiler/metrics.py, diffulex_profiler/example.py
New profiler with ProfilerConfig dataclass, DiffulexProfiler class with lifecycle management, PerformanceMetrics container, and GPU/CPU/memory metric collection utilities. Supports context-manager profiling, throughput recording, and pluggable exporters.
Profiler Backends
diffulex_profiler/backends/__init__.py, diffulex_profiler/backends/base.py, diffulex_profiler/backends/simple.py, diffulex_profiler/backends/viztracer.py, diffulex_profiler/backends/pytorch.py
Abstract ProfilerBackend base class with time-based (SimpleTimerBackend) and tracing-based backends (VizTracerBackend, PyTorchProfilerBackend). Conditional imports handle optional dependencies gracefully.
Profiler Exporters
diffulex_profiler/exporters/__init__.py, diffulex_profiler/exporters/base.py, diffulex_profiler/exporters/json.py, diffulex_profiler/exporters/csv.py, diffulex_profiler/exporters/summary.py
Abstract ProfilerExporter base; concrete implementations for JSON (with aggregated summary), CSV (dynamic field collection), and human-readable text export. Handles metric serialization and file I/O.
Examples & Scripts
examples/test_dream_diffulex_gsm8k.py, examples/test_sdar_diffulex_gsm8k.py, profile/d2f_dream_profile.py, script/d2f_dream_eval_gsm8k.sh
Updated example scripts removing profiling helper functions; new D2F profiling example; new evaluation shell script with offline HuggingFace setup and logging.
Developer Tooling
format.sh, pyproject.toml, docs/make.bat
New formatting/linting orchestration script with pre-commit and clang-tidy integration; updated pyproject.toml with new dependencies (pandas, tilelang, rich, colorama, lm-eval) and Aliyun PyPI mirror; minor whitespace adjustments to docs Makefile.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BenchmarkRunner
    participant DiffulexEngine
    participant SamplingParams
    participant Metrics
    participant Exporter

    Client->>BenchmarkRunner: initialize(model_path, tokenizer_path)
    BenchmarkRunner->>DiffulexEngine: setup engine with kwargs
    
    Client->>BenchmarkRunner: evaluate_batch(prompts, sampling_params)
    BenchmarkRunner->>SamplingParams: configure generation parameters
    BenchmarkRunner->>DiffulexEngine: generate(prompts, sampling_params)
    DiffulexEngine-->>BenchmarkRunner: outputs with timing/tokens
    
    BenchmarkRunner->>Metrics: collect stats (tokens, time, throughput)
    Metrics-->>BenchmarkRunner: aggregated results
    
    Client->>BenchmarkRunner: export results
    BenchmarkRunner->>Exporter: export(metrics, output_path)
    Exporter-->>BenchmarkRunner: JSON/CSV/text file written
    
    rect rgb(200, 240, 255)
        Note over Metrics,Exporter: Metrics collected per-sample<br/>Aggregation & export optional
    end
Loading
sequenceDiagram
    participant User
    participant DiffulexProfiler
    participant Backend
    participant Metrics
    participant Exporter

    User->>DiffulexProfiler: initialize(backend_type, config)
    DiffulexProfiler->>Backend: _init_backend()
    Backend-->>DiffulexProfiler: ready
    
    User->>DiffulexProfiler: profile(section_name, metadata)
    DiffulexProfiler->>DiffulexProfiler: start(name)
    DiffulexProfiler->>Metrics: capture GPU/memory baseline
    DiffulexProfiler->>Backend: start profiling
    
    rect rgb(240, 255, 240)
        Note over DiffulexProfiler,Backend: User code executes
    end
    
    User->>DiffulexProfiler: stop()
    DiffulexProfiler->>Backend: stop profiling
    Backend-->>DiffulexProfiler: timing/trace data
    DiffulexProfiler->>Metrics: capture GPU/memory end state
    DiffulexProfiler->>DiffulexProfiler: compute duration & deltas
    
    User->>DiffulexProfiler: export(output_path)
    DiffulexProfiler->>Exporter: export(metrics, path)
    Exporter-->>DiffulexProfiler: JSON/CSV/text written
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A log framework hops into place,
Benchmarking tools keep up the pace,
Profilers trace from core to edge—
Diffulex stands on solid ledge! 📊✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.31% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main enhancement: improving decoding strategies to make development easier and inference more efficient. It is concise, directly relevant to the changeset, and provides meaningful context.
✨ Finishing touches
  • 📝 Generate docstrings

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@drewjin drewjin mentioned this pull request Jan 5, 2026
17 tasks
…ar backends and comprehensive metrics collection
@drewjin drewjin marked this pull request as ready for review January 6, 2026 06:36
@drewjin
Copy link
Contributor Author

drewjin commented Jan 6, 2026

@coderabbitai help

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@drewjin
Copy link
Contributor Author

drewjin commented Jan 6, 2026

@coderabbitai resume

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

✅ Actions performed

Reviews resumed.

@drewjin
Copy link
Contributor Author

drewjin commented Jan 6, 2026

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/test_sdar_diffulex_gsm8k.py (1)

46-54: Fix model name mismatch and ensure output directory exists.

Two issues here:

  1. Model name mismatch: The output filename references "dvllm_dream_7B" but the actual model is "SDAR-1.7B-Chat-b32". This inconsistency will make profiling results confusing to identify.

  2. Missing directory check: The script assumes log/profiles/ exists. If the directory doesn't exist, both os.path.exists() and VizTracer will fail with a FileNotFoundError.

🔎 Proposed fix
     if PROFILE:
-        output_file = "log/profiles/perf_dvllm_dream_7B.json"
+        output_file = "log/profiles/perf_sdar_1.7B.json"
+        os.makedirs(os.path.dirname(output_file), exist_ok=True)
         if os.path.exists(output_file):
             os.remove(output_file)
🤖 Fix all issues with AI Agents
In @CODE_OF_CONDUCT.md:
- Around line 59-67: The Enforcement section currently lists a personal email
(leiwang1999@outlook.com); replace that personal address with a non-personal
project/organizational contact such as a dedicated project email (e.g.,
conduct@diffulex.dev), an alias/distribution list, or a link to a
moderation/ticketing form, and update the line in the "## Enforcement" block to
use that shared address or form so incident reports are routed to a shared
mailbox or team rather than an individual's personal account.

In @diffulex_bench/datasets.py:
- Around line 26-47: The slicing bug arises because dataset[:limit] returns a
batched dict instead of a Dataset; replace that line with a proper Dataset slice
using the huggingface Dataset.select API (e.g., call
dataset.select(range(limit))) so that the subsequent for item in dataset
iterates over dataset examples; update the code where the variable dataset is
reassigned (the block using dataset = dataset[:limit]) to use dataset =
dataset.select(range(limit)) and keep the existing logic for prompt_template and
results unchanged.
- Around line 65-88: The current use of Python slice syntax dataset[:limit] in
load_humaneval does not properly slice a HuggingFace Dataset; replace that with
the Dataset.select-based approach (e.g., use dataset.select(range(limit)) or
dataset.select(list(range(limit)))) so the dataset is correctly truncated when
limit is provided; update the branch that checks if limit and references
dataset[:limit] to call select instead.

In @diffulex_bench/lm_eval_model.py:
- Around line 299-310: The current loglikelihood method (loglikelihood) returns
placeholder (0.0, False) values which can silently corrupt metrics; change it to
fail fast by raising NotImplementedError (consistent with loglikelihood_rolling)
instead of returning defaults, or at minimum replace the warning with a
process-level error log before raising; update the method to raise
NotImplementedError("loglikelihood not implemented for diffusion models") so
callers cannot silently use incorrect scores.

In @diffulex_bench/main.py:
- Around line 61-63: The args serialization builds args_list from args_dict
without escaping, so values containing ',', '=', or ':' break the
lm-evaluation-harness parser; update the logic that constructs args_list (the
list comprehension creating f"{k}={v}" using args_dict) to detect these
characters, escape any backslashes and quotes within the value, and wrap the
value in double quotes when needed (e.g., transform value -> value.replace("\\",
"\\\\").replace('"', '\\"') and then f'{k}="{escaped}"' if special chars
present), ensuring the return ','.join(args_list) yields a safe key=value
string.

In @diffulex_bench/metrics.py:
- Around line 51-52: Add a validation before computing metrics that checks for
empty input lists: after verifying len(predictions) == len(ground_truths) also
check if len(predictions) == 0 (or simply if not predictions) and raise a
ValueError like "No data to evaluate: predictions and ground_truths are empty"
so you never reach the subsequent division (which currently can produce an
invalid/ambiguous result when both lists are empty).

In @diffulex_bench/report.py:
- Around line 22-23: Wrap the open/results json.load block that reads
results_file in a try/except to handle FileNotFoundError, PermissionError and
json.JSONDecodeError: catch these exceptions around the with open(...) and
json.load(...) lines, log a clear error via the module logger (or import logging
if none exists), and return a sensible fallback (e.g., an empty dict or None) or
re-raise a wrapped exception with context so the function fails gracefully; keep
references to the existing variable results_file and the json.load call so the
change is localized.

In @diffulex_bench/runner.py:
- Around line 149-157: The loop is assigning a misleading per-output
"generation_time" computed as total batch time / len(outputs); change this to a
clear average metric by renaming the field and computing it once: after calling
self.llm.generate(prompts, sampling_params, use_tqdm=use_tqdm) compute
total_time = end_time - start_time and avg = total_time / len(outputs) if
outputs else 0, then set output['avg_generation_time'] (or
'batch_time_per_sample') for each output, or alternatively remove per-output
timing and add a single batch-level field (e.g., batch_generation_time or
avg_generation_time variable returned/logged) if per-sample timings are not
available; update any consumers/readers of output['generation_time'] to the new
name.
- Around line 49-52: The code currently calls
AutoTokenizer.from_pretrained(self.tokenizer_path, trust_remote_code=True) which
enables arbitrary code execution; change this by adding a configurable flag
(e.g., trust_remote_code) to the Runner/class initializer or config with a safe
default of False, use that flag when calling
AutoTokenizer.from_pretrained(self.tokenizer_path,
trust_remote_code=trust_remote_code), and update any relevant docs/CLI defaults
to explain the security implications so consumers can opt-in when they trust the
model source.

In @diffulex_profiler/backends/pytorch.py:
- Around line 59-92: In stop(), defend against self.current_name being None
before building the trace filename: check if self.current_name is None and
either return early (e.g., None or a minimal result) or set a safe fallback name
(e.g., "unknown" or a timestamp) and use that when constructing trace_file;
update any places that add "name": self.current_name in the result to use the
fallback/checked value so you never create a file named pytorch_trace_None.json
and avoid downstream errors.

In @diffulex_profiler/example.py:
- Around line 68-71: The Diffulex instantiation inside the profiler context uses
an illegal ellipsis token as an argument (llm = Diffulex(model_path,
model_name="dream", ...)); replace the trailing "..." with valid placeholder
arguments or remove it and add a comment. Update the call to use explicit
defaults or named placeholders (e.g., provide required kwargs or use
model_config=None or # TODO: add args) so the Diffulex(...) call is
syntactically valid inside the with profiler.profile("model_loading") block.

In @diffulex_profiler/exporters/csv.py:
- Around line 42-45: The loop assumes m.metadata is not None and will raise
AttributeError if it is; change the code around writer.writerow(row) to guard or
default metadata (e.g., use if m.metadata: for k, v in m.metadata.items():
row[f"metadata_{k}"] = v or iterate over (m.metadata or {}).items()) so that
m.metadata being None is handled before accessing .items(), ensuring
row.update(m.custom_metrics) and writer.writerow(row) still run safely.

In @diffulex_profiler/exporters/summary.py:
- Around line 57-59: The code shadows the module-level or earlier variable
output_file by assigning a new output_file inside the VizTracer branch, causing
the final summary to be written to the VizTracer path; rename the inner variable
(e.g., viztracer_output or viz_output_file) in the block that reads
m.backend_data.get("output_file", ...) and update the summary_lines append to
use that new name so the outer output_file (the intended .txt summary path)
remains unchanged when writing the summary.

In @diffulex_profiler/metrics.py:
- Around line 69-77: The NVML initialization in the try block
(pynvml.nvmlInit()) is never paired with pynvml.nvmlShutdown(), risking resource
leaks; update the collect_gpu_metrics (or the surrounding function using
metrics, pynvml.nvmlInit(), nvmlDeviceGetHandleByIndex,
nvmlDeviceGetUtilizationRates) to ensure pynvml.nvmlShutdown() is always
called—either put shutdown in a finally block after the try/except that sets
metrics, or refactor to initialize NVML once at module level and only shutdown
on process exit; ensure the exception handling still suppresses ImportError
gracefully but does not skip the shutdown call when pynvml was successfully
initialized.

In @diffulex_profiler/profiler.py:
- Around line 165-169: The code is incorrectly computing a delta for GPU
utilization by subtracting self.current_metrics.gpu_metrics_start from
gpu_metrics_end; change this to store a representative snapshot instead: set
self.current_metrics.gpu_utilization to the end snapshot value (e.g.,
self.current_metrics.gpu_metrics_end.get("utilization", 0)) or alternatively
preserve both self.current_metrics.gpu_metrics_start and gpu_metrics_end on the
metrics object for later comparison; update any consumers of gpu_utilization
accordingly to expect a snapshot rather than a delta.

In @diffulex/logger.py:
- Around line 44-47: The format() method currently mutates record.levelname
which pollutes other handlers; instead do not modify record.levelname in place —
create a colored string locally (e.g., using self.COLORS and self.RESET) and
either assign it to a new attribute on the record such as levelname_colored or
operate on a shallow copy of the LogRecord, and update the formatter pattern to
use that new attribute (or use the copy) so other handlers receive the original
record.levelname unchanged; make this change in the format() method (referencing
format, self.COLORS, self.RESET, and record).
- Around line 155-171: The success() implementation must not unconditionally
insert Rich markup because that literal markup can end up in other handlers
(e.g., FileHandler); update _add_success_method so success() emits plain text (a
simple "✓ {message}") instead of "[green]✓[/green] {message}" when
RICH_AVAILABLE is true, and keep the existing COLORAMA logic for terminals;
modify the success function defined inside _add_success_method (and the
assignment logging.Logger.success) to always call self.info with a plain-text
checkmark and message so RichHandler can apply its own formatting while file
handlers receive clean text.

In @format.sh:
- Around line 117-119: The pip install call using the invalid flag
`--requirements` must be changed to the correct flag `-r` (or `--requirement`)
so the conditional that installs lint deps when `clang-tidy` is missing
succeeds; update the command in the block that checks `clang-tidy` (the `python3
-m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user`
line) to use `-r "${ROOT}/requirements-lint.txt"` (or `--requirement`) instead,
keeping the rest of the options (`--upgrade` and `--user`) unchanged.
- Around line 57-72: The get_merge_base() function uses a hardcoded
UPSTREAM_REPO URL that is incorrect; update the UPSTREAM_REPO variable to the
correct upstream repository (e.g., "https://github.com/zhijie-group/Diffulex")
so merge-base detection queries the intended repo, then ensure the rest of the
logic still fetches main and computes MERGE_BASE as before.

In @pyproject.toml:
- Around line 35-39: Update the pyproject.toml dependency entry for "lm-eval" to
include a version constraint so builds are reproducible; edit the dependency
list where "lm-eval" appears and change it to a pinned or minimum version (for
example use a minimum like lm-eval>=X.Y.Z or an exact pin lm-eval==X.Y.Z)
consistent with how other packages are specified.
🟡 Minor comments (6)
diffulex/logger.py-155-171 (1)

155-171: success() method emits Rich markup unconditionally when Rich is available.

If a logger has both a RichHandler and a FileHandler, the [green]✓[/green] markup will be written literally to the log file, creating malformed entries.

Consider using plain text for the success method
 def _add_success_method():
     """Add success method to logging.Logger class"""
-    if RICH_AVAILABLE:
-        def success(self, message: str, *args, **kwargs):
-            """Log success message with rich formatting"""
-            self.info(f"[green]✓[/green] {message}", *args, **kwargs)
-    else:
-        def success(self, message: str, *args, **kwargs):
-            """Log success message"""
-            if COLORAMA_AVAILABLE:
-                self.info(f"{Fore.GREEN}✓{Style.RESET_ALL} {message}", *args, **kwargs)
-            else:
-                self.info(f"✓ {message}", *args, **kwargs)
+    def success(self, message: str, *args, **kwargs):
+        """Log success message"""
+        # Use plain checkmark to avoid markup in file handlers
+        self.info(f"✓ {message}", *args, **kwargs)
diffulex_profiler/exporters/csv.py-42-45 (1)

42-45: Potential AttributeError if metadata is None.

Line 26 guards with if m.metadata: before accessing .keys(), but line 43 iterates m.metadata.items() without a similar check. If metadata is None, this will raise an AttributeError.

🔎 Proposed fix
                 row.update(m.custom_metrics)
-                for k, v in m.metadata.items():
-                    row[f"metadata_{k}"] = v
+                if m.metadata:
+                    for k, v in m.metadata.items():
+                        row[f"metadata_{k}"] = v
                 writer.writerow(row)
diffulex_bench/metrics.py-51-52 (1)

51-52: Add validation for empty input lists.

The length check at line 51-52 doesn't prevent both lists from being empty. If both predictions and ground_truths are empty lists, the length check passes but line 63 would perform 0 / 0 which returns 0.0 (due to the conditional), which may not accurately represent "no data to evaluate."

🔎 Proposed fix
     if len(predictions) != len(ground_truths):
         raise ValueError("Predictions and ground_truths must have the same length")
+    
+    if len(predictions) == 0:
+        raise ValueError("Cannot compute accuracy for empty lists")
diffulex_profiler/backends/pytorch.py-59-92 (1)

59-92: Add defensive check for current_name being None.

Line 66 uses self.current_name in the filename without verifying it's not None. If stop() is called without calling start() first (or if called multiple times), current_name could be None, resulting in a filename like pytorch_trace_None.json.

🔎 Proposed fix
         self.profiler.__exit__(None, None, None)
         
-        trace_file = self.output_dir / f"pytorch_trace_{self.current_name}.json"
+        name = self.current_name if self.current_name else "unknown"
+        trace_file = self.output_dir / f"pytorch_trace_{name}.json"
diffulex_bench/main.py-61-63 (1)

61-63: Quote or escape values containing commas, equals signs, or colons in model_args string.

Line 62 formats values directly into a comma-separated string without quoting or escaping. The lm-evaluation-harness parser expects key1=val1,key2=val2 format, but values containing commas, equals signs, or colons must be quoted (e.g., key="value,with,comma") or escaped. Without this, malformed values will break argument parsing or cause filename errors during evaluation.

Add quoting for values containing special characters:

args_list = [f'{k}={v if "," not in str(v) and "=" not in str(v) else f\'"{v}\"\'}' for k, v in args_dict.items()]

Or validate/escape at the point where config values are generated.

diffulex_bench/lm_eval_model.py-299-310 (1)

299-310: Placeholder loglikelihood returns may corrupt evaluation metrics.

Returning (0.0, False) for all requests will produce incorrect scores for any task that uses log-likelihood evaluation. The warning is good, but consider:

  1. Raising NotImplementedError to fail fast (like loglikelihood_rolling), or
  2. Logging at ERROR level to ensure visibility.

If certain tasks explicitly require this method, the current behavior will silently produce wrong results.

🧹 Nitpick comments (51)
examples/test_sdar_diffulex_gsm8k.py (1)

19-19: Consider making the profiling flag configurable.

Enabling profiling by default in an example script may surprise users with unexpected log files and performance overhead. Consider using an environment variable or command-line argument to control this flag.

🔎 Example using environment variable
-    PROFILE = True
+    PROFILE = os.getenv("PROFILE", "false").lower() == "true"
diffulex/attention/__init__.py (2)

20-24: Preserve the exception chain using from e.

The exception handling fix is correct and allows the error message to include exception details. However, consider preserving the exception chain by using raise ... from e, which makes debugging easier by showing the full traceback.

🔎 Proposed improvement to preserve exception chain
         try:
             from .attn_impl import Attention
             return Attention
         except Exception as e:
-            raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}")
+            raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}") from e

25-26: Consider removing unreachable code.

Since fetch_attn_metadata is defined at module level (line 14), normal attribute lookup will always find it before __getattr__ is called. These lines are unreachable and can be safely removed for clarity.

🔎 Proposed cleanup
         try:
             from .attn_impl import Attention
             return Attention
         except Exception as e:
             raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}")
-    if name == "fetch_attn_metadata":
-        return metadata.fetch_attn_metadata
     raise AttributeError(f"module {__name__} has no attribute {name}")
script/d2f_dream_eval_gsm8k.sh (4)

1-1: Consider using bash for better portability.

The script uses zsh but doesn't appear to require zsh-specific features. Using bash would improve portability across systems where zsh may not be installed.

🔎 Recommended portability improvement
-#!/usr/bin/zsh
+#!/usr/bin/env bash

9-12: Improve path handling and document GPU requirements.

The cache path uses $(pwd), which assumes the script is run from the repository root. The hardcoded 8 GPU configuration may not be available on all systems.

🔎 Recommended improvements
+# Ensure script runs from repository root
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+cd "$REPO_ROOT" || exit 1
+
-export HF_HOME="$(pwd)/cache"
+export HF_HOME="$REPO_ROOT/cache"
 export HF_DATASETS_CACHE="$HF_HOME/datasets"
 export HF_METRICS_CACHE="$HF_HOME/metrics"
+
+# Configure CUDA devices (adjust based on available GPUs)
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

Additionally, consider documenting the 8-GPU requirement in a comment or checking GPU availability.


1-16: Add error handling for more robust script execution.

The script lacks error-handling directives, which means failures may go unnoticed. Adding set directives and prerequisite checks would improve reliability.

🔎 Recommended error handling additions
 #!/usr/bin/env bash
+set -euo pipefail
+
+# Validate prerequisites
+if ! command -v python &> /dev/null; then
+  echo "Error: python not found in PATH" >&2
+  exit 1
+fi
+
+CONFIG_FILE="custom_configs/d2f_dream_eval_gsm8k.yml"
+if [[ ! -f "$CONFIG_FILE" ]]; then
+  echo "Error: Config file not found: $CONFIG_FILE" >&2
+  exit 1
+fi
+
+# Create log directory if it doesn't exist
+mkdir -p log
 
 export HF_HUB_OFFLINE=1

14-16: Consider using explicit python3 or version validation.

The script uses python which may resolve to Python 2 on some systems, while pyproject.toml requires Python >=3.12. Consider using python3 explicitly or validating the Python version.

🔎 Optional improvements
+# Ensure correct Python version
+PYTHON_CMD="python3"
+PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
+REQUIRED_VERSION="3.12"
+if ! python3 -c "import sys; sys.exit(0 if sys.version_info >= (3, 12) else 1)" 2>/dev/null; then
+  echo "Error: Python >= $REQUIRED_VERSION required, found $PYTHON_VERSION" >&2
+  exit 1
+fi
+
-python -m diffulex_bench.main \
+$PYTHON_CMD -m diffulex_bench.main \
     --config custom_configs/d2f_dream_eval_gsm8k.yml \
     2>&1 | tee log/d2f_dream_eval_gsm8k.log
diffulex_bench/README.md (1)

202-209: Output format section could be more detailed.

The output section is brief. Consider expanding with actual JSON/file examples to clarify the exact structure of evaluation results and metrics that users should expect to find in the output directory.

CONTRIBUTING.md (3)

26-28: Expand the "Ask Questions" section with guidance on format and resources.

This section is minimal. Consider adding guidance on:

  • Preferred format for questions (discussion vs. issues)
  • Where to ask questions (GitHub Discussions, Discord, etc. if applicable)
  • Expected response time
  • Links to documentation or FAQ

This will set clearer expectations for new contributors.


104-110: Add target dates or checkpoints for completing TBA sections.

The "Build Wheels" and "Documentation" sections are marked as TBA. Consider adding:

  • A target date or milestone for completion
  • A note on whether these are blocking vs. non-blocking for the v0.0.1 release
  • Links to related issues or PRs tracking these sections

This helps contributors understand project status and whether these gaps affect their workflow.


36-37: Add or reference a pull request template to reinforce contribution expectations.

The note about including tests and docs is good, but consider creating a .github/pull_request_template.md file to automatically remind contributors of these requirements when they open a PR.

profile/d2f_dream_profile.py (4)

15-16: Hardcoded absolute paths reduce portability.

These paths won't work outside of the specific development environment. Consider using environment variables, command-line arguments, or configuration files.

Suggested approach using environment variables
+import os
+
 def main():
-    model_path = "/data1/ckpts/Dream-org/Dream-v0-Base-7B"
-    lora_path = "/data1/ckpts/SJTU-Deng-Lab/D2F_Dream_Base_7B_Lora"
+    model_path = os.environ.get("DREAM_MODEL_PATH", "/data1/ckpts/Dream-org/Dream-v0-Base-7B")
+    lora_path = os.environ.get("D2F_LORA_PATH", "/data1/ckpts/SJTU-Deng-Lab/D2F_Dream_Base_7B_Lora")

18-19: Unused variable: output_dir is created but never used.

The directory is created at Line 19 but no output is written to it. Either remove this dead code or add functionality to write profiling results to files.


21-22: Use the new logger infrastructure instead of print().

This PR introduces a unified logging system (diffulex.logger), but this script uses print() statements throughout. For consistency and better observability, migrate to the logger.

Suggested migration to logger
 from diffulex import Diffulex, SamplingParams
+from diffulex.logger import get_logger
 from transformers import AutoTokenizer
 
+logger = get_logger(__name__)
+
 def main():
     # ...
-    print("Loading model...")
+    logger.info("Loading model...")
     # ...
-    print(f"Model loaded in {model_load_time:.2f} seconds")
+    logger.info(f"Model loaded in {model_load_time:.2f} seconds")

Also applies to: 41-41, 52-52, 63-74, 76-83


58-61: Guard against empty outputs more robustly.

Line 60's check if outputs else 0 prevents division by zero, but num_outputs on Line 59 could still be 0 if outputs is an empty list. The subsequent loop (Lines 77-83) would simply not execute, but the printed metrics would be misleading (0 prompts, 0 tokens, 0.00 throughput).

Add early return for empty outputs
     outputs = llm.generate(prompts, sampling_params)
     inference_time = time.time() - inference_start
     
+    if not outputs:
+        logger.warning("No outputs generated")
+        return
+    
     total_tokens = sum(len(o.get('token_ids', [])) for o in outputs)
     num_outputs = len(outputs)
-    avg_diff_steps = sum(o.get('n_diff_steps', 0) for o in outputs) / num_outputs if outputs else 0
+    avg_diff_steps = sum(o.get('n_diff_steps', 0) for o in outputs) / num_outputs
format.sh (2)

101-103: Unquoted variable expansion may cause issues with special filenames.

While intentional for passing multiple files, filenames with spaces or special characters could break. Consider using arrays or null-delimited processing for robustness.

Alternative using mapfile for safer handling
     if [[ -n "${CHANGED_FILES}" ]]; then
         echo "Running pre-commit on changed files:"
         echo "${CHANGED_FILES}"
-        # Convert newline-separated files to space-separated and run pre-commit once
-        CHANGED_FILES_SPACE="$(echo "${CHANGED_FILES}" | tr '\n' ' ')"
-        python3 -m pre_commit run --files ${CHANGED_FILES_SPACE}
+        # Use mapfile to safely handle filenames
+        mapfile -t FILES_ARRAY <<< "${CHANGED_FILES}"
+        python3 -m pre_commit run --files "${FILES_ARRAY[@]}"
     else

133-136: clang_tidy_all only checks src/*.cc files.

This misses other C/C++ extensions (.c, .cpp, .h, .hpp) and subdirectories. The changed files function on Lines 141 handles more extensions.

Include all C/C++ source extensions and subdirectories
     clang_tidy_all() {
-        run-clang-tidy -j 64 src/*.cc -p build
+        fd -e c -e cc -e cpp -e h -e hpp . src --exec-batch run-clang-tidy -j 64 -p build
     }
diffulex/logger.py (1)

14-14: Unused imports: Progress, SpinnerColumn, TextColumn, BarColumn, TimeElapsedColumn.

These Rich progress-related imports are not used in the module.

Remove unused imports
 try:
     from rich.console import Console
     from rich.logging import RichHandler
     from rich.traceback import install as install_rich_traceback
-    from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TimeElapsedColumn
     RICH_AVAILABLE = True
 except ImportError:
     RICH_AVAILABLE = False
diffulex_bench/runner.py (1)

100-106: Consider making the initialization wait time configurable.

The hardcoded 2-second sleep may not be optimal for all deployment scenarios—faster systems may not need this delay, while slower systems or larger models may require more time.

🔎 Proposed refactor to make wait time configurable
-                        # Give subprocesses a bit more time to complete initialization
-                        # The main process initialization is synchronous, but subprocesses
-                        # may still be initializing (model loading, warmup, etc.)
-                        # Subprocesses will synchronize via barrier in ModelRunnerBase.__init__
-                        # So we just need to wait a bit for them to complete initialization
-                        time.sleep(2.0)  # Wait a bit for subprocess initialization
+                        # Give subprocesses time to complete initialization
+                        # Subprocesses synchronize via barrier in ModelRunnerBase.__init__
+                        initialization_wait = getattr(self.llm, 'initialization_wait', 2.0)
+                        time.sleep(initialization_wait)

Alternatively, you could add an initialization_wait parameter to _wait_for_ready:

-    def _wait_for_ready(self, timeout: float = 300.0, check_interval: float = 0.5):
+    def _wait_for_ready(self, timeout: float = 300.0, check_interval: float = 0.5, initialization_wait: float = 2.0):
         """
         Wait for the Diffulex engine to be fully initialized and ready
         
         Args:
             timeout: Maximum time to wait in seconds
             check_interval: Interval between readiness checks in seconds
+            initialization_wait: Time to wait for subprocess initialization in seconds
         """
diffulex/sampler/base.py (1)

98-100: Optional: Remove redundant warning before exception.

The warning at line 99 is immediately followed by raising an exception at line 100, making the warning somewhat redundant. Consider either removing the warning or restructuring the logic if the warning serves a specific observability purpose before the exception propagates.

🔎 Proposed refactor
 def _shift_logits(self, logits, last_logit=None):
     if logits.shape[1] == 0:
-        logger.warning("Logits sequence length is 0, returning empty logits")
         raise Exception("logits sequence length is 0")
diffulex_bench/configs/dream_d2f_gsm8k.yml (1)

5-5: Document or validate the placeholder model path.

The model_path contains a placeholder value "/path/to/dream/model" that users must replace. Ensure this is clearly documented in README or example documentation, or consider adding validation that produces a helpful error message if the placeholder path is used.

diffulex/strategy/fast_dllm_v2/engine/sequence.py (1)

24-25: Consider adding status management methods to FDV2SubBlock.

FDV2SubBlock now has a status field but lacks the status transition methods and property accessors that FDV2Block provides (e.g., to_cache(), in_cache(), is_active, is_to_cache). This asymmetry could lead to inconsistent usage patterns and direct field manipulation instead of controlled state transitions.

🔎 Proposed enhancement with status management methods
 @dataclass
 class FDV2SubBlock:
     sub_block_id: int = 0
     status: FDV2SubBlockStatus = FDV2SubBlockStatus.ACTIVE
+    
+    def to_dual_cache(self) -> None:
+        """Transition to TO_DUAL_CACHE status."""
+        if self.status == FDV2SubBlockStatus.ACTIVE:
+            self.status = FDV2SubBlockStatus.TO_DUAL_CACHE
+    
+    def in_dual_cache(self) -> None:
+        """Transition to IN_DUAL_CACHE status."""
+        if self.status == FDV2SubBlockStatus.TO_DUAL_CACHE:
+            self.status = FDV2SubBlockStatus.IN_DUAL_CACHE
+    
+    @property
+    def is_active(self) -> bool:
+        return self.status == FDV2SubBlockStatus.ACTIVE
+    
+    @property
+    def is_to_dual_cache(self) -> bool:
+        return self.status == FDV2SubBlockStatus.TO_DUAL_CACHE
+    
+    @property
+    def is_in_dual_cache(self) -> bool:
+        return self.status == FDV2SubBlockStatus.IN_DUAL_CACHE
diffulex_bench/configs/example.yml (1)

32-35: Enhance documentation for D2F threshold parameters.

The threshold parameters (accept_threshold, complete_threshold, add_new_block_threshold) would benefit from more detailed comments explaining their purpose and impact on D2F decoding behavior.

🔎 Proposed enhancement with detailed comments
   # D2F-specific configuration
-  accept_threshold: 0.9
-  complete_threshold: 0.95
-  add_new_block_threshold: 0.1
+  accept_threshold: 0.9  # Confidence threshold for accepting generated tokens (0.0-1.0)
+  complete_threshold: 0.95  # Threshold for considering a block complete (0.0-1.0)
+  add_new_block_threshold: 0.1  # Threshold for adding new diffusion blocks (0.0-1.0)
   diffusion_block_size: 32
diffulex_profiler/backends/simple.py (3)

17-20: Consider handling double-start scenario.

Calling start() twice without an intervening stop() will silently overwrite the previous timing session. Consider either raising an exception or logging a warning to help catch potential misuse.

🔎 Proposed enhancement with double-start detection
     def start(self, name: str) -> None:
         """Start timing."""
+        if self.start_time is not None:
+            # Either raise an exception or log a warning
+            import warnings
+            warnings.warn(f"start() called while timing '{self.current_name}' is still active. Previous timing will be lost.")
         self.current_name = name
         self.start_time = time.perf_counter()

38-43: Context manager usage lacks name parameter.

The context manager implementation doesn't call start() in __enter__, so current_name will be None when used with a with statement. This limits the usefulness of the context manager pattern.

Consider either:

  1. Requiring the name to be passed to __init__ when using as a context manager, or
  2. Accepting an optional name parameter in __enter__, or
  3. Documenting that users should call start() explicitly even when using the context manager
🔎 Proposed fix: Accept name in __init__ for context manager usage
-    def __init__(self):
+    def __init__(self, name: Optional[str] = None):
         self.start_time: Optional[float] = None
-        self.current_name: Optional[str] = None
+        self.current_name: Optional[str] = name
     
     def start(self, name: str) -> None:
         """Start timing."""
         self.current_name = name
         self.start_time = time.perf_counter()
     
     def stop(self) -> Optional[Dict[str, Any]]:
         """Stop timing and return duration."""
         if self.start_time is None:
             return None
         
         duration = time.perf_counter() - self.start_time
         result = {
             "duration_sec": duration,
             "name": self.current_name,
         }
         
         self.start_time = None
         self.current_name = None
         
         return result
     
     def __enter__(self):
+        if self.current_name is not None:
+            self.start_time = time.perf_counter()
         return self
     
     def __exit__(self, exc_type, exc_val, exc_tb):
         if self.start_time is not None:
             self.stop()

39-39: Add return type hint to __enter__.

🔎 Proposed fix
-    def __enter__(self):
+    def __enter__(self) -> "SimpleTimerBackend":
         return self
diffulex/diffulex.py (1)

1-1: Unused import: Config is no longer referenced.

The Config import on line 1 is no longer used after the refactoring.

🔎 Proposed cleanup
-from diffulex.config import Config
 from diffulex.engine.dp_worker import DiffulexDPWorker
 from diffulex.engine.tp_worker import DiffulexTPWorker
diffulex_bench/arg_parser.py (1)

182-192: Simplify the save-results argument pattern.

The combination of action="store_true" with default=True on line 184 is redundant. When default=True, the --save-results flag becomes unnecessary since the behavior is already enabled by default.

🔎 Suggested refactor to remove redundancy
-    parser.add_argument(
-        "--save-results",
-        action="store_true",
-        default=True,
-        help="Save results to file",
-    )
     parser.add_argument(
         "--no-save-results",
         dest="save_results",
         action="store_false",
+        default=True,
         help="Do not save results to file",
     )

This pattern is clearer: results are saved by default, and users explicitly opt out with --no-save-results.

diffulex_profiler/exporters/__init__.py (1)

14-18: Consider adding debug logging for the optional import.

The silent failure when CSVExporter is unavailable is a standard pattern for optional dependencies. However, given that this PR introduces comprehensive logging infrastructure, consider adding a debug-level log message when the import fails to aid troubleshooting.

🔎 Optional enhancement for observability
+from diffulex.logger import get_logger
+
+logger = get_logger(__name__)
+
 try:
     from diffulex_profiler.exporters.csv import CSVExporter
     __all__.append("CSVExporter")
 except ImportError:
+    logger.debug("CSVExporter not available (optional dependency missing)")
     pass
diffulex_profiler/exporters/json.py (1)

27-42: Consider explicit None checks for clarity.

Lines 32-33 use truthy checks (if m.duration, if m.total_tokens) which filter out both None and 0 values. If 0 is a valid value that should be included in the sum (e.g., a section with zero duration or zero tokens), this could produce incorrect totals.

🔎 Suggested refactor for explicit None handling
-        total_duration = sum(m.duration for m in metrics if m.duration)
-        total_tokens = sum(m.total_tokens for m in metrics if m.total_tokens)
+        total_duration = sum(m.duration for m in metrics if m.duration is not None)
+        total_tokens = sum(m.total_tokens for m in metrics if m.total_tokens is not None)

This makes the intent clearer: skip None values but include 0 values in the sum.

diffulex/config.py (2)

40-40: Minor style improvement: use list instead of lambda: [].

The default_factory=lambda: [] is functionally correct, but default_factory=list is more idiomatic and concise in Python.

🔎 Suggested style improvement
-    device_ids: list[int] = field(default_factory=lambda: [])
+    device_ids: list[int] = field(default_factory=list)

70-77: Add error handling for CUDA_VISIBLE_DEVICES parsing.

The parsing of CUDA_VISIBLE_DEVICES on lines 72-73 will raise ValueError if the environment variable contains non-numeric values (e.g., due to misconfiguration). Consider adding error handling or validation to provide a clearer error message.

🔎 Suggested error handling
         if not self.device_ids:
             import torch
-            self.device_ids = (
-                [int(x) for x in os.environ.get("CUDA_VISIBLE_DEVICES", "").split(",") if x.strip()]
-                if os.environ.get("CUDA_VISIBLE_DEVICES", "")
-                else list(range(torch.cuda.device_count()))
-            )
+            cuda_visible = os.environ.get("CUDA_VISIBLE_DEVICES", "")
+            if cuda_visible:
+                try:
+                    self.device_ids = [int(x) for x in cuda_visible.split(",") if x.strip()]
+                except ValueError as e:
+                    raise ValueError(
+                        f"Invalid CUDA_VISIBLE_DEVICES value '{cuda_visible}': {e}"
+                    ) from e
+            else:
+                self.device_ids = list(range(torch.cuda.device_count()))
             logger.info(f"Using CUDA devices: {self.device_ids}")
diffulex_profiler/exporters/csv.py (1)

22-29: Minor: Use a set literal for initialization.

The initialization set(["name", ...]) can be simplified to a set literal {"name", ...} for better readability.

🔎 Proposed fix
-        fieldnames = set(["name", "duration_sec", "total_tokens", "throughput_tokens_per_sec"])
+        fieldnames = {"name", "duration_sec", "total_tokens", "throughput_tokens_per_sec"}
diffulex/engine/dp_worker.py (1)

29-29: Remove or document commented-out code.

The commented-out CUDA_VISIBLE_DEVICES assignment should either be removed or accompanied by a comment explaining why it's preserved.

diffulex_profiler/README.md (1)

134-149: Add missing import in PyTorch Profiler example.

The example references ProfilerActivity.CPU and ProfilerActivity.CUDA without showing the required import statement.

🔎 Proposed fix
 ### PyTorch Profiler Backend

 For GPU/CPU operation-level profiling. Built into PyTorch.

 ```python
+from torch.profiler import ProfilerActivity
+
 profiler = DiffulexProfiler(
     config=ProfilerConfig(
         backend="pytorch",
         pytorch_profiler_config={
             "activities": [ProfilerActivity.CPU, ProfilerActivity.CUDA],
             "record_shapes": True,
             "profile_memory": True,
         }
     )
 )
</details>

</blockquote></details>
<details>
<summary>diffulex_profiler/backends/__init__.py (1)</summary><blockquote>

`12-23`: **Consider logging when optional backends fail to import.**

Silent `pass` on `ImportError` makes it harder to debug when a backend is unexpectedly unavailable. A debug-level log message would help troubleshoot missing dependencies without cluttering normal output.

<details>
<summary>🔎 Proposed fix</summary>

```diff
+import logging
+_logger = logging.getLogger(__name__)
+
 # Optional backends
 try:
     from diffulex_profiler.backends.viztracer import VizTracerBackend
     __all__.append("VizTracerBackend")
 except ImportError:
-    pass
+    _logger.debug("VizTracerBackend not available: viztracer not installed")

 try:
     from diffulex_profiler.backends.pytorch import PyTorchProfilerBackend
     __all__.append("PyTorchProfilerBackend")
 except ImportError:
-    pass
+    _logger.debug("PyTorchProfilerBackend not available")
diffulex_bench/__init__.py (1)

12-35: Refactor to avoid duplicating __all__ list.

The __all__ list is duplicated between the try/except branches, violating DRY. Define the base list once and conditionally append.

🔎 Proposed fix
+__all__ = [
+    "BenchmarkRunner",
+    "load_benchmark_dataset",
+    "compute_metrics",
+    "setup_logger",
+    "get_logger",
+    "BenchmarkConfig",
+    "EngineConfig",
+    "EvalConfig",
+]
+
 # Import lm_eval model to register it
 try:
     from diffulex_bench.lm_eval_model import DiffulexLM
-    __all__ = [
-        "BenchmarkRunner",
-        "load_benchmark_dataset",
-        "compute_metrics",
-        "setup_logger",
-        "get_logger",
-        "BenchmarkConfig",
-        "EngineConfig",
-        "EvalConfig",
-        "DiffulexLM",
-    ]
+    __all__.append("DiffulexLM")
 except ImportError:
-    __all__ = [
-        "BenchmarkRunner",
-        "load_benchmark_dataset",
-        "compute_metrics",
-        "setup_logger",
-        "get_logger",
-        "BenchmarkConfig",
-        "EngineConfig",
-        "EvalConfig",
-    ]
+    pass
diffulex_bench/report.py (1)

30-30: Consider removing the unnecessary lambda or use it consistently.

Line 30 defines append_line as a lambda, but line 52 directly uses report_lines.append instead. Either remove the lambda and use report_lines.append throughout, or use append_line consistently.

🔎 Suggested simplification

Remove the lambda and use report_lines.append directly:

-    append_line = lambda line: report_lines.append(line)
-    append_line("=" * 80)
+    report_lines.append("=" * 80)

Or use append_line consistently at line 52:

-        report_lines.append(f"  Accuracy: {metrics['accuracy']:.4f}")
+        append_line(f"  Accuracy: {metrics['accuracy']:.4f}")

Also applies to: 52-52

diffulex_profiler/backends/viztracer.py (1)

53-67: Add validation for the output_file attribute access.

Line 59 accesses self.tracer.output_file without verifying that this attribute exists. While VizTracer likely provides this attribute, defensive programming would add a fallback.

🔎 Suggested defensive code
         self.tracer.stop()
-        output_file = self.tracer.output_file
+        output_file = getattr(self.tracer, 'output_file', 'unknown')
         
         result = {
             "backend": "viztracer",
diffulex_bench/metrics.py (1)

66-83: Placeholder implementation for HumanEval evaluation.

This function is a placeholder and returns None. The comments correctly note that actual implementation requires a code execution environment.

Do you want me to open an issue to track implementing HumanEval evaluation with Docker-based code execution, or would you like suggestions for safe code execution frameworks?

diffulex_bench/main.py (3)

116-155: Consider thread-safety implications of sys.argv manipulation.

The function modifies sys.argv globally (lines 120-127), which is not thread-safe. If multiple benchmarks run concurrently in different threads, they could interfere with each other's arguments.

If concurrent benchmark execution is planned, consider:

  1. Using lm_eval's programmatic API instead of CLI if available
  2. Adding a lock around sys.argv manipulation
  3. Running benchmarks in separate processes rather than threads

217-226: Hardcoded defaults in getattr may diverge from actual defaults.

Lines 217-226 use getattr with hardcoded defaults (e.g., max_num_batched_tokens=4096). If the argument parser or EngineConfig has different defaults, they could become inconsistent over time.

Consider either:

  1. Defining these defaults in a central location (e.g., a constants module)
  2. Using the defaults from EngineConfig.__init__ directly
  3. Ensuring arg_parser provides all required arguments with proper defaults

This reduces maintenance burden when defaults change.


182-190: Add error handling for config loading.

The config loading from YAML/JSON files (lines 184-186) lacks error handling. If the config file is malformed or the from_yaml/from_json methods fail, the error won't be user-friendly.

🔎 Suggested error handling
     if config_path and config_path.exists():
-        if config_path.suffix in ['.yaml', '.yml']:
-            config = BenchmarkConfig.from_yaml(str(config_path))
-        elif config_path.suffix == '.json':
-            config = BenchmarkConfig.from_json(str(config_path))
-        else:
+        try:
+            if config_path.suffix in ['.yaml', '.yml']:
+                config = BenchmarkConfig.from_yaml(str(config_path))
+            elif config_path.suffix == '.json':
+                config = BenchmarkConfig.from_json(str(config_path))
+            else:
+                logger.error(f"Unsupported config file format: {config_path.suffix}")
+                sys.exit(1)
+        except Exception as e:
-            logger.error(f"Unsupported config file format: {config_path.suffix}")
+            logger.error(f"Failed to load config from {config_path}: {e}")
             sys.exit(1)
diffulex_profiler/metrics.py (2)

6-6: Unused import: time

The time module is imported but not used in this file. It appears to be used in profiler.py instead.

🔎 Proposed fix
-import time

79-80: Silent exception swallowing hinders debugging.

Bare except Exception: pass blocks silently discard all errors. While graceful degradation is appropriate here, consider logging at DEBUG level to aid troubleshooting when metrics unexpectedly return empty.

Also applies to: 95-96, 111-112

diffulex_profiler/profiler.py (1)

188-196: Clarify record_throughput usage timing in docstring.

If called before stop(), self.current_metrics.duration will be 0.0, causing the throughput to not be recorded. This is handled correctly by the guard, but the docstring should clarify that this method is intended to be called after stop() or with an explicit duration argument.

🔎 Proposed docstring enhancement
     def record_throughput(self, tokens: int, duration: Optional[float] = None):
-        """Record throughput in tokens per second."""
+        """Record throughput in tokens per second.
+        
+        Args:
+            tokens: Number of tokens generated.
+            duration: Duration in seconds. If None, uses the duration from the
+                      current profiling section (requires stop() to have been called).
+        """
diffulex_bench/config.py (2)

47-50: from_dict will fail on unknown keys.

Using cls(**config_dict) directly will raise TypeError if the dictionary contains keys not defined in the dataclass. This makes config evolution fragile—adding a new field in a YAML file would break older code.

Consider filtering to known fields:

🔎 Proposed fix for EngineConfig
     @classmethod
     def from_dict(cls, config_dict: Dict[str, Any]) -> "EngineConfig":
         """Create engine configuration from dictionary"""
-        return cls(**config_dict)
+        valid_fields = {f.name for f in cls.__dataclass_fields__.values()}
+        filtered = {k: v for k, v in config_dict.items() if k in valid_fields}
+        return cls(**filtered)

Apply the same pattern to EvalConfig.from_dict.

Also applies to: 102-105


143-159: Flat config backward compatibility may silently misroute unknown keys.

Keys not in engine_fields are assumed to be eval fields. If a typo or unknown key exists, it will be passed to EvalConfig.from_dict and cause a failure there. Consider logging a warning for unrecognized keys.

diffulex_profiler/__init__.py (1)

25-40: __all__ includes potentially unavailable symbols.

If VizTracerBackend, PyTorchProfilerBackend, or CSVExporter aren't importable due to missing dependencies, listing them in __all__ could be misleading. Consider dynamically building __all__ based on what's actually available.

diffulex_bench/lm_eval_model.py (2)

5-5: Unused import and variable: logging module and eval_logger.

eval_logger is created but never used. The class uses self.logger (from get_logger) instead.

🔎 Proposed fix
-import logging
 import time
 import json
 ...
-eval_logger = logging.getLogger(__name__)

Also applies to: 21-21


67-68: Prefer explicit exceptions over assert for argument validation.

assert statements are stripped when Python runs with -O (optimized mode), which could allow invalid arguments to pass silently in production.

🔎 Proposed fix
-        assert isinstance(pretrained, str)
-        assert isinstance(batch_size, (int, str))
+        if not isinstance(pretrained, str):
+            raise TypeError(f"pretrained must be str, got {type(pretrained)}")
+        if not isinstance(batch_size, (int, str)):
+            raise TypeError(f"batch_size must be int or str, got {type(batch_size)}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 535e296 and 7e65c0b.

📒 Files selected for processing (61)
  • .editorconfig
  • .gitattributes
  • .github/dependabot.yml
  • .github/workflows/pr-perfbench-bot.yml
  • .github/workflows/pr-reminder-bot.yml
  • .gitignore
  • .pre-commit-config.yaml
  • .pymarkdown
  • CODE_OF_CONDUCT.md
  • CONTRIBUTING.md
  • Tilelang-failed_test_cases
  • diffulex/__init__.py
  • diffulex/attention/__init__.py
  • diffulex/config.py
  • diffulex/diffulex.py
  • diffulex/engine/dp_worker.py
  • diffulex/engine/model_runner.py
  • diffulex/engine/tp_worker.py
  • diffulex/logger.py
  • diffulex/model/config/dream/configuration_dream.py
  • diffulex/model/config/fast_dllm_v2/configuration_fast_dllm_v2.py
  • diffulex/model/config/sdar/configuration_sdar.py
  • diffulex/sampler/base.py
  • diffulex/strategy/fast_dllm_v2/engine/sequence.py
  • diffulex/utils/loader.py
  • diffulex_bench/README.md
  • diffulex_bench/__init__.py
  • diffulex_bench/arg_parser.py
  • diffulex_bench/config.py
  • diffulex_bench/configs/__init__.py
  • diffulex_bench/configs/dream_d2f_gsm8k.yml
  • diffulex_bench/configs/example.yml
  • diffulex_bench/datasets.py
  • diffulex_bench/lm_eval_model.py
  • diffulex_bench/logger.py
  • diffulex_bench/main.py
  • diffulex_bench/metrics.py
  • diffulex_bench/report.py
  • diffulex_bench/runner.py
  • diffulex_profiler/README.md
  • diffulex_profiler/__init__.py
  • diffulex_profiler/backends/__init__.py
  • diffulex_profiler/backends/base.py
  • diffulex_profiler/backends/pytorch.py
  • diffulex_profiler/backends/simple.py
  • diffulex_profiler/backends/viztracer.py
  • diffulex_profiler/example.py
  • diffulex_profiler/exporters/__init__.py
  • diffulex_profiler/exporters/base.py
  • diffulex_profiler/exporters/csv.py
  • diffulex_profiler/exporters/json.py
  • diffulex_profiler/exporters/summary.py
  • diffulex_profiler/metrics.py
  • diffulex_profiler/profiler.py
  • docs/make.bat
  • examples/test_dream_diffulex_gsm8k.py
  • examples/test_sdar_diffulex_gsm8k.py
  • format.sh
  • profile/d2f_dream_profile.py
  • pyproject.toml
  • script/d2f_dream_eval_gsm8k.sh
💤 Files with no reviewable changes (1)
  • Tilelang-failed_test_cases

Comment on lines +59 to +67
## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
[leiwang1999@outlook.com](mailto:leiwang1999@outlook.com)
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace personal email with a project or organizational email for incident reporting.

Using a personal email address (leiwang1999@outlook.com) for Code of Conduct violation reports exposes the maintainer to privacy and security risks. Additionally, this email may become unavailable if the maintainer leaves the project.

Consider using:

  • A dedicated project email address (e.g., conduct@diffulex.dev or similar)
  • An organizational email alias or distribution list
  • A moderation form/ticketing system (if available)
  • A shared mailbox managed by multiple maintainers

This ensures continuity, shared responsibility, and protects individual privacy.

🤖 Prompt for AI Agents
In @CODE_OF_CONDUCT.md around lines 59 - 67, The Enforcement section currently
lists a personal email (leiwang1999@outlook.com); replace that personal address
with a non-personal project/organizational contact such as a dedicated project
email (e.g., conduct@diffulex.dev), an alias/distribution list, or a link to a
moderation/ticketing form, and update the line in the "## Enforcement" block to
use that shared address or form so incident reports are routed to a shared
mailbox or team rather than an individual's personal account.

Comment on lines +26 to +47
dataset = load_dataset("gsm8k", "main", split=split)

if limit:
dataset = dataset[:limit]

results = []
for item in dataset:
question = item["question"]
answer = item["answer"]

if prompt_template:
prompt = prompt_template(question)
else:
# Default template
prompt = f"Question: {question}\nAnswer:"

results.append({
'prompt': prompt,
'answer': answer,
'question': question,
})

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Dataset slicing returns batch format, breaking iteration.

When limit is applied, dataset[:limit] returns a dictionary of lists (batch format), not a sliced Dataset. The subsequent for item in dataset will iterate over column names (strings like "question", "answer"), causing incorrect behavior.

Fix using proper Dataset slicing
 def load_gsm8k(
     split: str = "test",
     limit: Optional[int] = None,
     prompt_template: Optional[Callable[[str], str]] = None,
 ) -> List[Dict[str, Any]]:
     """..."""
     dataset = load_dataset("gsm8k", "main", split=split)
     
     if limit:
-        dataset = dataset[:limit]
+        dataset = dataset.select(range(min(limit, len(dataset))))
     
     results = []
     for item in dataset:
🤖 Prompt for AI Agents
In @diffulex_bench/datasets.py around lines 26 - 47, The slicing bug arises
because dataset[:limit] returns a batched dict instead of a Dataset; replace
that line with a proper Dataset slice using the huggingface Dataset.select API
(e.g., call dataset.select(range(limit))) so that the subsequent for item in
dataset iterates over dataset examples; update the code where the variable
dataset is reassigned (the block using dataset = dataset[:limit]) to use dataset
= dataset.select(range(limit)) and keep the existing logic for prompt_template
and results unchanged.

Comment on lines +65 to +88
dataset = load_dataset("openai/humaneval", split="test")

if limit:
dataset = dataset[:limit]

results = []
for item in dataset:
prompt = item["prompt"]
test = item["test"]
entry_point = item["entry_point"]

if prompt_template:
full_prompt = prompt_template(prompt)
else:
full_prompt = prompt

results.append({
'prompt': full_prompt,
'original_prompt': prompt,
'test': test,
'entry_point': entry_point,
'task_id': item.get('task_id', ''),
})

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same slicing bug in load_humaneval.

The dataset[:limit] slicing has the same issue as in load_gsm8k.

Fix using proper Dataset slicing
     dataset = load_dataset("openai/humaneval", split="test")
     
     if limit:
-        dataset = dataset[:limit]
+        dataset = dataset.select(range(min(limit, len(dataset))))
     
     results = []
     for item in dataset:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dataset = load_dataset("openai/humaneval", split="test")
if limit:
dataset = dataset[:limit]
results = []
for item in dataset:
prompt = item["prompt"]
test = item["test"]
entry_point = item["entry_point"]
if prompt_template:
full_prompt = prompt_template(prompt)
else:
full_prompt = prompt
results.append({
'prompt': full_prompt,
'original_prompt': prompt,
'test': test,
'entry_point': entry_point,
'task_id': item.get('task_id', ''),
})
dataset = load_dataset("openai/humaneval", split="test")
if limit:
dataset = dataset.select(range(min(limit, len(dataset))))
results = []
for item in dataset:
prompt = item["prompt"]
test = item["test"]
entry_point = item["entry_point"]
if prompt_template:
full_prompt = prompt_template(prompt)
else:
full_prompt = prompt
results.append({
'prompt': full_prompt,
'original_prompt': prompt,
'test': test,
'entry_point': entry_point,
'task_id': item.get('task_id', ''),
})
🤖 Prompt for AI Agents
In @diffulex_bench/datasets.py around lines 65 - 88, The current use of Python
slice syntax dataset[:limit] in load_humaneval does not properly slice a
HuggingFace Dataset; replace that with the Dataset.select-based approach (e.g.,
use dataset.select(range(limit)) or dataset.select(list(range(limit)))) so the
dataset is correctly truncated when limit is provided; update the branch that
checks if limit and references dataset[:limit] to call select instead.

Comment on lines +22 to +23
with open(results_file, 'r', encoding='utf-8') as f:
results = json.load(f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for file operations.

The file reading lacks error handling for common exceptions like FileNotFoundError and json.JSONDecodeError, which would cause the function to crash rather than fail gracefully.

🔎 Proposed fix
-    with open(results_file, 'r', encoding='utf-8') as f:
-        results = json.load(f)
+    try:
+        with open(results_file, 'r', encoding='utf-8') as f:
+            results = json.load(f)
+    except FileNotFoundError:
+        raise FileNotFoundError(f"Results file not found: {results_file}")
+    except json.JSONDecodeError as e:
+        raise ValueError(f"Invalid JSON in results file: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(results_file, 'r', encoding='utf-8') as f:
results = json.load(f)
try:
with open(results_file, 'r', encoding='utf-8') as f:
results = json.load(f)
except FileNotFoundError:
raise FileNotFoundError(f"Results file not found: {results_file}")
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON in results file: {e}")
🤖 Prompt for AI Agents
In @diffulex_bench/report.py around lines 22 - 23, Wrap the open/results
json.load block that reads results_file in a try/except to handle
FileNotFoundError, PermissionError and json.JSONDecodeError: catch these
exceptions around the with open(...) and json.load(...) lines, log a clear error
via the module logger (or import logging if none exists), and return a sensible
fallback (e.g., an empty dict or None) or re-raise a wrapped exception with
context so the function fails gracefully; keep references to the existing
variable results_file and the json.load call so the change is localized.

Comment on lines +84 to +86
for result_file in result_files:
with open(result_file, 'r', encoding='utf-8') as f:
results = json.load(f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling and validation for file operations.

The loop lacks error handling for file operations and doesn't validate that result_files is non-empty. If one file is corrupted or missing, the entire comparison fails without providing partial results.

🔎 Proposed fix with error handling
+    if not result_files:
+        raise ValueError("result_files cannot be empty")
+    
     for result_file in result_files:
-        with open(result_file, 'r', encoding='utf-8') as f:
-            results = json.load(f)
+        try:
+            with open(result_file, 'r', encoding='utf-8') as f:
+                results = json.load(f)
+        except (FileNotFoundError, json.JSONDecodeError) as e:
+            logger.warning(f"Skipping {result_file}: {e}")
+            continue

Note: This requires importing the logger. Alternatively, you could raise the exception to fail fast.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +165 to +169
if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end:
self.current_metrics.gpu_utilization = (
self.current_metrics.gpu_metrics_end.get("utilization", 0) -
self.current_metrics.gpu_metrics_start.get("utilization", 0)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GPU utilization delta is semantically incorrect.

GPU utilization is a point-in-time percentage (e.g., "80% busy right now"), not a cumulative value. Subtracting start_utilization from end_utilization doesn't produce meaningful data—it could even be negative.

Consider either:

  1. Storing both start/end utilization for comparison, or
  2. Computing average utilization during the profiled section (requires sampling), or
  3. Simply storing the end utilization as a representative snapshot.
🔎 Proposed fix (store end utilization as snapshot)
         if self.config.collect_gpu_metrics and torch.cuda.is_available():
             self.current_metrics.gpu_metrics_end = collect_gpu_metrics()
-            if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end:
-                self.current_metrics.gpu_utilization = (
-                    self.current_metrics.gpu_metrics_end.get("utilization", 0) -
-                    self.current_metrics.gpu_metrics_start.get("utilization", 0)
-                )
+            if self.current_metrics.gpu_metrics_end:
+                self.current_metrics.gpu_utilization = (
+                    self.current_metrics.gpu_metrics_end.get("utilization", 0)
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end:
self.current_metrics.gpu_utilization = (
self.current_metrics.gpu_metrics_end.get("utilization", 0) -
self.current_metrics.gpu_metrics_start.get("utilization", 0)
)
if self.current_metrics.gpu_metrics_end:
self.current_metrics.gpu_utilization = (
self.current_metrics.gpu_metrics_end.get("utilization", 0)
)
🤖 Prompt for AI Agents
In @diffulex_profiler/profiler.py around lines 165 - 169, The code is
incorrectly computing a delta for GPU utilization by subtracting
self.current_metrics.gpu_metrics_start from gpu_metrics_end; change this to
store a representative snapshot instead: set
self.current_metrics.gpu_utilization to the end snapshot value (e.g.,
self.current_metrics.gpu_metrics_end.get("utilization", 0)) or alternatively
preserve both self.current_metrics.gpu_metrics_start and gpu_metrics_end on the
metrics object for later comparison; update any consumers of gpu_utilization
accordingly to expect a snapshot rather than a delta.

Comment on lines +44 to +47
def format(self, record):
log_color = self.COLORS.get(record.levelname, '')
record.levelname = f"{log_color}{record.levelname}{self.RESET}"
return super().format(record)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mutating record.levelname affects all handlers sharing the LogRecord.

The format() method modifies record.levelname in place. If multiple handlers (e.g., console + file) process the same record, the file handler will receive the colorized levelname, corrupting log files with ANSI codes.

Create a copy or use a separate attribute
     def format(self, record):
+        # Create a copy to avoid mutating the original record
+        record = logging.makeLogRecord(record.__dict__)
         log_color = self.COLORS.get(record.levelname, '')
         record.levelname = f"{log_color}{record.levelname}{self.RESET}"
         return super().format(record)
🤖 Prompt for AI Agents
In @diffulex/logger.py around lines 44 - 47, The format() method currently
mutates record.levelname which pollutes other handlers; instead do not modify
record.levelname in place — create a colored string locally (e.g., using
self.COLORS and self.RESET) and either assign it to a new attribute on the
record such as levelname_colored or operate on a shallow copy of the LogRecord,
and update the formatter pattern to use that new attribute (or use the copy) so
other handlers receive the original record.levelname unchanged; make this change
in the format() method (referencing format, self.COLORS, self.RESET, and
record).

Comment on lines +57 to +72
get_merge_base() {
UPSTREAM_REPO="https://github.com/tile-ai/tilelang"
if git ls-remote --exit-code "${UPSTREAM_REPO}" main &>/dev/null; then
# First try to use the upstream repository directly
MERGE_BASE="$(git fetch "${UPSTREAM_REPO}" main &>/dev/null && git merge-base FETCH_HEAD HEAD)"
elif git show-ref --verify --quiet refs/remotes/origin/main; then
# Fall back to origin/main if available
BASE_BRANCH="origin/main"
MERGE_BASE="$(git merge-base "${BASE_BRANCH}" HEAD)"
else
# Last resort, use local main
BASE_BRANCH="main"
MERGE_BASE="$(git merge-base "${BASE_BRANCH}" HEAD)"
fi
echo "${MERGE_BASE}"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual remote origin and repository name
echo "=== Git Remotes ==="
git remote -v 2>/dev/null || echo "Not a git repository"

echo -e "\n=== Repository Name (from .git/config) ==="
git config --get remote.origin.url 2>/dev/null || echo "No origin URL configured"

echo -e "\n=== Check for any config files referencing upstream ==="
find . -maxdepth 2 -type f \( -name "*.cfg" -o -name "*.conf" -o -name "*.toml" -o -name "*.yaml" -o -name "*.yml" -o -name ".git" \) 2>/dev/null | head -20

echo -e "\n=== Directory listing to confirm repository name ==="
basename "$(pwd)"

echo -e "\n=== Check for environment variables or documentation ==="
find . -maxdepth 2 -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name ".env*" \) -exec grep -l "upstream\|tilelang\|diffulex" {} \; 2>/dev/null

Repository: zhijie-group/Diffulex

Length of output: 186


Fix incorrect upstream repository URL in get_merge_base() function.

The hardcoded URL references https://github.com/tile-ai/tilelang, but this repository is zhijie-group/Diffulex. This will cause the merge base detection to fail or use an unintended repository. Update the URL to point to the correct upstream repository.

🤖 Prompt for AI Agents
In @format.sh around lines 57 - 72, The get_merge_base() function uses a
hardcoded UPSTREAM_REPO URL that is incorrect; update the UPSTREAM_REPO variable
to the correct upstream repository (e.g.,
"https://github.com/zhijie-group/Diffulex") so merge-base detection queries the
intended repo, then ensure the rest of the logic still fetches main and computes
MERGE_BASE as before.

Comment on lines +117 to +119
if [[ ! -x "$(command -v clang-tidy)" ]]; then
python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Invalid pip flag --requirements.

The correct flag is -r or --requirement (singular). This will cause the script to fail when clang-tidy is not installed.

Fix the pip flag
     if [[ ! -x "$(command -v clang-tidy)" ]]; then
-        python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user
+        python3 -m pip install --upgrade -r "${ROOT}/requirements-lint.txt" --user
     fi
🤖 Prompt for AI Agents
In @format.sh around lines 117 - 119, The pip install call using the invalid
flag `--requirements` must be changed to the correct flag `-r` (or
`--requirement`) so the conditional that installs lint deps when `clang-tidy` is
missing succeeds; update the command in the block that checks `clang-tidy` (the
`python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt"
--user` line) to use `-r "${ROOT}/requirements-lint.txt"` (or `--requirement`)
instead, keeping the rest of the options (`--upgrade` and `--user`) unchanged.

Comment on lines +35 to +39
"pandas>=2.3.3",
"tilelang>=0.1.7.post1",
"rich>=13.0.0",
"colorama>=0.4.6",
"lm-eval"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read the pyproject.toml file
find . -name "pyproject.toml" -type f

Repository: zhijie-group/Diffulex

Length of output: 186


🌐 Web query:

lm-eval package PyPI versions history

💡 Result:

lm-eval (PyPI) release history (versions listed on PyPI, earliest → latest):

  • 0.0.1 — Sep 2, 2021. [1]
  • 0.2.0 — Mar 7, 2022. [1][2]
  • 0.3.0 — Dec 8, 2022. [1][0]
  • 0.4.0 — Dec 22, 2023. [1]
  • 0.4.1 — Jan 31, 2024. [1]
  • 0.4.2 — Mar 18, 2024. [1]
  • 0.4.3 — Jul 1, 2024. [1]
  • 0.4.4 — Sep 5, 2024. [1]
  • 0.4.5 — Oct 8, 2024. [1]
  • 0.4.7 — Dec 17, 2024. [1]
  • 0.4.8 — Mar 5, 2025. [1]
  • 0.4.9 — Jun 19, 2025. [1]
  • 0.4.9.1 — Aug 4, 2025. [1]
  • 0.4.9.2 — Nov 26, 2025. [1]

Sources: PyPI project and release pages for lm-eval. [1] https://pypi.org/project/lm-eval/ (release list) · [2] https://pypi.org/project/lm-eval/0.2.0/ (example version page).


🌐 Web query:

lm-eval 0.4.0 PyPI

💡 Result:

lm-eval 0.4.0 on PyPI was released on December 22, 2023. Key notes on that release (from the PyPI project page) include new Open LLM Leaderboard tasks, internal refactoring, config-based task creation, Jinja2 prompt support, vLLM/MPS/HF speedups, and new tasks (e.g., CoT BIG-Bench-Hard). [1]

Install with:
pip install lm-eval==0.4.0

Sources:
[1] PyPI project page for lm-eval (release history and v0.4.0 notes).


Pin the lm-eval dependency to a specific version.

The lm-eval dependency lacks a version constraint, which can cause reproducibility issues and unpredictable behavior across different environments. All other dependencies in this dependency group specify minimum versions; lm-eval should do the same.

Recommended fix
-    "lm-eval"
+    "lm-eval>=0.4.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"pandas>=2.3.3",
"tilelang>=0.1.7.post1",
"rich>=13.0.0",
"colorama>=0.4.6",
"lm-eval"
"pandas>=2.3.3",
"tilelang>=0.1.7.post1",
"rich>=13.0.0",
"colorama>=0.4.6",
"lm-eval>=0.4.0"
🤖 Prompt for AI Agents
In @pyproject.toml around lines 35 - 39, Update the pyproject.toml dependency
entry for "lm-eval" to include a version constraint so builds are reproducible;
edit the dependency list where "lm-eval" appears and change it to a pinned or
minimum version (for example use a minimum like lm-eval>=X.Y.Z or an exact pin
lm-eval==X.Y.Z) consistent with how other packages are specified.

@drewjin drewjin merged commit 5b8352f into SJTU-DENG-Lab:v0.0.1 Jan 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant