-
Notifications
You must be signed in to change notification settings - Fork 10
[Feat] Enhance Decoding Strategies for Easier Development and More Efficient Inference #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…and pull requests
…ctory; update GitHub workflows to grant write permissions for issues and pull requests
[Fix]: Fix PR Bot Workflow Bug
…ribution guidelines
… modify uvicorn index URL, and improve error handling in attention module; remove unused profiling function from example scripts
📝 WalkthroughWalkthroughThis pull request introduces comprehensive infrastructure improvements including repository configuration, logging framework, developer documentation, and two new subsystems: a benchmarking framework (diffulex_bench) for LLM evaluation integration and a profiling framework (diffulex_profiler) with multiple backends for performance instrumentation. Logging is refactored across the codebase from print statements and transformers logging to a unified custom logger. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BenchmarkRunner
participant DiffulexEngine
participant SamplingParams
participant Metrics
participant Exporter
Client->>BenchmarkRunner: initialize(model_path, tokenizer_path)
BenchmarkRunner->>DiffulexEngine: setup engine with kwargs
Client->>BenchmarkRunner: evaluate_batch(prompts, sampling_params)
BenchmarkRunner->>SamplingParams: configure generation parameters
BenchmarkRunner->>DiffulexEngine: generate(prompts, sampling_params)
DiffulexEngine-->>BenchmarkRunner: outputs with timing/tokens
BenchmarkRunner->>Metrics: collect stats (tokens, time, throughput)
Metrics-->>BenchmarkRunner: aggregated results
Client->>BenchmarkRunner: export results
BenchmarkRunner->>Exporter: export(metrics, output_path)
Exporter-->>BenchmarkRunner: JSON/CSV/text file written
rect rgb(200, 240, 255)
Note over Metrics,Exporter: Metrics collected per-sample<br/>Aggregation & export optional
end
sequenceDiagram
participant User
participant DiffulexProfiler
participant Backend
participant Metrics
participant Exporter
User->>DiffulexProfiler: initialize(backend_type, config)
DiffulexProfiler->>Backend: _init_backend()
Backend-->>DiffulexProfiler: ready
User->>DiffulexProfiler: profile(section_name, metadata)
DiffulexProfiler->>DiffulexProfiler: start(name)
DiffulexProfiler->>Metrics: capture GPU/memory baseline
DiffulexProfiler->>Backend: start profiling
rect rgb(240, 255, 240)
Note over DiffulexProfiler,Backend: User code executes
end
User->>DiffulexProfiler: stop()
DiffulexProfiler->>Backend: stop profiling
Backend-->>DiffulexProfiler: timing/trace data
DiffulexProfiler->>Metrics: capture GPU/memory end state
DiffulexProfiler->>DiffulexProfiler: compute duration & deltas
User->>DiffulexProfiler: export(output_path)
DiffulexProfiler->>Exporter: export(metrics, path)
Exporter-->>DiffulexProfiler: JSON/CSV/text written
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
…e models, datasets, and logging; add configuration management and command-line interface
…ex; introduce logger module and refactor existing code to utilize logging instead of print statements
… DP worker and evaluation collapse when DP enabled
…ar backends and comprehensive metrics collection
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/test_sdar_diffulex_gsm8k.py (1)
46-54: Fix model name mismatch and ensure output directory exists.Two issues here:
Model name mismatch: The output filename references "dvllm_dream_7B" but the actual model is "SDAR-1.7B-Chat-b32". This inconsistency will make profiling results confusing to identify.
Missing directory check: The script assumes
log/profiles/exists. If the directory doesn't exist, bothos.path.exists()andVizTracerwill fail with aFileNotFoundError.🔎 Proposed fix
if PROFILE: - output_file = "log/profiles/perf_dvllm_dream_7B.json" + output_file = "log/profiles/perf_sdar_1.7B.json" + os.makedirs(os.path.dirname(output_file), exist_ok=True) if os.path.exists(output_file): os.remove(output_file)
🤖 Fix all issues with AI Agents
In @CODE_OF_CONDUCT.md:
- Around line 59-67: The Enforcement section currently lists a personal email
(leiwang1999@outlook.com); replace that personal address with a non-personal
project/organizational contact such as a dedicated project email (e.g.,
conduct@diffulex.dev), an alias/distribution list, or a link to a
moderation/ticketing form, and update the line in the "## Enforcement" block to
use that shared address or form so incident reports are routed to a shared
mailbox or team rather than an individual's personal account.
In @diffulex_bench/datasets.py:
- Around line 26-47: The slicing bug arises because dataset[:limit] returns a
batched dict instead of a Dataset; replace that line with a proper Dataset slice
using the huggingface Dataset.select API (e.g., call
dataset.select(range(limit))) so that the subsequent for item in dataset
iterates over dataset examples; update the code where the variable dataset is
reassigned (the block using dataset = dataset[:limit]) to use dataset =
dataset.select(range(limit)) and keep the existing logic for prompt_template and
results unchanged.
- Around line 65-88: The current use of Python slice syntax dataset[:limit] in
load_humaneval does not properly slice a HuggingFace Dataset; replace that with
the Dataset.select-based approach (e.g., use dataset.select(range(limit)) or
dataset.select(list(range(limit)))) so the dataset is correctly truncated when
limit is provided; update the branch that checks if limit and references
dataset[:limit] to call select instead.
In @diffulex_bench/lm_eval_model.py:
- Around line 299-310: The current loglikelihood method (loglikelihood) returns
placeholder (0.0, False) values which can silently corrupt metrics; change it to
fail fast by raising NotImplementedError (consistent with loglikelihood_rolling)
instead of returning defaults, or at minimum replace the warning with a
process-level error log before raising; update the method to raise
NotImplementedError("loglikelihood not implemented for diffusion models") so
callers cannot silently use incorrect scores.
In @diffulex_bench/main.py:
- Around line 61-63: The args serialization builds args_list from args_dict
without escaping, so values containing ',', '=', or ':' break the
lm-evaluation-harness parser; update the logic that constructs args_list (the
list comprehension creating f"{k}={v}" using args_dict) to detect these
characters, escape any backslashes and quotes within the value, and wrap the
value in double quotes when needed (e.g., transform value -> value.replace("\\",
"\\\\").replace('"', '\\"') and then f'{k}="{escaped}"' if special chars
present), ensuring the return ','.join(args_list) yields a safe key=value
string.
In @diffulex_bench/metrics.py:
- Around line 51-52: Add a validation before computing metrics that checks for
empty input lists: after verifying len(predictions) == len(ground_truths) also
check if len(predictions) == 0 (or simply if not predictions) and raise a
ValueError like "No data to evaluate: predictions and ground_truths are empty"
so you never reach the subsequent division (which currently can produce an
invalid/ambiguous result when both lists are empty).
In @diffulex_bench/report.py:
- Around line 22-23: Wrap the open/results json.load block that reads
results_file in a try/except to handle FileNotFoundError, PermissionError and
json.JSONDecodeError: catch these exceptions around the with open(...) and
json.load(...) lines, log a clear error via the module logger (or import logging
if none exists), and return a sensible fallback (e.g., an empty dict or None) or
re-raise a wrapped exception with context so the function fails gracefully; keep
references to the existing variable results_file and the json.load call so the
change is localized.
In @diffulex_bench/runner.py:
- Around line 149-157: The loop is assigning a misleading per-output
"generation_time" computed as total batch time / len(outputs); change this to a
clear average metric by renaming the field and computing it once: after calling
self.llm.generate(prompts, sampling_params, use_tqdm=use_tqdm) compute
total_time = end_time - start_time and avg = total_time / len(outputs) if
outputs else 0, then set output['avg_generation_time'] (or
'batch_time_per_sample') for each output, or alternatively remove per-output
timing and add a single batch-level field (e.g., batch_generation_time or
avg_generation_time variable returned/logged) if per-sample timings are not
available; update any consumers/readers of output['generation_time'] to the new
name.
- Around line 49-52: The code currently calls
AutoTokenizer.from_pretrained(self.tokenizer_path, trust_remote_code=True) which
enables arbitrary code execution; change this by adding a configurable flag
(e.g., trust_remote_code) to the Runner/class initializer or config with a safe
default of False, use that flag when calling
AutoTokenizer.from_pretrained(self.tokenizer_path,
trust_remote_code=trust_remote_code), and update any relevant docs/CLI defaults
to explain the security implications so consumers can opt-in when they trust the
model source.
In @diffulex_profiler/backends/pytorch.py:
- Around line 59-92: In stop(), defend against self.current_name being None
before building the trace filename: check if self.current_name is None and
either return early (e.g., None or a minimal result) or set a safe fallback name
(e.g., "unknown" or a timestamp) and use that when constructing trace_file;
update any places that add "name": self.current_name in the result to use the
fallback/checked value so you never create a file named pytorch_trace_None.json
and avoid downstream errors.
In @diffulex_profiler/example.py:
- Around line 68-71: The Diffulex instantiation inside the profiler context uses
an illegal ellipsis token as an argument (llm = Diffulex(model_path,
model_name="dream", ...)); replace the trailing "..." with valid placeholder
arguments or remove it and add a comment. Update the call to use explicit
defaults or named placeholders (e.g., provide required kwargs or use
model_config=None or # TODO: add args) so the Diffulex(...) call is
syntactically valid inside the with profiler.profile("model_loading") block.
In @diffulex_profiler/exporters/csv.py:
- Around line 42-45: The loop assumes m.metadata is not None and will raise
AttributeError if it is; change the code around writer.writerow(row) to guard or
default metadata (e.g., use if m.metadata: for k, v in m.metadata.items():
row[f"metadata_{k}"] = v or iterate over (m.metadata or {}).items()) so that
m.metadata being None is handled before accessing .items(), ensuring
row.update(m.custom_metrics) and writer.writerow(row) still run safely.
In @diffulex_profiler/exporters/summary.py:
- Around line 57-59: The code shadows the module-level or earlier variable
output_file by assigning a new output_file inside the VizTracer branch, causing
the final summary to be written to the VizTracer path; rename the inner variable
(e.g., viztracer_output or viz_output_file) in the block that reads
m.backend_data.get("output_file", ...) and update the summary_lines append to
use that new name so the outer output_file (the intended .txt summary path)
remains unchanged when writing the summary.
In @diffulex_profiler/metrics.py:
- Around line 69-77: The NVML initialization in the try block
(pynvml.nvmlInit()) is never paired with pynvml.nvmlShutdown(), risking resource
leaks; update the collect_gpu_metrics (or the surrounding function using
metrics, pynvml.nvmlInit(), nvmlDeviceGetHandleByIndex,
nvmlDeviceGetUtilizationRates) to ensure pynvml.nvmlShutdown() is always
called—either put shutdown in a finally block after the try/except that sets
metrics, or refactor to initialize NVML once at module level and only shutdown
on process exit; ensure the exception handling still suppresses ImportError
gracefully but does not skip the shutdown call when pynvml was successfully
initialized.
In @diffulex_profiler/profiler.py:
- Around line 165-169: The code is incorrectly computing a delta for GPU
utilization by subtracting self.current_metrics.gpu_metrics_start from
gpu_metrics_end; change this to store a representative snapshot instead: set
self.current_metrics.gpu_utilization to the end snapshot value (e.g.,
self.current_metrics.gpu_metrics_end.get("utilization", 0)) or alternatively
preserve both self.current_metrics.gpu_metrics_start and gpu_metrics_end on the
metrics object for later comparison; update any consumers of gpu_utilization
accordingly to expect a snapshot rather than a delta.
In @diffulex/logger.py:
- Around line 44-47: The format() method currently mutates record.levelname
which pollutes other handlers; instead do not modify record.levelname in place —
create a colored string locally (e.g., using self.COLORS and self.RESET) and
either assign it to a new attribute on the record such as levelname_colored or
operate on a shallow copy of the LogRecord, and update the formatter pattern to
use that new attribute (or use the copy) so other handlers receive the original
record.levelname unchanged; make this change in the format() method (referencing
format, self.COLORS, self.RESET, and record).
- Around line 155-171: The success() implementation must not unconditionally
insert Rich markup because that literal markup can end up in other handlers
(e.g., FileHandler); update _add_success_method so success() emits plain text (a
simple "✓ {message}") instead of "[green]✓[/green] {message}" when
RICH_AVAILABLE is true, and keep the existing COLORAMA logic for terminals;
modify the success function defined inside _add_success_method (and the
assignment logging.Logger.success) to always call self.info with a plain-text
checkmark and message so RichHandler can apply its own formatting while file
handlers receive clean text.
In @format.sh:
- Around line 117-119: The pip install call using the invalid flag
`--requirements` must be changed to the correct flag `-r` (or `--requirement`)
so the conditional that installs lint deps when `clang-tidy` is missing
succeeds; update the command in the block that checks `clang-tidy` (the `python3
-m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user`
line) to use `-r "${ROOT}/requirements-lint.txt"` (or `--requirement`) instead,
keeping the rest of the options (`--upgrade` and `--user`) unchanged.
- Around line 57-72: The get_merge_base() function uses a hardcoded
UPSTREAM_REPO URL that is incorrect; update the UPSTREAM_REPO variable to the
correct upstream repository (e.g., "https://github.com/zhijie-group/Diffulex")
so merge-base detection queries the intended repo, then ensure the rest of the
logic still fetches main and computes MERGE_BASE as before.
In @pyproject.toml:
- Around line 35-39: Update the pyproject.toml dependency entry for "lm-eval" to
include a version constraint so builds are reproducible; edit the dependency
list where "lm-eval" appears and change it to a pinned or minimum version (for
example use a minimum like lm-eval>=X.Y.Z or an exact pin lm-eval==X.Y.Z)
consistent with how other packages are specified.
🟡 Minor comments (6)
diffulex/logger.py-155-171 (1)
155-171:success()method emits Rich markup unconditionally when Rich is available.If a logger has both a RichHandler and a FileHandler, the
[green]✓[/green]markup will be written literally to the log file, creating malformed entries.Consider using plain text for the success method
def _add_success_method(): """Add success method to logging.Logger class""" - if RICH_AVAILABLE: - def success(self, message: str, *args, **kwargs): - """Log success message with rich formatting""" - self.info(f"[green]✓[/green] {message}", *args, **kwargs) - else: - def success(self, message: str, *args, **kwargs): - """Log success message""" - if COLORAMA_AVAILABLE: - self.info(f"{Fore.GREEN}✓{Style.RESET_ALL} {message}", *args, **kwargs) - else: - self.info(f"✓ {message}", *args, **kwargs) + def success(self, message: str, *args, **kwargs): + """Log success message""" + # Use plain checkmark to avoid markup in file handlers + self.info(f"✓ {message}", *args, **kwargs)diffulex_profiler/exporters/csv.py-42-45 (1)
42-45: PotentialAttributeErrorifmetadataisNone.Line 26 guards with
if m.metadata:before accessing.keys(), but line 43 iteratesm.metadata.items()without a similar check. IfmetadataisNone, this will raise anAttributeError.🔎 Proposed fix
row.update(m.custom_metrics) - for k, v in m.metadata.items(): - row[f"metadata_{k}"] = v + if m.metadata: + for k, v in m.metadata.items(): + row[f"metadata_{k}"] = v writer.writerow(row)diffulex_bench/metrics.py-51-52 (1)
51-52: Add validation for empty input lists.The length check at line 51-52 doesn't prevent both lists from being empty. If both
predictionsandground_truthsare empty lists, the length check passes but line 63 would perform0 / 0which returns0.0(due to the conditional), which may not accurately represent "no data to evaluate."🔎 Proposed fix
if len(predictions) != len(ground_truths): raise ValueError("Predictions and ground_truths must have the same length") + + if len(predictions) == 0: + raise ValueError("Cannot compute accuracy for empty lists")diffulex_profiler/backends/pytorch.py-59-92 (1)
59-92: Add defensive check for current_name being None.Line 66 uses
self.current_namein the filename without verifying it's not None. Ifstop()is called without callingstart()first (or if called multiple times),current_namecould be None, resulting in a filename likepytorch_trace_None.json.🔎 Proposed fix
self.profiler.__exit__(None, None, None) - trace_file = self.output_dir / f"pytorch_trace_{self.current_name}.json" + name = self.current_name if self.current_name else "unknown" + trace_file = self.output_dir / f"pytorch_trace_{name}.json"diffulex_bench/main.py-61-63 (1)
61-63: Quote or escape values containing commas, equals signs, or colons in model_args string.Line 62 formats values directly into a comma-separated string without quoting or escaping. The lm-evaluation-harness parser expects
key1=val1,key2=val2format, but values containing commas, equals signs, or colons must be quoted (e.g.,key="value,with,comma") or escaped. Without this, malformed values will break argument parsing or cause filename errors during evaluation.Add quoting for values containing special characters:
args_list = [f'{k}={v if "," not in str(v) and "=" not in str(v) else f\'"{v}\"\'}' for k, v in args_dict.items()]Or validate/escape at the point where config values are generated.
diffulex_bench/lm_eval_model.py-299-310 (1)
299-310: Placeholderloglikelihoodreturns may corrupt evaluation metrics.Returning
(0.0, False)for all requests will produce incorrect scores for any task that uses log-likelihood evaluation. The warning is good, but consider:
- Raising
NotImplementedErrorto fail fast (likeloglikelihood_rolling), or- Logging at ERROR level to ensure visibility.
If certain tasks explicitly require this method, the current behavior will silently produce wrong results.
🧹 Nitpick comments (51)
examples/test_sdar_diffulex_gsm8k.py (1)
19-19: Consider making the profiling flag configurable.Enabling profiling by default in an example script may surprise users with unexpected log files and performance overhead. Consider using an environment variable or command-line argument to control this flag.
🔎 Example using environment variable
- PROFILE = True + PROFILE = os.getenv("PROFILE", "false").lower() == "true"diffulex/attention/__init__.py (2)
20-24: Preserve the exception chain usingfrom e.The exception handling fix is correct and allows the error message to include exception details. However, consider preserving the exception chain by using
raise ... from e, which makes debugging easier by showing the full traceback.🔎 Proposed improvement to preserve exception chain
try: from .attn_impl import Attention return Attention except Exception as e: - raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}") + raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}") from e
25-26: Consider removing unreachable code.Since
fetch_attn_metadatais defined at module level (line 14), normal attribute lookup will always find it before__getattr__is called. These lines are unreachable and can be safely removed for clarity.🔎 Proposed cleanup
try: from .attn_impl import Attention return Attention except Exception as e: raise ImportError(f"Failed to import diffulex.attention.attn_impl.Attention: {e}") - if name == "fetch_attn_metadata": - return metadata.fetch_attn_metadata raise AttributeError(f"module {__name__} has no attribute {name}")script/d2f_dream_eval_gsm8k.sh (4)
1-1: Consider using bash for better portability.The script uses
zshbut doesn't appear to require zsh-specific features. Usingbashwould improve portability across systems where zsh may not be installed.🔎 Recommended portability improvement
-#!/usr/bin/zsh +#!/usr/bin/env bash
9-12: Improve path handling and document GPU requirements.The cache path uses
$(pwd), which assumes the script is run from the repository root. The hardcoded 8 GPU configuration may not be available on all systems.🔎 Recommended improvements
+# Ensure script runs from repository root +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +cd "$REPO_ROOT" || exit 1 + -export HF_HOME="$(pwd)/cache" +export HF_HOME="$REPO_ROOT/cache" export HF_DATASETS_CACHE="$HF_HOME/datasets" export HF_METRICS_CACHE="$HF_HOME/metrics" + +# Configure CUDA devices (adjust based on available GPUs) export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7Additionally, consider documenting the 8-GPU requirement in a comment or checking GPU availability.
1-16: Add error handling for more robust script execution.The script lacks error-handling directives, which means failures may go unnoticed. Adding
setdirectives and prerequisite checks would improve reliability.🔎 Recommended error handling additions
#!/usr/bin/env bash +set -euo pipefail + +# Validate prerequisites +if ! command -v python &> /dev/null; then + echo "Error: python not found in PATH" >&2 + exit 1 +fi + +CONFIG_FILE="custom_configs/d2f_dream_eval_gsm8k.yml" +if [[ ! -f "$CONFIG_FILE" ]]; then + echo "Error: Config file not found: $CONFIG_FILE" >&2 + exit 1 +fi + +# Create log directory if it doesn't exist +mkdir -p log export HF_HUB_OFFLINE=1
14-16: Consider using explicit python3 or version validation.The script uses
pythonwhich may resolve to Python 2 on some systems, whilepyproject.tomlrequires Python >=3.12. Consider usingpython3explicitly or validating the Python version.🔎 Optional improvements
+# Ensure correct Python version +PYTHON_CMD="python3" +PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(".".join(map(str, sys.version_info[:2])))') +REQUIRED_VERSION="3.12" +if ! python3 -c "import sys; sys.exit(0 if sys.version_info >= (3, 12) else 1)" 2>/dev/null; then + echo "Error: Python >= $REQUIRED_VERSION required, found $PYTHON_VERSION" >&2 + exit 1 +fi + -python -m diffulex_bench.main \ +$PYTHON_CMD -m diffulex_bench.main \ --config custom_configs/d2f_dream_eval_gsm8k.yml \ 2>&1 | tee log/d2f_dream_eval_gsm8k.logdiffulex_bench/README.md (1)
202-209: Output format section could be more detailed.The output section is brief. Consider expanding with actual JSON/file examples to clarify the exact structure of evaluation results and metrics that users should expect to find in the output directory.
CONTRIBUTING.md (3)
26-28: Expand the "Ask Questions" section with guidance on format and resources.This section is minimal. Consider adding guidance on:
- Preferred format for questions (discussion vs. issues)
- Where to ask questions (GitHub Discussions, Discord, etc. if applicable)
- Expected response time
- Links to documentation or FAQ
This will set clearer expectations for new contributors.
104-110: Add target dates or checkpoints for completing TBA sections.The "Build Wheels" and "Documentation" sections are marked as TBA. Consider adding:
- A target date or milestone for completion
- A note on whether these are blocking vs. non-blocking for the v0.0.1 release
- Links to related issues or PRs tracking these sections
This helps contributors understand project status and whether these gaps affect their workflow.
36-37: Add or reference a pull request template to reinforce contribution expectations.The note about including tests and docs is good, but consider creating a
.github/pull_request_template.mdfile to automatically remind contributors of these requirements when they open a PR.profile/d2f_dream_profile.py (4)
15-16: Hardcoded absolute paths reduce portability.These paths won't work outside of the specific development environment. Consider using environment variables, command-line arguments, or configuration files.
Suggested approach using environment variables
+import os + def main(): - model_path = "/data1/ckpts/Dream-org/Dream-v0-Base-7B" - lora_path = "/data1/ckpts/SJTU-Deng-Lab/D2F_Dream_Base_7B_Lora" + model_path = os.environ.get("DREAM_MODEL_PATH", "/data1/ckpts/Dream-org/Dream-v0-Base-7B") + lora_path = os.environ.get("D2F_LORA_PATH", "/data1/ckpts/SJTU-Deng-Lab/D2F_Dream_Base_7B_Lora")
18-19: Unused variable:output_diris created but never used.The directory is created at Line 19 but no output is written to it. Either remove this dead code or add functionality to write profiling results to files.
21-22: Use the new logger infrastructure instead ofprint().This PR introduces a unified logging system (
diffulex.logger), but this script usesprint()statements throughout. For consistency and better observability, migrate to the logger.Suggested migration to logger
from diffulex import Diffulex, SamplingParams +from diffulex.logger import get_logger from transformers import AutoTokenizer +logger = get_logger(__name__) + def main(): # ... - print("Loading model...") + logger.info("Loading model...") # ... - print(f"Model loaded in {model_load_time:.2f} seconds") + logger.info(f"Model loaded in {model_load_time:.2f} seconds")Also applies to: 41-41, 52-52, 63-74, 76-83
58-61: Guard against empty outputs more robustly.Line 60's check
if outputs else 0prevents division by zero, butnum_outputson Line 59 could still be 0 ifoutputsis an empty list. The subsequent loop (Lines 77-83) would simply not execute, but the printed metrics would be misleading (0 prompts, 0 tokens, 0.00 throughput).Add early return for empty outputs
outputs = llm.generate(prompts, sampling_params) inference_time = time.time() - inference_start + if not outputs: + logger.warning("No outputs generated") + return + total_tokens = sum(len(o.get('token_ids', [])) for o in outputs) num_outputs = len(outputs) - avg_diff_steps = sum(o.get('n_diff_steps', 0) for o in outputs) / num_outputs if outputs else 0 + avg_diff_steps = sum(o.get('n_diff_steps', 0) for o in outputs) / num_outputsformat.sh (2)
101-103: Unquoted variable expansion may cause issues with special filenames.While intentional for passing multiple files, filenames with spaces or special characters could break. Consider using arrays or null-delimited processing for robustness.
Alternative using mapfile for safer handling
if [[ -n "${CHANGED_FILES}" ]]; then echo "Running pre-commit on changed files:" echo "${CHANGED_FILES}" - # Convert newline-separated files to space-separated and run pre-commit once - CHANGED_FILES_SPACE="$(echo "${CHANGED_FILES}" | tr '\n' ' ')" - python3 -m pre_commit run --files ${CHANGED_FILES_SPACE} + # Use mapfile to safely handle filenames + mapfile -t FILES_ARRAY <<< "${CHANGED_FILES}" + python3 -m pre_commit run --files "${FILES_ARRAY[@]}" else
133-136:clang_tidy_allonly checkssrc/*.ccfiles.This misses other C/C++ extensions (
.c,.cpp,.h,.hpp) and subdirectories. The changed files function on Lines 141 handles more extensions.Include all C/C++ source extensions and subdirectories
clang_tidy_all() { - run-clang-tidy -j 64 src/*.cc -p build + fd -e c -e cc -e cpp -e h -e hpp . src --exec-batch run-clang-tidy -j 64 -p build }diffulex/logger.py (1)
14-14: Unused imports:Progress,SpinnerColumn,TextColumn,BarColumn,TimeElapsedColumn.These Rich progress-related imports are not used in the module.
Remove unused imports
try: from rich.console import Console from rich.logging import RichHandler from rich.traceback import install as install_rich_traceback - from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TimeElapsedColumn RICH_AVAILABLE = True except ImportError: RICH_AVAILABLE = Falsediffulex_bench/runner.py (1)
100-106: Consider making the initialization wait time configurable.The hardcoded 2-second sleep may not be optimal for all deployment scenarios—faster systems may not need this delay, while slower systems or larger models may require more time.
🔎 Proposed refactor to make wait time configurable
- # Give subprocesses a bit more time to complete initialization - # The main process initialization is synchronous, but subprocesses - # may still be initializing (model loading, warmup, etc.) - # Subprocesses will synchronize via barrier in ModelRunnerBase.__init__ - # So we just need to wait a bit for them to complete initialization - time.sleep(2.0) # Wait a bit for subprocess initialization + # Give subprocesses time to complete initialization + # Subprocesses synchronize via barrier in ModelRunnerBase.__init__ + initialization_wait = getattr(self.llm, 'initialization_wait', 2.0) + time.sleep(initialization_wait)Alternatively, you could add an
initialization_waitparameter to_wait_for_ready:- def _wait_for_ready(self, timeout: float = 300.0, check_interval: float = 0.5): + def _wait_for_ready(self, timeout: float = 300.0, check_interval: float = 0.5, initialization_wait: float = 2.0): """ Wait for the Diffulex engine to be fully initialized and ready Args: timeout: Maximum time to wait in seconds check_interval: Interval between readiness checks in seconds + initialization_wait: Time to wait for subprocess initialization in seconds """diffulex/sampler/base.py (1)
98-100: Optional: Remove redundant warning before exception.The warning at line 99 is immediately followed by raising an exception at line 100, making the warning somewhat redundant. Consider either removing the warning or restructuring the logic if the warning serves a specific observability purpose before the exception propagates.
🔎 Proposed refactor
def _shift_logits(self, logits, last_logit=None): if logits.shape[1] == 0: - logger.warning("Logits sequence length is 0, returning empty logits") raise Exception("logits sequence length is 0")diffulex_bench/configs/dream_d2f_gsm8k.yml (1)
5-5: Document or validate the placeholder model path.The
model_pathcontains a placeholder value"/path/to/dream/model"that users must replace. Ensure this is clearly documented in README or example documentation, or consider adding validation that produces a helpful error message if the placeholder path is used.diffulex/strategy/fast_dllm_v2/engine/sequence.py (1)
24-25: Consider adding status management methods to FDV2SubBlock.
FDV2SubBlocknow has astatusfield but lacks the status transition methods and property accessors thatFDV2Blockprovides (e.g.,to_cache(),in_cache(),is_active,is_to_cache). This asymmetry could lead to inconsistent usage patterns and direct field manipulation instead of controlled state transitions.🔎 Proposed enhancement with status management methods
@dataclass class FDV2SubBlock: sub_block_id: int = 0 status: FDV2SubBlockStatus = FDV2SubBlockStatus.ACTIVE + + def to_dual_cache(self) -> None: + """Transition to TO_DUAL_CACHE status.""" + if self.status == FDV2SubBlockStatus.ACTIVE: + self.status = FDV2SubBlockStatus.TO_DUAL_CACHE + + def in_dual_cache(self) -> None: + """Transition to IN_DUAL_CACHE status.""" + if self.status == FDV2SubBlockStatus.TO_DUAL_CACHE: + self.status = FDV2SubBlockStatus.IN_DUAL_CACHE + + @property + def is_active(self) -> bool: + return self.status == FDV2SubBlockStatus.ACTIVE + + @property + def is_to_dual_cache(self) -> bool: + return self.status == FDV2SubBlockStatus.TO_DUAL_CACHE + + @property + def is_in_dual_cache(self) -> bool: + return self.status == FDV2SubBlockStatus.IN_DUAL_CACHEdiffulex_bench/configs/example.yml (1)
32-35: Enhance documentation for D2F threshold parameters.The threshold parameters (
accept_threshold,complete_threshold,add_new_block_threshold) would benefit from more detailed comments explaining their purpose and impact on D2F decoding behavior.🔎 Proposed enhancement with detailed comments
# D2F-specific configuration - accept_threshold: 0.9 - complete_threshold: 0.95 - add_new_block_threshold: 0.1 + accept_threshold: 0.9 # Confidence threshold for accepting generated tokens (0.0-1.0) + complete_threshold: 0.95 # Threshold for considering a block complete (0.0-1.0) + add_new_block_threshold: 0.1 # Threshold for adding new diffusion blocks (0.0-1.0) diffusion_block_size: 32diffulex_profiler/backends/simple.py (3)
17-20: Consider handling double-start scenario.Calling
start()twice without an interveningstop()will silently overwrite the previous timing session. Consider either raising an exception or logging a warning to help catch potential misuse.🔎 Proposed enhancement with double-start detection
def start(self, name: str) -> None: """Start timing.""" + if self.start_time is not None: + # Either raise an exception or log a warning + import warnings + warnings.warn(f"start() called while timing '{self.current_name}' is still active. Previous timing will be lost.") self.current_name = name self.start_time = time.perf_counter()
38-43: Context manager usage lacks name parameter.The context manager implementation doesn't call
start()in__enter__, socurrent_namewill beNonewhen used with awithstatement. This limits the usefulness of the context manager pattern.Consider either:
- Requiring the name to be passed to
__init__when using as a context manager, or- Accepting an optional name parameter in
__enter__, or- Documenting that users should call
start()explicitly even when using the context manager🔎 Proposed fix: Accept name in __init__ for context manager usage
- def __init__(self): + def __init__(self, name: Optional[str] = None): self.start_time: Optional[float] = None - self.current_name: Optional[str] = None + self.current_name: Optional[str] = name def start(self, name: str) -> None: """Start timing.""" self.current_name = name self.start_time = time.perf_counter() def stop(self) -> Optional[Dict[str, Any]]: """Stop timing and return duration.""" if self.start_time is None: return None duration = time.perf_counter() - self.start_time result = { "duration_sec": duration, "name": self.current_name, } self.start_time = None self.current_name = None return result def __enter__(self): + if self.current_name is not None: + self.start_time = time.perf_counter() return self def __exit__(self, exc_type, exc_val, exc_tb): if self.start_time is not None: self.stop()
39-39: Add return type hint to__enter__.🔎 Proposed fix
- def __enter__(self): + def __enter__(self) -> "SimpleTimerBackend": return selfdiffulex/diffulex.py (1)
1-1: Unused import: Config is no longer referenced.The
Configimport on line 1 is no longer used after the refactoring.🔎 Proposed cleanup
-from diffulex.config import Config from diffulex.engine.dp_worker import DiffulexDPWorker from diffulex.engine.tp_worker import DiffulexTPWorkerdiffulex_bench/arg_parser.py (1)
182-192: Simplify the save-results argument pattern.The combination of
action="store_true"withdefault=Trueon line 184 is redundant. Whendefault=True, the--save-resultsflag becomes unnecessary since the behavior is already enabled by default.🔎 Suggested refactor to remove redundancy
- parser.add_argument( - "--save-results", - action="store_true", - default=True, - help="Save results to file", - ) parser.add_argument( "--no-save-results", dest="save_results", action="store_false", + default=True, help="Do not save results to file", )This pattern is clearer: results are saved by default, and users explicitly opt out with
--no-save-results.diffulex_profiler/exporters/__init__.py (1)
14-18: Consider adding debug logging for the optional import.The silent failure when CSVExporter is unavailable is a standard pattern for optional dependencies. However, given that this PR introduces comprehensive logging infrastructure, consider adding a debug-level log message when the import fails to aid troubleshooting.
🔎 Optional enhancement for observability
+from diffulex.logger import get_logger + +logger = get_logger(__name__) + try: from diffulex_profiler.exporters.csv import CSVExporter __all__.append("CSVExporter") except ImportError: + logger.debug("CSVExporter not available (optional dependency missing)") passdiffulex_profiler/exporters/json.py (1)
27-42: Consider explicit None checks for clarity.Lines 32-33 use truthy checks (
if m.duration,if m.total_tokens) which filter out bothNoneand0values. If0is a valid value that should be included in the sum (e.g., a section with zero duration or zero tokens), this could produce incorrect totals.🔎 Suggested refactor for explicit None handling
- total_duration = sum(m.duration for m in metrics if m.duration) - total_tokens = sum(m.total_tokens for m in metrics if m.total_tokens) + total_duration = sum(m.duration for m in metrics if m.duration is not None) + total_tokens = sum(m.total_tokens for m in metrics if m.total_tokens is not None)This makes the intent clearer: skip
Nonevalues but include0values in the sum.diffulex/config.py (2)
40-40: Minor style improvement: uselistinstead oflambda: [].The
default_factory=lambda: []is functionally correct, butdefault_factory=listis more idiomatic and concise in Python.🔎 Suggested style improvement
- device_ids: list[int] = field(default_factory=lambda: []) + device_ids: list[int] = field(default_factory=list)
70-77: Add error handling for CUDA_VISIBLE_DEVICES parsing.The parsing of
CUDA_VISIBLE_DEVICESon lines 72-73 will raiseValueErrorif the environment variable contains non-numeric values (e.g., due to misconfiguration). Consider adding error handling or validation to provide a clearer error message.🔎 Suggested error handling
if not self.device_ids: import torch - self.device_ids = ( - [int(x) for x in os.environ.get("CUDA_VISIBLE_DEVICES", "").split(",") if x.strip()] - if os.environ.get("CUDA_VISIBLE_DEVICES", "") - else list(range(torch.cuda.device_count())) - ) + cuda_visible = os.environ.get("CUDA_VISIBLE_DEVICES", "") + if cuda_visible: + try: + self.device_ids = [int(x) for x in cuda_visible.split(",") if x.strip()] + except ValueError as e: + raise ValueError( + f"Invalid CUDA_VISIBLE_DEVICES value '{cuda_visible}': {e}" + ) from e + else: + self.device_ids = list(range(torch.cuda.device_count())) logger.info(f"Using CUDA devices: {self.device_ids}")diffulex_profiler/exporters/csv.py (1)
22-29: Minor: Use a set literal for initialization.The initialization
set(["name", ...])can be simplified to a set literal{"name", ...}for better readability.🔎 Proposed fix
- fieldnames = set(["name", "duration_sec", "total_tokens", "throughput_tokens_per_sec"]) + fieldnames = {"name", "duration_sec", "total_tokens", "throughput_tokens_per_sec"}diffulex/engine/dp_worker.py (1)
29-29: Remove or document commented-out code.The commented-out
CUDA_VISIBLE_DEVICESassignment should either be removed or accompanied by a comment explaining why it's preserved.diffulex_profiler/README.md (1)
134-149: Add missing import in PyTorch Profiler example.The example references
ProfilerActivity.CPUandProfilerActivity.CUDAwithout showing the required import statement.🔎 Proposed fix
### PyTorch Profiler Backend For GPU/CPU operation-level profiling. Built into PyTorch. ```python +from torch.profiler import ProfilerActivity + profiler = DiffulexProfiler( config=ProfilerConfig( backend="pytorch", pytorch_profiler_config={ "activities": [ProfilerActivity.CPU, ProfilerActivity.CUDA], "record_shapes": True, "profile_memory": True, } ) )</details> </blockquote></details> <details> <summary>diffulex_profiler/backends/__init__.py (1)</summary><blockquote> `12-23`: **Consider logging when optional backends fail to import.** Silent `pass` on `ImportError` makes it harder to debug when a backend is unexpectedly unavailable. A debug-level log message would help troubleshoot missing dependencies without cluttering normal output. <details> <summary>🔎 Proposed fix</summary> ```diff +import logging +_logger = logging.getLogger(__name__) + # Optional backends try: from diffulex_profiler.backends.viztracer import VizTracerBackend __all__.append("VizTracerBackend") except ImportError: - pass + _logger.debug("VizTracerBackend not available: viztracer not installed") try: from diffulex_profiler.backends.pytorch import PyTorchProfilerBackend __all__.append("PyTorchProfilerBackend") except ImportError: - pass + _logger.debug("PyTorchProfilerBackend not available")diffulex_bench/__init__.py (1)
12-35: Refactor to avoid duplicating__all__list.The
__all__list is duplicated between the try/except branches, violating DRY. Define the base list once and conditionally append.🔎 Proposed fix
+__all__ = [ + "BenchmarkRunner", + "load_benchmark_dataset", + "compute_metrics", + "setup_logger", + "get_logger", + "BenchmarkConfig", + "EngineConfig", + "EvalConfig", +] + # Import lm_eval model to register it try: from diffulex_bench.lm_eval_model import DiffulexLM - __all__ = [ - "BenchmarkRunner", - "load_benchmark_dataset", - "compute_metrics", - "setup_logger", - "get_logger", - "BenchmarkConfig", - "EngineConfig", - "EvalConfig", - "DiffulexLM", - ] + __all__.append("DiffulexLM") except ImportError: - __all__ = [ - "BenchmarkRunner", - "load_benchmark_dataset", - "compute_metrics", - "setup_logger", - "get_logger", - "BenchmarkConfig", - "EngineConfig", - "EvalConfig", - ] + passdiffulex_bench/report.py (1)
30-30: Consider removing the unnecessary lambda or use it consistently.Line 30 defines
append_lineas a lambda, but line 52 directly usesreport_lines.appendinstead. Either remove the lambda and usereport_lines.appendthroughout, or useappend_lineconsistently.🔎 Suggested simplification
Remove the lambda and use
report_lines.appenddirectly:- append_line = lambda line: report_lines.append(line) - append_line("=" * 80) + report_lines.append("=" * 80)Or use
append_lineconsistently at line 52:- report_lines.append(f" Accuracy: {metrics['accuracy']:.4f}") + append_line(f" Accuracy: {metrics['accuracy']:.4f}")Also applies to: 52-52
diffulex_profiler/backends/viztracer.py (1)
53-67: Add validation for the output_file attribute access.Line 59 accesses
self.tracer.output_filewithout verifying that this attribute exists. While VizTracer likely provides this attribute, defensive programming would add a fallback.🔎 Suggested defensive code
self.tracer.stop() - output_file = self.tracer.output_file + output_file = getattr(self.tracer, 'output_file', 'unknown') result = { "backend": "viztracer",diffulex_bench/metrics.py (1)
66-83: Placeholder implementation for HumanEval evaluation.This function is a placeholder and returns
None. The comments correctly note that actual implementation requires a code execution environment.Do you want me to open an issue to track implementing HumanEval evaluation with Docker-based code execution, or would you like suggestions for safe code execution frameworks?
diffulex_bench/main.py (3)
116-155: Consider thread-safety implications of sys.argv manipulation.The function modifies
sys.argvglobally (lines 120-127), which is not thread-safe. If multiple benchmarks run concurrently in different threads, they could interfere with each other's arguments.If concurrent benchmark execution is planned, consider:
- Using lm_eval's programmatic API instead of CLI if available
- Adding a lock around sys.argv manipulation
- Running benchmarks in separate processes rather than threads
217-226: Hardcoded defaults in getattr may diverge from actual defaults.Lines 217-226 use
getattrwith hardcoded defaults (e.g.,max_num_batched_tokens=4096). If the argument parser orEngineConfighas different defaults, they could become inconsistent over time.Consider either:
- Defining these defaults in a central location (e.g., a constants module)
- Using the defaults from
EngineConfig.__init__directly- Ensuring arg_parser provides all required arguments with proper defaults
This reduces maintenance burden when defaults change.
182-190: Add error handling for config loading.The config loading from YAML/JSON files (lines 184-186) lacks error handling. If the config file is malformed or the
from_yaml/from_jsonmethods fail, the error won't be user-friendly.🔎 Suggested error handling
if config_path and config_path.exists(): - if config_path.suffix in ['.yaml', '.yml']: - config = BenchmarkConfig.from_yaml(str(config_path)) - elif config_path.suffix == '.json': - config = BenchmarkConfig.from_json(str(config_path)) - else: + try: + if config_path.suffix in ['.yaml', '.yml']: + config = BenchmarkConfig.from_yaml(str(config_path)) + elif config_path.suffix == '.json': + config = BenchmarkConfig.from_json(str(config_path)) + else: + logger.error(f"Unsupported config file format: {config_path.suffix}") + sys.exit(1) + except Exception as e: - logger.error(f"Unsupported config file format: {config_path.suffix}") + logger.error(f"Failed to load config from {config_path}: {e}") sys.exit(1)diffulex_profiler/metrics.py (2)
6-6: Unused import:timeThe
timemodule is imported but not used in this file. It appears to be used inprofiler.pyinstead.🔎 Proposed fix
-import time
79-80: Silent exception swallowing hinders debugging.Bare
except Exception: passblocks silently discard all errors. While graceful degradation is appropriate here, consider logging at DEBUG level to aid troubleshooting when metrics unexpectedly return empty.Also applies to: 95-96, 111-112
diffulex_profiler/profiler.py (1)
188-196: Clarifyrecord_throughputusage timing in docstring.If called before
stop(),self.current_metrics.durationwill be0.0, causing the throughput to not be recorded. This is handled correctly by the guard, but the docstring should clarify that this method is intended to be called afterstop()or with an explicitdurationargument.🔎 Proposed docstring enhancement
def record_throughput(self, tokens: int, duration: Optional[float] = None): - """Record throughput in tokens per second.""" + """Record throughput in tokens per second. + + Args: + tokens: Number of tokens generated. + duration: Duration in seconds. If None, uses the duration from the + current profiling section (requires stop() to have been called). + """diffulex_bench/config.py (2)
47-50:from_dictwill fail on unknown keys.Using
cls(**config_dict)directly will raiseTypeErrorif the dictionary contains keys not defined in the dataclass. This makes config evolution fragile—adding a new field in a YAML file would break older code.Consider filtering to known fields:
🔎 Proposed fix for EngineConfig
@classmethod def from_dict(cls, config_dict: Dict[str, Any]) -> "EngineConfig": """Create engine configuration from dictionary""" - return cls(**config_dict) + valid_fields = {f.name for f in cls.__dataclass_fields__.values()} + filtered = {k: v for k, v in config_dict.items() if k in valid_fields} + return cls(**filtered)Apply the same pattern to
EvalConfig.from_dict.Also applies to: 102-105
143-159: Flat config backward compatibility may silently misroute unknown keys.Keys not in
engine_fieldsare assumed to be eval fields. If a typo or unknown key exists, it will be passed toEvalConfig.from_dictand cause a failure there. Consider logging a warning for unrecognized keys.diffulex_profiler/__init__.py (1)
25-40:__all__includes potentially unavailable symbols.If
VizTracerBackend,PyTorchProfilerBackend, orCSVExporteraren't importable due to missing dependencies, listing them in__all__could be misleading. Consider dynamically building__all__based on what's actually available.diffulex_bench/lm_eval_model.py (2)
5-5: Unused import and variable:loggingmodule andeval_logger.
eval_loggeris created but never used. The class usesself.logger(fromget_logger) instead.🔎 Proposed fix
-import logging import time import json ... -eval_logger = logging.getLogger(__name__)Also applies to: 21-21
67-68: Prefer explicit exceptions overassertfor argument validation.
assertstatements are stripped when Python runs with-O(optimized mode), which could allow invalid arguments to pass silently in production.🔎 Proposed fix
- assert isinstance(pretrained, str) - assert isinstance(batch_size, (int, str)) + if not isinstance(pretrained, str): + raise TypeError(f"pretrained must be str, got {type(pretrained)}") + if not isinstance(batch_size, (int, str)): + raise TypeError(f"batch_size must be int or str, got {type(batch_size)}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (61)
.editorconfig.gitattributes.github/dependabot.yml.github/workflows/pr-perfbench-bot.yml.github/workflows/pr-reminder-bot.yml.gitignore.pre-commit-config.yaml.pymarkdownCODE_OF_CONDUCT.mdCONTRIBUTING.mdTilelang-failed_test_casesdiffulex/__init__.pydiffulex/attention/__init__.pydiffulex/config.pydiffulex/diffulex.pydiffulex/engine/dp_worker.pydiffulex/engine/model_runner.pydiffulex/engine/tp_worker.pydiffulex/logger.pydiffulex/model/config/dream/configuration_dream.pydiffulex/model/config/fast_dllm_v2/configuration_fast_dllm_v2.pydiffulex/model/config/sdar/configuration_sdar.pydiffulex/sampler/base.pydiffulex/strategy/fast_dllm_v2/engine/sequence.pydiffulex/utils/loader.pydiffulex_bench/README.mddiffulex_bench/__init__.pydiffulex_bench/arg_parser.pydiffulex_bench/config.pydiffulex_bench/configs/__init__.pydiffulex_bench/configs/dream_d2f_gsm8k.ymldiffulex_bench/configs/example.ymldiffulex_bench/datasets.pydiffulex_bench/lm_eval_model.pydiffulex_bench/logger.pydiffulex_bench/main.pydiffulex_bench/metrics.pydiffulex_bench/report.pydiffulex_bench/runner.pydiffulex_profiler/README.mddiffulex_profiler/__init__.pydiffulex_profiler/backends/__init__.pydiffulex_profiler/backends/base.pydiffulex_profiler/backends/pytorch.pydiffulex_profiler/backends/simple.pydiffulex_profiler/backends/viztracer.pydiffulex_profiler/example.pydiffulex_profiler/exporters/__init__.pydiffulex_profiler/exporters/base.pydiffulex_profiler/exporters/csv.pydiffulex_profiler/exporters/json.pydiffulex_profiler/exporters/summary.pydiffulex_profiler/metrics.pydiffulex_profiler/profiler.pydocs/make.batexamples/test_dream_diffulex_gsm8k.pyexamples/test_sdar_diffulex_gsm8k.pyformat.shprofile/d2f_dream_profile.pypyproject.tomlscript/d2f_dream_eval_gsm8k.sh
💤 Files with no reviewable changes (1)
- Tilelang-failed_test_cases
| ## Enforcement | ||
|
|
||
| Instances of abusive, harassing, or otherwise unacceptable behavior may be | ||
| reported to the community leaders responsible for enforcement at | ||
| [leiwang1999@outlook.com](mailto:leiwang1999@outlook.com) | ||
| All complaints will be reviewed and investigated promptly and fairly. | ||
|
|
||
| All community leaders are obligated to respect the privacy and security of the | ||
| reporter of any incident. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace personal email with a project or organizational email for incident reporting.
Using a personal email address (leiwang1999@outlook.com) for Code of Conduct violation reports exposes the maintainer to privacy and security risks. Additionally, this email may become unavailable if the maintainer leaves the project.
Consider using:
- A dedicated project email address (e.g.,
conduct@diffulex.devor similar) - An organizational email alias or distribution list
- A moderation form/ticketing system (if available)
- A shared mailbox managed by multiple maintainers
This ensures continuity, shared responsibility, and protects individual privacy.
🤖 Prompt for AI Agents
In @CODE_OF_CONDUCT.md around lines 59 - 67, The Enforcement section currently
lists a personal email (leiwang1999@outlook.com); replace that personal address
with a non-personal project/organizational contact such as a dedicated project
email (e.g., conduct@diffulex.dev), an alias/distribution list, or a link to a
moderation/ticketing form, and update the line in the "## Enforcement" block to
use that shared address or form so incident reports are routed to a shared
mailbox or team rather than an individual's personal account.
| dataset = load_dataset("gsm8k", "main", split=split) | ||
|
|
||
| if limit: | ||
| dataset = dataset[:limit] | ||
|
|
||
| results = [] | ||
| for item in dataset: | ||
| question = item["question"] | ||
| answer = item["answer"] | ||
|
|
||
| if prompt_template: | ||
| prompt = prompt_template(question) | ||
| else: | ||
| # Default template | ||
| prompt = f"Question: {question}\nAnswer:" | ||
|
|
||
| results.append({ | ||
| 'prompt': prompt, | ||
| 'answer': answer, | ||
| 'question': question, | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Dataset slicing returns batch format, breaking iteration.
When limit is applied, dataset[:limit] returns a dictionary of lists (batch format), not a sliced Dataset. The subsequent for item in dataset will iterate over column names (strings like "question", "answer"), causing incorrect behavior.
Fix using proper Dataset slicing
def load_gsm8k(
split: str = "test",
limit: Optional[int] = None,
prompt_template: Optional[Callable[[str], str]] = None,
) -> List[Dict[str, Any]]:
"""..."""
dataset = load_dataset("gsm8k", "main", split=split)
if limit:
- dataset = dataset[:limit]
+ dataset = dataset.select(range(min(limit, len(dataset))))
results = []
for item in dataset:🤖 Prompt for AI Agents
In @diffulex_bench/datasets.py around lines 26 - 47, The slicing bug arises
because dataset[:limit] returns a batched dict instead of a Dataset; replace
that line with a proper Dataset slice using the huggingface Dataset.select API
(e.g., call dataset.select(range(limit))) so that the subsequent for item in
dataset iterates over dataset examples; update the code where the variable
dataset is reassigned (the block using dataset = dataset[:limit]) to use dataset
= dataset.select(range(limit)) and keep the existing logic for prompt_template
and results unchanged.
| dataset = load_dataset("openai/humaneval", split="test") | ||
|
|
||
| if limit: | ||
| dataset = dataset[:limit] | ||
|
|
||
| results = [] | ||
| for item in dataset: | ||
| prompt = item["prompt"] | ||
| test = item["test"] | ||
| entry_point = item["entry_point"] | ||
|
|
||
| if prompt_template: | ||
| full_prompt = prompt_template(prompt) | ||
| else: | ||
| full_prompt = prompt | ||
|
|
||
| results.append({ | ||
| 'prompt': full_prompt, | ||
| 'original_prompt': prompt, | ||
| 'test': test, | ||
| 'entry_point': entry_point, | ||
| 'task_id': item.get('task_id', ''), | ||
| }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same slicing bug in load_humaneval.
The dataset[:limit] slicing has the same issue as in load_gsm8k.
Fix using proper Dataset slicing
dataset = load_dataset("openai/humaneval", split="test")
if limit:
- dataset = dataset[:limit]
+ dataset = dataset.select(range(min(limit, len(dataset))))
results = []
for item in dataset:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dataset = load_dataset("openai/humaneval", split="test") | |
| if limit: | |
| dataset = dataset[:limit] | |
| results = [] | |
| for item in dataset: | |
| prompt = item["prompt"] | |
| test = item["test"] | |
| entry_point = item["entry_point"] | |
| if prompt_template: | |
| full_prompt = prompt_template(prompt) | |
| else: | |
| full_prompt = prompt | |
| results.append({ | |
| 'prompt': full_prompt, | |
| 'original_prompt': prompt, | |
| 'test': test, | |
| 'entry_point': entry_point, | |
| 'task_id': item.get('task_id', ''), | |
| }) | |
| dataset = load_dataset("openai/humaneval", split="test") | |
| if limit: | |
| dataset = dataset.select(range(min(limit, len(dataset)))) | |
| results = [] | |
| for item in dataset: | |
| prompt = item["prompt"] | |
| test = item["test"] | |
| entry_point = item["entry_point"] | |
| if prompt_template: | |
| full_prompt = prompt_template(prompt) | |
| else: | |
| full_prompt = prompt | |
| results.append({ | |
| 'prompt': full_prompt, | |
| 'original_prompt': prompt, | |
| 'test': test, | |
| 'entry_point': entry_point, | |
| 'task_id': item.get('task_id', ''), | |
| }) |
🤖 Prompt for AI Agents
In @diffulex_bench/datasets.py around lines 65 - 88, The current use of Python
slice syntax dataset[:limit] in load_humaneval does not properly slice a
HuggingFace Dataset; replace that with the Dataset.select-based approach (e.g.,
use dataset.select(range(limit)) or dataset.select(list(range(limit)))) so the
dataset is correctly truncated when limit is provided; update the branch that
checks if limit and references dataset[:limit] to call select instead.
| with open(results_file, 'r', encoding='utf-8') as f: | ||
| results = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for file operations.
The file reading lacks error handling for common exceptions like FileNotFoundError and json.JSONDecodeError, which would cause the function to crash rather than fail gracefully.
🔎 Proposed fix
- with open(results_file, 'r', encoding='utf-8') as f:
- results = json.load(f)
+ try:
+ with open(results_file, 'r', encoding='utf-8') as f:
+ results = json.load(f)
+ except FileNotFoundError:
+ raise FileNotFoundError(f"Results file not found: {results_file}")
+ except json.JSONDecodeError as e:
+ raise ValueError(f"Invalid JSON in results file: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with open(results_file, 'r', encoding='utf-8') as f: | |
| results = json.load(f) | |
| try: | |
| with open(results_file, 'r', encoding='utf-8') as f: | |
| results = json.load(f) | |
| except FileNotFoundError: | |
| raise FileNotFoundError(f"Results file not found: {results_file}") | |
| except json.JSONDecodeError as e: | |
| raise ValueError(f"Invalid JSON in results file: {e}") |
🤖 Prompt for AI Agents
In @diffulex_bench/report.py around lines 22 - 23, Wrap the open/results
json.load block that reads results_file in a try/except to handle
FileNotFoundError, PermissionError and json.JSONDecodeError: catch these
exceptions around the with open(...) and json.load(...) lines, log a clear error
via the module logger (or import logging if none exists), and return a sensible
fallback (e.g., an empty dict or None) or re-raise a wrapped exception with
context so the function fails gracefully; keep references to the existing
variable results_file and the json.load call so the change is localized.
| for result_file in result_files: | ||
| with open(result_file, 'r', encoding='utf-8') as f: | ||
| results = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and validation for file operations.
The loop lacks error handling for file operations and doesn't validate that result_files is non-empty. If one file is corrupted or missing, the entire comparison fails without providing partial results.
🔎 Proposed fix with error handling
+ if not result_files:
+ raise ValueError("result_files cannot be empty")
+
for result_file in result_files:
- with open(result_file, 'r', encoding='utf-8') as f:
- results = json.load(f)
+ try:
+ with open(result_file, 'r', encoding='utf-8') as f:
+ results = json.load(f)
+ except (FileNotFoundError, json.JSONDecodeError) as e:
+ logger.warning(f"Skipping {result_file}: {e}")
+ continueNote: This requires importing the logger. Alternatively, you could raise the exception to fail fast.
Committable suggestion skipped: line range outside the PR's diff.
| if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end: | ||
| self.current_metrics.gpu_utilization = ( | ||
| self.current_metrics.gpu_metrics_end.get("utilization", 0) - | ||
| self.current_metrics.gpu_metrics_start.get("utilization", 0) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU utilization delta is semantically incorrect.
GPU utilization is a point-in-time percentage (e.g., "80% busy right now"), not a cumulative value. Subtracting start_utilization from end_utilization doesn't produce meaningful data—it could even be negative.
Consider either:
- Storing both start/end utilization for comparison, or
- Computing average utilization during the profiled section (requires sampling), or
- Simply storing the end utilization as a representative snapshot.
🔎 Proposed fix (store end utilization as snapshot)
if self.config.collect_gpu_metrics and torch.cuda.is_available():
self.current_metrics.gpu_metrics_end = collect_gpu_metrics()
- if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end:
- self.current_metrics.gpu_utilization = (
- self.current_metrics.gpu_metrics_end.get("utilization", 0) -
- self.current_metrics.gpu_metrics_start.get("utilization", 0)
- )
+ if self.current_metrics.gpu_metrics_end:
+ self.current_metrics.gpu_utilization = (
+ self.current_metrics.gpu_metrics_end.get("utilization", 0)
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.current_metrics.gpu_metrics_start and self.current_metrics.gpu_metrics_end: | |
| self.current_metrics.gpu_utilization = ( | |
| self.current_metrics.gpu_metrics_end.get("utilization", 0) - | |
| self.current_metrics.gpu_metrics_start.get("utilization", 0) | |
| ) | |
| if self.current_metrics.gpu_metrics_end: | |
| self.current_metrics.gpu_utilization = ( | |
| self.current_metrics.gpu_metrics_end.get("utilization", 0) | |
| ) |
🤖 Prompt for AI Agents
In @diffulex_profiler/profiler.py around lines 165 - 169, The code is
incorrectly computing a delta for GPU utilization by subtracting
self.current_metrics.gpu_metrics_start from gpu_metrics_end; change this to
store a representative snapshot instead: set
self.current_metrics.gpu_utilization to the end snapshot value (e.g.,
self.current_metrics.gpu_metrics_end.get("utilization", 0)) or alternatively
preserve both self.current_metrics.gpu_metrics_start and gpu_metrics_end on the
metrics object for later comparison; update any consumers of gpu_utilization
accordingly to expect a snapshot rather than a delta.
| def format(self, record): | ||
| log_color = self.COLORS.get(record.levelname, '') | ||
| record.levelname = f"{log_color}{record.levelname}{self.RESET}" | ||
| return super().format(record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating record.levelname affects all handlers sharing the LogRecord.
The format() method modifies record.levelname in place. If multiple handlers (e.g., console + file) process the same record, the file handler will receive the colorized levelname, corrupting log files with ANSI codes.
Create a copy or use a separate attribute
def format(self, record):
+ # Create a copy to avoid mutating the original record
+ record = logging.makeLogRecord(record.__dict__)
log_color = self.COLORS.get(record.levelname, '')
record.levelname = f"{log_color}{record.levelname}{self.RESET}"
return super().format(record)🤖 Prompt for AI Agents
In @diffulex/logger.py around lines 44 - 47, The format() method currently
mutates record.levelname which pollutes other handlers; instead do not modify
record.levelname in place — create a colored string locally (e.g., using
self.COLORS and self.RESET) and either assign it to a new attribute on the
record such as levelname_colored or operate on a shallow copy of the LogRecord,
and update the formatter pattern to use that new attribute (or use the copy) so
other handlers receive the original record.levelname unchanged; make this change
in the format() method (referencing format, self.COLORS, self.RESET, and
record).
| get_merge_base() { | ||
| UPSTREAM_REPO="https://github.com/tile-ai/tilelang" | ||
| if git ls-remote --exit-code "${UPSTREAM_REPO}" main &>/dev/null; then | ||
| # First try to use the upstream repository directly | ||
| MERGE_BASE="$(git fetch "${UPSTREAM_REPO}" main &>/dev/null && git merge-base FETCH_HEAD HEAD)" | ||
| elif git show-ref --verify --quiet refs/remotes/origin/main; then | ||
| # Fall back to origin/main if available | ||
| BASE_BRANCH="origin/main" | ||
| MERGE_BASE="$(git merge-base "${BASE_BRANCH}" HEAD)" | ||
| else | ||
| # Last resort, use local main | ||
| BASE_BRANCH="main" | ||
| MERGE_BASE="$(git merge-base "${BASE_BRANCH}" HEAD)" | ||
| fi | ||
| echo "${MERGE_BASE}" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual remote origin and repository name
echo "=== Git Remotes ==="
git remote -v 2>/dev/null || echo "Not a git repository"
echo -e "\n=== Repository Name (from .git/config) ==="
git config --get remote.origin.url 2>/dev/null || echo "No origin URL configured"
echo -e "\n=== Check for any config files referencing upstream ==="
find . -maxdepth 2 -type f \( -name "*.cfg" -o -name "*.conf" -o -name "*.toml" -o -name "*.yaml" -o -name "*.yml" -o -name ".git" \) 2>/dev/null | head -20
echo -e "\n=== Directory listing to confirm repository name ==="
basename "$(pwd)"
echo -e "\n=== Check for environment variables or documentation ==="
find . -maxdepth 2 -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name ".env*" \) -exec grep -l "upstream\|tilelang\|diffulex" {} \; 2>/dev/nullRepository: zhijie-group/Diffulex
Length of output: 186
Fix incorrect upstream repository URL in get_merge_base() function.
The hardcoded URL references https://github.com/tile-ai/tilelang, but this repository is zhijie-group/Diffulex. This will cause the merge base detection to fail or use an unintended repository. Update the URL to point to the correct upstream repository.
🤖 Prompt for AI Agents
In @format.sh around lines 57 - 72, The get_merge_base() function uses a
hardcoded UPSTREAM_REPO URL that is incorrect; update the UPSTREAM_REPO variable
to the correct upstream repository (e.g.,
"https://github.com/zhijie-group/Diffulex") so merge-base detection queries the
intended repo, then ensure the rest of the logic still fetches main and computes
MERGE_BASE as before.
| if [[ ! -x "$(command -v clang-tidy)" ]]; then | ||
| python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Invalid pip flag --requirements.
The correct flag is -r or --requirement (singular). This will cause the script to fail when clang-tidy is not installed.
Fix the pip flag
if [[ ! -x "$(command -v clang-tidy)" ]]; then
- python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt" --user
+ python3 -m pip install --upgrade -r "${ROOT}/requirements-lint.txt" --user
fi🤖 Prompt for AI Agents
In @format.sh around lines 117 - 119, The pip install call using the invalid
flag `--requirements` must be changed to the correct flag `-r` (or
`--requirement`) so the conditional that installs lint deps when `clang-tidy` is
missing succeeds; update the command in the block that checks `clang-tidy` (the
`python3 -m pip install --upgrade --requirements "${ROOT}/requirements-lint.txt"
--user` line) to use `-r "${ROOT}/requirements-lint.txt"` (or `--requirement`)
instead, keeping the rest of the options (`--upgrade` and `--user`) unchanged.
| "pandas>=2.3.3", | ||
| "tilelang>=0.1.7.post1", | ||
| "rich>=13.0.0", | ||
| "colorama>=0.4.6", | ||
| "lm-eval" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find and read the pyproject.toml file
find . -name "pyproject.toml" -type fRepository: zhijie-group/Diffulex
Length of output: 186
🌐 Web query:
lm-eval package PyPI versions history
💡 Result:
lm-eval (PyPI) release history (versions listed on PyPI, earliest → latest):
- 0.0.1 — Sep 2, 2021. [1]
- 0.2.0 — Mar 7, 2022. [1][2]
- 0.3.0 — Dec 8, 2022. [1][0]
- 0.4.0 — Dec 22, 2023. [1]
- 0.4.1 — Jan 31, 2024. [1]
- 0.4.2 — Mar 18, 2024. [1]
- 0.4.3 — Jul 1, 2024. [1]
- 0.4.4 — Sep 5, 2024. [1]
- 0.4.5 — Oct 8, 2024. [1]
- 0.4.7 — Dec 17, 2024. [1]
- 0.4.8 — Mar 5, 2025. [1]
- 0.4.9 — Jun 19, 2025. [1]
- 0.4.9.1 — Aug 4, 2025. [1]
- 0.4.9.2 — Nov 26, 2025. [1]
Sources: PyPI project and release pages for lm-eval. [1] https://pypi.org/project/lm-eval/ (release list) · [2] https://pypi.org/project/lm-eval/0.2.0/ (example version page).
🌐 Web query:
lm-eval 0.4.0 PyPI
💡 Result:
lm-eval 0.4.0 on PyPI was released on December 22, 2023. Key notes on that release (from the PyPI project page) include new Open LLM Leaderboard tasks, internal refactoring, config-based task creation, Jinja2 prompt support, vLLM/MPS/HF speedups, and new tasks (e.g., CoT BIG-Bench-Hard). [1]
Install with:
pip install lm-eval==0.4.0
Sources:
[1] PyPI project page for lm-eval (release history and v0.4.0 notes).
Pin the lm-eval dependency to a specific version.
The lm-eval dependency lacks a version constraint, which can cause reproducibility issues and unpredictable behavior across different environments. All other dependencies in this dependency group specify minimum versions; lm-eval should do the same.
Recommended fix
- "lm-eval"
+ "lm-eval>=0.4.0"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "pandas>=2.3.3", | |
| "tilelang>=0.1.7.post1", | |
| "rich>=13.0.0", | |
| "colorama>=0.4.6", | |
| "lm-eval" | |
| "pandas>=2.3.3", | |
| "tilelang>=0.1.7.post1", | |
| "rich>=13.0.0", | |
| "colorama>=0.4.6", | |
| "lm-eval>=0.4.0" |
🤖 Prompt for AI Agents
In @pyproject.toml around lines 35 - 39, Update the pyproject.toml dependency
entry for "lm-eval" to include a version constraint so builds are reproducible;
edit the dependency list where "lm-eval" appears and change it to a pinned or
minimum version (for example use a minimum like lm-eval>=X.Y.Z or an exact pin
lm-eval==X.Y.Z) consistent with how other packages are specified.
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.