Basic Information - Models Used
MiniMax-M2
Information about environment and deployment
environment
OS: Ubuntu 22.04.5 LTS
GPU: H200*8
Python: 3.12.11
PyTorch: 2.8.0+cu129
sglang: 0.5.4.post1
sgl_kernel: 0.3.16.post4
The deployment command for Sglang
python3 -m sglang.launch_server --model-path /MiniMax-M2 \
--tp-size 8 --ep-size 8 --tool-call-parser minimax-m2 --trust-remote-code\
--reasoning-parser minimax-append-think --nnodes 1 --node-rank 0 \
--host 0.0.0.0 --port 8000 --mem-fraction-static 0.85
Description
- LiveCodeBench (LCB)
In OpenCompass’s LiveCodeBench (LCB) benchmark, the MiniMax-M2 model was evaluated 5 times independently but consistently failed to achieve official score of 83.
Test Results
lcb_code_generation 29.25
lcb_code_execution 42.38
lcb_test_output 88.46
- SWE-bench Verified
I evaluated the MiniMax-M2 model using the mini-swe-agent tool, running each test case for 350 steps. The final score achieved was 38, which was lower than official.
In multiple test cases, the model generated unexpected code formatting, requiring the agent to repeatedly correct the output structure, but finally failed. Some such example are attached.
sphinx-doc__sphinx-9591.traj.json
sympy__sympy-20916.traj.json
django__django-13964.traj.json