Export dynamic batch size ONNX using ONNX's DeformConv#167
Export dynamic batch size ONNX using ONNX's DeformConv#167itskyf wants to merge 1 commit intoZhengPeng7:mainfrom
Conversation
|
Thanks a lot! I'll take time to look at it tomorrow, which really helps. |
|
Sure, I've updated the notebook to reduce the modifications. |
|
Thank you so much, @itskyf, for your contribution! Have you had a chance to test whether the execution works with ONNX Runtime? |
Hi, @itskyf. Did you successfully export the ONNX model? I tried it but met this problem. I tried both `PyTorch==2.0.1+onnxruntime-gpu==1.18.1` and `PyTorch==2.5.1+onnxruntime-gpu==1.20.1`).
|
|
Hi @ZhengPeng7, I believe the issue arises because ONNX has implemented the |
|
@ZhengPeng7 ah, I forgot to mention that we need to also update the onnx package for opset 19. |
|
@itskyf Could you please provide the code in which you have converted the dynamic batched model to TensorRT? Thanks in advance! |
|
Thanks for @itskyf 's PR. This is exactly what I tested, and it worked. I have a question about this PR for @itskyf When I tested in this way, I found the result trt engine will work as expected when the batch size used when generating is different from batch size used for inferencing. And I figured out #166 this change should be made. Do you find the same issue ? |
Hi, @itskyf , sorry for the late reply, just came back from the Lunar New Year holiday :) As you said above, the DCN is still not supported in |
|
Hi @ZhengPeng7, I might be able to help. To export with opset >19, you’ll need to update your PyTorch version to >2.4. In the provided example, it uses opset 19 for the converter and opset 20 for the entire model. Regarding execution, you can’t run an .onnx file directly with onnxruntime by default, because the operator is not implemented yet. I believe @jhwei is referring to converting the .onnx model to a TensorRT engine for execution. That said, I’m not sure, but maybe you can run it natively with onnxruntime if you specify the TensorRT execution provider. |
|
Hi, @alfausa1. Thanks a lot for the details :) |
|
Hi @ZhengPeng7, you’re correct. It might be worth testing if an onnxruntime session works by specifying the TensorRT execution provider like this: If that doesn’t work, maybe @jhwei can guide us on how to export and use a TensorRT engine, as there are different approaches that involve using CUDA libraries and low-level configurations. I also found this new repo: onnx/onnx-tensorrt, which could be useful to test. Sorry for all the information without testing it myself—my GPU resources are currently limited :(( |
|
Thank you, alfausa1, I've tested it but more errors need to be fixed there and more libs needs to be installed. I'll take a deeper look into it when I have spare time. |
|
Haha, I'm at my wit's end. 😭😭😭 I tried adding dynamic image size input to BirefNet, but it's still stuck on ONNXRuntime not supporting DeformConv2D. Looks like I'll have to hand-code a TensorRT plugin after all. This is so frustrating. This is my current work. I hope future developers can build upon it. “相信后人的智慧” https://gist.github.com/ShirasawaSama/c231d83e3c24d10d4b706051c0c2c6f1
Meanwhile, I tried to export an ONNX model in fp16 format and found that the discrepancy was extremely large, which was quite strange: I can provide my code. Would anyone be willing to help me? |
|
I submitted a Pull Request for ONNXRuntime to enable support for the DeformConv2D operator on CPU, CUDA, and TensorRT. Next, I will conduct comparative tests using my build and birefnet to check for any accuracy degradation. I will post updates here as progress is made. |
|
Thanks for the PR. I'm not an expert on this, but still, many thanks and good luck to you! |
|
Thanks for the effort @ShirasawaSama!! In my opinion, it’s strange that the ONNX session performs similarly, or even worse than PyTorch. I would honestly expect an improvement in inference times. If I understood correctly, this could be due to the resize operator, right?? |
In fact, it might simply be because I haven't fully optimized performance yet. I've only ensured it runs on the GPU as much as possible. The model can still be fine-tuned to improve performance, and if I have more time, I'll continue trying to optimize it. |






This PR replaces the usage of deform_conv2d_onnx_exporter with the native DeformConv operator available in ONNX opset 19. The exported ONNX model now supports dynamic batch sizes.
Notes
symbolic_deform_conv_19()function was generated using OpenAI o1. It works in my testing, but let me know if there are any special requirements to consider.