docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686 #2870

blueberrycongee · 2025-12-10T19:53:38Z

Summary

Add descriptive documentation to sgemm_sm80.cu explaining that it is actually an FP16xFP16 GEMM (HGEMM) tutorial using CuTe.

Problem

Issue #1686 reports that users cannot find an fp16 GEMM tutorial in the CuTe examples.

Solution

The existing sgemm_sm80.cu already implements FP16 GEMM using cute::half_t, but this was not documented. This PR adds a documentation block clarifying:

This example uses FP16 data types despite the "sgemm" filename
Key features: Tensor Cores, cp.async, pipelining, swizzled shared memory
Usage examples

Fixes #1686

hwu36 · 2026-01-07T18:53:01Z

if the accumulation type is fp32, it is sgemm. if the accumulation type is fp16, it is hgemm.

docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes NVIDIA#1686

71d3aa9

blueberrycongee mentioned this pull request Dec 10, 2025

[QST] Is there any fp16xfp16 GEMM sample using CUTE with a performance comparable to cublas? #1686

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686 #2870

docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686 #2870

blueberrycongee commented Dec 10, 2025

Uh oh!

hwu36 commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686 #2870

Are you sure you want to change the base?

docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686 #2870

Conversation

blueberrycongee commented Dec 10, 2025

Summary

Problem

Solution

Uh oh!

hwu36 commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants