I may be slow to respond.
Pinned Loading
-
LancerLab/choreo
LancerLab/choreo PublicAn ultra-accessible DSL for high-performance kernel programming
C++ 4
-
LancerLab/kebab
LancerLab/kebab PublicDissecting Hopper/Ampere performance with microbenchmarks and Gemm best practices.
Cuda 3
-
sparse-gemm-with-hopper-sptc
sparse-gemm-with-hopper-sptc PublicA minimal MatMul/Gemm case for using WGMMA + Structural Sparsity in Hopper
Cuda 2
-
-
Samoyeds
Samoyeds PublicForked from LancerLab/Samoyeds
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores (EuroSys'25)
Jupyter Notebook
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.

