feat: SM-constraint-GEMM by triton persistent kernel (
#982 )
Pull request merge
yzh119pushed 1 commit to main • d7a9234…5751fc6 • 21 hours ago
perf: prefetch page indices for mla kernel (
#991 )
Pull request merge
yzh119pushed 1 commit to main • 17ff5a7…d7a9234 • yesterday
misc: fix devcontainer conda path (
#989 )
Pull request merge
yzh119pushed 1 commit to main • 72f00bc…17ff5a7 • 2 days ago
ci: add torch 2.6+cu126 wheel (
#985 )
Pull request merge
misc: update devcontainer (
#986 )
Pull request merge
yzh119pushed 1 commit to main • afa9332…31cfe10 • 2 days ago
ci: switch to on-demand instances if spot instance is interrupted (
#987 )
Pull request merge
yzh119pushed 1 commit to main • 86da6b8…afa9332 • 2 days ago
misc: Rename output_emitted_token_num
-> `output_emitted_draft_toke…
Pull request merge
yzh119pushed 1 commit to main • bb028cc…86da6b8 • 3 days ago
feat: Allow passing workspace base directory via environment variable (
… Pull request merge
yzh119pushed 1 commit to main • 893172c…bb028cc • 3 days ago
triton: Triton
rms_norm
kernels (
#983 )
Pull request merge
yzh119pushed 1 commit to main • 77ccda8…893172c • 3 days ago
misc: Use environment variable to control JIT verbose flag (
#981 )
Pull request merge
yzh119pushed 1 commit to main • 3a69560…77ccda8 • 3 days ago
bugfix: Fix compilation with FP16_QK_REDUCTION enabled. (
#962 )
Pull request merge
yzh119pushed 1 commit to main • bc81a59…3a69560 • 4 days ago
release: bump version to v0.2.4 (
#980 )
Pull request merge
yzh119pushed 1 commit to main • 60d37b7…bc81a59 • 4 days ago
perf: Use 2WG pipeline design for MLA implementation on Hopper (
#952 )
Pull request merge
yzh119pushed 1 commit to main • e19cb7b…60d37b7 • 4 days ago
perf: dual pivot top-p/top-k renorm (
#974 )
Pull request merge
yzh119pushed 1 commit to main • 588c2fb…e19cb7b • 6 days ago
benchmark: add sampling.renorm benchmarks (
#970 )
Pull request merge
yzh119pushed 1 commit to main • 55a6668…588c2fb • 7 days ago
bugfix: Fix POD JIT bugs (
#971 )
Pull request merge
yzh119pushed 1 commit to main • 61e049a…55a6668 • 8 days ago
perf: Fix python API overhead when CUDAGraph is not enabled (
#969 )
Pull request merge
yzh119pushed 1 commit to main • f65b93f…61e049a • 9 days ago
feat: Added tvm binding for sampling kernel (
#958 )
Pull request merge
yzh119pushed 1 commit to main • 86b12ad…f65b93f • 9 days ago
perf: reduce torch.library dispatch overhead (
#968 )
Pull request merge
yzh119pushed 1 commit to main • bb49fac…86b12ad • 11 days ago
doc: remove misleading docstring about
non_blocking
(
#966 )
Pull request merge
yzh119pushed 1 commit to main • 034fc18…bb49fac • 11 days ago
remove-non-blocking-docstring
bugfix: Fix compilation on cuda 12.2 (
#961 )
Pull request merge
yzh119pushed 1 commit to main • 2be9ad7…034fc18 • 14 days ago
ci: improve jenkins (
#943 )
Pull request merge
yzh119pushed 1 commit to main • 594febe…2be9ad7 • 15 days ago
misc: Temporarily disable POD from AOT wheels (
#956 )
Pull request merge
yzh119pushed 1 commit to main • 30b2838…594febe • 15 days ago
misc: Temporarily disable POD from AOT wheels
yzh119pushed 1 commit to main • 211dfc6…30b2838 • 16 days ago
You can’t perform that action at this time.