Skip to content

Activity

feat: SM-constraint-GEMM by triton persistent kernel (#982)

Pull request merge
yzh119pushed 1 commit to main • d7a9234…5751fc6 • 
21 hours ago

perf: prefetch page indices for mla kernel (#991)

Pull request merge
yzh119pushed 1 commit to main • 17ff5a7…d7a9234 • 
yesterday

misc: fix devcontainer conda path (#989)

Pull request merge
yzh119pushed 1 commit to main • 72f00bc…17ff5a7 • 
2 days ago

ci: add torch 2.6+cu126 wheel (#985)

Pull request merge
MasterJH5574pushed 1 commit to main • 31cfe10…72f00bc • 
2 days ago

misc: update devcontainer (#986)

Pull request merge
yzh119pushed 1 commit to main • afa9332…31cfe10 • 
2 days ago

ci: switch to on-demand instances if spot instance is interrupted (#987)

Pull request merge
yzh119pushed 1 commit to main • 86da6b8…afa9332 • 
2 days ago

Deleted branch

yzh119deleted nandor/norm • 
3 days ago

misc: Rename output_emitted_token_num -> `output_emitted_draft_toke…

Pull request merge
yzh119pushed 1 commit to main • bb028cc…86da6b8 • 
3 days ago

feat: Allow passing workspace base directory via environment variable (

Pull request merge
yzh119pushed 1 commit to main • 893172c…bb028cc • 
3 days ago

triton: Triton rms_norm kernels (#983)

Pull request merge
yzh119pushed 1 commit to main • 77ccda8…893172c • 
3 days ago

misc: Use environment variable to control JIT verbose flag (#981)

Pull request merge
yzh119pushed 1 commit to main • 3a69560…77ccda8 • 
3 days ago

Triton rms_norm kernels

nandorcreated nandor/norm • a65242c • 
4 days ago

bugfix: Fix compilation with FP16_QK_REDUCTION enabled. (#962)

Pull request merge
yzh119pushed 1 commit to main • bc81a59…3a69560 • 
4 days ago

release: bump version to v0.2.4 (#980)

Pull request merge
yzh119pushed 1 commit to main • 60d37b7…bc81a59 • 
4 days ago

perf: Use 2WG pipeline design for MLA implementation on Hopper (#952)

Pull request merge
yzh119pushed 1 commit to main • e19cb7b…60d37b7 • 
4 days ago

perf: dual pivot top-p/top-k renorm (#974)

Pull request merge
yzh119pushed 1 commit to main • 588c2fb…e19cb7b • 
6 days ago

benchmark: add sampling.renorm benchmarks (#970)

Pull request merge
yzh119pushed 1 commit to main • 55a6668…588c2fb • 
7 days ago

bugfix: Fix POD JIT bugs (#971)

Pull request merge
yzh119pushed 1 commit to main • 61e049a…55a6668 • 
8 days ago

perf: Fix python API overhead when CUDAGraph is not enabled (#969)

Pull request merge
yzh119pushed 1 commit to main • f65b93f…61e049a • 
9 days ago

feat: Added tvm binding for sampling kernel (#958)

Pull request merge
yzh119pushed 1 commit to main • 86b12ad…f65b93f • 
9 days ago

perf: reduce torch.library dispatch overhead (#968)

Pull request merge
yzh119pushed 1 commit to main • bb49fac…86b12ad • 
11 days ago

doc: remove misleading docstring about non_blocking (#966)

Pull request merge
yzh119pushed 1 commit to main • 034fc18…bb49fac • 
11 days ago

Deleted branch

yzh119deleted lequn/0317-pod-aot • 
11 days ago

Deleted branch

yzh119deleted remove-non-blocking-docstring • 
11 days ago

remove-non-blocking-docstring

yzh119created remove-non-blocking-docstring • 74f3336 • 
11 days ago

bugfix: Fix compilation on cuda 12.2 (#961)

Pull request merge
yzh119pushed 1 commit to main • 2be9ad7…034fc18 • 
14 days ago

ci: improve jenkins (#943)

Pull request merge
yzh119pushed 1 commit to main • 594febe…2be9ad7 • 
15 days ago

misc: Temporarily disable POD from AOT wheels (#956)

Pull request merge
yzh119pushed 1 commit to main • 30b2838…594febe • 
15 days ago

misc: Temporarily disable POD from AOT wheels

abcdabcd987created lequn/0317-pod-aot • 3af6e10 • 
15 days ago

bugfix: bugfix to #949 (#951)

Pull request merge
yzh119pushed 1 commit to main • 211dfc6…30b2838 • 
16 days ago