-
Notifications
You must be signed in to change notification settings - Fork 751
Issues: kubeflow/trainer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[feature] consolidate workflow jobs into one
kind/feature
lifecycle/needs-triage
#2532
opened Mar 16, 2025 by
mahdikhashan
KEP-2170: Add manifest overlays for standalone installation
kind/feature
lifecycle/needs-triage
#2526
opened Mar 16, 2025 by
Doris-xm
Support TrainJob ResourcePerNode in CoScheduling plugin
area/controller
kind/feature
#2525
opened Mar 15, 2025 by
tenzen-y
Fix the Coveralls badge for the Go test coverage
area/testing
good first issue
help wanted
kind/bug
#2519
opened Mar 13, 2025 by
andreyvelich
KEP-2401: Determine the tag for torchtune trainer & Add support for multiple accelerators
area/llm
kind/feature
#2518
opened Mar 13, 2025 by
Electronic-Waste
Create DeepSpeed Runtime with Kubeflow Trainer
area/runtimes
kind/feature
#2517
opened Mar 13, 2025 by
andreyvelich
Get and Use TrainingRuntime ApplyConfiguration throughout KF PipelineFramework
area/controller
kind/feature
#2515
opened Mar 13, 2025 by
tenzen-y
KEP-2401: Create LLM Training Runtimes for Llama 3.2 model family
area/llm
area/runtimes
kind/feature
#2510
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Create LLM Training Runtimes for Llama 3.1 model family
area/llm
area/runtimes
kind/feature
#2509
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Validate fine-tuning configurations in
torch
plugin
area/llm
kind/feature
#2508
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Complement
torch
plugin to support torchtune
config mutation
area/llm
kind/feature
#2507
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Support mutating dataset preprocessing config in SDK
area/llm
area/sdk
kind/feature
#2506
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Support LoRA/QLoRA/DoRA fine-tuning in LLM Trainer V2
area/llm
area/sdk
kind/feature
#2505
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Add
TorchTuneConfig
to train()
API
area/llm
area/sdk
kind/feature
#2504
opened Mar 12, 2025 by
Electronic-Waste
Add replicatedJobs.replicas validations to TrainingRuntime and ClusterTrainingRuntime Webhook
kind/feature
#2502
opened Mar 12, 2025 by
tenzen-y
Update Kubeflow Pipeline Framework Diagram and Description with PodNetworkPlugin
kind/documentation
kind/feature
#2497
opened Mar 10, 2025 by
tenzen-y
Migrate Trainer to PodSet and RuntimePolicy in runtime package (InternalAPI)
area/controller
kind/cleanup
#2495
opened Mar 10, 2025 by
tenzen-y
Add a workflow for publishing Helm charts
area/deployment
good first issue
help wanted
kind/feature
#2488
opened Mar 7, 2025 by
ChenYi015
Decouple UTs between Framework and Plugins packages
area/controller
kind/feature
#2468
opened Mar 3, 2025 by
tenzen-y
2 of 6 tasks
Explore
uv
project manager for Kubeflow Python SDK
area/sdk
good first issue
help wanted
kind/discussion
kind/feature
#2462
opened Feb 28, 2025 by
andreyvelich
KEP-2170: Revisit TrainJob Created condition status type
kind/feature
#2459
opened Feb 28, 2025 by
tenzen-y
Distributed training with mutliple pods, with multi-gpu in each pod
#2456
opened Feb 28, 2025 by
githubthunder
Previous Next
ProTip!
no:milestone will show everything without a milestone.