Skip to main content
Modern PyTorch Guide home page
Search...
⌘K
Official Docs
GitHub
GitHub
Search...
Navigation
Custom Operators
Triton Kernels
Foundations
Building Models
Performance
Domains
Production
Advanced
API Reference
Community
Forums
Fine-tuning & Adaptation
Transfer Learning
PEFT & LoRA
QLoRA
RLHF
Warmstarting
Custom Operators
Custom Python Operators
C++ Extensions
CUDA Kernels
Triton Kernels
Dispatching
Backend integration
Double backward
Advanced Parallelism
Megatron-LM
DeepSpeed ZeRO
Expert Parallelism
3D Parallelism
Device mesh
Symmetric memory
Research Tools
torch.func
torch.fx
functorch
FlashAttention
Vmap
Jacobian hessian
Extending PyTorch
Overview
Autograd functions
Cpp frontend
Custom backends
Privateuse1
Sparse & Structured Tensors
Sparse tensors
Nested tensors
Masked tensors
Operations
Advanced Features
Complex numbers
Foreach map
Packaging
Hub
Autoload extensions
Torch modes
Low-Level & Internals
Fake tensors
Aten ir
Provenance tracking
Minifier
Logging
On this page
Triton Kernels
Custom Operators
Triton Kernels
Python-based GPU programming
Triton Kernels
OpenAI Triton for accessible GPU kernel development.
CUDA Kernels
Dispatching
⌘I