Skip to main content

Modern PyTorch Guide home page

Official Docs
GitHub
GitHub

Building Models

Community
Forums

torch.compile

Introduction to torch.compile
Compiler Backends
Graph Breaks
Dynamic Shapes
Inductor
Aot autograd
Troubleshooting
Compiled autograd
Regional compilation
Cache control

torch.export

Overview
Ir spec
Control flow
Dynamic shapes
Aot inductor

Quantization

Quantization Overview
Post-Training Quantization
Quantization-Aware Training
Dynamic Quantization
Fx graph mode
Backends

Distributed Training

Distributed Training Overview
DistributedDataParallel (DDP)
Fully Sharded Data Parallel (FSDP)
Fsdp2
DeepSpeed Integration
Tensor Parallelism
Pipeline Parallelism
Context parallel
Rpc framework
Zero redundancy
Comm hooks
Join context
Monarch

Hardware Utilization

Cuda semantics
GPU Memory Management
Multi-GPU Setup
NVLink & InfiniBand
Amp
Channels last
Mps backend
Hip rocm
Intel gpu

Memory Optimization

Gradient Checkpointing
Mixed Precision Deep Dive
CPU/Disk Offloading
Activation checkpointing

Profiling & Benchmarking

Pytorch profiler
Tensorboard profiling
Inductor profiling
Benchmark utils
Bottleneck
Flight recorder

Code Transforms

Overview
Graph transformations
Fusion patterns
Experimental

Hardware Utilization

Hip rocm

Documentation Index
Fetch the complete documentation index at: https://newtorch.aboneda.com/llms.txt
Use this file to discover all available pages before exploring further.

Mps backend Intel gpu

⌘I

x github linkedin

Powered byThis documentation is built and hosted on Mintlify, a developer documentation platform