Customers running machine learning models that are sensitive to inference latency and throughput can use Inf1 instances for high-performance cost-effective inference. For those ML models that are less sensitive to inference latency and throughput, customers can use EC2 C5 instances and utilize the AVX-512/VNNI instruction set. For ML models that require access to NVIDIA’s CUDA, CuDNN or TensorRT libraries, we recommend using G4 instances.