How do I take advantage of AWS Inferentia’s NeuronCore Pipeline capability to lower latency in Amazon EC2?

Question

How do I take advantage of AWS Inferentia’s NeuronCore Pipeline capability to lower latency in Amazon EC2?

1 Answer

Indian · Answer 1 · 2020-05-26T05:12:04+0000

Inf1 instances with multiple Inferentia chips, such as Inf1.6xlarge or Inf1.24xlarge, support a fast chip-to-chip interconnect. Using the Neuron Processing Pipeline capability, you can split your model and load it to local cache memory across multiple chips. The Neuron compiler uses ahead-of-time (AOT) compilation technique to analyze the input model and compile it to fit across the on-chip memory of single or multiple Inferentia chips. Doing so enables the Neuron Cores to have high-speed access to models and not require access to off-chip memory, keeping latency bounded while increasing the overall inference throughput.

How do I take advantage of AWS Inferentia’s NeuronCore Pipeline capability to lower latency in Amazon EC2?

Please log in or register to answer this question.

1 Answer

Related questions

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media