Pose Estimation Benchmarks on intelligent edge

Benchmarks on Google Coral, Movidius Neural Compute Stick, Raspberry Pi and others

Introduction

In an earlier article, we covered running PoseNet on Movidius. We saw that we were able to achieve 30FPS with acceptable accuracy. In this article we are going to evaluate PoseNet on the following mix of hardware:

  1. Raspberry Pi 3B
  2. Movidius NCS + RPi 3B
  3. Ryzen 3
  4. GTX1030 + Ryzen 3
  5. Movidius NCS + Ryzen 3
  6. Google Coral + RPi 3B
  7. Google Coral + Ryzen 3
  8. GTX1080 + i7 7th Gen

This is a comparison of PoseNet’s performance across hardware, to help decide which hardware to use for a specific use case, if optimizations can help. It also gives a glimpse into hardware capabilities in the wild. The hardware included a range from baseline prototyping platforms to tailored for edge to production-grade CPUs.

Hardware Choices

  1. Raspberry Pi: The board of choice for prototyping, although low powered, gives a good initial understanding of what to expect and what to choose for production. It may not be able to run the DNN models, but it sure is fun.
  2. Movidius NCS + RPi 3B: Movidius Neural Compute Stick is a promising candidate if the model is to be run on the edge. NCS has Vision Processing Units (VPU) which are optimized to run deep neural networks.
  3. Ryzen 3: AMD’s quad-core CPUs are not a conventional choice for neural networks, but it is worth checking how the networks perform on the platform.
  4. GTX1030 + Ryzen 3: Adding an Nvidia GPU to the rig (granted, it is comparatively old but it is cheap) allows us to benchmark what is possible on older cuDNN versions and GPUs.
  5. Movidius NCS + Ryzen 3: A desktop system allows for better and faster interfacing with the NCS. This setup is preferred during prototyping your edge application. Having a high performance CPU allows rapid application development while NCS gives the ability to run your models on your development laptop.
  6. Google Coral + RPi 3B: Google’s answer to on-edge ML is their Coral board which has TPUs. Tensor Processing Units are used by Google’s gigantic AI systems. Coral puts the compute power of TPUs on small form factor. It has native support for Raspberry Pi too.
  7. Google Coral + Ryzen 3: As we mentioned in Movidius NCS + Ryzen 3 section, it is going to be insightful to see how Coral interfaces with Ryzen 3 based computer.
  8. GTX1080 + i7 7th Gen: Top of the line system with GTX1080 and Intel i7 CPU. This is the highest performing combination in the list.

Repositories and models used:

  1. PoseNet — tfjs version
  • Based on MobileNetV1_050
  • Based on MobileNetV1_075
  • Based on MobileNetV1_100

2. PoseNet — Google Coral version

3. Read our previous blog post to get Movidius versions of PoseNet

Comparing Edge Compute Units

Google Coral’s PoseNet repository provides a model based on MobileNet 0.75 which is optimized specifically for Coral. At the time of writing, the details of the optimizations have not been provided and it is not possible to generate models for MobileNet 0.50 and 1.00.

Google Coral vs Intel Movidius

The optimized Coral model gives an exceptional performance of 77FPS with Ryzen 3 system. However, the same model gives ~9FPS when running on Raspberry Pi.

Movidius shows differences in performance with RPi and Ryzen, with the general pattern being faster on the Ryzen 3 system

Comparing Desktop CPUs and GPUs

The results are aligning with expectations while comparing CPU with GTX 1030 and GTX 1080. The high-end GPU outperforms the other candidates by a huge margin. However, the competition between Ryzen 3 and GTX 1030 is close.

Ryzen vs GTX 1030 vs GTX 1080

Final Thoughts

The following chart shows frames per second for a standard video input:

Frames per second

Google Coral, when paired with a desktop computer outperforms every other platform — including GTX1080.

Other noteworthy results are:

  1. When paired with Raspberry Pi 3, Coral gives ~9FPS. The reason behind the result is not yet explained but is being looked into.
  2. GTX1080 performs almost equally regardless of the model size.
  3. Movidius NCS performs better than GTX1030.
  4. Raspberry Pi is not able to run the models at all.

Different hardware gives a different flavor of performance, and there is scope for model optimization (quantization for example). It may not always be necessary to go with a high-end GPU such as GTX 1080 if your use case allows for a good trade-off between accuracy and speed/latency.

Our analysis shows that choosing the right hardware coupling with a well-optimized neural network is essential and may require in-depth comparative analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *