What is TPU (Tensor Processing Unit) and how is it used?

Tensor Processing Unit is Google's ASIC for machine learning. TPU is used specifically for deep learning to solve complex matrix and vector math.

Google's TensorFlow platform allows users to train AI by providing tools and resources for machine learning. AI engineers have long used traditional CPUs and GPUs to train AI. While these processors can handle a wide variety of machine learning processes, they are still general-purpose hardware, used for a variety of everyday tasks.

To speed up AI training, Google has developed an application-specific integrated circuit (ASIC) called Tensor Processing Unit (TPU). However, what is a Tensor Processing Unit and how can it speed up AI programming?

What is Tensor Processing Unit (TPU)?

Tensor Processing Unit is Google's ASIC for machine learning. TPU is used specifically for deep learning to solve complex matrix and vector math. The TPU is streamlined to solve matrix and vector operations at lightning speed, but must be paired with the CPU to issue and execute instructions. TPU can only be used with Google's TensorFlow or TensorFlow Lite platform, whether through the cloud or a lite version on local hardware.

Applications for TPU

What is TPU (Tensor Processing Unit) and how is it used? Picture 1What is TPU (Tensor Processing Unit) and how is it used? Picture 1

Google has been using TPUs since 2015. It has also confirmed the use of these new processors for word processing in Google Street View, Google Photos and Google search results (Rank Brain), as well as for created an AI called AlphaGo, which defeated the strongest Go players, and the AlphaZero system won against the top shows in Chess, Go, and Shogi.

TPU can be used in various deep learning applications such as fraud detection, computer vision, natural language processing, self-driving cars, voice AI, agriculture, virtual assistants, trading stocks, e-commerce and various social predictions.

When to use TPU?

Since TPU is highly specialized hardware for deep learning, it loses a lot of the other functionality you would normally expect from a general-purpose processor like a CPU. With this in mind, there are specific situations where the use of TPU will yield the best results when training AI.

The best time to use TPU is for operations where models rely heavily on matrix calculations, like a recommendation system for search engines. TPU also delivers great results for models where the AI ​​analyzes large amounts of data points that would take weeks or months to complete. AI engineers use TPUs for cases where a custom TensorFlow model is not available and has to start from scratch.

When not to use TPU?

As stated earlier, the TPU optimization makes these types of processors work only on specific workloads. Therefore, there are cases where choosing to use a traditional CPU and GPU will yield faster results. These cases include:

  1. Rapid prototyping with maximum flexibility
  2. Models are limited by available data points
  3. The models are simple and can be trained quickly
  4. Model is too hard to change
  5. Models depend on custom TensorFlow operations written in C++

TPU versions and specifications

What is TPU (Tensor Processing Unit) and how is it used? Picture 2What is TPU (Tensor Processing Unit) and how is it used? Picture 2

Since Google announced its TPU, the public has been constantly updated on the latest TPU versions and their specifications. The following is a list of all TPU versions with accompanying specifications:

  TPUv1 TPUv2 TPUv3 TPUv4 Edgev1
Meeting day 2016 2017 2018 2021 2018
Processing Node (nm) 28 16 16 7  
Mold size (mm²) 331 <625 <700 <400  
On-chip memory 28 32 32 144  
Clock speed (MHz) 700 700 940 1050  
Minimum memory configuration (GB) 8 DDR3 16 HBM 32 HBM 32 HBM  
TDP (Watts) 75 280 450 175 2
TOPS (Tera Operations Per Second) 23 45 90 ? 4
TOPS/W 0.3 0.16 0.2 ? 2

As you can see, the TPU clock speed doesn't seem too impressive, especially since today's modern desktops can have 3 - 5 times faster clock speeds. But if you look at the bottom two rows of the table, you can see that the TPU can handle 23 - 90 tera-operations per second with only 0.16 - 0.3 watts of power. TPU is estimated to be 15 - 30 times faster than modern CPUs and GPUs when using neural network interfaces.

With each release, the new TPU shows significant improvements and capabilities. Here are a few highlights for each version.

  1. TPUv1: The first publicly announced TPU. Designed as an 8-bit matrix multiplication engine and limited to solving integers only.
  2. TPUv2: As engineers noted that TPUv1 is bandwidth constrained. This version now has twice the memory bandwidth with 16GB of RAM. This version can now deal with floating point numbers, making it useful for training and reference.
  3. TPUv3: Released in 2018, TPUv3 has twice as many processors and is implemented with 4 times as many chips as TPUv2. Upgrades allow this version to have up to 8x the performance of previous versions.
  4. TPUv4: This is the latest version of TPU announced on May 18, 2021. Google CEO announced that this version will have more than twice the performance of TPU v3.
  5. Edge TPU: This version of TPU for smaller activities is optimized to use less power than other TPU versions in overall performance. Despite using only 2 watts of power, the Edge TPU can handle up to 4 active terras per second. Edge TPU is only found on small handsets like Google's Pixel 4 smartphone.

How to access TPU? Who can use them?

TPUs are proprietary processing units designed by Google for use with the TensorFlow platform. Third-party access to these processors has been allowed since 2018. Today, TPUs (with the exception of Edge TPUs) can only be accessed through Google's computing services through the cloud. cloud. While the Edge TPU hardware can be purchased through Google's Pixel 4 smartphone and its prototype generator called Coral.

Coral is a USB accelerator that uses USB 3.0 Type C for data and power. It gives your device edge TPU computing that is capable of 4 TOPS for every 2W of power. This toolkit can run on Windows 10, macOS and Debian Linux machines (it can also work with Raspberry Pi).

Other Dedicated AI Accelerators

With artificial intelligence trending over the past decade, Big Tech is constantly looking for ways to make machine learning as fast and efficient as possible. Although Google's TPU is arguably the most popular ASIC developed for deep learning, other tech companies like Intel, Microsoft, Alibaba, and Qualcomm have also developed their own AI accelerators. These include Microsoft Brainwave, Intel Neural Compute Stick, and Graphicore's IPU (Intelligence Processing Unit).

But while more AI hardware is in development, the sad thing is that most of it is yet to hit the market, and much of it never will. As at the time of writing, if you really want to buy AI-accelerated hardware, the most popular options are to buy the Coral prototyping kit, Intel NCS, Graphicore Bow Pod, or Asus IoT AI Accelerator. If you just want access to specialized AI hardware, you can use Google's cloud services or other alternatives like Microsoft Brainwave.

5 ★ | 1 Vote