Microsoft announced DeepSpeed, a new deep learning library that can support the training of super-large scale AI models

Meta starts releasing LLaMA 'super AI' language model to researchers
AI engineer Facebook talks about deep learning, new programming languages and hardware for artificial intelligence

Microsoft Research has recently sparked the rise of artificial intelligence (AI) researchers when it announced the successful development of DeepSpeed, a deep learning optimization library that can be used to train huge AI models with Scale up to 100 billion parameters.

In AI training, the bigger the natural language models you own, the greater the accuracy. However, the training of large natural language models takes a lot of time, and the costs involved are not small. DeepSpeed was born to overcome all of these difficulties: Improve speed, cost, training scale and usability.

In addition, Microsoft also mentioned that DeepSpeed also includes ZeRO (Zero Redundancy Optimizer), a parallel optimization technique that minimizes the amount of resources needed for models, while still helping to increase the amount of reference. Numbers can be trained. Using a combination of DeepSpeed and ZeRO, Microsoft researchers were able to successfully develop the new Turing Natural Language Generation (Turing-NLG) model - the largest language model available today with 17 billion parameters. .

Microsoft announced DeepSpeed, a new deep learning library that can support the training of super-large scale AI models Picture 1

Some highlights of DeepSpeed:

Scale: Large, advanced AI models such as OpenAI GPT-2, NVIDIA Megatron-LM and Google T5 are 1.5 billion, 8.3 billion and 11 billion parameters, respectively. ZeRO phase 1 in DeepSpeed can provide system support to run models with up to 100 billion parameters, which is 10 times larger than Google's largest model.
Speed: The throughput recorded will increase differently depending on the hardware configuration. On NVIDIA GPU clusters with low bandwidth connectivity (without NVIDIA NVLink or Infiniband), DeepSpeed achieves a throughput improvement of 3.75 times compared to using Megatron-LM only for standard GPT-2 models with 1.5 billion parameters. On the NVIDIA DGX-2 cluster with high bandwidth connection, for models with 20 to 80 billion parameters, DeepSpeed is 3 to 5 times faster.
Cost: From improvements in speed, training costs are also significantly optimized. For example, to train a model with 20 billion parameters, DeepSpeed requires 3 times less resources than usual.
Availability: Only a few minor code-related changes are needed to allow existing models to migrate to DeepSpeed and ZeRO. DeepSpeed does not require code redesign or refactoring the model.

Microsoft is open source for both DeepSpeed and ZeRO on GitHub, please refer.

David Pac

Update 11 February 2020

You should read it

May be interested

How to use Scikit-LLM to analyze text with large language models
powerful language models + scikit-learn = scikit-llm. this library will help you implement text analysis tasks quickly.
Magic: The Gathering - the most difficult 'chew' game for all AI models
you may not know but video games in general create the perfect teaching and training environment for developers in training their machine learning models, this need not be many tables, but the problem lies in which games currently can create the ideal environment to train the most powerful ai models?
Many advanced AI models surpass humans in emotional intelligence, opening up new directions in education and training.
modern artificial intelligence (ai) continues to amaze us with its capabilities beyond expectations, especially large language models (llms).
How to search safely on deep web?
web sinking (deep web or hidden web, deep web, invisibility web) and its dark nooks, where google or bing can't reach is not a place for shy and untrained people. however, with the right tools, this is where there is a lot to explore.
Detects campaigns looking for large-scale Adminer database administration tools
sucuri, a newly acquired network security company, godaddy discovered a large-scale search campaign to find websites using adminer's database administration scenario.
IBM Unveils Breakthrough Optical Data Transmission Technology That Enables 'Light-Speed' AI Training
ibm has successfully developed a new optical technology that can support training of ai models at the 'speed of light', while saving significant energy.
Top of Vietnam - Vietnam's first large-scale virtual reality game launched, invited to experience
top of vietnam, the first large-scale virtual reality game developed by vietnamese, was officially launched by xtreme studio, a company specializing in virtual reality technology solutions 4.0 on the evening of 10.5 in ho chi minh city.
Another major cryptocurrency trading platform was hacked, $ 80 million 'failed'
qubit finance, one of the large-scale decentralized finance (defi) platforms, has just become the latest name to join the list of large-scale cryptocurrency exchanges visited by hackers this year.
The robot already knows how to dribble, pass people, and score goals
scientists at google deepmind trained the robotis op3 robot to play soccer using an ai training technique called deep reinforcement learning.
100 hackers were arrested for the super-dangerous BlackShades malicious code
the authorities seized more than 1,000 computers, smartphones and hard drives in a large-scale campaign to take down blackshades, a malicious code that the security community is extremely sophisticated, dangerous and has an attack on. terrible

Microsoft announced DeepSpeed, a new deep learning library that can support the training of super-large scale AI models

You should read it

May be interested

Tech info