In fact, the use of GPGPU to enhance the CPU is still a problem that is being discussed by scientists. Many people believe that the addition of GPGPU does help supercomputers become stronger and score higher benchmarks, but when put into practice, it takes a lot of effort to adjust the software to have The power of this combined architecture can be squeezed out.
Europe is also developing a supercomputer project using the Tegra 3 v2 GPU GeForce. Specifically, this machine uses 256 NVIDIA Tegra 3 quad-core processors, 256 GeForce 520MX graphics processors and 1TB of DDR3 RAM to operate. With a configuration like this, the machine can calculate 38 teraflops per second but only consume 1W of power for every 7.5 gigaflops. NVIDIA calls this the first supercomputer to use ARM's CPU in combination with its GPU. Mont-Blanc's research and cooperation project to build supercomputers in Europe promises that this product will operate four to ten times higher than other supercomputers.
Heat dissipation, energy consumption
Over the decades, one of the main problems with centralized supercomputers was power consumption and heat dissipation. Each supercomputer eats a huge amount of energy, and most of it is converted into heat. When the system overheats can dramatically reduce performance, component life is significantly shortened as well. Scientists have studied many options to solve this problem, such as pumping Flourinert coolant, liquid cooling, air cooling, by liquid-to-air hybrid system. Especially, this hybrid system is commonly used in supercomputers made up of many cabinets. In addition, some supercomputers use low-power processors to cope with high temperatures such as IBM's Blue Gene. The IBM Aquasar is even more special when it uses hot water to dissipate heat, and that water is also used to warm the whole building during the cold season.
In addition to the amount of electricity used by supercomputer, we can take for example the Chinese Tianhe-1A with a capacity of 4.04 megawatts, that is, the governing agency has to spend 400 USD / hour on electricity. and for the whole year is 3.5 million USD. Therefore, the issue of cost when shipping supercomputers is also a complicated problem.
To measure the energy efficiency of a supercomputer, people use the number of FLOPS per Watt and the higher the index the better, that is, for every Watt of electricity, the supercomputer calculates more calculations. In 2008, IBM's Roadrunner achieved 376 MFLOPS / W. In 2010, the Blue Gene / Q reached 1684 MFLOPS / W. In 2011, the Blue Gene in New York reached 2097 MFLOPS / W. And what is FLOPS, I will talk about below.
Operating system
Since the beginning of the 20th century, the operating system used in supercomputers has also recorded many changes, similar to the way supercomputer architecture changed. The first operating systems were customized for each supercomputer to increase speed, but it was very time consuming, labor and money. Therefore, in the current trend, people will use common operating systems such as Linux for supercomputers, not use UNIX as in the 90s. You can look at the chart right below to see this trend right away.
Of the 500 fastest computers in the world, Linux is the largest operating system and the adoption of this open source OS is increasingly popular. At the time of writing, Linux is accounting for 93.8% of the supercomputer market share. UNIX is also the operating system used on many supercomputers but is not as popular as Linux and it holds 4% market share. Windows and BSD are also present, but they are insignificant because they are not as reliable as Linux and UNIX, and are also affected by licensing costs. Apple's OS X was previously used for supercomputers but mainly in distributed supercomputers.
Software tools
Because of the parallel computing architecture of a supercomputer, people often have to apply special programming techniques to get the most out of it. The tools used for this are API functions such as MPI, PVM, VTL, in addition to open source software solutions such as EBowulf. In most cases, the PVM and MPI environments will be used for clustered systems, while OpenMP is for systems with shared memory. The algorithms also need to be greatly optimized because supercomputers not only run on one but on a lot of CPUs, GPUs, not to mention each supercomputer cabinet is separate from each other. Scientists also have to consider minimizing the amount of CPU free time they can wait for data from another node to transfer. With GPGPU supercomputers, more CUDA models of NVIDIA are available to enhance machine performance.
Distributed supercomputer
We've talked a lot about centralized supercomputer for a while now, let's turn to distributed supercomputer a bit. In distributed form, there are two ways to take advantage of many small computers, the opportunistic method and the quasi-opportunistic method.
A. Method of opportunity
This is a form of grid computing. In this method, a large number of individual computers will voluntarily work together to form a large network to perform computing tasks equally equally. Grid computing has solved many parallel computing problems, but it also has the drawback of not counting some classic tasks like flow simulation.
The fastest supercomputer network as of March 2012 is Folding @ Home (developed by Stanford University). It has power of 8.1 petaflops and uses x86 processors. Among them, 5.8 petaflops are "devoted" from computers with different types of GPUs, 1.7 petaflops are from PlayStation 3 game consoles, the rest are contributed by multiple CPUs. In 2011, the world had a BOINC grid with a power of 5.5 petaflops contributed from 480,000 computers. Last year, Simon Cox built a supercomputer with 64 Raspberry Pi's. The system is called Iridis-Pi and costs only £ 2,500 and has 1TB of memory (16GB SD card for each Raspberry Pi).
B. Opportunity approach
Quasi-opportunistic computing is similar to opportunistic computing, but it has a higher quality of service by increasing control over the tasks that every single machine will do. It also tightly controls the use of distributed resources. Also included is an intelligent system that helps ensure the presence and stability of the member machines. In order for the opportunistic approach to work, computers will need to have a "resource allocation contract" in combination with complex forms of communication and error prevention.
Performance measurement
A. Capacity vs Capacity
Supercomputers are designed to target complex calculations, ie capability computing. This means that supercomputers are used to maximize computing power, thereby solving a problem in the shortest possible time. These calculations are very special and it is so difficult that normal computers cannot handle, such as nuclear explosion simulation, weather forecast, quantum research .
If you want to process a large amount of data, people do not use supercomputer but use a computer called mainframe. The mainframe can handle huge input but the calculations it runs are not as complex as supercomputers. Mainframes can be used to solve many small problems at the same time. We will have a separate mainframe article, see you then to talk more about this type of computer.
B. Measuring supercomputer performance
If on a PC, laptop, tablet, smartphone people will conduct a benchmark to know the power of the machine, the supercomputer is the same. However, the computational power of a supercomputer is measured by FLOPS (FLoating Point Operations Per Second - floating point calculation is done per second), while normal computers are measured in MIPS (instructions per second - number instructions are executed every second). FLOPS can be added with some prefixes in SI measurement systems such as tera- (TFLOPS, ie 10 ^ 12 FLOPS, pronounced teraflops), peta (10 ^ 15 FLOPS).
Currently, the world's leading supercomputers have reached the threshold of Petaflops, for example, the 2008 IBM Roadrunner was 1,105 Petaflops, the 2011 Fujitsu K reached 10.51 Petaflops, while the most powerful Cray Titan is 17.59. Petaflops. It is predicted that only after about 10 years, supercomputers will soon step into Exaflops (10 ^ 18 FLOPS) because the technology of CPU and GPGPU is growing rapidly, the price is cheaper and consumption efficiency is high. Electricity is increasingly being raised.
So where do the FLOPS numbers come from? It was measured by a software called Linpack. However, it should be added that no single number alone can reflect the overall performance of computers in general and supercomputers in particular. There are two numbers expressed when it comes to supercomputers: the processor's theoretical floating point computing performance (Rpeak symbol) and input processing performance (Rmax). Rpeak is almost impossible to achieve in real life, while Rmax is completely attainable when supercomputers run. All the FLOPS numbers you see above are Rmax.
C. TOP500 list
Since 1993, the fastest supercomputers in the world have been listed on a list called TOP500 based on its benchmark scores. TOP500 has their own website at http://top500.org and it provides us a lot of useful information. Besides looking at the list of the world's leading supercomputer, you can see statistics about the distribution of operating systems in the world of supercomputers, the number of supercomputers per country, supercomputer architecture ( MPP or cluster) . In essence, this list has not been rated as absolutely accurate and unbiased, but it is one of the most common sources that people often take when comparing supercomputer power. at some point.
Supercomputer application
Some applications of supercomputers in today's times:
Some manufacturers produce and supply supercomputers
You can see that IBM, HP and Cray are the three leading companies in the field of supercomputers today. Dell, Intel, NEC, Lenovo, Acer, Fujitsu, Oracle also participate in this field.
According To Delicate.