Figure 2: Basic block diagram of a CPU
The dotted line in Figure 2 shows the body of the CPU, because the RAM memory is located outside the CPU. Data path between RAM and normal 64-bit CPU (or 128 when the two-channel configuration is used), running at the memory clock rate or the external CPU clock (or the bus's memory clock, in case for AMD microprocessors).
All circuits inside the dot run with the CPU clock rate. Depending on the CPU, some of its components can even run at a higher clock rate. The path between CPU blocks can also be wider, meaning more bits will be transmitted per clock cycle (more than 64 or 128). For example, the data path between L2 cache and L1 instruction cache on modern microprocessors is usually 256-bit wide. The data path between the L1 instruction cache and the CPU fetch unit also changes depending on the model of each CPU - 128 bits is typical value, but at the end of this tutorial we will introduce one Technical indicators of the main memory cache for CPUs on the market today. The higher the number of bits transmitted over a clock cycle, the faster the transmission (in other words, the faster the transfer rate).
In general, all modern CPUs have up to three memory caches: L2 is a larger memory cache and can be found between the RAM and the L1 instruction cache, which holds both instructions and data; The instruction cache L1 is used to store instructions executed by the CPU and store the data so that it can be written back to memory.
L1 and L2 mean 'Level 1' and 'Level 2', referring to the distance from them to the CPU core (execution block). There is a doubt that why there are three separate memory caches (L1 memory cache, L1 instruction memory cache and L2 cache).
To make the static memory latency drop to '0' is very difficult, especially for CPUs running at very high clock speeds. Since the production of static RAM has an approximate '0' latency, it is difficult for manufacturers to use a memory type only above the memory cache L1. The L2 memory cache uses static RAM that is not as fast as the memory used on the L1 memory cache, which is because it has a certain delay, so it will be slightly slower than the L1 memory cache.
Notice in Figure 2 that we will see that the L1 instruction cache works like an 'input cache', while the L1 data cache works like an 'output cache'. L1 instruction cache (usually smaller than L2 cache) is more effective when the program starts to repeat some of its small parts, which is also because the required instructions will be closer to the fetch block.
It is also rarely mentioned, but the L1 instruction cache is also used to store other data with decoded instructions. Depending on the CPU it can be used to store some pre-decoding data and branching information (in general, control data will increase the speed of the decoding process) and sometimes the Cache L1 instructs even bigger than it has stated, this is because the manufacturer often does not add to the available expansion space for these extended pieces of information.
In the CPU specification page, L1 cache can have many different types. Some manufacturers list two L1 caches completely separate (sometimes call the instruction cache 'I' and the data cache 'D'), sometimes adding both the number and symbol part 'separated', if '128 KB, separated' then that means the Cache instructs 64KB and 64KB Data Cache, some firms have done so that you can guess the overall number and have to divide it to get capacity of each Cache. However, there are exceptions to CPUs built on Netburst architecture, such as Pentium 4, Pentium D, Pentium 4 based on Xeon and Celeron CPUs from Pentium 4.
Microprocessors based on Netburst architecture have no instruction cache L1, instead they have a trace execution cache (or can be called traces), this cache is placed between the decoding block and the execution block , save the decoded instructions. Therefore, it can be said that the L1 instruction cache is it, but is hidden under a completely different name and placed in a different location. We will mention this here because this is a very good mistake, people often think that Pentium 4 CPUs do not have L1 instruction cache. This leads to the phenomenon of comparing Pentium 4 to other CPUs, often thinking that its L1 cache is smaller, because they only count 8KB of L1 data cache. The cache executes traces of CPUs built on Netburst architecture of 150KB.
L2 Memory Cache on multi-core CPUs
On CPUs with more than one core, the L2 Cache architecture varies a lot, this change depends on the CPU type.
With dual-core Pentium D and AMD CPUs built on K8 architecture, each CPU core has its own L2 memory cache. That's why each core works as if it were working for a standalone CPU.
Intel's dual-core CPUs are built on Core and Pentium M architectures, so the two L2 memory caches can be shared between two cores.
Intel said that this shared architecture offers better performance because on a separate Cache method, at one time, one core might be overloaded while the other is not used or used. Use up the performance on its own L2 Cache. When this happens, the overloaded core will retrieve data from the main RAM memory, although the space on the L2 memory cache is completely empty, which should have been used to store the data and prevent the core from overloading. Accessing data from RAM reduces the overall system performance. With this new method, the Core 2 Duo processor with 4MB L2 memory cache, this one can use up to 3.5MB while the other core uses 0.5MB, quite the same with the coefficient Fixed division 50% -50% as on dual-core CPUs.
In other words, Intel's current quad-core CPUs such as Core 2 Extreme QX and Core 2 Quad use 2 dual-core chips, meaning that this sharing only occurs between cores 1 and 2 and 3. and 4. Currently, Intel has planned quad-core CPUs using a single chip. With this method, L2 cache will be shared between four cores.
On Figure 3 you can see the comparison between these L2 memory cache solutions.
Figure 3: Comparison of existing L2 memory cache solutions on multi-core CPUs
AMD processors built on K10 architecture will have a shared L3 Cache located inside the CPU, and there is a hybrid between these two methods. This problem is shown in Figure 4. The size of this cache will depend on the CPU model, just like what happens with the size of the L2 Cache.
Figure 4: Architecture K10 Cache
Learn how cache works (End section)