Wednesday, January 26, 2011

【 Weak current College 】 cpu knowledge encyclopedia (2)



10. instruction set

(1) CISC instruction set

CISC instruction set, also known as a complex instruction set, the English name is CISC, (ComplexInstructionSetComputer abbreviation). In CISC microprocessor, the program's instructions is a serial execution of the order, each instruction in the various actions is also a serial execution of the order. Order of merit is the control of simple, but computer parts usage is not high, slow. In fact it is Intel's x 86 series production (that is, IA-32 schema) CPU and compatible CPU, such as AMD, and VIA. Even now, a new x 86-64 (also known as AMD64) belong to the category of CISC.

Want to know what is instruction set also from today's x 86 architecture CPU. Intel x 86 instruction set is for its first 16-bit CPU (i8086) specially developed for the launch of the world in IBM1981 first PC CPU — i8088 (i8086 Lite) is also used by the x 86 instruction, and computer to enhance the capacity of floating-point data processing has increased after X87 chip, the x 86 instruction set and X87 instruction sets are collectively known as the x 86 instruction set. Although with the CPU technology, Intel has developed the update type of i80386, i80486 until past PII Xeon, PIII Xeon, Pentium3, finally to today's series, Xeon Pentium4 (not including Xeon Nocona), but in order to ensure that computers can continue to run in the past development of various types of applications to protect and inherit rich software resources, so Intel company produces all the CPU still continue to use the x 86 instruction set, so it's still belong to X86 series CPU. Since the IntelX86 series and compatible CPU (e.g. AMDAthlonMP,) use the x 86 instruction set, so they formed a large x 86 series today and compatible CPU lineup. At present mainly has intel x86CPU of server CPU and AMD Server CPU.

(2) RISC instruction set

RISC is the English acronym for "ReducedInstructionSetComputing", is a "reduced instruction set". It is in theCISC instruction system developed on the basis that it was tested on the CISC machine has shown that the use of the various directives, the frequency is most frequently used are some relatively simple instructions, they accounted for only 20 per cent of the total number of instructions, but the frequency in your accounts for 80 per cent. Complex instruction set will inevitably increase the complexity of the microprocessor so that the processor for a long time and high costs. And complex instruction requires a complex operation that will reduce the speed of your computer. For these reasons, the 1980s RISC-CPU was born, relative to the CISC CPU type, CPU RISC type not only streamlines instruction system also uses a feature called "superscalar and hyper pipelined architecture", greatly increased the parallel processing capabilities. RISC instruction set is a high-performance CPU. It and traditional CISC (complex instruction set). In contrast, unified RISC instruction format, type less, addressing mode is less than the complex instruction set. Of course, processing speed increased a lot. At present in the high-end server universal adoption of this directive, the system CPU, especially high-end servers are all adopt CPU RISC instruction set. RISC instruction set is more suitable for high-end server operating systems UNIX, Linux or UNIX operating systems are similar. RISC-CPU with Intel and AMD CPU in software and hardware are not compatible.

Currently, high-end servers with RISC CPU instruction in the following categories: PowerPC processor, SPARC processor, processor, MIPS processor PA-RISC, Alpha processor.

(3)IA-64

EPIC (ExplicitlyParallelInstructionComputers, accurate parallel instruction computing) is a RISC and CISC system Inheritors controversy has been a lot, single to EPIC system, it's more like Intel processors into RISC system important steps. In theory, the EPIC system design of the CPU, on the same host configuration, handling Windows application software is based on the UNIX application software is much better.

Intel uses EPIC technology of server CPU is Itanium Itanium (development code name i.e., Merced). It is a 64-bit processor, is also the first IA-64 series. Microsoft also has developed an operating system codenamed Win64, be supported on the software. In Intel used the x 86 instruction set, it also turned to seek more advanced 64-bit microprocessors, Intel do this because they want to get rid of capacity great x 86 architecture, thus introducing the energetic and powerful instruction set, and uses EPIC instruction set of the IA is born. IA-64 in many respects than the x86. Break through the traditional IA32 architecture of many of the limitations in the data processing capability, system stability, security, availability, good reason, and so get a breakthrough increase.

IA-64 microprocessor's biggest flaw is their lack of compatibility with x 86, Intel to IA-64 processors to run two dynasties of software, which in IA-64 processor (Itanium or Itanium2 ...) Introduces the x86-to-IA-64Decoder so that you can put the x 86 instructions translated into IA-64 directive. This decoder is not the most efficient codec, or run the best way to x86 code (the best way is to directly run on x 86 processors x86 code), so the Itanium and Itanium2 at run time in application x86 performance is very bad. This has also become X86-64 the root cause.

(4)X86-64(AMD64/EM64T)

AMD company design, you can at the same time, deal with 64-bit integer operations, and compatible with x 86-32 architecture. Which supports 64-bit logical addressing while providing converted to 32-bit addressing option; but data manipulation directive defaults to 32-bit and 8-bit, converted to a 64-bit and 16-bit option; support for general-purpose registers, if it is a 32-bit arithmetic operation, it is necessary to extend the results into a complete 64-bit. In this way, the directive has a "direct" and "transformation performs" difference, its directive field is an 8-bit or 32-bit, you can avoid field is too long.

X86-64 (also known as AMD64) have also not groundless, x 86 processor of 32bit addressing space is limited to 4 GB of memory, and processor IA-64 nor compatible x86. AMD to take full account of the needs of customers, enhance the x 86 instruction set of features that make this set of instructions that can simultaneously support 64-bit operation mode, so AMD put their structure called x86-64. Technically AMD x 86-64 architecture for 64-bit computing, the AMD as it introduces new R8-R15 general-purpose registers as existing x 86 processor registers of expansion, but in 32-bit environment does not fully used to these registers. The original registers such as EAX, EBX and expansion by the 32-bit to 64-bit. In SSE units new joined eight new registers, to provide support to SSE2. The increase in the number of registers will bring performance improved. At the same time, in order to support both 32 and 64-bit code and register, x86-64 architecture allows the processor to work in two modes: LongMode (long mode) and LegacyMode (genetic pattern), Long model is divided into two seeds mode (64bit mode and Compatibilitymode compatibility mode). The standard has been introduced in AMD's Opteron server processor processors.

But this year also introduces support for 64-bit EM64T technology, and haven't been formally life as EM64T is IA32E before, this is the Intel Extended Memory 64 Technology name, used to distinguish between x 86 instruction set. Intel's EM64T support 64-bit sub-mode, and AMD x 86-64 technology similar to using a 64-bit linear addressing, join the plane eight new general-purpose registers (GPRs), also adds eight registers support SSE instructions. Similar with AMD, Intel's 64-bit technology will be compatible with the IA32 and IA32E, only if you are running a 64-bit operating system, it will adopt IA32E. IA32E will consist of 2 sub-mode: 64-bit and32-bit sub-mode sub-mode, as with AMD64 is downward compatible. Intel's EM64T is fully compatible with AMD x 86-64 technology. Now Nocona processor has joined a number of 64-bit technology, Intel's Pentium4E processor also supports 64-bit technology.

It should be said, both of which are compatible with the x 86 instruction set architecture for 64-bit microprocessor, EM64T and AMD64 or there are not the same place, AMD64 processor NX bit in Intel processor is not provided.

11. hyper pipelined and superscalar

In interpreting the hyper pipelined and superscalar pipeline before understanding (pipeline). Line is the first time in Intel 486 chip to start using. Assembly line work as industrial production on the Assembly line. In the CPU from 5-6 different function circuit modules one instruction processing line, and then place an x 86 instruction divided into 5-6-step, then such circuit unit performed separately, so that you can achieve in a CPU clock cycles to complete one instruction, thus increasing the CPU's speed. Classic Pentium each integer lines are divided into four levels of water, i.e. instruction prefetch, decode, execute, write back the result, float flow is divided into eight levels.

Superscalar is via built-in multiple lines to the simultaneous execution of multiple processors, its essence is a space for time. The pipeline is running through the refinement, increased frequency so that one machine cycle to complete one or more operations, its essence is taking time for space. For example, the line on Pentium4 20 level. The pipeline design step (grade), their complete instruction faster and therefore to adapt working frequency higher CPU. But the line is too long or a certain side effects that are likely to occur at higher CPU lower actual operation speed, Intel's Pentium 4 appeared in this situation, although its clock speed up to 1.4G above, but its operational performance is far behind the Athlon or Pentium AMD1.2G III. 12. package

CPU package is the use of specific materials to CPU chip or CPU module curing in which prevent damage to the protective measures, must be delivered after the CPU package. CPU package depends on the CPU installation forms and device integration design, from the large category to see usually uses the Socket outlet with the installation of CPU use PGA (grid array) package, which uses the CPU Slotx duct installation are all adopt the SEC (unilateral patch box) forms of packaging. Now there are PLGA (PlastIcLandGridArray), OLGA (OrganicLandGridArray) packaging technology. As the market competition becoming increasingly fierce, the current CPU packaging technology direction of development in order to save costs.


13, multi-threaded

Simultaneous multithreading Simultaneousmultithreading, SMT. SMT you can copy the processor of the State structure, so that the same processor for multiple thread synchronization and shared processor execution resources to maximize the wide emission, out-of-sequence, superscalar processing, improving the utilization of the processor arithmetic parts, ease because data related or Cache misses will bring access memory latency. When no more threads available, SMT processor virtually and traditional wide emission superscalar processors. SMT's most attractive is just small change processor core design, almost no additional costs can significantly improve performance. Multi-threading technology for high speed computing core prepared more pending data, reduce the operational core of idle time. This is undoubtedly a low-end system desktop is very attractive. Intel starts from 3.06GHzPentium4, all processors will support SMT technology.

14, multi core

Multiple core, also referred to as single-chip multiprocessor (Chipmultiprocessors, hereinafter referred to as CMP). CMP is made by Stanford University, the idea is to massively parallel processors in the SMP (symmetric multiprocessor) integrated into the same chip, each processor side-by-side execution of different processes. Compare with CMP, SMT processor structure comparison of flexibility. However, when semi-conductors technology into 0.18 Micron, line delay have exceeded the gate delay, request the microprocessor design through Division of many smaller, local better the basic unit of structure. In contrast, because CMP structure has been divided into more than one processor core to design, each nuclear are relatively simple, help to optimize the design, so even more promising. Currently, IBM's Power4 chip and Sun's MAJC5200 chips are CMP structure. Multi-core processor to processor internal shared cache, cache usage, while simplifying multiprocessor system design complexity.

The second half of 2005, Intel and AMD's new processors will also be integrated into the structure of the CMP. New Itanium processor development code for dual-core Montecito, designs, having at least 18MB chip cache, take the 90nm process, it definitely is on the challenges of today's chip industry. It's the core of each individual will have independent L1, L2 and L3cache, contains approximately 10 billion-transistor.

15、SMP

SMP (SymmetricMulti-Processing), symmetric multiprocessing architecture ", refers to a computer has gathered a group of processors (CPU), shared between the CPU and memory subsystem bus architecture. In this kind of technical support, a server system can simultaneously run multiple processors, and share memory and other host resources. Like dual Xeon, that is what we call the second road, which is symmetric processor system one of the most common (Xeon MP can support to four road, AMDOpteron can support 1-8). A small number is 16. But generally speaking, the structure of SMP machines scalability is poor, it is difficult to be 100 over multiple processors, regular is generally 8 to 16, but for most users is sufficient. In high-performance server and workstation-class motherboard schema are most common, like UNIX server can support up to 256-CPU system.

Build a set of SMP system requirements are: support for SMP hardware including the motherboard and CPU; support for SMP systems platform, then the application software support SMP.

In order to be able to make SMP systems play an efficient performance, the operating system must support SMP systems, such as WINNT, LINUX, and UNIX, and so on 32-bit operating system. That is, able to multi-task and multithreading. Many tasks are at the same time with the operating system to allow different CPU perform different tasks; multithreading refers to the operating system to make different CPU parallelism on the same task.

To create a SMP systems on the selected CPU high demands, first of all, the internal CPU must be built-in APIC (AdvancedProgrammableInterruptControllers) unit. Intel multiprocessing specification of core is advanced programmable interrupt controller (AdvancedProgrammableInterruptControllers--APICs); once again, the same model, same type of CPU core, exactly as running frequency; Finally, to keep the same product serial number, because the two production batch of the CPU as a dual-processor operation, of the possibility of a CPU burden is too high, and the other a little of burden to maximize performance, worse may cause panic.

16, NUMA technology

NUMA is non-uniform access distributed shared memory technology, it is made up by high speed private network connections with the system of independent nodes form, each node can be a single CPU or SMP system. On NUMA, Cache consistency has a wide range of solutions, operating system and special software support. Figure 2 is a Sequent NUMA systems company. Here are three SMP module high speed private network with up to form one of the nodes, each node can have 12 CPUs. Like a Sequent systems can reach up to 64 CPUs even 256 CPUs. Obviously, this is on the basis of the SMP, then uses NUMA's technology to expand, this is the combination of the two technologies. 17, out of sequence execution technologies

Out of sequence execution (out-of-orderexecution) refers to the CPU to allow more than one directive does not require order by program development to the appropriate branch circuit unit processing technology. This will be based on a circuit unit status and instructions can advance implementation of the specific situation analysis, will be able to advance the implementation of the directive be sent immediately to the appropriate circuit element, in this order is not required during the execution of instructions, and then by rearranging cell will each perform unit results order reorder instructions. Adoption out of sequence execution technology is designed to make the CPU internal circuit full load operation and a corresponding increase in the CPU to run the program. Branch-and-technology: (branch) of the directive when it is necessary to wait for the results of operations, general unconditional branches only needs to order according to the directive, and the conditions under post-processing Mycobacterium must, before deciding whether to make the original order.

18, CPU internal memory controller

Many applications have a more complex reading mode (almost randomly, especially when cachehit unpredictable time), and there is no efficient use of bandwidth. Typical of this kind of application is the business process software, even if you have such as out-of-sequence execution (outoforderexecution) CPU characteristics will also be affected by the memory latency. This CPU must wait until the operation is required to complete the data divide mount can execute instructions (whether these data are from CPUcache or main memory system). The current low segment system memory latency is approximately 120-150ns, and CPU speeds can reach more than once 3GHz separate memory request may be a waste of 200-300 CPU cycles. Even in the cache hit ratio (cachehitrate) reaches 99% of cases, CPU also may take up to 50 per cent of the time to wait for memory request end-for example because the memory latency.

You can see the Opteron consolidated memory controller, its delays, and chipset supports dual-channel DDR memory controller's delay in comparison, it is much lower. Intel is also planned that internal consolidation in the processor, the memory controller led North Bridge chip will become less important. But changing the processor access main memory, help to increase bandwidth, lower memory latency and improve processor performance


No comments:

Post a Comment