Saturday, December 18, 2010
【School】 understand the weak no one would dare you Huyou CPU (2).
<br> <BR> 11. pipelined and superscalar super <BR> in the interpretation of ultra superscalar pipeline and before understanding pipeline (pipeline). .Pipeline is the first time Intel started with 486 chips. .Line of work as industrial production on the assembly lines. .By 5-6 in the CPU circuit unit composed of different functional instruction processing pipeline, and then an X86 instruction is divided into 5-6 steps and then again from the circuit modules are implemented, this can be achieved in a CPU clock cycle to complete one instruction ., thereby increasing the CPU's processing speed. .Pentium classic lines are divided into four levels each integer water, that is, instruction prefetch, decode, execute, write back the results of floating-point pipeline is divided into eight water. .<BR> <BR> Superscalar pipeline is built to simultaneously perform a number of multiple processors, and its essence is space for time. .The ultra-refined water by pipeline to improve the frequency, so in one machine cycle to complete one or more operations, and its essence is the time for space. .For example, the line Pentium4 to a 20-level. .The pipeline design of the step (level) longer, the faster the completion of an instruction, so to adapt to the work of a higher frequency of CPU. .But it also brings a certain line is too long side effects, are likely to higher frequency with lower CPU speed of the phenomenon of actual computing, Intel's Pentium 4, appeared in this situation, although its frequency can be as high as 1.4G or above ., its computational performance is far less than the Athlon or Pentium AMD1.2G III. .<BR> <BR> 12. Package <BR> CPU package is the use of specific materials will cure CPU chip or CPU module in which the protective measures to prevent damage, usually after the CPU can be in the package delivered to users. .CPU CPU installed package depends on the integration of form and design of devices, from the large category of view usually installed in the CPU socket Socket using the PGA (grid array) packaging methods, and installation of the CPU using Slotx tank is used in all SEC ( .sided patch box) in the form of packages. .Now there PLGA (PlasticLandGridArray), OLGA (OrganicLandGridArray) and other packaging technologies. .As the market increasingly competitive, the current direction of development of CPU packaging technology in order to save cost based. .<BR> <BR> <BR> 13, <BR> <BR> multithreading multithreading Simultaneousmultithreading, referred to as SMT. .SMT processor by copying the structure of the state, so that a processor with multiple execution threads simultaneously and share the processor's execution resources to maximize the broad emission, out of order superscalar processing, improve the processor operations .the utilization of parts, ease of data-related or due Cache misses caused by memory access latency. .When no more threads available, SMT processor, the broad emission almost traditional superscalar processors. .SMT is simply the most attractive small-scale changes in processor core design, almost no additional costs can significantly improve performance. .High-speed multi-threaded computing technology that the core can be prepared for more data to be processed to reduce the computing core of the idle time. .This table is undoubtedly a very attractive low-end systems. .Intel from 3.06GHzPentium4, all processors will support the SMT technology. .<BR> <BR> 14, <BR> <BR> multi-core multi-core, but also refers to the single-chip multi-processor (Chipmultiprocessors, referred to as CMP). .CMP is proposed by the United States Stanford University, the idea is to massively parallel processors in SMP (symmetric multi-processor) into the same chip, each processor parallel implementation of different processes. .Compared with the CMP, SMT processor architecture flexibility outstanding. .However, when the 0.18-micron semiconductor technology into the future, the line delay of more than a gate delay, requested by dividing the number of microprocessor design smaller, more localized for the basic cell structure. .In contrast, the CMP structure has been divided into multiple processor cores to the design, each core is relatively simple, help to optimize the design, therefore more promising. .Currently, IBM's Power4 chip and Sun's chips are used MAJC5200 CMP structure. .Multi-core processors within the shared cache in the processor to improve the cache utilization, and simplify multi-processor system design complexity. .<BR> <BR> 2005 in the second half, Intel and AMD's new processor will be integrated into CMP structure. .The new Itanium processor, code-named Montecito, dual-core design, with at least 18MB on-chip cache, 90nm manufacturing process to take its design is definitely regarded as the challenges of today's chip industry. .It is the core of each individual are independent of the L1, L2 and L3cache, contains about 1 billion transistors. .<BR> <BR> 15, SMP <BR> <BR> SMP (SymmetricMulti-Processing), symmetric multi-processing structure of the short, is a collection of a set of computer processors (multi CPU), between the CPU .shared memory subsystem, and bus structure. .In support of this technology, a server system can run multiple processors and a host of shared memory and other resources. .Like a dual Xeon, which is what we call Road, which is symmetric processor system, the most common one (can support up to four Xeon MP, AMDOpteron can support 1-8 Road). .A few are 16-way. .But in general, SMP scalability of the machine structure is poor, it is difficult to achieve more than 100 multi-processor, the conventional generally 8-16, but that for most users who have enough. .In the high-performance server and workstation motherboard, the most common structure, such as UNIX server can support up to 256 CPU system. .Construction of an SMP system <BR> <BR> the necessary conditions are: support for SMP hardware, including motherboard and CPU; support SMP system platform, and later to support SMP applications. .<BR> <BR> Play to be able to make the efficient performance of SMP systems, the operating system must support SMP systems, such as WINNT, LINUX, and UNIX, and so 32-bit operating system. .That is capable of multi-tasking and multi threading. .Multitasking refers to the operating system can at the same time for different CPU to complete different tasks; multi-threaded operating system is able to make a different CPU in parallel to complete the same task. .<BR> <BR> To build SMP system, the selected CPU has high demands, first of all, CPU must be built within the APIC (AdvancedProgrammableInterruptControllers) unit. .Intel multi-core processing standard is the Advanced Programmable Interrupt Controller (AdvancedProgrammableInterruptControllers - APICs) use; again, the same product type, the same type of CPU core, the same operating frequency; Finally, as far as possible to maintain the same serial .No, because the two production batches of the CPU as a dual processor running, there may occur a CPU burden is too high a burden on the other rare occasions, not maximum performance, even worse, may lead to .crash. .<BR> <BR> 16, NUMA technology <BR> <BR> NUMA access is non-uniform distribution of shared storage technology, which is adopted by a number of high-speed private network connected to form a system of independent nodes, each node can be a single CPU .or SMP systems. .In the NUMA, Cache consistency in a variety of solutions to the special needs of the operating system and software support. .Figure 2 is an example of Sequent company NUMA system. .There are three high-speed private network SMP Modules linked together to form a node, each node can have 12 CPU. .Sequent systems as up to 64 up to 256 CPU or CPU. .Obviously, this is the basis of the SMP, and then be expanded NUMA technology is a combination of these two technologies. .<BR> <BR> 17, out of order execution of technical <BR> <BR> of-order execution (out-of-orderexecution), is the CPU allows multiple instructions do not follow the procedural requirements of the order of minutes to give the corresponding circuit unit development .processing technology. .This unit will be based on a circuit status and ability to advance the implementation of the directives of the circumstances, it will be able to advance the implementation of the instruction immediately sent to the appropriate unit of the implementation of the circuit, during which the order is not required to execute instructions, and then re-arranged by the unit .The results of the execution units according to the instructions in order to re-order. .Order execution technology with the purpose of the CPU running at full capacity within the circuit and the corresponding operational procedures to improve the CPU speed. .Branching technique: (branch) instruction is required to wait for the results of operations, only according to the general unconditional branching instruction sequence, and branching conditions must be based on the results of treatment, and then decide whether the original order. .<BR> <BR> 18, CPU memory controller inside the <BR> <BR> many applications have a more complex reading mode (almost random manner, especially when cachehit unpredictable time), and there is no effective .use of bandwidth. .Typical of such applications are the business processing software, even with such out of order (outoforderexecution) This CPU features, will be limited by memory latency. .This CPU must wait until completion of loading operations data needed to execute instructions dividend (whether the data from main memory CPUcache or system). .Section of the current system of low-memory latency is about 120-150ns, while the CPU speed can reach over 3GHz, a separate memory request may be a waste of CPU cycles of 200-300 times. .Even in the cache hit ratio (cachehitrate) 99% of the cases, CPU may spend 50% of the time waiting for memory requests of the end - such as memory latency reasons. .<BR> <BR> You can see the Opteron memory controller integrated, and its delay, and the chipset supports dual-channel DDR memory controller compared to the delay, it is much lower. .Intel is also planned as an integrated memory controller within the processor, this has led to the Northbridge chip will become less important. .But changing the way the processor to access main memory to help increase the bandwidth and reduce memory latency and increasing processor performance. .<BR> <BR> <BR> <BR>.
Labels:
[:]
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment