计算机组成原理第二章笔记---计算机进化与性能
本文内容整理自西安交通大学软件学院李晨老师的课件,仅供学习使用,请勿转载
计算机组成原理系列笔记汇总:计算机组成原理笔记及思维导图汇总附复习建议_Qlz的博客-CSDN博客
文章目录
文章目录
- 文章目录
- 本章思维导图
- **Brief History of Computer**
- Vacuum tube Computer
- ENIAC
- details
- 图灵机
- IAS Memory Formats
- IAS Registers
- **IAS Instruction Set**
- UNIVAC
- **Transistor Computer**
- **IC Computer**
- Family concept
- Moore’s Law
- LSI & VLSI Computer
- Designing for Performance
- Microprocessor speed
- The techniques for meet the CPU speed
- Other critical components speed lags of CPU’s speed
- Solutions
- Multicore, MICs and GPGPUs
- Multicore(多核)
- Many Integrated Core (MIC)集成众核
- **Graphics Processing Unit (GPU)**
- Embedded System and the ARM
- Embedded System
- Acorn RISC Machine (ARM)
- Performance Assessment
- Clock frequency
- Processor time T
- Processor Speed
- Benchmark suite(基准程序)
- **Desirable characteristics of a benchmark**(基准测试的理想特征)
- SPEC (Standard Performance Evaluation Corporation)
- Amdahl Law
- Relation between system speed-up and Fe
- Vocabulary
- Key points
本章思维导图
计算机进化与性能思维导图
Brief History of Computer
1950 ~ 59 Vacuum tube
1960 ~ 68 Transistor
1969 ~ 77 Integrated Circuits
1978 ~ ? Large-scale integration (LSI) and Very-large-scale integration (VLSI)
2009~? Intelligence
Vacuum tube Computer
ENIAC
The first general-purpose computer
Can conditional Jump and be programmable, distinguished it from earlier ones
Used for computing artillery firing tables
Started 1943 and Finished 1946
details
- Decimal (not binary)
- 20 accumulators of 10 digits
- Programmed manually by switches
- 18,000 vacuum tubes
- 30 tons
- 15,000 square feet
- 140 kW power consumption
- 5,000 additions per second
图灵机
Von Neumann/Turing(IAS)
- Stored program concept
- Begin 1946, but not completed 1952
- Main memory storing programs and data
- ALU operating on binary data
- Control unit interpreting instructions from memory and executing them
- Input and output equipment operated by control unit
- Princeton Institute for Advanced Studies – IAS
Structure of the IAS computer
IAS Memory Formats
- The memory of the IAS consists of 1000 storage locations (called words) of 40 bits each
- Both data and instructions are stored there
- Numbers are represented in binary form and each instruction is a binary code
IAS Registers
- Memory buffer register (MBR)
- Contains a word to be stored in memory or sent to the I/O unit
- Or is used to receive a word from memory or from the I/O unit
- 内存中取出的指令、数据,以及将被送到内存的数据都要经过MBR
- Memory address register (MAR)
- Specifies the address in memory of the word to be written from or read into the MBR
- 将要被操作(取指,取数据,存数据)的地址
- Instruction register (IR)
- Contains the 8-bit opcode instruction being executed
- 正在和执行的8位操作码指令
- Instruction buffer register (IBR)
- Employed to temporarily hold the right-hand instruction from a word in memory
- Program counter (PC)
- Contains the address of the next instruction pair to be fetched from memory
- 下一条将被执行的指令地址
- Accumulator (AC) and multiplier quotient (MQ)
- Employed to temporarily hold operands and results of ALU operations
- AC:暂时存储ALU的计算结果
Expanded structure of IAS computer
首先将PC中的下一条要被执行的指令地址放入MAR中,控制单元根据MAR中的地址取主存中取指令放入MBR中,MBR将左右指令分开,左指令放入IR中,右指令放入IBR中,接下来控制单元对IR中的指令进行解析。产生一系列的控制信号,如果需要操作数据的话,需要将数据的地址放入MAR中,控制单元根据MAR中的地址取主存中寻找对应的数据,并将其放入MBR中,完成后将右指令从IBR中放入IR中,接下来一样的操作,如果需要与I/O进行交互的话需要将数据先放入MBR才可以
IAS Instruction Set
- 21 Instructions
- Data Transfer
- Unconditional Branch
- Conditional Branch
- Arithmetic
- Address Modify
UNIVAC
the Universal Automatic Computer
the first computer company – Electronic Control Corp 第一家计算机公司–电子控制公司。
UNIVAC tasks involve scientific and commercial applications
Transistor Computer
- More complex arithmetic and logic units and control units
- The use of high-level programming languages
- Provision of system software which provided the ability to:
- load programs
- move data to peripherals(外围设备) and libraries
- perform common computations
IC Computer
Integrated Circuits (IC) 集成电路
- SSI & MSI based computer is the 3rd computer
- SSI :small scale integration(小规模集成电路)
- MSI :medium scale integration(中规模集成电路)
Family concept
- Similar or identical instruction set
- Similar or identical operating system
- Increasing speed, increasing number of I/O ports, Increasing memory size and Increasing cost
Transistor, resistance, capacitance made from semiconductor, together with whole circuit can be put in a silicon wafer
晶体管、电阻、电容,以及整个电路都可以放在硅片中
Moore’s Law
1965,Gordon Moore - cofounder of Intel
Number of transistors on a chip will double every year 每两年翻一番
Since 1970’s development has slowed a little ,Number of transistors doubles every 18 months
- Cost of a chip has remained almost unchanged
- Higher packing density means shorter electrical paths, giving higher performance
- Smaller size gives increased flexibility
- Reduced power and cooling requirements
- Fewer interconnections increases reliability
LSI & VLSI Computer
Semiconductor memories(半导体存储)
the first relatively capacious(相对大规模) semiconductor memory(1970, Fairchild)
Quantity and Unit in common use
- Bit – b
- Byte – B: 8bit= 23
- K (Hz, bytes): 10310^{3}103 --1024=2102^{10}210
- M: Mega (bytes,Hz): 10610^{6}106 –102421024^{2}10242=2202^{20}220
- G: Giga (bytes,Hz): 10910^{9}109 –102431024^{3}10243=2302^{30}230
- T: tera (bytes,Hz): 101210^{12}1012 –102441024^{4}10244=2402^{40}240
- P: peta (bytes,Hz): 101510^{15}1015 –102451024^{5}10245=2502^{50}250
Designing for Performance
Microprocessor speed
The techniques for meet the CPU speed
- Branch prediction(分支预测)
- Data flow analysis(数据流分析)
- Speculative execution(推测执行)
Other critical components speed lags of CPU’s speed
- CPU has to wait
- Bottleneck
- Reduce the whole performance
- Especially, main memory
Solutions
Optimize system structure, balancing the whole performance of CPU, memory and I/O
- Improve the interface between CPU and memory
- The interface is the key path responsible for transferring instruction and data
- Increase number of bits retrieved at one time
- Make DRAM “wider” rather than “deeper”
- Change DRAM interface
- Cache
- Reduce frequency of memory access
- More complex cache and cache on chip
- Increase interconnection bandwidth
- High speed buses
- Hierarchy of buses
- Caching and buffering schemes(缓存和缓冲机制)
- Higher-speed interconnection buses and more elaborate(复杂) interconnection structures
- Use of multiple-processor configurations can aid in satisfying I/O demands
- Increase hardware speed of processor
- Fundamentally due to shrinking logic gate size
- More gates, packed more tightly, increasing clock rate
- Propagation time for signals reduced
- Fundamentally due to shrinking logic gate size
- Increase size and speed of caches
- Dedicating part of processor chip
- Cache access times drop significantly
- Dedicating part of processor chip
- Change processor organization and architecture
- Increase effective speed of instruction execution
- Parallelism
- Power
- Power density increases with density of logic and clock speed
- Dissipating heat(散热)
- RC delay
- Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them
- Delay increases as RC product increases
- Wire interconnects thinner, increasing resistance
- Wires closer together, increasing capacitance
- Memory latency(内存延迟)
- Memory speeds lag processor speeds
Multicore, MICs and GPGPUs
Multicore(多核)
- The use of multiple processors on the same chip provides the potential to increase performance without increasing the clock rate
- Strategy is to use two simpler processors on the chip rather than one more complex processor
- With two processors larger caches are justified(合理的)
- As caches became larger it made performance sense to create two and then three levels of cache on a chip(多级缓存)
Many Integrated Core (MIC)集成众核
- The Leap(飞跃) in performance as well as the challenges in developing software to exploit(利用) such a large number of cores
- MIC is a software architecture for Co-Processor
- The multicore and MIC strategy involves a homogeneous collection of general purpose processors(通用处理器) on a single chip
Graphics Processing Unit (GPU)
- Core designed to perform parallel operations on graphics data
- Traditionally found on a plug-in graphics card, it is used to encode and render(渲染) 2D and 3D graphics as well as process video
- Used as vector processors for a variety of applications that require repetitive computations
- A GPU can support a broad range of applications — GPGPU
- Deep learning
Embedded System and the ARM
Embedded System
- Def.: A combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a dedicated (专用的) function.
- In many cases, embedded systems are part of a larger system or product, as in the case of an antilock braking system in a car.
Acorn RISC Machine (ARM)
- Family of RISC-based microprocessors and microcontrollers
- Designs microprocessor and multicore architectures and licenses them to manufacturers(设计微处理器和多核架构,并将其授权给制造商)
- Chips are high-speed processors that are known for their small die size(小模具尺寸) and low power requirements
- Widely used in PDAs and other handheld devices
- Most widely used processor architecture of any kind
Performance Assessment
Clock frequency
- Operations performed by a processor are governed by a system clock
- Speed of processor is dictated(决定) by the pulse frequency by the system clock, measured in cycles per second(Hz) — clock frequency处理器的速度由系统时钟的脉冲频率决定,以每秒(Hz)为周期测量——时钟频率
- imprecise(不准确的)
Processor time T
- Time needed that a processor execute a given program (执行给定程序耗费的时间)
- T= *CPU clock cycles for a program clock cycle τ
= CPU clock cycles for a program/clock rate - T=CPI×IC×clockcycle(τ)=CPI×IC/clockrateT= CPI ×I_C × clock cycle (τ) = CPI ×I_C / clock rateT=CPI×IC×clockcycle(τ)=CPI×IC/clockrate
- CPI: average cycles per instruction
- ICI_CIC: instruction count
- Influenced by the instruction set architecture, compiler technology, processor implementation and memory hierarchy
Processor Speed
- rate at which instructions are executed, expressed as millions of instructions per second (MIPS)
- referred to as the MIPS rate
MIPSrate=ICT×106=fCPI×106MIPS\, rate=\frac{I_C}{T\times 10^6}=\frac{f}{CPI\times 10^6} MIPSrate=T×106IC=CPI×106f
- millions of floating-point instructions per second (MFLOPS)
- For scientific and game application
MFLOPSrate=Numberofexecutedfloating−pointoperationsinaprogramExecutiontime×106MFLOPS \, rate=\frac{Number\, of\, executed\, floating-point\, operations\, in\, a\, program}{Execution\, time \times 10^6} MFLOPSrate=Executiontime×106Numberofexecutedfloating−pointoperationsinaprogram
Benchmark suite(基准程序)
- A collection of programs, defined in a high-level language
- Attempts to provide a representative test of a computer in a particular application or system programming area
Desirable characteristics of a benchmark(基准测试的理想特征)
- Written in a high-level language, making it portable across different machines
- Representative of a particular kind of programming style, such as systems
- programming, numerical programming, or commercial programming
- Measured easily
- Wide distribution
SPEC (Standard Performance Evaluation Corporation)
- An industry consortium(组织)
- Defines and maintains the best known collection of benchmark suites
- Performance measurements are widely used for comparison and research purposes
Amdahl Law
- The improved performance of using a faster execution mode is limited by the fraction of the execution time of faster mode in total execution time 使用更快的执行模式的改进性能受到更快模式在总执行时间中的执行时间的限制
- The improved performance is limited by the frequency of using a faster mode 改进的性能受到使用一个更快的模式的频率的限制
- Amdahl Law defines speed-up that can be gained by using a particular technology
Fe=thecomputingtimeofpartthatcanbeenhanced(可以被加速部分的时间)totalcomputingtimebeforeenhanced(总时间)≤1Se=computingtimeofpartthatcanbeenhancedbeforeenhancement(加速前耗费时间)computingtimeofthispartafterenhanced(加速后耗费时间)≥1T0:totaltaskexecutiontimebeforeenhancement(未被加速前所用总时间)F_e=\frac{the\,computing\, time\,of\,part\,that\,can\,be\,enhanced(可以被加速部分的时间)}{total\,computing\,time\,before\,enhanced(总时间)}\le1 \\ S_e=\frac{computing\, time\,of\,part\,that\,can\,be\,enhanced\,before\,enhancement(加速前耗费时间)}{computing\, time\,of\,this\,part\,after\,enhanced(加速后耗费时间)}\ge1 \\ T_0:total\,task\,execution\,time\,before\,enhancement(未被加速前所用总时间) Fe=totalcomputingtimebeforeenhanced(总时间)thecomputingtimeofpartthatcanbeenhanced(可以被加速部分的时间)≤1Se=computingtimeofthispartafterenhanced(加速后耗费时间)computingtimeofpartthatcanbeenhancedbeforeenhancement(加速前耗费时间)≥1T0:totaltaskexecutiontimebeforeenhancement(未被加速前所用总时间)
Execution time of total task after enhancement (总时间): Tn=T0(1−Fe)(未被加速的部分)+T0×FeSe(被加速的部分)T_n=T_0(1-F_e)(未被加速的部分)+T_0\times \frac{F_e}{S_e}(被加速的部分)Tn=T0(1−Fe)(未被加速的部分)+T0×SeFe(被加速的部分)
System speed-up after enhancement(加速率) : Sn=T0Tn(加速前总时间比加速后总时间)=1(1−Fe)+FeSeS_n=\frac{T_0}{T_n}(加速前总时间比加速后总时间)=\frac{1}{(1-F_e)+\frac{F_e}{S_e}}Sn=TnT0(加速前总时间比加速后总时间)=(1−Fe)+SeFe1
Relation between system speed-up and Fe
- When$ F_e=0, S_n=1$, this means no part can be enhanced
- When Se=∞,Sn=1/(1−Fe)S_e=∞, S_n=1/(1-F_e)Se=∞,Sn=1/(1−Fe)
- So, the system performance enhancement is strongly limited by FeF_eFe
example:
Suppose that a task makes extensive use of floating-point operation, with 40% of the time is consumed by floating-point operations. With a new hardware design, the floating-point module is speeded up by a factor of K. what is the overall speedup gained by this enhancement?
Solutions:
Fe=0.4,Se=kF_e=0.4,S_e=kFe=0.4,Se=k
Sn=1(1−Fe)+FeSe=10.6+0.4KS_n=\frac{1}{(1-F_e)+\frac{F_e}{S_e}}=\frac{1}{0.6+\frac{0.4}{K}}Sn=(1−Fe)+SeFe1=0.6+K0.41
while Se=∞,Sn=1.67(优化到极致)S_e=\infty,S_n=1.67(优化到极致)Se=∞,Sn=1.67(优化到极致)
Vocabulary
Pipelining and parallel execution: 流水与并行执行
Speculative execution: 推测执行
Cache: 快速缓存
Decimal: 十进制
Binary: 二进制
General purpose computer: 通用计算机
Von Neumann Machine: 冯-诺依曼计算机
Opcode=operation code: 操作码
Instruction cycle: 指令周期
Fetch cycle: 取(读)周期
Flowchart: 流程图
Condition branch: 条件转移
Data transfer: 数据传送
Upward compatible: 向上兼容
Multiplexor: 复用器
Bus: 总线
Magnetic-core memory: 磁芯存储器
End user: 端用户
Speech recognition: 语音识别
Videoconferencing: 视频会议
Multimedia authoring: 多媒体编著
Workstation: 工作站
Client-server: 客户机-服务器
DRAM—dynamic random access memory: 动态随机存取存储器
Branch prediction: 转移预测
Throughput: 吞吐率
Trade-off : 折衷
Supercomputer: 超级计算机/巨型机
Parallelism: 并行性
Key points
- What is the first computer in the world?
- What features of von Nuemann machine is there? How about its structure?
- Moore law?
- What is multicore, MICs and GPU?
- CPI, IcI_cIc, T, MIPS, MFLOPS
- Amdahl Law
计算机组成原理第二章笔记---计算机进化与性能相关推荐
- 计算机组成原理第二章数据,计算机组成原理第二章数据在计算机中的表示
计算机组成原理第二章数据在计算机中的表示 (91页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 14.90 积分 第二章 数据在计算机中的表示 n 概述 ...
- 计算机组成原理第二章测试题,计算机组成原理第二章习题答案.doc
计算机组成原理第二章习题答案 第2章?习题及解答 2-2?? 将下列十进制表示成二进制浮点规格化的数(尾数取12位,包括一位符号位:阶取4位,包括一位符号位),并写出它的原码.反码.补码三和阶移尾补四 ...
- 计算机组成原理|第二章(笔记)
目录 第二章 计算机的发展及应用 2.1 计算机的发展史 2.1.1 计算机的生产和发展 2.1.2 微型计算机的出现和发展 2.1.3 软件技术的兴起与发展 2.2 计算机的应用 2.3 计算机的展 ...
- 计算机组成原理第二章数据,计算机组成原理第二章数据表示(含答案)
null 第二章数据表示 2.1 机器数及特点随堂测验 1.设计算机字长8位,设x = -5, [x]补为( ) (单选) A.FBH B.FDH C.FAH D.05H 2.系列关于补码机器数的描述 ...
- 计算机组成原理组间串行进位,计算机组成原理第二章课件.ppt
<计算机组成原理第二章课件.ppt>由会员分享,提供在线免费全文阅读可下载,此文档格式为ppt,更多相关<计算机组成原理第二章课件.ppt>文档请在天天文库搜索. 1.2.5 ...
- 计算机组成原理中01010110,计算机组成原理第二章教案.ppt
文档介绍: 第二章 运算方法与运算器 运算方法和运算器 http://zcylytueducn/ 姥轮赃哮仗七二促滨者凶疹觅轮啦茸仟竟仔份脂溪谓偷兄擒释阮芹尘付蔫计算机组成原理第二章教案计算机组成原理 ...
- 计算机组成原理-第二章 数据表示与运算
计算机组成原理-第二章 数据表示与运算 一.数据的表示 1.数值型数据的表示(重点难点) 1.1数值型数据的表示--进位制 1.2数值型数据表示-码制 1.3数值型数据的表示--定点数 1.4数值型数 ...
- 计算机的定点运算器原理,计算机组成原理第二章第10讲定点运算器的组成.ppt
<计算机组成原理第二章第10讲定点运算器的组成.ppt>由会员分享,可在线阅读,更多相关<计算机组成原理第二章第10讲定点运算器的组成.ppt(20页珍藏版)>请在装配图网上搜 ...
- 计算机组成原理乘法运算说明过程,计算机组成原理第二章 第8讲 定点乘法运算...
计算机组成原理第二章 第8讲 定点乘法运算 (34页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 9.90 积分 定点乘法运算,,2.3 定点乘法运算,2 ...
最新文章
- Android Studio编译卡死
- Nginx之rewrite简述
- NLP:LSTM之父眼中的深度学习十年简史《The 2010s: Our Decade of Deep Learning / Outlook on the 2020s》的参考文献
- flyme8会更新Android版本吗,魅族17系列升级Flyme 8.1操作系统:终于到Android 10
- 进阶之路(基础篇) - 009 通过底层AVR方法实现SPI数据传输
- java fri星期转_Java日期时间以及日期相互转换
- 数学建模主要方法与常用算法概括
- 【遥感微课堂】学习ENVI5.0
- [附源码]计算机毕业设计JAVAjsp宾馆客房管理系统
- mac Parallels Desktop安装ubuntu教程
- java碰撞检测代码_java碰撞检测代码
- 期货与期权的主要区别与联系?
- 【2018年12月05日】滚动市盈率PE最低排名
- nyoj-1016-德莱联盟(向量叉乘判断线段相交)
- C ++ Hello World和可爱的无情彩虹
- golang-命令源码文件
- 在360个人图书馆中实现复制
- latex数学公式(行内(间)公式标注/希腊字母/数学函数/配对括号/定理环境
- 谷歌商店上架流程_Googleplay 上架流程(2022版)
- freeswitch 显示主叫名称和主叫号码
热门文章
- python车牌识别系统抬杆_昆明车牌识别自动抬杆系统
- 【python】数字验证常用操作
- Redis 远程连接( redis.conf 配置 auth 认证 重启 redis)
- 【转载】人工智能发展简史
- {嵌入式}之TQ2440(天嵌)小记
- 右键添加打开方式 windows
- matlab 动画生成avi,MATLAB 生成.avi和.gif
- IP地址分类/IP地址10开头和172开头和192开头的区别
- 1577 例题3 数字转换(LOJ10155) 约数计算 树上最长链(两次找最大深度)
- MySQL8.0 之SQL(DQL)单表、多表查询(详细回顾篇)