数据库系统

lec1 数据库系统概述

1、什么是数据库
P3

  • Data 数据:

    • facts and statistics collected together for reference or analysis.收集事实和统计数据以供参考或分析。
  • Database 数据库:Data + Base
    • A very large, structured collection of data.一个非常大的、结构化的数据集合。
    • Models some real-world “enterprise”, such as a university。模拟一些真实世界的“规划”,如大学
      • Entities 实体:例如学生、课程
      • Relationships 联系:例如张三正在上数据库系统课程。

2、什么是数据库管理系统
P3

  • Database Management System (DBMS) 数据库管理系统是:

    • A software system designed to store, manage, and facilitate query to databases.一种用于存储、管理和方便查询数据库的软件系统。
    • Popular DBMS:Oracle、IBM DB2、Microsoft SQL Server
    • Database System = Databases + DBMS 数据库系统 = 数据库 + 数据库管理系统

3、Typical Applications Supported by Database Systems 数据库系统支持的典型应用程序

  • Online Transaction Processing (OLTP) 联机事务处理

    • Recording sales data in supermarkets
    • Booking flight tickets
    • Electronic banking
  • Online analytical processing (OLAP) and Data Warehousing 在线分析处理和数据仓库

    • Business reporting for sales data
    • Customer Relationship Management (CRM)
  • Is the WWW a DBMS?

    • The Web = Surface Web + Deep Web
    • Surface Web: simply the HTML pages
      • Accessed by “search”:

        • Pose keywords in search box.
    • Deep Web: content hidden behind HTML forms
      • Accessed by “query”
      • Fill in query forms.

4、搜索和查询的区别

Search is structure-free.Query is structure-aware.搜索是无结构的。查询是结构感知的。

  • Search is structure-free.

    • The keywords “database systems” can appear in anyplace in a HTML pages
  • Query is structure-aware.
    • Say, we restruct that the keywords “database systems” can only appear in the “TITLE” field.
    • i.e., we assume there is an underlying STRUCTURE (of a book).

5、文件和DBMS的区别

  • We can store data in OS files.

    • E.g., Google has its own distributed file system called Google File System (GFS).
  • What are the advantages of DBMS?
    • Good data modeling 好的数据建模

      • Data Independence 数据独立性
      • Data Integrity and Security 数据完整性和安全性
    • Simple and efficient ad-hoc queries 简单高效的即时查询
      • Reduced application development time 缩短应用程序开发时间
    • Concurrency control 并发控制
    • Crash recovery 事故恢复

6、历史视角
P5

  • Integrated Data Store (IDS)集成数据存储, by Charles Bachman, early 1960s.网络数据模型,1973年图灵奖获得者。
  • Information Management System (IMS)信息管理系统, by IBM, late 1960s.Hierarchical data model.分层数据模型
  • Relational Data Model关系数据模型, by Edgar Codd, 1970.1981年图灵奖获得者
  • System R R系统,关系型数据库系统, by IBM, started in 1974.Structured Query Language (SQL) 结构化查询语言
  • INGRES 数据库, by Berkeley, started in 1974.“Interactive Graphic and Retrieval System”交互式图形检索系统。
  • Database Transaction Processing数据库事务处理, mainly by Jim Gray.1993年图灵奖获得者
  • Object-Relational DBMS对象关系数据库管理系统, 1990s.
    • Stonebraker, Michael with Moore, Dorothy. Object-Relational DBMSs: The Next Great Wave. 1996.
    • Postgres (UC Berkeley), PostgreSQL.
    • IBM’s DB2, Oracle database, and Microsoft SQL Server
    • Turing Award, 2014.
  • Column store列存储, memory database内存数据库, big data大数据, 2010s.
    • C-store, H-Store, SciDB , …

7、从OLTP到OLAP和数据仓库

  • OLAP (On-Line Analytical Processing, Codd, 1993)联机分析处理

    • Flexible Reporting for Business Intelligence灵活的商业智能报告
  • Characteristics of OLAP applications :
    • Transactions that involve large numbers of records 涉及大量记录的事务
    • Frequent Ad-hoc queries and Infrequent updates 频繁的特定查询和不频繁的更新
    • A few decision making users 少数决策用户
    • Fast response times 快速响应时间
  • Data warehouses are designed to facilitate reporting and analysis. 促进报告和分析
    • Read-Mostly DBMS: C-Store, MonetDB
  • Data Warehousing 数据仓库
    • Integrated data spanning long time periods, often augmented with summary information. 跨长时间段的集成数据,通常附有摘要信息。
    • Several gigabytes to terabytes common. 几GB到TB是常见的。
    • Interactive response times expected for complex queries: ad-hoc updates uncommon 复杂查询有预期的交互式响应时间:特别更新不常见

8、Data Mining (DM)数据挖掘

  • DM是对大量数据的探索和分析,以发现数据中有效、新颖、潜在有用且最终可理解的模式。
  • Association Rules关联规则
    • 60%购买尿布的顾客也会购买啤酒。
  • 分类:垃圾邮件
  • 聚类:按相似兴趣对新浪微博用户进行聚类
  • 网页排名:谷歌的PageRank

9、Big Data 大数据

  • 牛津字典:太大、太复杂,无法使用标准方法或工具进行操作或查询的数据集。
  • 数据来自各个地方
  • Big data spans four dimensions: 大数据包括四个维度
    • Volume, terabytes (TB), even petabytes of information 容量,TB,甚至PB的信息
    • Velocity, Sometimes 2 minutes is too late 速度
    • Variety, Big data is any type of data - structured and unstructured data 多样性,大数据是任意类型的数据——结构化和非结构化数据
    • Veracity 真实性

10、Describing Data: Data Models 描述数据:数据模型
P8

  • A data model is a collection of concepts for describing data. 数据模型是用于描述数据的概念集合。
  • A schema is a description of a particular collection of data, using a given data model. 模式是使用给定数据模型对特定数据集合的描述。
  • The relational data model is the most widely used model today. 关系数据模型是当今应用最广泛的模型。
    • Main concept: relation, basically a table with rows and columns. 关系,基本上是一个包含行和列的表。
    • Every relation has a schema, which describes the columns, or fields (their names, types, constraints, etc.). 每个关系都有一个模式,它描述列或字段(它们的名称、类型、约束等)。

11、Schema in Relation Data Model 关系数据模型中的模式
P8

A relation schema is a TEMPLATE of the corresponding relation. 关系模式是对应关系的模板。

12、Levels of Abstraction in a DBMS 数据库管理系统中的抽象层次
P9

  • Many views describe how users see the data. 许多场景描述用户如何查看数据。

    • Personalized access of data. 个性化数据访问。
  • Conceptual schema defines logical structure 概念模式定义了逻辑结构
    • i.e., what relations to store. 例如,存储什么关系。
  • Physical schema specifies physical structure. 物理模式指定物理结构。
  • How the “logical” relations are physically stored on external storage such as disk. 如何将“逻辑”关系物理存储在外部存储器(如磁盘)上。

View外模式->Conceptual schema概念模式->Physical schema物理模式->DB磁盘

Example: University Database

  • Conceptual schema:

    • Students(sid: string, name: string, login: string, age: integer, gpa: real)
    • Courses(cid: string, cname: string, credits: integer)
    • Enrolled(sid: string, cid: string, grade: string)
  • Physical schema:
    • Relations stored as unordered files.
    • Index on first column of Students.
  • External Schema (View):
    • Course_info(cid: string, enrollment: integer)

13、Data Independence 数据独立性
P11

  • Applications insulated from how data is structured and stored. 应用程序与数据的结构和存储方式无关。
  • Logical data independence: Protection from changes in logical structure of data. 逻辑数据独立性:防止数据的逻辑结构发生变化。
  • Physical data independence: Protection from changes in physicalstructure of data. 物理数据独立性:防止数据的物理结构发生变化。
  • One of the most important benefits of using a DBMS!

14、Queries in a Relational DBMS 关系数据库管理系统中的查询
P11

  • Specified in a Non-Procedural way 以非程序方式指定

    • Users only specify what data they need; 用户只指定他们需要什么数据;
    • A DBMS takes care to evaluate queries as efficiently as possible. DBMS会尽可能高效地评估查询。
  • a Non-Procedural Query Language: 非过程查询语言:
    • SQL: Structured Query Language 结构化查询语言

15、Concurrent execution of user programs 用户程序的并发执行
P13

Why?

  • Utilize CPU while waiting for disk I/O 在等待磁盘I/O时利用CPU

    • (database programs make heavy use of disk)
  • Avoid short programs waiting behind long ones 避免短程序在等待长程序执行完之后再执行
    • e.g. ATM withdrawal while bank manager sums balance across all accounts

Concurrent execution 并行执行

  • Interleaving actions of different user programs can lead to inconsistency: 不同用户程序的交叉操作可能导致不一致

Concurrency Control 并发控制

  • DBMS ensures such problems don’t arise.
  • Users can pretend they are using a single-user system. 用户可以当作使用单用户系统。

16、Key concept: Transaction 关键概念:事务
P12

  • An Transaction is an atomic sequence of database actions (reads / writes) 事务是数据库操作(读/写)的原子序列
  • Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. 如果事务开始时DB是一致的,则完全执行的每个事务都必须使DB保持一致状态。

17、Incomplete Transaction and System Crashes 未完成的事务和系统崩溃
P13

  • Incomplete transaction 未完成的事务

    • Canceled by the transaction or DBMS 被事务或DBMS取消
    • Aborted unexpectedly by system crash 由于系统崩溃而意外中止
  • Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of transactions: 在执行一组事务时,保留DBMS执行的所有操作的日志(历史记录)
    • Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol; OS support for this is often inadequate.) 在对数据库进行更改之前,会将相应的日志条目强制放到安全位置(预写式协议;操作系统对此的支持通常是不够的)
    • After a crash, the effects of partially executed transactions are undone using the log. 崩溃后,部分执行的事务的效果将使用日志撤消。

18、数据库管理系统的结构
P14

Databases make these folks happy 数据库让这些人很开心
P15

  • End users and DBMS vendors 终端用户和DBMS供应商
  • DB application programmers 数据库应用程序程序员
    • E.g., smart webmasters 网站管理员
  • Database administrator (DBA) 数据库管理员
    • Designs logical /physical schemas 设计逻辑/物理模式
    • Handles security and authorization 处理安全和授权
    • Data availability, crash recovery 数据可用性、崩溃恢复
    • Database tuning as needs evolve 根据需要调整数据库
    • Must understand how a DBMS works! 必须了解DBMS是如何工作的!

19、Summary 总结

  • DBMS used to maintain, query large datasets. DBMS用于维护、查询大型数据集
  • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. 好处包括从系统崩溃中恢复、并发访问、快速应用程序开发、数据完整性和安全性。
  • Levels of abstraction give data independence. 抽象级别提供了数据独立性。
  • A DBMS typically has a layered architecture. DBMS通常具有分层体系结构。
  • DBAs hold responsible jobs and are well-paid! DBA拥有负责任的工作,而且薪水很高!
  • DBMS R&D is one of the broadest, most exciting areas in CS. DBMS研发是CS中最广泛、最令人兴奋的领域之一。
  • We focus on Relational DBMS: 我们主要关注关系型DBMS:
    • maintain/query structured data 维护/查询结构化数据

lec2 关系模型 The Relational Model

1、关系模型定义
P43-45

  • Relational database: 关系数据库

    • a set of relations. 一组关系
  • Relation: made up of 2 parts:
    • Schema模式: specifies name of relation, plus name and type of each column. 指定关系的名称,加上每个列的名称和类型。
  • Instance实例: a table, with rows and columns. 具有行和列的表。
    • #rows = cardinality行=基
    • #fields = arity (or degree)字段=参数数量(或度)
  • Can think of a relation as a set of rows or tuples. 可以将关系视为一组行或元组。
    • i.e., all rows are distinct 例如,所有行都是不同的

2、SQL - A language for Relational DBs
P45

  • SQL (a.k.a. “Sequel”), standard language 标准语言
  • Data Definition Language (DDL) 数据定义语言
    • create, modify, delete relations 创建、修改、删除关系
    • specify constraints 指定约束条件
    • administer users, security, etc. 管理用户、安全等。
  • Data Manipulation Language (DML) 数据操作语言
    • Specify queries to find tuples that satisfy criteria 指定查询以查找满足条件的元组
    • add, modify, remove tuples 添加、修改、删除元组
CREATE TABLE <name> ( <field> <domain>, … )INSERT INTO <name> (<field names>)VALUES (<field values>)DELETE FROM <name>WHERE <condition>UPDATE <name>SET <field name> = <value>
WHERE <condition>SELECT <fields>FROM <name>
WHERE <condition>

3、Creating Relations in SQL 创建关系
P45

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10),age INTEGER,gpa FLOAT)

创建表格

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))

添加和删除元组

INSERT INTO Students (sid, name, login, age, gpa)VALUES ('53688', 'Smith', 'smith@ee', 18, 3.2)
DELETE
FROM Students S
WHERE S.name = 'Smith'

4、Keys 键
P47

  • Keys are a way to associate tuples in different relations. 键是在不同关系中关联元组的一种方法。
  • Keys are one form of integrity constraint (IC) 键是完整性约束(IC)的一种形式

5、Primary Keys 主键

  • A set of fields is a superkey if 一组字段是超键:

    • No two distinct tuples can have same values in all key fields 没有两个不同的元组可以在所有键字段中具有相同的值
  • A set of fields is a key for a relation if 一组字段是关系的键:
    • It is a superkey
    • No subset of the fields is a superkey. (i.e., minimal). 字段的任何子集都不是超级键
  • what if more than one keys for a relation?
    • One of the keys is chosen (by DBA) to be the primary key. DBA选择其中一个键作为主键。
    • Other keys are called candidate keys. 其他键称为候选键
  • E.g.
    • sid is a key for Students.
    • What about name?
    • The set {sid, gpa} is a superkey.

6、Primary and Candidate Keys in SQL SQL中的主键和候选键
P47

  • Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. 可能有许多候选键(使用UNIQUE指定),其中一个被选为主键。
  • Keys must be used carefully!
  • “For a given student and course, there is a single grade.”
CREATE TABLE Enrolled(sid CHAR(20)cid  CHAR(20),grade CHAR(2),PRIMARY KEY (sid,cid))VS.CREATE TABLE Enrolled(sid CHAR(20),cid  CHAR(20),grade CHAR(2),PRIMARY KEY  (sid),UNIQUE (cid, grade))
  • “Students can take only one course, and no two students in a course receive the same grade.”

7、Foreign Keys 外键 vs. Referential Integrity 参照完整性
P48

  • Foreign key: Set of fields in one relation that is used to `refer’ to a tuple in another relation. 一个关系中的一组字段,用于“引用”另一个关系中的元组。

    • Must correspond to the primary key of the other relation. 必须对应于其他关系的主键。
    • Like a `logical pointer’.
  • If all foreign key constraints are enforced, referential integrity is achieved (i.e., no dangling references.) 如果强制执行所有外键约束,则实现参照完整性(即,没有悬空引用)

  • E.g. Only students listed in the Students relation should be allowed to enroll for courses. 只有学生关系中列出的学生才允许注册课程。

    • sid is a foreign key referring to Students: sid是指向学生关系的外键
CREATE TABLE Enrolled
(sid CHAR(20),cid CHAR(20),grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )

8、Enforcing Referential Integrity 强制引用完整性
P51

  • sid in Enrolled: foreign key referencing Students. sid是指向学生关系的外键
  • Scenarios:
    • Insert Enrolled tuple with non-existent student id? 向Enrolled中插入不存在的学生学号的元组
    • Delete a Students tuple? 删除一个学生元组
      • Also delete Enrolled tuples that refer to it? (Cascade) 级联
      • Disallow if referred to? (No Action) 不采取行动
      • Set sid in referring Enrolled tuples to a default value? (Set Default) 设置默认值
      • Set sid in referring Enrolled tuples to null, denoting unknown’ orinapplicable’. (Set NULL) 设置为空
  • Similar issues arise if primary key of Students tuple is updated. 如果更新Students元组的主键,也会出现类似的问题

9、Integrity Constraints (ICs) 完整性约束
P46

  • IC: condition that must be true for any instance of the database 对于数据库的任何实例都必须为真的条件

    • e.g., domain constraints. 域约束。
    • ICs are specified when schema is defined. 在定义模式时指定ICs
    • ICs are checked when relations are modified. 修改关系时会检查ICs
  • A legal instance of a relation is one that satisfies all specified ICs. 一个关系的合法实例是满足所有指定ICs的实例
    • DBMS should not allow illegal instances. DBMS不应该允许非法实例
  • If the DBMS checks ICs, stored data is more faithful to real-world meaning. 如果DBMS检查ICs,则存储的数据更符合真实世界的含义
    • Avoids data entry errors, too! 避免数据也输入错误

10、Where do ICs Come From?

  • Semantics 语义学 of the real world!
  • Key and foreign key ICs are the most common 键和外键ICs是最常见的
  • More general ICs supported too. 也支持更通用的ICs

11、Relational Query Languages 关系查询语言

  • Feature: Simple, powerful ad hoc querying 简单、功能强大的即时查询
  • Declarative languages 说明性语言
    • Queries precisely specify what to return 查询精确地指定要返回的内容
    • DBMS is responsible for efficient evaluation (how). DBMS负责有效的评估
    • Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change. 允许优化器广泛地重新排序操作,并且仍然确保答案不变。
      • Key to data independence! 数据独立性关键

The SQL Query Language SQL查询语言

  • The most widely used relational query language.

    • Current std is SQL:2008; SQL92 is a basic subset
  • To find all 18 year old students, we can write:
SELECT *FROM Students S
WHERE S.age=18
  • To find just names and logins, replace the first line:
SELECT S.name, S.login

Querying Multiple Relations 查询多个关系

  • What does the following query compute
SELECT S.name, E.cidFROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade='A'

12、Semantics of a Query 查询的语义

  • A conceptual evaluation method for the previous query: 之前的查询的概念评估方法:

    1. do FROM clause: compute cross-product of Students and Enrolled 计算学生和注册课程的叉积
    2. do WHERE clause: Check conditions, discard tuples that fail 检查条件,丢弃不要的元组
    3. do SELECT clause: Delete unwanted fields 删除不需要的字段
  • Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers. 记住,这是概念性的。实际评估将更加有效,但必须得出相同的答案。

13、Summary 总结

  • A tabular representation of data, simple and intuitive, currently the most widely used 数据的表格表示,简单直观,目前使用最广泛

    • Object-relational features in most products 大多数产品中的对象关系特性
  • Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations. 完整性约束可以由DBA根据应用程序语义指定。DBMS检查违规行为。
    • Two important ICs: primary and foreign keys 主键和外键
    • In addition, we always have domain constraints. 域约束
  • Powerful query languages exist.
    • SQL is the standard commercial one SQL是一种标准商用语言

      • DDL - Data Definition Language 数据定义语言
      • DML - Data Manipulation Language 数据操作语言

lec3 关系代数 Relational Algebra

1、Formal Relational Query Languages 形式关系查询语言
P74

  • Relational Algebra: More operational, very useful for representing execution plans. 关系代数:更具操作性,对于表示执行计划非常有用。
  • Relational Calculus: Describe what you want, rather than how to compute it. (Non-procedural, declarative.) 关系演算:描述你想要什么,而不是如何计算它。(非程序性、说明性。)

Preliminaries

  • A query is applied to relation instances 查询应用于关系实例
  • The result of a query is also a relation instance. 查询的结果也是一个关系实例。
    • Schemas of input relations for a query are fixed 查询的输入关系模式是固定的
    • Schema for the result of a query is also fixed. 查询结果的模式也是固定的。
      • determined by the query language constructs 由查询语言构造确定
  • Positional vs. named-field notation 位置与命名字段表示法:
    • Positional notation easier for formal definitions 位置表示法便于形式化定义
    • Named-field notation more readable. 命名字段表示法更具可读性。
    • Both used in SQL 两者都用于SQL
      • Though positional notation is discouraged 虽然不鼓励使用位置表示法

2、Relational Algebra: 5 Basic Operations 关系代数:5种基本运算

  • Selection ( σ ) 选择(选行)

    • Selects a subset of rows (horizontal) 选择行的子集(水平)
  • Projection ( π ) 投影(选列)
    • Retains only desired columns (vertical) 仅保留所需的列(垂直)
  • Cross-product ( × ) 叉乘/拼表
    • Allows us to combine two relations.
  • Set-difference ( — ) 差/减表
    • Tuples in r1, but not in r2.
  • Union ( ∪ ) 并表
    • Tuples in r1 or in r2.
  • Since each operation returns a relation, operations can be composed! (Algebra is “closed”.)

Projection ( π ) 投影
P76

  • Example:
  • Retains only attributes that are in the “projection list”. 仅保留“投影列表”中的属性。
  • Schema of result: 结果模式
    • the fields in the projection list 投影列表中的字段
    • with the same names that they had in the input relation. 与输入关系中的名称相同
  • Projection operator has to eliminate duplicates 必须消除重复项
    • Note: real systems typically don’t do duplicate elimination 实际系统通常不进行重复消除
    • Unless the user explicitly asks for it. 除非用户明确要求
    • (Why not?)

Selection ( σ ) 选择
P76

  • Selects rows that satisfy selection condition. 选择满足选择条件的行
  • Result is a relation. 结果是一种关系
    • Schema of result is same as that of the input relation. 结果的模式与输入关系的模式相同
  • Do we need to do duplicate elimination?

Union and Set-Difference 并差
P77

  • Both of these operations take two input relations, which must be union-compatible: 这两种操作都采用两种输入关系,它们必须是并集兼容的

    • Same number of fields. 相同数量的字段
    • ‘Corresponding’ fields have the same type. “对应”字段具有相同的类型
  • For which, if any, is duplicate elimination required?

Cross-Product 叉积
P78

  • S1 × R1:

    • Each row of S1 paired with each row of R1. S1的每一行与R1的每一行配对
  • Q: How many rows in the result?
  • Result schema has one field per field of S1 and R1, 结果牧师在S1和R1的每个字段中有一个字段
    • Field names `inherited’ if possible. 如果可能,字段名“继承”
    • Naming conflict: S1 and R1 have a field with the same name. 命名冲突:S1和R1有一个同名字段
    • Can use the renaming operator: 可以使用重命名运算符

重命名 ρ
P78

Compound Operator: Intersection 交
P77

  • On top of 5 basic operators, several additional “Compound Operators” 除5个基本运算符外,还有几个“复合运算符”

    • These add no computational power to the language 这些不增加语言的计算能力 有用的速记
    • Useful shorthand 可以用基本运算符单独表示
    • Can be expressed solely with the basic operators. 交集采用两个输入关系,它们必须是并集兼容的
  • Intersection takes two input relations, which must be union-compatible.
  • Q: How to express it using basic operators? 问:如何使用基本运算符表示它?
    • R ∩ S = R - (R - S)

Compound Operator: Join 连接
P78

  • Involve cross product, selection, and (sometimes) projection. 涉及叉积、选择和(有时)投影。
  • Most common type of join: “natural join” 最常见的连接类型:“自然连接”
    • R |X| S conceptually is:

      • Compute R × S 计算R×S
      • Select rows where attributes appearing in both relations have equal values 选择两个关系中出现的属性值相等的行
      • Project all unique attributes and one copy of each of the common ones. 投影所有唯一属性和每个公共属性的一个副本
  • Note: Usually done much more efficiently than this. 注意:通常完成比这更高效

Other Types of Joins

  • Condition Join (or “theta-join”) 条件连接 :
    P79

Result schema same as that of cross-product. 结果模式与叉积的结果模式相同。
May have fewer tuples than cross-product. 可能具有比叉积更少的元组。
Equi-Join 等值连接 : Special case: condition c contains only conjunction of equalities. 特例:条件c只包含等式的连接。

例子
P81

Summary 总结

  • Relational Algebra: a small set of operators mapping relations to relations 关系代数:将关系映射到关系的一小组运算符

    • Operational, in the sense that you specify the explicit order of operations 可操作性,即指定操作的显式顺序
  • A closed set of operators! Can mix and match. 一组闭合的运算符!可以混搭。
  • Basic ops include: σ, π, x, ∪, -,|X|
  • Important compound ops: ∩,

lec3 Storing Data: Disks and Files 存储数据:磁盘和文件

Block diagram of a DBMS 数据库管理系统的框图

Disks, Memory, and Files 磁盘、内存和文件

Disks and Files 磁盘和文件

  • DBMS stores information on disks. DBMS将信息存储在磁盘上。

    • Tapes are also used. 还使用磁带。
  • Major implications for DBMS design! DBMS设计的主要含义!
    • READ: transfer data from disk to main memory (RAM). 读取:将数据从磁盘传输到主存储器(RAM)。
    • WRITE: transfer data from RAM to disk. 写入:将数据从RAM传输到磁盘。
    • Both high-cost relative to memory references 两者都比内存引用成本高
      • Can/should plan carefully! 可以/应该仔细计划!

Why Not Store Everything in Main Memory? 为什么不把所有东西都存储在主内存中呢?

  • Costs too much. For ~$1000, PCConnection will sell you either 费用太高了。只需约1000美元,PCConnection即可向您出售

    • ~80GB of RAM (unrealistic) ~80GB内存(不切实际)
    • ~400GB of Flash USB keys (unrealistic) 约400GB闪存USB密钥(不现实)
    • ~180GB of Flash solid-state disk (serious) ~180GB闪存固态磁盘(严重)
    • ~7.7TB of disk (serious) 约7.7TB磁盘(严重)
  • Main memory is volatile. 主存储器是易失性的。
    • Want data to persist between runs. (Obviously!) 希望数据在两次运行之间保持不变。(显然!)

The Storage Hierarchy 存储层次结构
P231

  • Main memory (RAM) for currently used data. 用于当前使用数据的主存储器(RAM)。
  • Disk for main database (secondary storage). 主数据库磁盘(辅助存储)
  • Tapes for archive (tertiary storage). 用于存档的磁带(第三级存储)
  • The role of Flash (SSD) still unclear 闪存(SSD)的作用仍不清楚

Disks 磁盘
P231

  • Still the secondary storage device of choice. 仍然是首选的辅助存储设备。
  • Main advantage over tape: 与磁带相比的主要优势:
    • random access vs. sequential. 随机存取与顺序存取。
  • Fixed unit of transfer 固定转移单位
    • Read/write disk blocks or pages (8K) 读/写磁盘块或页(8K)
  • Not “random access” (vs. RAM) 非“随机存取”(与RAM相比)
    • Time to retrieve a block depends on location 检索块的时间取决于位置
    • Relative placement of blocks on disk has major impact on DBMS performance! 块在磁盘上的相对位置对DBMS性能有重大影响!

Components of a Disk 磁盘组件

  • The platters spin (say, 120 rps). 盘片旋转(比如120转)。
  • The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). 将臂组件移入或移出,以将头部定位在所需轨道上。头部下方的轨道构成一个圆柱体(想象中的!)。
  • Only one head reads/writes at any one time. 一次只能读取/写入一个磁头。
  • Block size is a multiple of sector size (which is fixed). 块大小是扇区大小(固定)的倍数。

Accessing a Disk Page 访问磁盘页

  • Time to access (read/write) a disk block: 访问(读/写)磁盘块的时间

    • seek time (moving arms to position disk head on track) 寻道时间(移动臂将磁头定位在磁道上)
    • rotational delay (waiting for block to rotate under head) 旋转延迟(等待块在头部下方旋转)
    • transfer time (actually moving data to/from disk surface) 传输时间(实际将数据移动到磁盘表面或从磁盘表面移动数据)
  • Seek time and rotational delay dominate. 寻道时间和旋转延迟占主导地位。
    • Seek time varies from 0 to 10msec 寻道时间从0到10毫秒不等
    • Rotational delay varies from 0 to 3msec 旋转延迟从0到3毫秒不等
    • Transfer rate around .02msec per 8K block 传输速率约为每8K块0.02毫秒
  • Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions? 降低I/O成本的关键:减少寻道/旋转延迟!硬件与软件解决方案?

Arranging Pages on Disk 在磁盘上排列页面

  • Next’ block concept:下一个“块”概念:

    • blocks on same track, followed by 同一磁道上的块,然后是
    • blocks on same cylinder, followed by 位于同一柱面上的块,然后是
    • blocks on adjacent cylinder 相邻柱面上的块
  • Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay. 文件中的块应在磁盘上按顺序排列(按“下一步”),以最小化寻道和旋转延迟。
  • For a sequential scan, pre-fetching several pages at a time is a big win! 对于顺序扫描,一次预取几个页面是一个巨大的胜利!

Disk Space Management 磁盘空间管理

  • Lowest layer of DBMS, manages space on disk 数据库管理系统的最底层,管理磁盘上的空间
  • Higher levels call upon this layer to: 更高级别要求该层
    • allocate/de-allocate a page 分配/取消分配页面
    • read/write a page 读/写一页
  • Request for a sequence of pages best satisfied by pages stored + sequentially on disk! 请求按顺序存储在磁盘上的页面最好满足页面序列!
    • Responsibility of disk space manager. 磁盘空间管理器的职责
    • Higher levels don’t know how this is done, or how free space is managed. 更高的级别不知道如何做到这一点,也不知道如何管理可用空间
    • Though they may make performance assumptions! 尽管他们可能会做出性能假设
      • Hence disk space manager should do a decent job. 因此,磁盘空间管理器应该做得很好

Context 环境

Files of Records 文件记录

  • Blocks are the interface for I/O, but… 块是I/O的接口,但是…
  • Higher levels of DBMS operate on records, and files of records. 更高级别的DBMS对记录和记录文件进行操作。
  • FILE: A collection of pages, each containing a collection of records. Must support: 文件:页面的集合,每个页面包含一组记录。必须支持:
    • insert/delete/modify record 插入/删除/修改记录
    • fetch a particular record (specified using record id) 获取特定记录(使用记录id指定)
    • scan all records (possibly with some conditions on the records to be retrieved) 扫描所有记录(可能对要检索的记录具有某些条件)
  • Typically implemented as multiple OS “files” 通常实现为多个操作系统“文件”
    • Or “raw” disk space 或“原始”磁盘空间

Unordered (Heap) Files 无序(堆)文件

  • Collection of records in no particular order. 不按特定顺序收集记录。
  • As file shrinks/grows, disk pages (de)allocated 随着文件的缩小/增长,磁盘页(不)被分配
  • To support record level operations, we must: 为了支持记录级操作,我们必须:
    • keep track of the pages in a file 跟踪文件中的页面
    • keep track of free space on pages 跟踪页面上的可用空间
    • keep track of the records on a page 跟踪页面上的记录
  • There are many alternatives for keeping track of this. 有很多方法可以跟踪这一点。
    • We’ll consider two. 我们考虑两个。

Heap File Implemented as a List 作为链表实现的堆文件

  • The header page id and Heap file name must be stored someplace. 头页id和堆文件名必须存储在某个位置。

    • Database “catalog” 数据库“目录”
  • Each page contains 2 `pointers’ plus data. 每页包含2个“指针”和数据。
  • One disadvantage 一个缺点
    • Virtually all pages will be on the free list if records are of variable length, i.e., every page may have some free bytes if we like to keep each record in a single page. 如果记录长度可变,则几乎所有页面都将位于空闲列表中,即,如果我们希望将每个记录保留在单个页面中,则每个页面可能都有一些空闲字节。

Heap File Using a Page Directory 使用页面目录堆文件

  • The directory is itself a collection of pages; each page can hold several entries. 目录本身就是一个页面集合;每页可以容纳多个条目。
  • The entry for a page can include the number of free bytes on the page. 页面的条目可以包括页面上的可用字节数。
  • To insert a record, we can search the directory to determine which page has enough space to hold the record. 要插入记录,我们可以搜索目录以确定哪个页面有足够的空间来保存记录。

Indexes (a sneak preview) 索引(预览)

  • A Heap file allows us to retrieve records: 堆文件允许我们检索记录

    • by specifying the rid (record id), or 通过指定rid(记录id),或
    • by scanning all records sequentially 按顺序扫描所有记录
  • Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., 有时,我们希望通过在一个或多个字段中指定值来检索记录,例如
    • Find all students in the “CS” department 查找“CS”系的所有学生
    • Find all students with a gpa > 3 查找gpa>3的所有学生
  • Indexes are file structures that enable us to answer such value-based queries efficiently. 索引是一种文件结构,使我们能够高效地回答此类基于值的查询。

Record Formats: Fixed Length 记录格式:固定长度

  • Information about field types same for all records in a file; stored in system catalogs. 关于文件中所有记录的相同字段类型的信息;存储在系统目录中。
  • Finding i’th field done via arithmetic. 通过运算找到第i个字段

Record Formats: Variable Length 记录格式:可变长度

  • Two alternative formats (# fields is fixed): 两种可选格式(#字段是固定的)

    • Fields Delimited by Special Symbols 由特殊符号分隔的字段
    • Array of Field Offsets 字段偏移数组
  • Second offers direct access to i’th field, efficient storage of nulls (special don’t know value); small directory overhead. 第二个提供了对第i个字段的直接访问,有效地存储空值(特殊的未知值);目录开销小。

Summary 总结

  • Disks provide cheap, non-volatile storage. 磁盘提供廉价的非易失性存储

    • Better random access than tape, worse than RAM 随机存取比磁带好,比RAM差
    • Arrange data to minimize seek and rotation delays. 安排数据以最小化寻道和旋转延迟
      • Depends on workload! 取决于工作量
  • Buffer manager brings pages into RAM. 缓冲区管理器将页面带入RAM
    • Page pinned in RAM until released by requestor. 页面固定在RAM中,直到请求者释放
    • Dirty pages written to disk when frame replaced (sometime after requestor unpins the page). 当帧被替换时(请求者解除页面锁定后的某个时间),脏页被写入磁盘
    • Choice of frame to replace based on replacement policy. 根据更换策略选择要更换的帧
    • Tries to pre-fetch several pages at a time. 尝试一次预取几页
  • DBMS vs. OS File Support DBMS与OS文件支持
    • DBMS needs non-default features DBMS需要非默认特性
    • Careful timing of writes, control over prefetch 仔细安排写入时间,控制预取
  • Variable length record format 可变长度记录格式
    • Direct access to i’th field and null values. 直接访问第i个字段和空值
  • Slotted page format 分槽页格式
    • Variable length records and intra-page reorg 可变长度记录和页面内重新排序
  • DBMS “File” tracks collection of pages, records within each. DBMS“文件”跟踪每个文件中的页面和记录的集合
    • Pages with free space identified using linked list or directory structure 使用链表或目录结构标识具有可用空间的页面
  • Indexes support efficient retrieval of records based on the values in some fields. 索引支持根据某些字段中的值高效检索记录
  • Catalog relations store information about relations, indexes and views. 目录关系存储有关关系、索引和视图的信息

lec4 查询语言 SQL: The Query Language

回顾

  • Relational Algebra (Operational Semantics) 关系代数(操作语义)

    • Given a query, how to mix and match the relational algebra operators to answer it 给定一个查询,如何混合和匹配关系代数运算符来解答它
    • Used for query optimization 用于查询优化
  • Relational Calculus (Declarative Semantics) 关系演算(说明语义)
    • Given a query, what do I want my answer set to include? 给定一个查询,我希望我的答案集包括什么?
  • Algebra and safe calculus are simple and powerful models for query languages for relational model 代数和安全演算是关系模式查询语言的简单而强大的模型
    • Have same expressive power 有同样的表现力
  • SQL can express every query that is expressible in relational algebra/calculus. (and more) SQL可以表达每一个可以用关系代数/演算表达的查询。(及更多)

Relational Query Languages 关系查询语言

  • Two sublanguages: 两个子语言:

    • DDL – Data Definition Language 数据定义语言

      • Define and modify schema (at all 3 levels) 定义和修改架构(在所有3个级别)
    • DML – Data Manipulation Language 数据操作语言
      • Queries can be written intuitively. 可以直观地编写查询。
  • DBMS is responsible for efficient evaluation. DBMS负责有效的评估。
    • The key: precise semantics for relational queries. 关键:关系查询的精确语义。
    • Optimizer can re-order operations 优化器可以重新排序操作
      • Won’t affect query answer. 不会影响查询答案。
    • Choices driven by “cost model” 由“成本模型”驱动的选择

The SQL Query Language SQL查询语言
P97

  • The most widely used relational query language. 最广泛使用的关系查询语言
  • Standardized 标准化
    • (although most systems add their own “special sauce” – including PostgreSQL) 尽管大多数系统都添加了自己的“特殊角色”——包括PostgreSQL
  • We will study SQL92 – a basic subset 我们将研究SQL92——一个基本子集

数据库例子
P98

Conceptual Evaluation 概念评估

  • The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list. 计算关系列表的叉积,丢弃不符合要求的元组,删除“不必要”字段,并根据分组列表中的属性值将剩余元组划分为组。
  • One answer tuple is generated per qualifying group. 每个符合条件的组生成一个答案元组。

Conceptual Evaluation 概念评估

  • Form groups as before. 像以前一样分组。
  • The group-qualification is then applied to eliminate some groups. 然后应用组限定来消除某些组。
    • Expressions in group-qualification must have a single value per group! 组限定中的表达式每个组必须有一个值!
    • That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list. (SQL does not exploit primary key semantics here!) 也就是说,组限定中的属性必须是聚合op的参数,或者也必须出现在分组列表中。(SQL在此不利用主键语义!)
  • One answer tuple is generated per qualifying group. 每个符合条件的组生成一个答案元组。

Two more important topics 还有两个重要的话题

  • Constraints 约束条件
  • SQL embedded in other languages 嵌入其他语言的SQL

Integrity Constraints (Review) 完整性约束(回顾)

  • An IC describes conditions that every legal instance of a relation must satisfy. IC描述了关系的每个合法实例必须满足的条件。

    • Inserts/deletes/updates that violate IC’s are disallowed. 不允许违反IC的插入/删除/更新。
    • Can ensure application semantics (e.g., sid is a key), or prevent inconsistencies (e.g., sname has to be a string, age must be < 200) 可以确保应用程序语义(例如,sid是一个键),或防止不一致(例如,sname必须是一个字符串,年龄必须小于200)
  • Types of IC’s: Domain constraints, primary key constraints, foreign key constraints, general constraints. IC的类型:域约束、主键约束、外键约束、一般约束。

General Constraints 一般约束
P50

  • Useful when more general ICs than keys are involved. 当涉及比键更通用的IC时非常有用。
  • Can use queries to express constraint. 可以使用查询来表示约束。
  • Checked on insert or update. 在插入或更新时选中。
  • Constraints can be named. 可以命名约束。

Summary 总结

  • Relational model has well-defined query semantics 关系模型具有定义良好的查询语义
  • SQL provides functionality close to basic relational model SQL提供了接近基本关系模型的功能
    (some differences in duplicate handling, null values, set operators, …) (在重复处理、空值、集合运算符等方面存在一些差异)
  • Typically, many ways to write a query 通常,有许多方法可以编写查询
    • DBMS figures out a fast way to execute a query, regardless of how it is written. DBMS找到了一种执行查询的快速方法,而不管它是如何编写的。

lec5 Tree-Structured Indexes 树结构索引

Review: Files, Pages, Records 回顾:文件、页面、记录

  • Abstraction of stored data is “files” with “pages” of “records”. 存储数据的抽象是“文件”和“记录”的“页面”。

    • Records live on pages 记录在页面上
    • Physical Record ID (RID) = <page#, slot#> 物理记录ID(RID)=<page#,slot#>
    • Records can have fixed length or variable length. 记录可以具有固定长度或可变长度。
  • Files can be unordered (heap), sorted, or kind of sorted (i.e., “clustered”) on a search key. 在搜索键上,文件可以是无序(堆)、排序或某种排序(即“聚簇”)。
  • Indexes can be used to speed up many kinds of accesses. (i.e., “access paths”) 索引可用于加速多种访问。(即“访问路径”)

Tree-Structured Indexes: Introduction 树索引介绍

  • Selections of form: field constant 选择形式
  • Equality selections (op is =) 相等选择
    • Either “tree” or “hash” indexes help here. “树”或“散列”索引在这里都有帮助
  • Range selections (op is one of <, >, <=, >=, BETWEEN) 范围选择
    • “Hash” indexes don’t work for these. “散列”索引对这些不起作用
  • More complex selections (e.g. spatial containment) 更复杂的选择(如空间包容)
    • There are fancier trees that can do this… 有更奇特的树可以做到这一点
  • Tree-structured indexing techniques support both range selections and equality selections. 树结构索引技术支持范围选择和相等选择
    • ISAM: static structure; early index technology. 静态结构;早期索引技术
    • B+ tree: dynamic, adjusts gracefully under inserts and deletes. 动态,在插入和删除发生时优雅地调整

Range Searches 范围搜索
P254

  • ``Find all students with gpa > 3.0’’

    • If data is in sorted file, do binary search to find first such student, then scan to find others. 若数据在已排序的文件中,则进行二分搜索以查找第一个此类学生,然后进行扫描以查找其他这种学生
    • Cost of binary search in a database can be quite high. 在数据库中进行二分搜索的成本可能相当高
      • Why???
  • Simple idea: Create an `index’ file, and then do binary search on (smaller) index file. 简单的想法:创建一个“索引”文件,然后对(较小的)索引文件进行二分搜索

ISAM (Indexed Sequential Access Method) 索引顺序存储方法
P255

  • 索引项:<search key value,page id>,它们直接搜索叶子中的数据项。
  • 每个节点可以容纳2个条目的示例;

ISAM is a STATIC Structure ISAM是一种静态结构
P256

  • File creation: 文件创建

    • Leaf (data) pages allocated sequentially, sorted by search key 按顺序分配的叶子(数据)页,按搜索键排序
    • then index pages 然后索引页面
    • then overflow pages. 然后溢出页
  • Search: Start at root; use key comparisons to go to leaf. 搜索:从根开始;使用键比较转到叶
  • Cost = log F N
    • F = # entries/page (i.e., fanout) F=#条目/页面(即扇出)(在一个节点中指向子节点的指针数量)
    • N = # leaf pages N=#叶子页数
    • no need for `next-leaf-page’ pointers. (Why?) 不需要“下一页”指针
  • Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. 插入:找到数据项所属的叶,并将其放在那个里面。如有必要,增加溢出页。
  • Delete: Seek and destroy! If deleting a tuple empties an overflow page, de-allocate it and remove from linked-list. 删除:寻找并销毁!若删除元组会清空溢出页面,则取消分配该页面并将其从链表中删除。

例子:插入23*,48*,41*,42*,然后删除42*,51*,97*
P256

B+ Tree Structure (1) B+树结构
P257

  • The ROOT node contains between 1 and 2d index entries. 根节点包含1到2d个索引项。

    • The parameter d is called the order of the tree. 参数d称为树的阶(秩)。
    • An index entry is a pair of <key, page id> 索引项是一对<key,page id>
    • the ROOT is a leaf or has at least two children. 根是一片叶子或至少有两个孩子。
  • Each internal node contains m (d ≤ m ≤ 2d) index entries. 每个内部节点包含m个(d≤ M≤ 2d)索引项。
    • Each internal node has m +1 children. 每个内部节点都有m+1个子节点。
  • Each leaf node contains m (d ≤ m ≤ 2d) data entries 每个叶节点包含m(d≤ M≤ 2d)数据项
    • A data entry is one of <key, record> or <key, RID> or <key, list of RIDs> 数据项是<key,record>或<key,RID>或<key,list of RID>
  • Each path from the ROOT to any leaf has the same length. 从根到任何叶子的每条路径都具有相同的长度。
    • Length is the number of nodes in a path. Length是路径中的节点数。
  • Supports equality and range-searches efficiently. 有效地支持相等和范围搜索。

B+ Tree Equality Search B+数相等搜索

  • Search begins at root, and key comparisons direct it to a leaf. 搜索从根开始,键比较将其指向叶。
  • Search for 15*…

B+ Tree Range Search B+树范围搜索

  • Search all records whose ages are in [15,28]. 搜索在[15,28]的所有记录

    • Equality search 15*. 和搜索15一样
    • Follow sibling pointers. 沿着兄弟指针

B+ Trees in Practice 实践中的B+树

  • Typical order: 100. Typical fill-factor: 67%. 典型阶数:100 典型填充系数:67%

    • average fanout = 133 平均扇出:133
  • Can often hold top levels in buffer pool:
    • Level 1 = 1 page = 8 KB
    • Level 2 = 133 pages = 1 MB
    • Level 3 = 17,689 pages = 145 MB
    • Level 4 = 2,352,637 pages = 19 GB
  • With 1 MB buffer, can locate one record in 19 GB (or 0.3 billion records) in two I/Os! 使用1 MB缓冲区,可以在两个I/O中定位19 GB(或3亿条记录)中的一条记录!

Inserting a Data Entry into a B+ Tree 向B+树中插入一个数据项
P261

  • Find correct leaf L. 找到正确的叶L
  • Put data entry onto L. 把数据放进到L
    • If L has enough space, done! 如果L有足够的空间,完成
    • Else, must split L (into L and a new node L2) 否则,必须拆分L(分为L和新节点L2)
      • Redistribute entries evenly, copy up middle key. 均匀地重新分配条目,向上复制中间键
      • Insert index entry pointing to L2 into parent of L. 将指向L2的索引项插入L的父项
  • This can happen recursively 这可能会递归发生
    • To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) 要分割索引节点,请均匀地重新分配条目,但向上推中间键。(与叶裂开形成对比。)
  • Splits “grow” tree; root split increases height. 分裂“生长”树;根分裂增加高度。
    • Tree growth: gets wider or one level taller at top. 树生长:顶部变宽或高一级。

Example B+ Tree – Inserting 8*
P261

可以使用重分布避免分裂,但在实践中通常不会使用。

Data vs. Index Page Split (from previous example of inserting “8*”) 数据页和索引页分裂对比

  • Observe how minimum occupancy is guaranteed in both leaf and index pg splits. 观察如何在叶子页和索引页拆分中保证最低占用率。
  • Note difference between copy-up and push-up; be sure you understand the reasons for this. 注意复制上去和弹上去之间的区别

Deleting a Data Entry from a B+ Tree 从B+树中删除一个数据项
P263

  • Start at root, find leaf L where entry belongs. 从根开始,找到条目所属的叶L。
  • Remove the entry. 删除条目
    • If L is at least half-full, done! 如果L至少有半满,完成。
    • If L has only d-1 entries, 如果L只有d-1个条目
      • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). 尝试重分布,从兄弟节点(与L具有相同父节点的相邻节点)借用
      • If re-distribution fails, merge L and sibling. 如果重分布失败,则合并L和兄弟结点
  • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. 如果发生合并,则必须从L的父项中删除条目(指向L或兄弟结点)
  • Merge could propagate to root, decreasing height. 合并可能会传播到根,从而降低高度

Example Tree (including 8*) Delete 19* and 20* …
P264

重分布
P265

Bulk Loading of a B+ Tree B+树的块加载
P268

  • Given: large collection of records 给定:大量的记录集合
  • Desire: B+ tree on some field 希望:某个领域的B+树
  • Bad idea: repeatedly insert records 坏主意:重复插入记录
    • Slow, and poor leaf space utilization . Why? 速度慢,叶空间利用率低。
  • Bulk Loading can be done much more efficiently. 块加载可以更有效地完成
  • Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. 初始化:对所有数据项进行排序,在新(根)页中插入指向第一(叶)页的指针。

  • Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) 叶子页的索引项总是插入到叶子层级上方最右边的索引页中。当这个填满时,它就会分裂。(拆分可能会沿最右边的路径到达根。)
  • Much faster than repeated inserts. 比重复插入快得多。

Summary of Bulk Loading 块加载总结

  • Option 1: multiple inserts. 多个插入

    • Slow. 缓慢的
    • Does not give sequential storage of leaves. 不提供叶子的顺序存储
  • Option 2: Bulk Loading 块加载
    • Fewer I/Os during build. 在构建过程中更少的I/O次
    • Leaves will be stored sequentially (and linked, of course). 叶子将按顺序存储(当然还有链接)
    • Can control “fill factor” on pages. 可以控制页面上的“填充因子”

A Note on `Order’

  • Order (d) makes little sense with variable-length entries 对于可变长度的条目,阶数(d)没有什么意义
  • Use a physical criterion in practice (`at least half-full’). 在实践中使用物理标准(“至少半满”)。
    • Index pages often hold many more entries than leaf pages. 索引页通常比叶子页包含更多的条目。
    • Variable sized records and search keys: 可变大小的记录和搜索键
      • different nodes have different numbers of entries. 不同的节点具有不同的条目数
    • Even with fixed length fields, Alternative (3) gives variable length 即使使用固定长度字段,备选方案也会给出可变长度
  • Many real systems are even sloppier than this — only reclaim space when a page is completely empty. 许多真正的系统甚至比这更草率 — 只在页面完全为空时回收空间

Summary 总结

  • Tree-structured indexes are ideal for range-searches, also good for equality searches. 树结构索引非常适合范围搜索,也适用于相等搜索。
  • ISAM is a static structure. ISAM是一种静态结构。
    • Only leaf pages modified; overflow pages needed. 只修改叶子页;需要溢出页面
    • Overflow chains can degrade performance unless size of data set and data distribution stay constant. 溢出链会降低性能,除非数据集和数据分布的大小保持不变
  • B+ tree is a dynamic structure. B+树是一种动态结构
    • Inserts/deletes leave tree height-balanced; log F N cost. 插入/删除保持树高平衡;logF N成本
    • High fanout (F) means depth rarely more than 3 or 4. 高扇出(F)表明深度很少超过3或4
    • Typically, 67% occupancy on average. 通常,平均填充系数为67%
    • Usually preferable to ISAM; adjusts to growth gracefully. 通常优于ISAM;优雅地适应成长
    • If data entries are data records, splits can change rids! 如果数据项是数据记录,拆分可以更改去除
  • Key compression increases fanout, reduces height. 按键压缩增加扇出,降低高度
  • Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set. 对于在大型数据集上创建B+树,块加载比重复插入快得多
  • B+ tree widely used because of its versatility. B+树因其多功能性而被广泛使用
    • One of the most optimized components of a DBMS. 数据库管理系统中最优化的组件之一

lec6 External Sorting 外部排序

Why Sort?
P315

  • A classic problem in computer science!
  • Data requested in sorted order 按排序顺序请求的数据
    • e.g., find students in increasing gpa order
  • First step in bulk loading B+ tree index. 块加载B+树索引的第一步
  • Useful for eliminating duplicates (Why?) 用于消除重复项
  • Useful for summarizing groups of tuples 于汇总元组的组
  • Sort-merge join algorithm involves sorting. 排序合并连接算法涉及排序
  • Problem: sort 100Gb of data with 1Gb of RAM. 问题:使用1Gb内存对100Gb数据进行排序
    • why not virtual memory? 为什么不是虚拟内存

2-Way Sort: Requires 3 Buffers 2路排序:需要3个缓冲区
P317

  • Pass 0: Read a page, sort it, write it. 第0趟:每次从文件中读取一个数据页,读入数据后,对其中的数据进行排序,写回磁盘

    • only one buffer page is used.
    • each sorted page (or subfiles) is called a run.
  • Pass 1, 2, 3, …, etc.: 从之前处理的输出中读入一对有序段并进行归并,生成两倍长的段
    • requires 3 buffer pages
    • merge pairs of runs into runs twice as long
    • three buffer pages used.

  • Each pass we read + write each page in file. 每一趟读入一个数据页,进行处理然后写回磁盘,对每个数据页读写磁盘两次
  • N pages in the file => the number of passes N个页面处理趟数为 向上取整[log2N] + 1
  • So total cost is: 2N(向上取整[log2N] + 1)
  • Idea: Divide and conquer: sort subfiles and merge 分治法:对子文件排序然后合并。

General External Merge Sort 常用外部合并排序
P318

  • To sort a file with N pages using B buffer pages: 使用B个可用主存页排序有N个数据页的文件

    • Pass 0: use B buffer pages. Produce 向上取整[N / B] sorted runs of B pages each. 第0趟:每次读入B个数据页,在主存内排序后生成向上取整[N / B]个长为B个数据页的段
    • Pass 1, 2, …, etc.: merge B-1 runs. 用B-1个缓存作输入,剩余的一个缓存作输出,同时归并B-1个有序段

Cost of External Merge Sort 外部排序花费
P319

  • Number of passes: 趟数为 1+向上取整[log(B-1)向上取整[N / B]]
  • Cost = 2N * (# of passes) 2N *趟数
  • E.g., with 5 buffer pages, to sort 108 page file: 用5个缓冲区排序包含108个数据页的文件

Blocked I/O for External Merge Sort

  • Do I/O a page at a time 一次I/O一页

    • Not one I/O per record 不是每个记录一个I/O
  • In fact, read a block (chunk) of pages sequentially! 事实上,按顺序读取一块页面!
  • Suggests we should make each buffer (input/output) be a block of pages. 建议将每个缓冲区(输入/输出)设为一个页面块。
    • But this will reduce fan-in during merge passes! 但这将减少合并过程中的扇入!
    • In practice, most files still sorted in 2-3 passes. 实际上,大多数文件仍按2-3遍排序。
  • Theme: Amortize a random I/O across more data read. 主题:将随机I/O分摊到更多读取的数据中。
  • But pay for it in memory footprint 但要在内存占用中付出消耗

Double Buffering 双缓冲
P323

  • Goal: reduce wait time for I/O requests during merge 目标:减少合并期间I/O请求的等待时间
  • Idea: 2 blocks RAM per run, disk reader fills one while sort merges the other 想法:每次运行2块RAM,磁盘读取器填充一个,而排序合并另一个
    • Potentially, more passes; in practice, most files still sorted in 2-3 passes. 潜在的,更多的趟;实际上,大多数文件仍按2-3遍排序。
  • Theme: overlap I/O and CPU activity via read-ahead (prefetching) 主题:通过预读(预取)重叠I/O和CPU活动

Using B+ Trees for Sorting 使用B+树来排序

  • Scenario: Table to be sorted has B+ tree index on sorting column(s). 场景:要排序的表在排序列上有B+树索引。
  • Idea: Can retrieve records in order by traversing leaf pages. 想法:可以通过遍历叶子页按顺序检索记录。
  • Is this a good idea? 这是个好主意吗?
  • Cases to consider:
    • B+ tree is clustered B+树是聚簇的
    • B+ tree is not clustered B+树不是聚簇的

Clustered B+ Tree Used for Sorting 使用聚簇索引的B+树来排序
P324

  • Cost: root to the left-most leaf, then retrieve all leaf pages (Alternative 1) 成本:从根开始到最左边的叶子,然后检索所有叶子页(备选方案1)
  • If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once. 如果使用备选方案2?检索数据记录的额外成本:每个页面只提取一次。
  • Always better than external sorting! 永远比外排序好

Unclustered B+ Tree Used for Sorting 使用非聚簇索引的B+树来排序
P324

  • Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record! 数据输入的备选方案(2);每个数据项都包含一条数据记录。通常,每个数据记录一个I/O!

Summary 总结

  • External sorting is important 外排序很重要
  • External merge sort minimizes disk I/O cost: 外部合并排序将磁盘I/O成本降至最低
    • Pass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs. 第0趟:生成大小为B(#缓冲页)的排序运行。以后的过程:合并运行。
    • of runs merged at a time depends on B, and block size. #一次合并的运行次数取决于B和块大小。

    • Larger block size means less I/O cost per page. 较大的块大小意味着每页的I/O成本较低。
    • Larger block size means smaller # runs merged. 较大的块大小意味着较小的#运行合并。
    • In practice, # of runs rarely more than 2 or 3. 在实践中#的运行次数很少超过2或3次。
  • Choice of internal sort algorithm may matter: 选择内部排序算法可能很重要:
    • Quicksort: Quick! 快速排序:快!
    • Heap/tournament sort 堆/锦标赛排序
  • The best sorts are wildly fast: 最好的排序非常快:
    • Despite 40+ years of research, still improving!
  • Clustered B+ tree is good for sorting; unclustered tree is usually very bad. 聚簇B+树有利于排序;非聚簇的树通常很糟糕。

lec7 Hash-Based Indexes

Introduction 介绍

  • As for any index, 3 alternatives for data entries k*: 对于任何索引,数据项k*有3个备选方案

    • Data record with key value k 具有键值k的数据记录
    • <k, rid of data record with search key valuek> <k,删除带有搜索关键字值k的数据记录>
    • <k, list of rids of data records with search key k> <k,具有搜索键k的数据记录的RID列表>
    • Choice orthogonal to the indexing technique 与索引技术正交的选择
  • Hash-based indexes are best for equality selections. Cannot support range searches. 基于哈希的索引最适合于相等选择。无法支持范围搜索。
  • Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees. 存在静态和动态哈希技术;类似于ISAM与B+树的权衡。

Static Hashing 静态哈希
P278

  • The number of primary pages is fixed. 主页面的数量是固定的。
  • Primary pages are allocated sequentially, never de-allocated; 主页面按顺序分配,从不取消分配;
    • overflow pages if needed. 如果需要,创建溢出页面。
  • h(k) mod N = bucket to which data entry with key k belongs. (N = number of buckets) h(k)mod N=带有k键的数据输入所属的存储桶。(N=桶的数量)

  • Buckets contain data entries. 桶包含数据条目。
  • Hash function works on search key field of record r. Must distribute values over range 0 … N-1. 哈希函数作用于记录r的搜索键字段。必须将值分布在范围0 … N-1上。
    • h(key) = (a * key+ b) usually works well. h(key) = (a * key+ b)通常工作正常。
    • a and b are constants; lots known about how to tune h. a和b是常数;很多知道如何调整h。
  • Long overflow chains can develop and degrade performance. 长溢出链可能会伸长并降低性能。
  • Extendible and Linear Hashing: Dynamic techniques to fix this problem. 可扩展和线性哈希:解决此问题的动态技术。

Extendible Hashing 可扩展哈希
P279

  • Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling the number of buckets? 情况:存储桶(主页)已满。为什么不通过加倍存储桶的数量来重新组织文件?

    • Reading and writing all pages is expensive! 读和写所有页面都很昂贵!
  • Idea of Extendible Hashing: 可扩展哈希的概念:
    • Use directory of pointers to buckets, double the number of buckets by doubling the directory, 使用指向存储桶的指针目录,通过加倍目录将存储桶的数量加倍,
    • splitting just the bucket that overflowed! 将满溢的桶分开!
  • Directory is much smaller than file, so doubling it is much cheaper. 目录比文件小得多,所以翻倍要便宜得多。
  • Only one page of data entries is split. No overflow page! 只拆分一页数据条目。没有溢出页面!
  • Trick lies in how hash function is adjusted! 诀窍在于如何调整哈希函数!

例子
P279

Points to Note

  • 20 = binary 10100. Last 2 bits (00) tell us r belongs in A or A2. Last 3 bits needed to tell which. 最后2位(00)告诉我们r属于A或A2。最后3位告诉属于哪个

    • Global depth of directory: Max number of bits needed to tell which bucket an entry belongs to. 全局目录深度:告诉条目属于哪个bucket所需的最大位数。
    • Local depth of a bucket: number of bits used to determine if an entry belongs to this bucket. 桶的局部深度:用于确定条目是否属于该桶的位数。
  • When does bucket split cause directory doubling? 什么时候桶分裂会导致目录加倍
    • Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth. 插入前,桶的局部深度=全局深度。插入使桶的局部深度大于全局深度。

目录翻倍

Equality Search in Extendible Hashing 可扩展哈希中的相等搜索
P283

  • If directory fits in memory, equality search answered with one disk access; else two. 若目录在内存中,等值选择能在一次磁盘访问中完成;另外两点

    • 100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and 25,000 directory elements; 100MB文件,每个数据项是100字节,4KB大小的页面包含1000000条记录(作为数据项)和25000个目录元素
    • chances are high that directory will fit in memory. 目录很有可能放入内存中。

Delete in Extendible Hashing 可扩展哈希中的删除
P282

  • If removal of data entry makes a bucket empty, the bucket can be merged with its `split image’. 如果删除数据条目使桶为空,则该桶可以与其“分裂映像”合并。
  • If each directory element points to same bucket as its split image, we can halve the directory. 如果每个目录元素指向与其分裂映像相同的桶,我们可以将目录减半。

Linear Hashing (LH) 线性哈希
P283

  • This is another dynamic hashing scheme, an alternative to Extendible Hashing. 这是另一个动态哈希方案,是可扩展哈希的替代方案。
  • LH handles the problem of long overflow chains without using a directory, and handles duplicates. LH在不使用目录的情况下处理长溢出链的问题,并处理重复的溢出链。
    • What problem will duplicates cause in Extendible Hashing? 在可扩展散列中,重复会导致什么问题?

The Idea of Linear Hashing 线性哈希的想法
P283

  • Use a family of hash functions h0, h1, h2, …, where hi+1 doubles the range of hi (similar to directory doubling) 利用哈希函数h0, h1, h2, …的家族,其特性是每个函数的区间都是它前辈的两倍

    • hi(key) = h(key) mod (2iN); N = initial # buckets 哈希函数h,桶的初始数N
    • h is some hash function (range is not 0 to N-1)
    • If N = 2^d0, for some d0, hi consists of applying h and looking at the last di bits, where di = d0 + i. 可以应用h,察看后di位,其中d0是表达N需要的位数,并且di = d0 + i
  • Directory avoided in LH by using overflow pages, and choosing bucket to split round-robin. 在LH中使用溢出页并选择痛来拆分循环,从而避免了目录。
    • Splitting proceeds in `rounds’. 将过程分成“轮”
    • Round ends when all NR initial (for round R) buckets are split. 当所有NR初始(第R轮)桶分裂时,每一轮结束
    • Buckets 0 to Next-1 have been split; Next to NR yet to be split. 桶0到Next-1已拆分;Next-1到NR尚未分裂。
    • Current round number is Level. 当前轮数为计数

Bucket Split 桶分裂
P284

  • A split can be triggered by 分裂可以通过以下方式触发

    • the addition of a new overflow page 添加新的溢出页
    • conditions such as space utilization 空间利用率的条件限制
  • Whenever a split is triggered, 无论何时分裂被触发
    • the Next bucket is split, Next指向的桶将被分裂
    • and hash function hLevel+1 redistributes entries between this bucket (say bucket number b) and its split image; 哈希函数hLevel+1将在该桶(设桶号为b)和它的分裂映像之间重新分布项
    • the split image is therefore bucket number b+NLevel. 因此,分裂映像的桶号是b+NLevel
    • Next <- Next + 1. 分裂完一个桶后,Next的值加一

线性哈希例子
P284

Extendible VS. Linear Hashing 可扩展哈希和线性哈希对比

  • Imagine that we also have a directory in LH with elements 0 to N-1. 假设在LH中有一个目录,其中包含元素0到N-1。

    • The first split is at bucket 0, and so we add directory element N. 第一次分裂在桶0处,因此添加了目录元素N。
    • Imagine directory being doubled at this point, but elements <1,N+1>, <2,N+2>, … are the same. So, we can avoid copying elements from 1 to N-1. 设想目录在这一点上翻了一番,但是元素<1,N+1>,<2,N+2>,…都是一样的。因此,我们可以避免将元素从1复制到N-1。
    • We process subsequent splits in the same way, 以同样的方式处理后续拆分,
    • And at the end of the round, all the orginal N buckets are split, and the directory is doubled in size. 在这一轮结束时,所有原始的N个桶都被拆分,目录的大小增加了一倍。
  • i.e., LH doubles the imaginary directory gradually. LH逐渐将虚拟目录加倍

Summary 总结

  • Hash-based indexes: best for equality searches, cannot support range searches. 基于哈希的索引:最适合等值搜索,不支持范围搜索。
  • Static Hashing can lead to long overflow chains. 静态哈希可能导致长溢出链。
  • Extendible Hashing avoids overflow pages by splitting a full bucket when a new data entry is to be added to it. 可扩展哈希通过在添加新数据项时分裂一个完整的桶来避免溢出页面。
  • Linear Hashing avoids directory by splitting buckets round-robin, and using overflow pages. 线性哈希通过循环分割存储桶和使用溢出页来避免目录

lec8 Implementation of Relational Operations 关系操作的实现

Introduction

  • Next topic: QUERY PROCESSING 下一个主题:查询处理
  • Some database operations are EXPENSIVE 有些数据库操作很昂贵
  • Huge performance gained by being “smart” “聪明”带来的巨大业绩
    • We’ll see 1,000,000x over naïve approach
  • Main weapons are:
    • clever implementation techniques for operators 运算符的巧妙实现技术
    • exploiting relational algebra “equivalences” 利用关系代数的“等价性”
    • using statistics and cost models to choose 使用统计数据和成本模型进行选择

Simple SQL Refresher 简单SQL复习

SELECT <list-of-fields>FROM <list-of-tables>
WHERE <condition>SELECT S.name, E.cidFROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade='A'

A Really Bad Query Optimizer 一个非常糟糕的查询优化器

  • For each Select-From-Where query block 对于每个Select-From-Where查询块

    • Create a plan that: 创建一个计划

      • Forms the cross product of the FROM clause 形成FROM子句的叉积
      • Applies the WHERE clause 应用WHERE子句
  • (Then, as needed: 然后,根据需要:
    • Apply the GROUP BY clause 应用GROUP BY子句
    • Apply the HAVING clause 应用HAVING子句
    • Apply any projections and output expressions 应用任何投影和输出表达式
    • Apply duplicate elimination and/or ORDER BY) 应用重复消除和/或ORDER BY

Cost-based Query Sub-System 基于成本的查询子系统

The Query Optimization Game 查询优化博弈

  • Goal is to pick a “good” plan 目标是选择一个“好”的计划

    • Good = low expected cost, under cost model 良好=低预期成本,低于成本模型
    • Degrees of freedom: 自由度
      • access methods 访问方法
      • physical operators 物理操作员
      • operator orders 操作员命令
  • Roadmap for this topic: 本主题的路线图:
    • First: implementing individual operators 首先:实现单个操作符
    • Then: optimizing multiple operators 然后:优化多个操作符

Relational Operations 关系操作

  • We will consider how to implement:

    • Selection ( σ ) Select a subset of rows. 选择行的子集
    • Projection ( π ) 投影 Remove unwanted columns. 移除不需要的列
    • Join ( × ) Combine two relations. 联合两个关系
    • Set-difference ( − ) Tuples in reln. 1, but not in reln. 2.
    • Union ( ∪ ) 并表 Tuples in reln. 1 and in reln. 2.
  • Q: What about Intersection?

Schema for Examples 模式例子
P329

Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

  • Sailors: 水手

    • Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 每个元组50字节,每一页可以容纳80个Reserves元组,共有500个这样的页
    • [S]=500, pS=80.
  • Reserves: 预约
    • Each tuple is 40 bytes, 100 tuples per page, 1000 pages. 每个元组40字节,每一页可以容纳100个Reserves元组,共有1000个这样的页
    • [R]=1000, pR=100.

Simple Selections 简单选择
P329

  • How best to perform? Depends on: 怎样最好的执行?取决于:

    • what indexes are available 有哪些索引可用
    • expected size of result 预期结果大小
  • Size of result approximated as 结果的大小近似为
    (size of R) * selectivity (R的大小)*选择性
  • selectivity estimated via statistics – we will discuss shortly. 通过统计数据估计的选择性–我们将很快讨论。

Our options … 我们的选择
P329

  • If no appropriate index exists: 如果不存在适当的索引:
    Must scan the whole relation 必须扫描整个关系
  • cost = [R]. For “reserves” = 1000 I/Os. 成本=[R]。对于“reserves”=1000 I/O。

P331

  • With index on selection attribute: 在选择属性上使用索引:

      1. Use index to find qualifying data entries 使用索引查找符合条件的数据项
      1. Retrieve corresponding data records 检索相应的数据记录
  • Total cost = cost of step 1 + cost of step 2 总成本=步骤1的成本+步骤2的成本
    • For “reserves”, if selectivity = 10% (100 pages, 10000 tuples): 对于“reserves”,如果选择性=10%(100页,10000元组):
    • If clustered index, cost is a little over 100 I/Os; 如果使用聚簇索引,则成本略高于100 I/O;
    • If unclustered, could be up to 10000 I/Os! … unless … 如果使用非聚簇索引,则可能到达有10000个I/O…除非…

Refinement for unclustered indexes 非聚簇索引的优化

  1. Find qualifying data entries. 查找符合条件的数据条目。
  2. Sort the rids of the data records to be retrieved. 对要检索的数据记录的RID进行排序。
  3. Fetch rids in order. 按顺序取出RID
    Each data page is looked at just once (though # of such pages likely to be higher than with clustering). 每个数据页只被查看一次(尽管这些页面中的#个可能比使用聚簇时要高

General Selection Conditions 一般的选择条件
P331

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3

  • First, convert to conjunctive normal form (CNF): 首先,转换为合取范式

    • (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3)
  • We only discuss the case with no ORs 我们只讨论这个没有or的例子
  • Terminology: 术语
    • A B-tree index matches terms that involve only attributes in a prefix of the search key. e.g.: B树索引匹配只涉及搜索键前缀中属性
    • Index on <a, b, c> matches a=5 AND b= 3, but not b=3. <a,b,c>上的索引匹配a=5和b=3,但不匹配b=3。

2 Approaches to General Selections 一般选择的2种方法

Approach I: 方法一

    1. Find the cheapest access path 找到最便宜的访问路径
    1. retrieve tuples using it 使用它检索元组
    1. Apply any remaining terms that don’t match the index 应用与索引不匹配的任何剩余术语

Cheapest access path: An index or file scan that we estimate will require the fewest page I/Os. 最便宜的访问路径:我们估计需要最少页面I/O的索引或文件扫描。

Cheapest Access Path - Example 花费最少的访问路径 - 例子
P332

query: day < 8/9/94 AND bid=5 AND sid=3

some options:
B+tree index on day; check bid=5 and sid=3 afterward. 利用属性域day上的B+树索引,找出满足条件的元组标识
hash index on <bid, sid>; check day<8/9/94 afterward. 哈希索引,找出满足条件的元组标识

How about a B+tree on <rname, day>?
How about a B+tree on <day, rname>?
How about a Hash index on <day, rname>?

Approach II: use 2 or more matching indexes. 方法二:使用2个或更多匹配索引。

    1. From each index, get set of rids 从每个索引中,获取一组RID
    1. Compute intersection of rid sets 计算rid集的交集
    1. Retrieve records for rids in intersection 检索交集中RID的记录
    1. Apply any remaining terms 适用任何剩余项

EXAMPLE: day<8/9/94 AND bid=5 AND sid=3

  • Suppose we have an index on day, and another index on sid. 假设在day有一个索引,在sid上有另一个索引。
  • Get rids of records satisfying day<8/9/94. 获取满足日期<8/9/94的记录的RID。
  • Also get rids of records satisfying sid=3. 还要获取满足sid=3的记录的RID。
  • Find intersection, then retrieve records, then check bid=5. 找到交集,然后检索记录,然后检索bid=5。

Projection 投影
P334

SELECT DISTINCTR.sid, R.bid
FROM Reserves R
  • Issue is removing duplicates. 问题是删除重复项
  • Use sorting!! 使用排序
      1. Scan R, extract only the needed attributes 扫描R,仅提取所需的属性
      1. Sort the resulting set 对结果集进行排序
      1. Remove adjacent duplicates 删除相邻的重复项
    • Cost: 费用
      • Ramakrishnan/Gehrke writes to temp table at each step! 在每一步都写入临时表
      • Reserves with size ratio 0.25 = 250 pages. 大小比为0.25的Reserves=250页
      • With 20 buffer pages can sort in 2 passes, so: 1000 +250 + 2 * 2 * 250 + 250 = 2500 I/Os 由于有20个缓冲页,可以在2个过程中进行排序,因此:1000+250+22250+250=2500 I/O

Projection – improved 投影 – 优化
P335

  • Avoid the temp files, work on the fly: 避免临时文件,动态工作

    • Modify Pass 0 of sort to eliminate unwanted fields. 优化第0趟以消除不需要的字段。
    • Modify Passes 1+ to eliminate duplicates. 优化第1趟以消除重复项。
    • Cost:
      • Reserves with size ratio 0.25 = 250 pages.
      • With 20 buffer pages can sort in 2 passes, so:
          1. Read 1000 pages
          1. Write 250 (in runs of 40 pages each) = 7 runs
          1. Read and merge runs (20 buffers, so 1 merge pass!) 读取和合并run
      • Total cost = 1000 + 250 +250 = 1500.

Other Projection Tricks
P337

  • If an index search key contains all wanted attributes: 如果索引搜索键包含所有需要的属性

    • Do index-only scan 只扫描索引

      • Apply projection techniques to data entries (much smaller!) 将投影技术应用于数据条目(小得多!)
  • If a B+Tree index search key prefix has all wanted attributes: 如果B+树索引搜索键前缀具有所有需要的属性
    • Do in-order index-only scan 按顺序只进行索引扫描

      • Just retrieve the data entries in order; 只需按顺序检索数据条目
      • Discarding unwanted fields; 丢弃不需要的字段
      • Compare adjacent tuples on the fly to check for duplicates. 动态比较相邻元组以检查重复项

Joins 连接
P338

SELECT  *
FROM     Reserves R1, Sailors S1
WHERE  R1.sid=S1.sid
  • Joins are very common. 连接非常常见
  • R x S is large; so, R x S followed by a selection is inefficient. 后跟一个选择是低效的
  • Many approaches to reduce join cost. 许多降低连接成本的方法
  • Join techniques we will cover today:
      1. Nested-loops join 嵌套循环连接
      1. Index-nested loops join 索引嵌套循环连接
      1. Sort-merge join 排序合并连接

Block Nested Loops Join 块嵌套循环连接
P339

Hash-Join 哈希连接
P345

Memory Requirements of Hash-Join 对内存的需求
P346

Cost of Hash-Join 哈希连接的花费

  • In partitioning phase, read+write both relns; 2(M+N). In matching phase, read both relns; M+N I/Os. 在分区阶段,读+写两个reln;2(M+N)。在匹配阶段,读取两个reln;M+N I/O。
  • In our running example, this is a total of 4500 I/Os. 在我们正在运行的示例中,总共有4500个I/O。
  • Sort-Merge Join vs. Hash Join: 排序合并连接与哈希连接:
    • Given a minimum amount of memory (what is this, for each?) both have a cost of 3(M+N) I/Os. Hash Join superior if relation sizes differ greatly (e.g., if one reln fits in memory). Also, Hash Join shown to be highly parallelizable. 给定最小内存量(这是什么,每一个?)两者都有3(M+N)I/O的成本。如果关系大小差异很大(例如,如果内存中有一个reln),则哈希连接优于其他连接。此外,哈希连接显示出高度的可并行性。
    • Sort-Merge less sensitive to data skew; result is sorted. 排序合并对数据倾斜不太敏感;结果已排序。

Set Operations 集合操作
P349

  • Intersection and cross-product as special cases of join. 交集和叉积作为连接的特例
  • Union (Distinct) and Except similar; we’ll do union.
  • Sorting based approach to union: 基于排序的联合方法
    • Sort both relations (on combination of all attributes). 对两个关系进行排序(根据所有属性的组合)
    • Scan sorted relations and merge them. 扫描已排序的关系并合并它们
    • Alternative: Merge runs from Pass 0 for both relations. 备选方案:合并从两个关系的第0趟运行
  • Hash based approach to union: 基于哈希的联合方法
    • Partition R and S using hash function h. 使用哈希函数h划分R和S
    • For each S-partition, build in-memory hash table (using h2), scan corresponding R-partition and add tuples to table while discarding duplicates. 对于每个S分区,构建内存哈希表(使用h2),扫描相应的R分区并向表中添加元组,同时丢弃重复的元组。

General Join Conditions 一般连接条件
P348

  • Equalities over several attributes (e.g., R.sid=S.sid AND R.rname=S.sname): 多个属性上的等式(例如,R.sid=S.sid和R.rname=S.sname):

    • For Index NL, build index on <sid, sname> (if S is inner); or use existing indexes on sid or sname. 对于索引NL,在<sid,sname>上构建索引(如果S是内部的);或者使用sid或sname上的现有索引。
    • For Sort-Merge and Hash Join, sort/partition on combination of the two join columns. 对于排序合并和哈希连接,根据两个连接列的组合进行排序/分区。
  • Inequality conditions (e.g., R.rname < S.sname): 不等式条件
    • For Index NL, need (clustered!) B+ tree index. 对于索引NL,需要(聚簇!)B+树索引。

      • Range probes on inner; # matches likely to be much higher than for equality joins. 在内部探查范围;#匹配可能远高于相等连接的匹配
    • Hash Join, Sort Merge Join not applicable! 哈希联接、排序合并联接不适用!
    • Block NL quite likely to be the best join method here. 块NL很可能是这里最好的连接方法。

Aggregate Operations (AVG, MIN, etc.) 聚集操作(平均值、最小值等)
P350

Example:
SELECT AVG(S.age)
FROM    Sailors S
  • Without grouping: 不分组

    • In general, requires scanning the relation. 通常,需要扫描关系
    • Given a tree index whose search key includes all attributes in the SELECT or WHERE clauses, can do index-only scan. 给定一个树索引,其搜索键包含SELECT或WHERE子句中的所有属性,则只能执行索引扫描。
  • With grouping: 分组
    • Sort on group-by attributes, then scan relation and compute aggregate for each group. (Better: combine sorting and aggregate computation.) 按属性分组排序,然后扫描关系并计算每个组的聚合。(更好:将排序和聚合计算结合起来。)
    • Similar approach based on hashing on group-by attributes. 类似的方法基于按属性分组的哈希。
    • Given a tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do index-only scan; 给定一个树索引,其搜索键包括SELECT、WHERE和GROUP BY子句中的所有属性,则只能进行索引扫描;
      • if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order. 若按属性分组形成搜索键的前缀,则可以按顺序分组检索数据项/元组。

Summary 总结

  • Queries are composed of a few basic operators; 查询由几个基本运算符组成

    • The implementation of these operators can be carefully tuned (and it is important to do this!). 可以仔细调整这些操作符的实现(这一点很重要!)
  • Many alternative implementation techniques for each operator; no universally superior technique for most. 针对每个操作符的许多替代实施技术;对于大多数来说,没有普遍优越的技术
  • Must consider alternatives for each operation in a query and choose best one based on statistics, etc. 必须考虑查询中的每个操作的备选方案,并基于统计等选择最佳操作
  • This is part of the broader task of Query Optimization, which we will cover next! 这是查询优化这一更广泛任务的一部分,我们将在下一步介绍它

lec9 Relational Query Optimization 关系查询优化

Query Optimization Overview

  • Query can be converted to relational algebra 查询可以转换为关系代数
  • Relational Algebra converts to tree, joins form branches 关系代数转换为树,连接形成分支
  • Each operator has implementation choices 每个操作符都有实现选项
  • Operators can also be applied in different order! 运算符也可以按不同顺序应用!

  • Plan: Tree of Relation Algebra operations (and some others) with choice of algorithm for each operation. 计划:关系代数操作树(以及其他一些操作),为每个操作选择算法。
  • Three main issues: 三个主要问题:
    • For a given query, what plans are considered? 对于给定的查询,考虑哪些计划?
    • How is the cost of a plan estimated? 如何估算计划的成本?
    • How do we “search” in the “plan space”? 我们如何在“计划空间”中“搜索”?
  • Ideally: Want to find best plan. 理想情况下:想找到最好的计划。
  • Reality: Avoid worst plans! 现实:避免最糟糕的计划!

Cost-based Query Sub-System 基于花费的查询子系统

Usually there is a heuristics-based rewriting step before the cost-based steps. 通常在基于成本的步骤之前有一个基于启发式的重写步骤。

Schema for Examples 模式实例

Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

  • Reserves: 预约

    • Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
    • Assume there are 100 boats 每个元组40字节,每一页可以容纳100个Reserves元组,共有1000个这样的页
  • Sailors: 水手
    • Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
    • Assume there are 10 different ratings 每个元组50字节,每一页可以容纳80个Reserves元组,共有500个这样的页
  • Assume we have 5 pages in our buffer pool! 假设缓冲池中有5个页面!

Motivating Example

SELECT  S.sname
FROM  Reserves R, Sailors S
WHERE  R.sid=S.sid AND R.bid=100 AND S.rating>5
  • Cost: 500+500*1000 I/Os
  • By no means the worst plan!
  • Misses several opportunities:
    • selections could be`pushed’ down
    • no use made of indexes
  • Goal of optimization: Find faster plans that compute the same answer.

Alternative Plans – Push Selects (No Indexes)

Summing up 总结

  • There are lots of plans 有很多计划

    • Even for a relatively simple query 即使是相对简单的查询
  • People tend to think they can pick good ones by hand 人们倾向于认为他们可以q亲手挑选好的
    • MapReduce is based on that assumption MapReduce就是基于这个假设
  • Not so clear that’s true! 不太清楚那是真的!
    • Machines are better at enumerating options than people 机器比人更擅长列举选项
    • But we will see soon how optimizers make simplifying assumptions 但我们很快就会看到优化器如何简化假设

What is Needed for Optimization? 优化需要什么

  • A closed set of operators 闭合操作符集

    • Relational ops (table in, table out) 关系操作(表输入、表输出)
    • Encapsulation (e.g. based on iterators) 封装(例如,基于迭代器)
  • Plan space 规划空间
    • Based on relational equivalences, different implementations 基于关系等价性,不同的实现
  • Cost Estimation, based on 成本估算,基于
    • Cost formulas 成本公式
    • Size estimation, in turn based on 尺寸估算,依次基于
      • Catalog information on base tables 基表上的目录信息
      • Selectivity (Reduction Factor) estimation 选择性(折减系数)估算
  • A search algorithm: To sift through the plan space and find lowest cost option! 一个搜索算法:在计划空间中筛选并找到成本最低的选项!

Query Optimization 优化查询

  • Will focus on “System R” (Selinger) style optimizers 将重点关注“System R”(Selinger)风格的优化器

Highlights of System R Optimizer R系统优化器的亮点

  • Impact:

    • Most widely used currently; works well for 10-15 joins. 目前使用最广泛;适用于10-15个连接。
  • Cost estimation: 成本估算:
    • Very inexact, but works OK in practice. 非常不精确,但在实践中效果良好。
    • Statistics in system catalogs used to estimate cost of operations and result sizes. 用于估计操作成本和结果大小的系统目录中的统计信息。
    • Considers combination of CPU and I/O costs. 考虑CPU和I/O成本的组合。
    • System R’s scheme has been improved since that time. 自那时以来,System R的方案得到了改进。
  • Plan Space: Too large, must be pruned. 计划空间:太大,必须修剪。
    • Many plans share common, “overpriced” subtrees 许多计划共享共同的“价格过高”子树

      • ignore them all! 别理他们!
    • In some implementations, only the space of left-deep plans is considered. 在一些实现中,只考虑左深平面的空间。
    • Cartesian products avoided in some implementations. 在某些实现中避免使用笛卡尔积。

Query Blocks: Units of Optimization 查询块:优化单元

  • Break query into query blocks 将查询分解为查询块
  • Optimized one block at a time 一次优化一个块
  • Uncorrelated nested blocks computed once 一次计算的不相关嵌套块
  • Correlated nested blocks like function calls 相关嵌套块,如函数调用
    • But sometimes can be “decorrelated” 但有时可能是“不相关的”
    • Beyond the scope of introductory course! 超出介绍课程的范围!
  • For each block, the plans considered are: 对于每个块,考虑计划为
    • All available access methods, for each relation in FROM clause. FROM子句中每个关系的所有可用访问方法。
    • All left-deep join trees 所有左深连接树
      • right branch always a base table 右分支始终是基表
      • consider all join orders and join methods 考虑所有连接顺序和连接方法

Schema for Examples

Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

  • Reserves: 预约

    • Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. 100 distinct bids. 每个元组40字节,每一页可以容纳100个Reserves元组,共有1000个这样的页,100个独立的bid
  • Sailors: 水手
    • Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 10 ratings, 40,000 sids. 每个元组50字节,每一页可以容纳80个Reserves元组,共有500个这样的页,40000个dis

Translating SQL to Relational Algebra 将SQL翻译为关系代数
P359

Relational Algebra Equivalences 关系代数等式

  • Allow us to choose different join orders and to `push’ selections and projections ahead of joins. 允许我们选择不同的连接顺序,并在连接之前“推送”选择和投影。

More Equivalences 更多等式

  • A projection commutes with a selection that only uses attributes retained by the projection. 投影与仅使用投影保留的属性的选择进行转换。
  • Selection between attributes of the two arguments of a cross-product converts cross-product to a join. 在叉积的两个参数的属性之间进行选择将叉积转换为连接。

Cost Estimation 花费估计
P360

  • For each plan considered, must estimate total cost: 对于考虑的每个计划,必须估算总成本:

    • Must estimate cost of each operation in plan tree. 必须在计划树中估计每个操作的成本。

      • Depends on input cardinalities. 取决于输入基数。
      • We’ve already discussed this for various operators 我们已经为不同的运算符讨论过这个问题
        • sequential scan, index scan, joins, etc. 顺序扫描、索引扫描、连接等。
    • Must estimate size of result for each operation in tree! 必须估计树中每个操作的结果大小!
      • Use information about the input relations. 使用有关输入关系的信息
      • For selections and joins, assume independence of predicates. 对于选择和连接,假设谓词独立
    • In System R, cost is boiled down to a single number consisting of #I/O + CPU-factor * #tuples 在System R中,成本被归结为一个由#I/O+CPU因子*#元组组成的单个数字
  • Q: Is “cost” the same as estimated “run time”? 问:“成本”与估计的“运行时间”相同吗?

P362

P366

P370

Summary 总结

  • Optimization is the reason for the lasting power of the relational system 优化是关系系统持久强大的原因
  • But it is primitive in some ways 但它在某些方面是原始的
  • New areas: many! 新领域:很多!
    • Smarter summary statistics (fancy histograms and “sketches”) 更智能的汇总统计(精美的直方图和“草图”)
    • Auto-tuning statistics, 自动调整统计信息,
    • Adaptive runtime re-optimization (e.g. eddies), 自适应运行时重新优化(例如涡流),
    • Multi-query optimization, 多查询优化,
    • And parallel scheduling issues, etc. 以及并行调度问题等。

lec9 Physical DB Design 物理数据库设计

Physical DB Design
P483

  • Query optimizer does what it can to use indices, clustering etc. 查询优化器尽其所能使用索引、聚簇等。
  • DataBase Administrator (DBA) is expected to set up physical design well. 数据库管理员(DBA)应做好物理设计。
  • Good DBAs understand query optimizers very well. 优秀的DBA非常了解查询优化器。

One Key Decision: Indexes 一个关键决定:索引
P485

  • Which tables 哪些表格
  • Which field(s) should be the search key? 哪些字段应为搜索关键字?
  • Multiple indexes? 多重索引?
  • Clustering? 聚簇?

Index Selection 索引选择

  • One approach: 一种方法:

    • Consider most important queries in turn. 依次考虑最重要的查询。
    • Consider best plan using the current indexes 使用当前索引考虑最佳方案
    • See if better plan is possible with an additional index. 看看是否有可能用一个额外的索引来制定更好的计划。
    • If so, create it. 如果是这样,创建它。
  • But consider impact on updates! 但是考虑更新的影响!
    • Indexes can make queries go faster, updates slower. 索引可以使查询更快,更新更慢。
    • Require disk space, too. 也需要磁盘空间。

Issues to Consider in Index Selection 索引选择中应考虑的几个问题