This is part one of my thesis that “all joins are nested loop joins – it’s just the startup overheads that vary”; there will be a note on “Joins – HJ” and “Joins – MJ” to follow. (For a quick reference list of URLs to all three articles in turn, see: Joins.)

In some ways, the claim is trivially obvious – a join simply takes two row sources and compares rows from one row source with rows from the other row source, reporting cases where some test of column values is true. Until the age of quantum computing when everything will happen everywhere at once we are stuck with Turing machines, and hidden somewhere in the process we are bound to see a sequence of steps similar to:

for each interesting row in rowsource X loop
     for each related row in rowsource Y loop
         report required columns from X and required columns from Y
     end loop
end loop

This, of course, is the basic theme of the nested loop join – we have two loop constructs, one inside the other (hence nested), and we can understand intuitively what we mean by the outer loop and the inner loop and therefore extend the concept to the “outer table” and “inner table” of a traditional nested loop join.

Looking at this from an Oracle perspective, we typically think of a nested loop join as a mechanism for examining a small volume of data using high-precision access methods, so the loop logic above might turn into an execution plan such as:

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     6 |
|   1 |  NESTED LOOPS                |       |     6 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TABX  |     3 |
|*  3 |    INDEX RANGE SCAN          | TX_I1 |     3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TABY  |     2 |
|*  5 |    INDEX RANGE SCAN          | TY_I1 |     2 |
------------------------------------------------------

In this case we use an accurate index to pick up just a few rows from table TABX, and for each row use an accurate index to pick up the matching rows from table TABY. When thinking about the suitability of this (or any) join method we need to look at the startup costs and the potential for wasted efforts.

By startup costs I mean the work we have to do before we can produce the first item in the result rowsource – and in this case the startup costs are effectively non-existent: there is no preparatory work we do before we start generating results. We fetch a row from TABX and we are immediately ready to fetch a row from TABY, combine, and deliver.

What about wasted efforts ? This example has been engineered to be very efficient, but in a more general case we might have multi-column indexes and predicates involving several (but not all) columns in those indexes; we might have predicates involving columns in the tables that are not in the indexes and, of course, we might have other users accessing the database at the same time. So we should consider the possibility that we visit some blocks that don’t hold data that we’re interested in, visit some blocks many times rather than just once, and have to compete with other processes to latch, pin, and unpin, (some of) the blocks we examine. Given sufficiently poor precision in our indexing we may also have to think about the number of blocks we will have to read from disk, and how many times we might have to re-read them if we don’t have a large enough cache to keep them in memory between visits. It is considerations like these that can make us look for alternative strategies for acquiring the data we need: can we find a way to invest resources to “prime” the nested loop join before we actually run the loops ?

I’ll answer that question in the next two notes – but before then I’d like to leave you with a concrete example of a nested loop join. This was run on 10.2.0.3 with an 8KB blocksize in a tablespace using freelist management and 1MB uniform extents.

create cluster hash_cluster
     (
         hash_col number(6)
     )               -- the hash key is numeric
     single table            -- promise to hold just one table
     hashkeys 1000           -- we want 1000 different hash values
     size 150            -- each key, with its data, needs 150 bytes
     hash is hash_col        -- the table column will supply the hash value
;
create table hashed_table(
     id      number(6)   not null,
     small_vc    varchar2(10),
     padding     varchar2(100) default(rpad('X',100,'X'))
)
cluster hash_cluster(id)
;
alter table hashed_table
add constraint ht_pk primary key(id)
;
begin
     for r1 in 1..1000 loop
         insert into hashed_table values(
             r1,
             lpad(r1,10,'0'),
             rpad('x',100,'x')
         );
     end loop;
end;
/
commit;
create table t1
as
select
     rownum              id,
     dbms_random.value(1,1000)   n1,
     rpad('x',50)            padding
from
     all_objects
where
     rownum <= 10000
;
alter table t1 add constraint t1_pk primary key(id);
begin
     dbms_stats.gather_table_stats(
         ownname      => user,
         tabname      =>'T1',
         estimate_percent => 100,
         method_opt   => 'for all columns size 1',
         cascade      => true
     );
     dbms_stats.gather_table_stats(
         ownname      => user,
         tabname      =>'hashed_table',
         estimate_percent => 100,
         method_opt   => 'for all columns size 1',
         cascade      => true
     );
end;
/
set autotrace traceonly explain
select
     substr(t1.padding,1,10),
     substr(ht.padding,1,10)
from
     t1,
     hashed_table    ht
where
     t1.id between 1 and 2
and ht.id = t1.n1
and ht.small_vc = 'a'
;
set autotrace off

I’ve created one of my two tables in a single-table hash cluster, given it a primary key which is also the hash key, and ensured that I get no hash collisions between rows in the table (i.e. no two rows in the table hash to the same hash value). With my particular setup the optimizer has decided to access the table by hash key rather than by primary key index. Here’s the execution path.

---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |   191 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |              |     1 |   191 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1           |     2 |   152 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_PK        |     2 |       |     2   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS HASH          | HASHED_TABLE |     1 |   115 |            |          |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
    3 - access("T1"."ID">=1 AND "T1"."ID"<=2)
    4 - access("HT"."ID"="T1"."N1")
        filter("HT"."SMALL_VC"='a')

This is an important lead in to hash joins and a different way of thinking about WHY we might want to use a hash join rather than a nested loop join.

To be continued …

Comments (13)

13 Comments »

  1. Hi Jonathan,

    I’ve a question regarding NULL in logical operations, although its off the topic but its driving me crazy, so i posted it here.

    My Problem is that why below two queries return different results:

    SELECT * FROM test WHERE sal > 100 AND sal > NULL;
    SELECT * FROM test WHERE sal > NULL AND sal > 100;

    Which says that: TRUE AND UNKNOWN = TRUE but UNKNOWN AND TRUE = UNKNOWN
    Supporting Code:
    CREATE TABLE test (sal NUMBER);
    INSERT INTO test VALUES( 100);
    INSERT INTO test VALUES( 200);
    INSERT INTO test VALUES( 300);
    INSERT INTO test VALUES( NULL);

    Comment by Manish — August 10, 2010 @ 6:40 am BST Aug 10,2010 | Reply

    • Manish,

      this is a bug (I can reproduce it in 10.2, seems to be fixed in 11.2)
      TRUE and UNKNOWN : UNKNOWN
      UNKNOWN and TRUE : UNKNOWN

      see
      http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/conditions004.htm#i1052219
      http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/conditions004.htm#i1052219

      Comment by Sokrates — August 10, 2010 @ 8:41 am BST Aug 10,2010 | Reply

      • >seems to be fixed in 11.2
        and in 10.2.0.5 too

        Comment by Timur Akhmadeev — August 10, 2010 @ 12:04 pm BST Aug 10,2010| Reply

    • Manish,

      This isn’t the right place, and you haven’t given a version number.
      I’ve reproduced this on 11.1.0.6 – the first query erroneously returns two rows, the second correctly returns none – it’s a bug, raise an SR.

      Looking at the 10053, Oracle seems to have “lost” the sal > null predicate in part of the logic for calculating single table cardinalities in the case where that predicate is the last for the table. The error doesn’t occur if the null is supplied as a bind variable.

      Comment by Jonathan Lewis — August 10, 2010 @ 8:52 am BST Aug 10,2010 | Reply

      • Thanks Jonathan.

        BTW the db was 10.2.0.4.

        Comment by Manish — August 10, 2010 @ 12:42 pm BST Aug 10,2010 | Reply

  2. Jonathan,

    As you have written specifically about Nested Loop Joins, would it be possible for you to mention how oracle processes NL join, especially in following releases?
    a) 8i (plan as mentioned above)
    b) 9i and 10g (introduced table prefetch)
    c) 11g (don’t know what it is called but plan changes a bit)
    I have not managed to find these details in a single document/post anywhere else.

    Comment by Narendra — August 10, 2010 @ 8:25 am BST Aug 10,2010 | Reply

  3. Narenda,

    I may find some time to comment on this one day. In the meantime, I thoughtChristian Antognini and Tanel Poder had made various comments on the changes. Search for “nlj_batching” on their web sites, or on the “Oak Table Safe Search”.

    Comment by Jonathan Lewis — August 10, 2010 @ 9:19 am BST Aug 10,2010 | Reply

    • Well, Oak Table did not find any entries.
      http://www.oaktable.net/search/node/nlj%20batching

      Comment by Narendra — August 10, 2010 @ 9:40 am BST Aug 10,2010 | Reply

      • Narenda,

        True, I just tried it myself. Even a full google search shows little more than a couple of passing references I’ve made to the feature and lists of hints from 11g. It’s possible that I’m remembering a discussion from Christian’s book. (Troubleshooting Oracle Performance).

        Comment by Jonathan Lewis — August 10, 2010 @ 9:44 am BST Aug 10,2010 |Reply

  4. [...] Joins – HJ Filed under: CBO,Execution plans,Performance — Jonathan Lewis @ 6:43 pm UTC Aug 10,2010 In the second note on my thesis that “all joins are nested loop joins with different startup costs” I want to look at hash joins, and I’ll start by going back to the execution plan I posted on “Joins – NLJ”. [...]

    Pingback by Joins – HJ « Oracle Scratchpad — August 10, 2010 @ 6:46 pm BST Aug 10,2010 |Reply

  5. [...] Lewis started a series about joins. Jonathan is the master of building clear and excellent test cases, and this post is a good example [...]

    Pingback by Log Buffer #199, A Carnival of the Vanities for DBAs | The Pythian Blog — August 14, 2010 @ 8:53 pm BST Aug 14,2010 | Reply

  6. […] this post i am writing about Nested loop joins based on the blog article from Jonathan Lewis – NLJ Typical psedocode of a nested loop is similar to […]

    Pingback by Nested Loop Joins | jagdeepsangwan — June 4, 2014 @ 10:33 am BST Jun 4,2014| Reply

  7. […] There are only three join mechanisms used by Oracle: merge join, hash join and nested loop join. […]

    Pingback by Joins | Oracle Scratchpad — June 5, 2014 @ 8:02 am BST Jun 5,2014 | Reply

RSS feed for comments on this post. TrackBack URI

Joins – NLJ相关推荐

  1. R语言data.table进行滚动数据连接,滚动联接通常用于分析涉及时间的数据实战(动画说明滚动数据连接的形式):rolling joins data.table in R

    R语言data.table进行滚动数据连接,滚动联接通常用于分析涉及时间的数据实战(动画说明滚动数据连接的形式):rolling joins data.table in R 目录

  2. Left,Right,Outer和Inner Joins有什么区别?

    我想知道如何区分所有这些不同的连接... #1楼 内连接 :仅显示行,何时从两个表中获取数据. 外连接 (左/右):显示从与配对行(S)的左/右表中的所有结果,如果它存在与否. #2楼 只有4种: 内 ...

  3. 连接定义点作用_最坏情况下最优连接(Worst-Case Optimal Joins)

    所谓最坏情况下最优连接(Worst-Case Optimal Joins),是一项关于数据库中连接操作的最新技术.给定若干表{R1, R2, ..., Rn},在它们之上的多表连接所能得到结果的数量上 ...

  4. MySQL - Join关联查询优化 --- NLJ及BNL 算法初探

    文章目录 生猛干货 Demo Table 表关联常见有两种算法 嵌套循环连接 Nested-Loop Join(NLJ) 算法 (NLP) 定义 示例 执行过程 规律 基于块的嵌套循环连接 Block ...

  5. mysql jion 实现原理_MySQL-join的实现原理、优化及NLJ算法

    案例分析: selectc.* fromhotel_info_original cleft joinhotel_info_collection honc.hotel_type=h.hotel_type ...

  6. Thinking with Joins

    Say you're making a basic scatterplot using D3, and you need to create some SVG circle elements to v ...

  7. MySQL --- 多表查询 - 七种JOINS实现、集合运算、多表查询练习

    七种JOINS实现 左上图的JOIN是左外连接,右上图的JOIN是右外连接,中间图的JOIN是内连接,左中图的JOIN在左上图的基础上再去掉中间重复的,只需要 A 在 B 中没有的部分(空的部分),右 ...

  8. 【译】Learn D3 入门文档:Joins

    引子 继 Learn D3: Animation 第七篇,只是英文翻译,可修改代码的部分用静态图片替代了,想要实时交互请阅读原文. 原文:Learn D3: Joins 版本:Published Ma ...

  9. Rails中的includes和joins的区别与用法(翻译,部分)

    includes和joins的不同 当includes和joins的时候最重要的概念就是他们有他们的典型用例. includes使用贪婪加载(eager loading)而joins使用懒加载(laz ...

最新文章

  1. 神经元产生的雌激素可能是一种新型的神经调节剂
  2. golang RSA (PKCS#1)加密解密
  3. 2019.02.11 bzoj4818: [Sdoi2017]序列计数(矩阵快速幂优化dp)
  4. 锤子手机成绝唱了,网友微博喊话罗永浩:赶快买回来吧!
  5. 工欲善其事 必先利其器
  6. centos8 Failed to download metadata for repo ‘base‘: Cannot download repomd.xml
  7. 我用VS2015 开发webapp (1) 需求、目的、配置
  8. LC-130 被环绕区域
  9. Rust: trim(),trim_matches()等江南六怪......
  10. astrolog php,如何在苹果MAC上使用Astrolog32 zet9等占星软件
  11. 「案例分析」生鲜行业B2B供应链平台开发案例
  12. Hadoop安装与环境配置
  13. 如何快速开设海外银行账户
  14. oracle 18c 转 11g,安装Oracle:Oracle 18c、Oracle 11g
  15. 沈剑架构师之路的分享-总结
  16. ChinaSkills-网络系统管理002(国赛所提供完整软件包组)
  17. 基于cp-abe算法的访问控制方法在linux下的实现和算法优化,基于CP-ABE的访问控制研究...
  18. springboot中的事务
  19. timesten针对复制器新增一张复制表
  20. python申明变量注意事项_python申明变量

热门文章

  1. 2.分析Ajax请求并抓取今日头条街拍美图
  2. 合并报表软件系统_财务合并报表的基础工作
  3. PMP超详细的报名流程,手把手教你报名(含备考资源)
  4. [AHK]F4Menu
  5. WebOffice常用API接口使用手册教程
  6. 【Axure视频教程】能播放音乐的音乐播放器
  7. php使用curl抓取网页自动跳转问题处理
  8. 64位计算机 内存,Win7 64位/32位系统支持多大内存?64/32位系统有什么区别?
  9. PT_随机变量离散型随机变量及其常见分布(二项分布/Possion分布)
  10. 最新天气网中国城市ID列表 - 2017年12月更新