2021SC@SDUSC

上层操作函数

上一篇博客只是大体分析了上层操作函数，这次我们来对于有关上层操作函数的源码进行具体分析首先来看indexam.c中的源码，其中有十余个INTERFACE ROUTINES即上层的操作函数，我们由index_open 这个函数开始看，我将自己的理解和分析写在代码块中

index_open

// index_open
Relation
index_open(Oid relationId, LOCKMODE lockmode)
{Relation   r;  建立一个Relation 类型的变量rr = relation_open(relationId, lockmode);/*通过OID对象id并调用在relation.h声明的函数relation_open来打开一个索引关系，如果lockmode不为“nolock”则获得其锁信息。*/if (r->rd_rel->relkind != RELKIND_INDEX &&r->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)/*if（）语句，并通过结构指针调用相关的变量判断要打开的索引是否存在*/ereport(ERROR,(errcode(ERRCODE_WRONG_OBJECT_TYPE),errmsg("\"%s\" is not an index",//如果不存在则报错并输出RelationGetRelationName(r))));return r;
}

因为PostgreSQL是属于关系数据库，一个Relation对应的是一张二维表,全局来看Relation是postgresql数据库实现的最关键的结构，这也是关系型数据库的特点。

index_beginscan()

我们继续看index_beginscan()

// index_beginscan()
IndexScanDesc  /*与扫描相关的结构体IndexScanDesc */
index_beginscan(Relation heapRelation,   /*输入相关的参数*/Relation indexRelation,Snapshot snapshot,int nkeys, int norderbys)
{IndexScanDesc scan;scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);/*调用上文声明的函数index_beginscan_internal，作用是做好scan的相关的参数设定*/scan->heapRelation = heapRelation;scan->xs_snapshot = snapshot;scan->xs_heapfetch = table_index_fetch_begin(heapRelation);/*调用table_index_fetch_begin函数从heap表中获取索引匹配*/return scan;
}

与index_beginscan函数相关的IndexScanDes结构类型可以在genam.h中查到

/* struct definitions appear in relscan.h */
typedef struct IndexScanDescData *IndexScanDesc;

而追根溯源IndexScanDesc又是数据结构类型IndexScanDescData的指针变量，这样做的好处是可以减少空间的浪费，提高空间的效率，对编译速度和执行速度都有提高，符合指针的优良特性。下一步我在relscan.h找到了对于IndexScanDescData的相关声明，代码如下

typedef struct IndexScanDescData
{/*扫描参数 */Relation  heapRelation;   /*堆关系 */Relation    indexRelation;  /* 索引关系 */struct SnapshotData *xs_snapshot; /* 快照 */int         numberOfKeys;   /* 索引限定符条件数量 */int          numberOfOrderBys;   /* 排序算子数 */struct ScanKeyData *keyData; struct ScanKeyData *orderByData;    bool        xs_want_itup;   /*是否请求索引元组*/bool        xs_temp_snap;   /* 在扫描结束时是否取消快照 *//*kill  索引元组的信号  */bool       kill_prior_tuple;   bool        ignore_killed_tuples;   bool        xactStartedInRecovery;  /*索引访问方法的私有状态  */void      *opaque;         /*在仅索引扫描中，成功的amgettuple调用必须填充其中任何一个xs_itup(和xs_itupdesc)或xs_hitup(和xs_hitupdesc)提供扫描返回的数据。它可以填满两者，在这种情况下堆格式将被使用。*/IndexTuple  xs_itup;        struct TupleDescData *xs_itupdesc;  HeapTuple   xs_hitup;       struct TupleDescData *xs_hitupdesc; ItemPointerData xs_heaptid; /* result */bool        xs_heap_continue;   /*  是否继续*/IndexFetchTableData *xs_heapfetch;bool        xs_recheck;     /*
当使用排序操作符获取时，ORDER BY的值根据索引，返回最后一个元组的表达式。如果
xs_recheckorderby为真，这些需要重新检查，就像扫描键，这里返回的值是实际值的下限
值。*/Datum      *xs_orderbyvals;bool    *xs_orderbynulls;bool        xs_recheckorderby;struct ParallelIndexScanDescData *parallel_scan;
}           IndexScanDescData;

我们通过源码中的注释可以清晰看到各个变量的准确含义。通过这样一个index_beginscan（）函数即可见postgresql源码的结构层次十分的清晰，而其中引用指针来使程序更加健壮。而通过该函数衍生出一系列函数index_beginscan_bitmap，index_beginscan_internal等，这些函数的作用在上一节已经给出，在这里便不一一分析了。

index_create()

接下来我们再看一下在index.c里面的函数，这些函数主要与系统目录相关，所以放在catalog文件夹下，其中，我们来看其中的函数index_create()函数，这是一个重要的函数，其有400余行，我们挑取其中比较中的片段来分析：
首先该函数传递了一些参数，我将其中的分析写在代码片中

Oid
index_create(Relation heapRelation,const char *indexRelationName,//可视的索引名Oid indexRelationId,//为索引生成一个OID，唯一Oid parentIndexRelid,//如果创建父子索引，其父索引的OIDOid parentConstraintId,//创建constraint的OIDOid relFileNode,//通过该OID获取新的存储IndexInfo *indexInfo,//用来插入索引List *indexColNames,//用于索引的那一列的名字Oid accessMethodObjectId,/*要使用的索引方法的OID，这个比较重点，对后面的分析有重要作用，也是默认使用nbtree的来源*/Oid tableSpaceId, //使用的table空间的OIDOid *collationObjectId,//排序规则的OID,一般用到的where应该与其有关Oid *classObjectId,int16 *coloptions,Datum reloptions，//特定的选择bits16 flags,bits16 constr_flags,bool allow_system_table_mods,bool is_internal,Oid *constraintId)

接下来我们看该函数中用来检查操作符类是否正确的代码段，分析在代码中

for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++){Oid            collation = collationObjectId[i];//排序规则的操作符的OIDOid         opclass = classObjectId[i];//opclass即操作符类的相关OIDif (collation){//判断是否nbtree 的操作符类if ((opclass == TEXT_BTREE_PATTERN_OPS_OID ||opclass == VARCHAR_BTREE_PATTERN_OPS_OID ||opclass == BPCHAR_BTREE_PATTERN_OPS_OID) &&!get_collation_isdeterministic(collation)){HeapTuple  classtup;//在堆上查找是否具有该opclassclasstup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass));if (!HeapTupleIsValid(classtup))elog(ERROR, "cache lookup failed for operator class %u", opclass);//如果没有则查找失败ereport(ERROR,(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("nondeterministic collations are not supported for operator class \"%s\"",
NameStr(((Form_pg_opclass) GETSTRUCT(classtup))->opcname))));
/*具体的索引类型有对应的opclass，如果用户输入的操作符正确但该索引类型搜索不到
，则输出错误*/ReleaseSysCache(classtup);}}}

indexcmd.c

接下来便是一系列的检查和对系统表的相关操作，这里不再展开叙述。可见关于索引的上层操作函数各个功能结构十分清晰，而同时我也用gdb调试查看与index_build的相关文件和函数，我将断点打在了index_build上，同时在另一终端操作数据库创建相关索引，具体结果如下

查看堆栈发现index.c的上一层为indexcmd.c文件，因此我有进一步分析了该文件，该文件相当于上层操作函数的上一层操作函数直接与用户指令相关。这进一步体现了pg的结构功能分明。

下层接口函数

下层接口函数是直接与不同索引类型相关联的操作函数，主要在amapi.c,genam.c函数，为了验证，我用gdb调试，将genam.c负责索引扫描的函数打上断点，得到如下堆栈
首先堵塞发生在genam.c中，而后又是nbtree.c 中，其次在存在上层操作函数的indexam.c中，由此充分说明上下层函数的调用关系。
接下来我们将详细分析下层接口函数的代码。

IndexScanDescData

首先基础的函数定义存在于amapi.h中，而amapi.h中函数的相关的数据结构在genam.h中体现，genam.h也会调用relscan.h等文件中的数据结构。拿amapi.h中的ambeginscan_function函数来说

/* prepare for index scan */
typedef IndexScanDesc (*ambeginscan_function) (Relation indexRelation,int nkeys,int norderbys);

可知引用了数据结构IndexScanDesc ，而继续查找genam.h中关于IndexScanDesc 的定义

/* struct definitions appear in relscan.h */
typedef struct IndexScanDescData *IndexScanDesc;

最终可知IndexScanDescData的数据结构在relscan.h中

typedef struct IndexScanDescData
{/* scan parameters */Relation  heapRelation;   /* 堆关系的表*/Relation  indexRelation;  /* 索引关系的表 */struct SnapshotData *xs_snapshot;   /* 可见的快照*/int           numberOfKeys;   /* 索引限定符条件的数量*/int          numberOfOrderBys;   /*  排序算数子的数量*/struct ScanKeyData *keyData;  /* 与numberOfKeys对应 */struct ScanKeyData *orderByData;   /* 与numberOfOrderBys对应 */bool       xs_want_itup;   /*是否请求索引  */bool        xs_temp_snap;   /* 结束时是否取消快照*//*向调用索引类型发送kill信号 */bool      kill_prior_tuple;bool       ignore_killed_tuples;   bool        xactStartedInRecovery;  /* 索引方法的私有状态 */void    *opaque;         /* access-method-specific info */IndexTuple xs_itup;        /* index tuple returned by AM */struct TupleDescData *xs_itupdesc;  HeapTuple   xs_hitup;       /* index data returned by AM, as HeapTuple */struct TupleDescData *xs_hitupdesc; ItemPointerData xs_heaptid; bool       xs_heap_continue;   IndexFetchTableData *xs_heapfetch;bool      xs_recheck;     Datum      *xs_orderbyvals;bool    *xs_orderbynulls;bool        xs_recheckorderby;/*共享内存中并行索引的信息  */struct ParallelIndexScanDescData *parallel_scan;
}           IndexScanDescData;

其中定义了大量关于scan的变量，分析已在代码片中。在这时我想提出一点，就是在amapi.h中对的定义大量引用了typedef bool+函数指针的形式，这种使用函数指针在c语言可以使得代码更美观，可维护性、可移植性、可读性更强。

/* parse index reloptions */
typedef bytea *(*amoptions_function) (Datum reloptions,bool validate);/* report AM, index, or index column property */
typedef bool (*amproperty_function) (Oid index_oid, int attno,IndexAMProperty prop, const char *propname,bool *res, bool *isnull);/* name of phase as used in progress reporting */
typedef char *(*ambuildphasename_function) (int64 phasenum);/* validate definition of an opclass for this AM */
typedef bool (*amvalidate_function) (Oid opclassoid);

在amapi.c是一些进行调用时的API，即应用程序接口，在这里不必深入分析，而genam.c则是不同索引类型都通用的一些函数，也对系统表进行相关操作，是对基础参数的一些初步设定，为下行操作函数进行初步设定并服务，代码较长，我们挑选典型的函数进行分析，都注释在代码片中。

RelationGetIndexScan

IndexScanDesc
RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
{RelationGetIndexScanIndexScanDesc scan;
//该函数进行了对于扫描时的初始设定，进一步的设定还需要不同的索引类型进行设置scan = (IndexScanDesc) palloc(sizeof(IndexScanDescData));scan->heapRelation = NULL;  /* 进行IndexScanDesc数据结构中变量初值的设置 */scan->xs_heapfetch = NULL;scan->indexRelation = indexRelation;scan->xs_snapshot = InvalidSnapshot; /* caller must initialize this */scan->numberOfKeys = nkeys;scan->numberOfOrderBys = norderbys;/*分配空间，用于access method 扫描 */if (nkeys > 0)scan->keyData = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys);elsescan->keyData = NULL;if (norderbys > 0)scan->orderByData = (ScanKey) palloc(sizeof(ScanKeyData) * norderbys);elsescan->orderByData = NULL;scan->xs_want_itup = false; /* may be set later *//*关于kill tuple 的设定*/scan->kill_prior_tuple = false;//设为false显然是并不杀死scan->xactStartedInRecovery = TransactionStartedDuringRecovery();scan->ignore_killed_tuples = !scan->xactStartedInRecovery;scan->opaque = NULL;scan->xs_itup = NULL;scan->xs_itupdesc = NULL;scan->xs_hitup = NULL;scan->xs_hitupdesc = NULL;return scan;
}

systable_beginscan

//系统表的相关扫描
SysScanDesc
systable_beginscan(Relation heapRelation,Oid indexId,bool indexOK,Snapshot snapshot,int nkeys, ScanKey key)
{SysScanDesc sysscan;Relation   irel;if (indexOK &&!IgnoreSystemIndexes &&!ReindexIsProcessingIndex(indexId))irel = index_open(indexId, AccessShareLock);elseirel = NULL;   //判断是否选择索引扫描还是堆扫描sysscan = (SysScanDesc) palloc(sizeof(SysScanDescData));sysscan->heap_rel = heapRelation;sysscan->irel = irel;sysscan->slot = table_slot_create(heapRelation, NULL);if (snapshot == NULL){Oid         relid = RelationGetRelid(heapRelation);snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));sysscan->snapshot = snapshot;}else{/* Caller is responsible for any snapshot. */sysscan->snapshot = NULL;}if (irel)//{int           i;/* 如果使用索引需要改变列的数值. */for (i = 0; i < nkeys; i++){int            j;for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++){if (key[i].sk_attno == irel->rd_index->indkey.values[j]){key[i].sk_attno = j + 1;//将列的数值改变为1，2，3.....break;}    //}if (j == IndexRelationGetNumberOfAttributes(irel))elog(ERROR, "column is not in index");}sysscan->iscan = index_beginscan(heapRelation, irel,snapshot, nkeys, 0);index_rescan(sysscan->iscan, key, nkeys, NULL, 0);sysscan->scan = NULL;}else{//使用堆扫描的时候的设定sysscan->scan = table_beginscan_strat(heapRelation, snapshot,nkeys, key,true, false);sysscan->iscan = NULL;}return sysscan;
}

由此可以分析得出genam.m是对于索引相关参数的设定，是为了对下行操作函数进行服务。而在amapi.h中是对于众多下行操作函数的声明，将其放在结构体IndexAmRoutine中，这些函数的作用上一篇博客已经给出，这里不再说明。

typedef struct IndexAmRoutine
{NodeTag        type;/** Total number of strategies (operators) by which we can traverse/search* this AM.  Zero if AM does not have a fixed set of strategy assignments.*/uint16        amstrategies;/* total number of support functions that this AM uses */uint16        amsupport;/* does AM support ORDER BY indexed column's value? */bool       amcanorder;/* does AM support ORDER BY result of an operator on indexed column? */bool      amcanorderbyop;/* does AM support backward scanning? */bool     amcanbackward;/* does AM support UNIQUE indexes? */bool     amcanunique;/* does AM support multi-column indexes? */bool     amcanmulticol;/* does AM require scans to have a constraint on the first index column? */bool       amoptionalkey;/* does AM handle ScalarArrayOpExpr quals? */bool     amsearcharray;/* does AM handle IS NULL/IS NOT NULL quals? */bool       amsearchnulls;/* can index storage data type differ from column data type? */bool       amstorage;/* can an index of this type be clustered on? */bool      amclusterable;/* does AM handle predicate locks? */bool     ampredlocks;/* does AM support parallel scan? */bool        amcanparallel;/* does AM support columns included with clause INCLUDE? */bool       amcaninclude;/* type of data stored in index, or InvalidOid if variable */Oid           amkeytype;/** If you add new properties to either the above or the below lists, then* they should also (usually) be exposed via the property API (see* IndexAMProperty at the top of the file, and utils/adt/amutils.c).*//* interface functions */ambuild_function ambuild;ambuildempty_function ambuildempty;aminsert_function aminsert;ambulkdelete_function ambulkdelete;amvacuumcleanup_function amvacuumcleanup;amcanreturn_function amcanreturn; /* can be NULL */amcostestimate_function amcostestimate;amoptions_function amoptions;amproperty_function amproperty; /* can be NULL */ambuildphasename_function ambuildphasename; /* can be NULL */amvalidate_function amvalidate;ambeginscan_function ambeginscan;amrescan_function amrescan;amgettuple_function amgettuple; /* can be NULL */amgetbitmap_function amgetbitmap;    /* can be NULL */amendscan_function amendscan;ammarkpos_function ammarkpos; /* can be NULL */amrestrpos_function amrestrpos; /* can be NULL *//* interface functions to support parallel index scans */amestimateparallelscan_function amestimateparallelscan; /* can be NULL */aminitparallelscan_function aminitparallelscan; /* can be NULL */amparallelrescan_function amparallelrescan; /* can be NULL */
} IndexAmRoutine;

由此可见将这些下行操作函数进行封装，极大地减少了代码的冗余度，而其中函数，结构体的层层套用以及函数指针的使用，使得pg代码更加健壮，结构清晰。

postgreSQL源码分析——索引的建立与使用——各种索引类型的管理和操作（2）相关推荐

PostgreSQL源码分析
PostgreSQL源码结构 PostgreSQL的使用形态 PostgreSQL采用C/S(客户机/服务器)模式结构.应用层通过INET或者Unix Socket利用既定的协议与数据库服务器进行通信 ...
postgreSQL源码分析——索引的建立与使用——GIST索引（2）
2021SC@SDUSC 本篇博客主要讲解GiST索引创建以及删除的相关函数这里写目录标题 GIST创建相关数据结构 GISTBuildState GISTInsertStack gistbuil ...
postgreSQL源码分析——索引的建立与使用——Hash索引(3)
2021SC@SDUSC 上一篇博客讲了关于Hash索引创建与插入的相关函数,这一篇博客讲述关于溢出页的操作函数以及Hash表的扩展相关的函数. 目录溢出页的分配和回收 _hash_addovflp ...
postgreSQL源码分析——索引的建立与使用——Hash索引(2)
2021SC@SDUSC 目录 Hash索引创建 hashbuild函数 _hash_init函数 Hash索引的插入 hashinsert函数 _hash_doinsert函数总结 Hash索引创 ...
postgreSQL源码分析——索引的建立与使用——各种索引类型的管理和操作（1）
2021SC@SDUSC 目录概述管理索引的系统表记录索引相关的系统表与索引系统表相关的后端源码索引的操作函数上层操作函数下层接口函数概述索引是指按表中某些关键属性或表达式建立元组的 ...
postgreSQL源码分析——索引的建立与使用——Hash索引(1)
2021SC@SDUSC 目录 Hash索引 Hash索引原理 Hash表 Hash索引结构 Hash的页面结构元页桶页,溢出页,位图页和B-Tree相比的优缺点优点缺点总结 Hash索引 ...
postgreSQL源码分析——索引的建立与使用——总结篇
2021SC@SDUSC 在小组中我负责索引的建立与使用的相关部分,在此一共写了16篇相关的分析报告,着重分析各种索引的操作和管理方法,以及分析了PG中四种最重要的索引B-Tree索引,Hash索引, ...
postgreSQL源码分析——索引的建立与使用——B-Tree索引(3)
2021SC@SDUSC 目录 B-Tree的插入 bt_insert _bt_doinsert BTInsertStateData _bt_search函数 _bt_moveright函数 B-Tr ...
postgreSQL源码分析——索引的建立与使用——B-Tree索引(2)
2021SC@SDUSC 目录 B-Tree建立过程 IndexAmRoutine BTBuildState BTWriteState btbuild() _bt_leafbuild _bt_load ...

postgreSQL源码分析——索引的建立与使用——各种索引类型的管理和操作（2）

目录