一、说明

本来是想在前面的一篇分析中把源码和内容同时过一遍,可突然发现,那可能是非常大的一章。所以就把源码独立了出来,在此章节中对相关四类内存数据结构进行分析,在代码分析过程中,可以和前面的说明以及早先的日志分析一并进行对比,会有更大的收获。

二、Buffer Pool

按照老规矩,先看数据结构的定义相关代码:

struct buf_pool_t {/** @name General fields *//** @{ *//** protects (de)allocation of chunks:- changes to chunks, n_chunks are performed while holding this latch,- reading buf_pool_should_madvise requires holding this latch for anybuf_pool_t- writing to buf_pool_should_madvise requires holding these latchesfor all buf_pool_t-s */BufListMutex chunks_mutex;/** LRU list mutex */BufListMutex LRU_list_mutex;/** free and withdraw list mutex */BufListMutex free_list_mutex;/** buddy allocator mutex */BufListMutex zip_free_mutex;/** zip_hash mutex */BufListMutex zip_hash_mutex;/** Flush state protection mutex */ib_mutex_t flush_state_mutex;/** Zip mutex of this buffer pool instance, protects compressed only pages (oftype buf_page_t, not buf_block_t */BufPoolZipMutex zip_mutex;/** Array index of this buffer pool instance */ulint instance_no;/** Current pool size in bytes */ulint curr_pool_size;/** Reserve this much of the buffer pool for "old" blocks */ulint LRU_old_ratio;
#ifdef UNIV_DEBUG/** Number of frames allocated from the buffer pool to the buddy system.Protected by zip_hash_mutex. */ulint buddy_n_frames;
#endif/** Allocator used for allocating memory for the the "chunks" member. */ut_allocator<unsigned char> allocator;/** Number of buffer pool chunks */volatile ulint n_chunks;/** New number of buffer pool chunks */volatile ulint n_chunks_new;/** buffer pool chunks */buf_chunk_t *chunks;/** old buffer pool chunks to be freed after resizing buffer pool */buf_chunk_t *chunks_old;/** Current pool size in pages */ulint curr_size;/** Previous pool size in pages */ulint old_size;/** Size in pages of the area which the read-ahead algorithms readif invoked */page_no_t read_ahead_area;/** Hash table of buf_page_t or buf_block_t file pages, buf_page_in_file() ==TRUE, indexed by (space_id, offset).  page_hash is protected by an array ofmutexes. */hash_table_t *page_hash;/** Old pointer to page_hash to be freed after resizing buffer pool */hash_table_t *page_hash_old;/** Hash table of buf_block_t blocks whose frames are allocated to the zipbuddy system, indexed by block->frame */hash_table_t *zip_hash;/** Number of pending read operations. Accessed atomically */std::atomic<ulint> n_pend_reads;/** number of pending decompressions.  Accessed atomically. */std::atomic<ulint> n_pend_unzip;/** when buf_print_io was last time called. Accesses not protected. */ib_time_monotonic_t last_printout_time;/** Statistics of buddy system, indexed by block size. Protected by zip_freemutex, except for the used field, which is also accessed atomically */buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1];/** Current statistics */buf_pool_stat_t stat;/** Old statistics */buf_pool_stat_t old_stat;/* @} *//** @name Page flushing algorithm fields *//** @{ *//** Mutex protecting the flush list access. This mutex protects flush_list,flush_rbt and bpage::list pointers when the bpage is on flush_list. It alsoprotects writes to bpage::oldest_modification and flush_list_hp */BufListMutex flush_list_mutex;/** "Hazard pointer" used during scan of flush_list while doing flush listbatch.  Protected by flush_list_mutex */FlushHp flush_hp;/** Entry pointer to scan the oldest page except for system temporary */FlushHp oldest_hp;/** Base node of the modified block list */UT_LIST_BASE_NODE_T(buf_page_t) flush_list;/** This is true when a flush of the given type is being initialized.Protected by flush_state_mutex. */bool init_flush[BUF_FLUSH_N_TYPES];/** This is the number of pending writes in the given flush type.  Protectedby flush_state_mutex. */ulint n_flush[BUF_FLUSH_N_TYPES];/** This is in the set state when there is no flush batch of the given typerunning. Protected by flush_state_mutex. */os_event_t no_flush[BUF_FLUSH_N_TYPES];/** A red-black tree is used exclusively during recovery to speed upinsertions in the flush_list. This tree contains blocks in order ofoldest_modification LSN and is kept in sync with the flush_list.  Eachmember of the tree MUST also be on the flush_list.  This tree is relevantonly in recovery and is set to NULL once the recovery is over.  Protectedby flush_list_mutex */ib_rbt_t *flush_rbt;/** A sequence number used to count the number of buffer blocks removed fromthe end of the LRU list; NOTE that this counter may wrap around at 4billion! A thread is allowed to read this for heuristic purposes withoutholding any mutex or latch. For non-heuristic purposes protected byLRU_list_mutex */ulint freed_page_clock;/** Set to false when an LRU scan for free block fails. This flag is used toavoid repeated scans of LRU list when we know that there is no free blockavailable in the scan depth for eviction. Set to TRUE whenever we flush abatch from the buffer pool. Accessed protected by memory barriers. */bool try_LRU_scan;/** Page Tracking start LSN. */lsn_t track_page_lsn;/** Maximum LSN for which write io has already started. */lsn_t max_lsn_io;/* @} *//** @name LRU replacement algorithm fields *//** @{ *//** Base node of the free block list */UT_LIST_BASE_NODE_T(buf_page_t) free;/** base node of the withdraw block list. It is only used during shrinkingbuffer pool size, not to reuse the blocks will be removed.  Protected byfree_list_mutex */UT_LIST_BASE_NODE_T(buf_page_t) withdraw;/** Target length of withdraw block list, when withdrawing */ulint withdraw_target;/** "hazard pointer" used during scan of LRU while doingLRU list batch.  Protected by buf_pool::LRU_list_mutex */LRUHp lru_hp;/** Iterator used to scan the LRU list when searching forreplaceable victim. Protected by buf_pool::LRU_list_mutex. */LRUItr lru_scan_itr;/** Iterator used to scan the LRU list when searching forsingle page flushing victim.  Protected by buf_pool::LRU_list_mutex. */LRUItr single_scan_itr;/** Base node of the LRU list */UT_LIST_BASE_NODE_T(buf_page_t) LRU;/** Pointer to the about LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV oldest blocks inthe LRU list; NULL if LRU length less than BUF_LRU_OLD_MIN_LEN; NOTE: whenLRU_old != NULL, its length should always equal LRU_old_len */buf_page_t *LRU_old;/** Length of the LRU list from the block to which LRU_old points onward,including that block; see buf0lru.cc for the restrictions on this value; 0if LRU_old == NULL; NOTE: LRU_old_len must be adjusted whenever LRU_oldshrinks or grows! */ulint LRU_old_len;/** Base node of the unzip_LRU list. The list is protected by theLRU_list_mutex. */UT_LIST_BASE_NODE_T(buf_block_t) unzip_LRU;/** @} *//** @name Buddy allocator fieldsThe buddy allocator is used for allocating compressed pageframes and buf_page_t descriptors of blocks that existin the buffer pool only in compressed form. *//** @{ */
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG/** Unmodified compressed pages */UT_LIST_BASE_NODE_T(buf_page_t) zip_clean;
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG *//** Buddy free lists */UT_LIST_BASE_NODE_T(buf_buddy_free_t) zip_free[BUF_BUDDY_SIZES_MAX];/** Sentinel records for buffer pool watches. Scanning the array is protectedby taking all page_hash latches in X. Updating or reading an individualwatch page is protected by a corresponding individual page_hash latch. */buf_page_t *watch;/** A wrapper for buf_pool_t::allocator.alocate_large which also advices theOS that this chunk should not be dumped to a core file if that was requested.Emits a warning to the log and disables @@global.core_file if advising wasrequested but could not be performed, but still return true as the allocationitself succeeded.@param[in]     mem_size  number of bytes to allocate@param[in,out]  chunk     mem and mem_pfx fields of this chunk will be updatedto contain information about allocated memory region@return true iff allocated successfully */bool allocate_chunk(ulonglong mem_size, buf_chunk_t *chunk);/** A wrapper for buf_pool_t::allocator.deallocate_large which also advicesthe OS that this chunk can be dumped to a core file.Emits a warning to the log and disables @@global.core_file if advising wasrequested but could not be performed.@param[in]  chunk   mem and mem_pfx fields of this chunk will be used tolocate the memory region to free */void deallocate_chunk(buf_chunk_t *chunk);/** Advices the OS that all chunks in this buffer pool instance can be dumpedto a core file.Emits a warning to the log if could not succeed.@return true iff succeeded, false if no OS support or failed */bool madvise_dump();/** Advices the OS that all chunks in this buffer pool instance should notbe dumped to a core file.Emits a warning to the log if could not succeed.@return true iff succeeded, false if no OS support or failed * /bool madvise_dont_dump();#if BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN
#error "BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN"
#endif * /
};

此数据结构体的定义前面的说明告诉大家,这个是内部自用的数据结构体,外面不要使用。再看里面的变量定义:
开头是几个缓冲区的并发控制锁定义,有块的,有LRU链表的,还是压缩的等等。然后定义了一个allocator分配器,大小是以无符号字节类型为基本分配单位。后面就是对Chunk的定义。然后就是HASH表的定义,最后是一些LRU相关的数据结构和链表的定义。这个数据结构体相对来说要比其它的数据结构定义要简单些,但是不要小觑。
在Buffer Pool中有几个重要的数据结构来控制整个缓存的管理,其中buf_pool_t是用来做实现Buffer Pool实例的数据结构。而buf_block_t和buf_page_t是对数据页进行管理分配的,而buf_chunk_t是buffer pool的分配的基本单位。换成人话来说就是,MySql中的缓冲池是按buf_chunk_t来分配的,而在buf_chunk_t里面管理了buf_block_t和buf_page_t,最终形成一个缓冲池内的Instance。而整个缓冲池中有多个这种Instance。下面就分别看一下相关的数据结构定义:

/** A chunk of buffers. The buffer pool is allocated in chunks. */
struct buf_chunk_t {ulint size;           /*!< size of frames[] and blocks[] */unsigned char *mem;   /*!< pointer to the memory area whichwas allocated for the frames */ut_new_pfx_t mem_pfx; /*!< Auxiliary structure, describing"mem". It is filled by the allocator'salloc method and later passed to thedeallocate method. */buf_block_t *blocks;  /*!< array of buffer control blocks * //** Get the size of 'mem' in bytes. * /size_t mem_size() const { return (mem_pfx.m_size); }bool madvise_dump();bool madvise_dont_dump();bool contains(const buf_block_t * ptr) const {return std::less_equal<const buf_block_t * >{}(blocks, ptr) &&std::less<const buf_block_t * >{}(ptr, blocks + size);}
};

再看一下buf_page_t:

class buf_page_t {public:/** Copy constructor.@param[in] other       Instance to copy from. */buf_page_t(const buf_page_t &other): id(other.id),size(other.size),buf_fix_count(other.buf_fix_count),io_fix(other.io_fix),state(other.state),flush_type(other.flush_type),buf_pool_index(other.buf_pool_index),
#ifndef UNIV_HOTBACKUPhash(other.hash),
#endif /* !UNIV_HOTBACKUP */list(other.list),newest_modification(other.newest_modification),oldest_modification(other.oldest_modification),LRU(other.LRU),zip(other.zip)
#ifndef UNIV_HOTBACKUP,m_flush_observer(other.m_flush_observer),m_space(other.m_space),freed_page_clock(other.freed_page_clock),access_time(other.access_time),m_version(other.m_version),m_dblwr_id(other.m_dblwr_id),old(other.old)
#ifdef UNIV_DEBUG,file_page_was_freed(other.file_page_was_freed),in_flush_list(other.in_flush_list),in_free_list(other.in_free_list),in_LRU_list(other.in_LRU_list),in_page_hash(other.in_page_hash),in_zip_hash(other.in_zip_hash)
#endif /* UNIV_DEBUG */
#endif /* !UNIV_HOTBACKUP */{
#ifndef UNIV_HOTBACKUPm_space->inc_ref();
#endif /* !UNIV_HOTBACKUP * /}

还有一个:

struct buf_block_t {/** @name General fields *//** @{ *//** page information; this must be the first field, sothat buf_pool->page_hash can point to buf_page_t or buf_block_t */buf_page_t page;#ifndef UNIV_HOTBACKUP/** read-write lock of the buffer frame */BPageLock lock;#endif /* UNIV_HOTBACKUP *//** pointer to buffer frame which is of size UNIV_PAGE_SIZE, and alignedto an address divisible by UNIV_PAGE_SIZE */byte *frame;/** node of the decompressed LRU list; a block is in the unzip_LRU list ifpage.state == BUF_BLOCK_FILE_PAGE and page.zip.data != NULL. Protected byboth LRU_list_mutex and the block mutex. */UT_LIST_NODE_T(buf_block_t) unzip_LRU;
#ifdef UNIV_DEBUG/** TRUE if the page is in the decompressed LRU list; used in debugging */bool in_unzip_LRU_list;bool in_withdraw_list;
#endif /* UNIV_DEBUG *//** hashed value of the page address in the record lock hash table;protected by buf_block_t::lock (or buf_block_t::mutex in buf_page_get_gen(),buf_page_init_for_read() and buf_page_create()) */uint32_t lock_hash_val;/** @} *//** @name Hash search fields (unprotected)NOTE that these fields are NOT protected by any semaphore! *//** @{ *//** Counter which controls building of a new hash index for the page */uint32_t n_hash_helps;/** Recommended prefix length for hash search: number of bytes in anincomplete last field */volatile uint32_t n_bytes;/** Recommended prefix length for hash search: number of full fields */volatile uint32_t n_fields;/** true or false, depending on whether the leftmost record of severalrecords with the same prefix should be indexed in the hash index */volatile bool left_side;/** @} *//** @name Hash search fieldsThese 5 fields may only be modified when:we are holding the appropriate x-latch in btr_search_latches[], andone of the following holds:(1) the block state is BUF_BLOCK_FILE_PAGE, andwe are holding an s-latch or x-latch on buf_block_t::lock, or(2) buf_block_t::buf_fix_count == 0, or(3) the block state is BUF_BLOCK_REMOVE_HASH.An exception to this is when we init or create a pagein the buffer pool in buf0buf.cc.Another exception for buf_pool_clear_hash_index() is thatassigning block->index = NULL (and block->n_pointers = 0)is allowed whenever btr_search_own_all(RW_LOCK_X).Another exception is that ha_insert_for_fold_func() maydecrement n_pointers without holding the appropriate latchin btr_search_latches[]. Thus, n_pointers must beprotected by atomic memory access.This implies that the fields may be read without racecondition whenever any of the following hold:- the btr_search_latches[] s-latch or x-latch is being held, or- the block state is not BUF_BLOCK_FILE_PAGE or BUF_BLOCK_REMOVE_HASH,and holding some latch prevents the state from changing to that.Some use of assert_block_ahi_empty() or assert_block_ahi_valid()is prone to race conditions while buf_pool_clear_hash_index() isexecuting (the adaptive hash index is being disabled). Such useis explicitly commented. *//** @{ */#if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG/** used in debugging: the number of pointers in the adaptive hash indexpointing to this frame; protected by atomic memory access orbtr_search_own_all(). */std::atomic<ulint> n_pointers;#define assert_block_ahi_empty(block) ut_a((block)->n_pointers.load() == 0)
#define assert_block_ahi_empty_on_init(block)                        \do {                                                               \UNIV_MEM_VALID(&(block)->n_pointers, sizeof(block)->n_pointers); \assert_block_ahi_empty(block);                                   \} while (0)#define assert_block_ahi_valid(block) \ut_a((block)->index || (block)->n_pointers.load() == 0)
#else                                         /* UNIV_AHI_DEBUG || UNIV_DEBUG */
#define assert_block_ahi_empty(block)         /* nothing */
#define assert_block_ahi_empty_on_init(block) /* nothing */
#define assert_block_ahi_valid(block)         /* nothing */
#endif                                        /* UNIV_AHI_DEBUG || UNIV_DEBUG *//** prefix length for hash indexing: number of full fields */uint16_t curr_n_fields;/** number of bytes in hash indexing */uint16_t curr_n_bytes;/** TRUE or FALSE in hash indexing */bool curr_left_side;/** true if block has been made dirty without acquiring X/SX latch as theblock belongs to temporary tablespace and block is always accessed by asingle thread. */bool made_dirty_with_no_latch;/** Index for which the adaptive hash index has been created, or NULL ifthe page does not exist in the index. Note that it does not guarantee thatthe index is complete, though: there may have been hash collisions, recorddeletions, etc. */dict_index_t *index;/** @} */
#ifndef UNIV_HOTBACKUP
#ifdef UNIV_DEBUG/** @name Debug fields *//** @{ *//** In the debug version, each thread which bufferfixes the block acquiresan s-latch here; so we can use the debug utilities in sync0rw */rw_lock_t debug_latch;/** @} */
#endif /* UNIV_DEBUG */
#endif /* !UNIV_HOTBACKUP *//** @name Optimistic search field *//** @{ *//** This clock is incremented every time a pointer to a record on the pagemay become obsolete; this is used in the optimistic cursor positioning: ifthe modify clock has not changed, we know that the pointer is still valid;this field may be changed if the thread (1) owns the LRU list mutex and thepage is not bufferfixed, or (2) the thread has an x-latch on the block,or (3) the block must belong to an intrinsic table */uint64_t modify_clock;/** @} *//** mutex protecting this block: state (also protected by the bufferpool mutex), io_fix, buf_fix_count, and accessed; we introduce thisnew mutex in InnoDB-5.1 to relieve contention on the buffer pool mutex */BPageMutex mutex;/** Get the page number and space id of the current buffer block.@return page number of the current buffer block. */const page_id_t &get_page_id() const { return page.id; }/** Get the page number of the current buffer block.@return page number of the current buffer block. */page_no_t get_page_no() const { return (page.id.page_no()); }/** Get the next page number of the current buffer block.@return next page number of the current buffer block. */page_no_t get_next_page_no() const {return (mach_read_from_4(frame + FIL_PAGE_NEXT));}/** Get the prev page number of the current buffer block.@return prev page number of the current buffer block. */page_no_t get_prev_page_no() const {return (mach_read_from_4(frame + FIL_PAGE_PREV));}/** Get the page type of the current buffer block.@return page type of the current buffer block. */page_type_t get_page_type() const {return (mach_read_from_2(frame + FIL_PAGE_TYPE));}/** Get the page type of the current buffer block as string.@return page type of the current buffer block as string. */const char *get_page_type_str() const noexceptMY_ATTRIBUTE((warn_unused_result));
}

其实可以理解为Buffer Pools是以buf_chunk_t来管理buf_page_t,二者可以可以互换,一个buf_chunk_t有一个buf_page_t。再说得更直白一些就是类似于一个企业,有公司,公司下有部门,部门下有人,有的部门只有一个人也得成立一个部门。

三、Change Buffer

再来看一下这个用于减少更新IO操作的Change Buffer:

/** Default value for maximum on-disk size of change buffer in terms
of percentage of the buffer pool. */
#define CHANGE_BUFFER_DEFAULT_SIZE (25)#ifndef UNIV_HOTBACKUP
/* Possible operations buffered in the insert/whatever buffer. See
ibuf_insert(). DO NOT CHANGE THE VALUES OF THESE, THEY ARE STORED ON DISK. */
typedef enum {IBUF_OP_INSERT = 0,IBUF_OP_DELETE_MARK = 1,IBUF_OP_DELETE = 2,/* Number of different operation types. */IBUF_OP_COUNT = 3
} ibuf_op_t;/** Combinations of operations that can be buffered.
@see innodb_change_buffering_names */
enum ibuf_use_t {IBUF_USE_NONE = 0,IBUF_USE_INSERT,             /* insert */IBUF_USE_DELETE_MARK,        /* delete */IBUF_USE_INSERT_DELETE_MARK, /* insert+delete */IBUF_USE_DELETE,             /* delete+purge */IBUF_USE_ALL                 /* insert+delete+purge */
};/** Operations that can currently be buffered. */
extern ulong innodb_change_buffering;/ ** The insert buffer control structure  * /
extern ibuf_t * ibuf;
/** Insert buffer struct */
struct ibuf_t {ulint size;          /*!< current size of the ibuf indextree, in pages */ulint max_size;      /*!< recommended maximum size of theibuf index tree, in pages */ulint seg_size;      /*!< allocated pages of the filesegment containing ibuf header andtree */bool empty;          /*!< Protected by the pagelatch of the root page of theinsert buffer tree(FSP_IBUF_TREE_ROOT_PAGE_NO). trueif and only if the insertbuffer tree is empty. */ulint free_list_len; /*!< length of the free list */ulint height;        /*!< tree height */dict_index_t *index; /*!< insert buffer index */std::atomic<ulint> n_merges; /*!< number of pages merged */std::atomic<ulint> n_merged_ops[IBUF_OP_COUNT];/*!< number of operations of each typemerged to index pages */std::atomic<ulint> n_discarded_ops[IBUF_OP_COUNT];/*!< number of operations of each typediscarded without merging due to thetablespace being deleted or theindex being dropped * /
};

然后再看一下相关的宏定义,包括位图BITMAP中的相关定义:

/** @name Offsets to the per-page bits in the insert buffer bitmap */
/** @{ */
#define IBUF_BITMAP_FREE     \0 /*!< Bits indicating the \amount of free space */
#define IBUF_BITMAP_BUFFERED        \2 /*!< TRUE if there are buffered \changes for the page */
#define IBUF_BITMAP_IBUF           \3 /*!< TRUE if page is a part of \the ibuf tree, excluding the   \root page, or is in the free   \list of the ibuf */
/** @} */#define IBUF_REC_FIELD_SPACE    \0 /*!< in the pre-4.1 format, \the page number. later, the space_id */
#define IBUF_REC_FIELD_MARKER        \1 /*!< starting with 4.1, a marker \consisting of 1 byte that is 0 */
#define IBUF_REC_FIELD_PAGE                                   \2                               /*!< starting with 4.1, the \page number */
#define IBUF_REC_FIELD_METADATA 3 /* the metadata field */
#define IBUF_REC_FIELD_USER 4     /* first user field *//* Various constants for checking the type of an ibuf record and extracting
data from it. For details, see the description of the record format at the
top of this file. *//** @name Format of the IBUF_REC_FIELD_METADATA of an insert buffer record
The fourth column in the MySQL 5.5 format contains an operation
type, counter, and some flags. */
#define IBUF_REC_INFO_SIZE               \4 /*!< Combined size of info fields at \the beginning of the fourth field */
#if IBUF_REC_INFO_SIZE >= DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE
#error "IBUF_REC_INFO_SIZE >= DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE"
#endif/* Offsets for the fields at the beginning of the fourth field */
#define IBUF_REC_OFFSET_COUNTER 0 /*!< Operation counter */
#define IBUF_REC_OFFSET_TYPE 2    /*!< Type of operation */
#define IBUF_REC_OFFSET_FLAGS 3   /*!< Additional flags *//* Record flag masks */
#define IBUF_REC_COMPACT           \0x1 /*!< Set in                  \IBUF_REC_OFFSET_FLAGS if the \user index is in COMPACT     \format or later *//** The mutex used to block pessimistic inserts to ibuf trees */
static ib_mutex_t ibuf_pessimistic_insert_mutex;/** The mutex protecting the insert buffer structs */
static ib_mutex_t ibuf_mutex;/** The mutex protecting the insert buffer bitmaps */
static ib_mutex_t ibuf_bitmap_mutex;/** The area in pages from which contract looks for page numbers for merge */
const ulint IBUF_MERGE_AREA = 8;/** Inside the merge area, pages which have at most 1 per this number less
buffered entries compared to maximum volume that can buffered for a single
page are merged along with the page whose buffer became full */
const ulint IBUF_MERGE_THRESHOLD = 4;/** In ibuf_contract at most this number of pages is read to memory in one
batch, in order to merge the entries for them in the insert buffer */
const ulint IBUF_MAX_N_PAGES_MERGED = IBUF_MERGE_AREA;/** If the combined size of the ibuf trees exceeds ibuf->max_size by this
many pages, we start to contract it in connection to inserts there, using
non-synchronous contract */
const ulint IBUF_CONTRACT_ON_INSERT_NON_SYNC = 0;/** If the combined size of the ibuf trees exceeds ibuf->max_size by this
many pages, we start to contract it in connection to inserts there, using
synchronous contract */
const ulint IBUF_CONTRACT_ON_INSERT_SYNC = 5;/** If the combined size of the ibuf trees exceeds ibuf->max_size by
this many pages, we start to contract it synchronous contract, but do
not insert */
const ulint IBUF_CONTRACT_DO_NOT_INSERT = 10;

需要注意的是,Change Buffer是可以配置何时使用的,也就是说,可以启用也可以不启用。

四、ADaptive Hash Index

看完了两个重要的Buffer,再看一下HASH相关:

/* The hash table structure */
struct hash_table_t {enum hash_table_sync_t type; /*!< type of hash_table. */
#if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
#ifndef UNIV_HOTBACKUPibool adaptive;     /* TRUE if this is the hashtable of the adaptive hashindex */
#endif                /* !UNIV_HOTBACKUP */
#endif                /* UNIV_AHI_DEBUG || UNIV_DEBUG */ulint n_cells;      /* number of cells in the hash table */hash_cell_t *cells; /*!< pointer to cell array */
#ifndef UNIV_HOTBACKUPulint n_sync_obj; /* if sync_objs != NULL, thenthe number of either the numberof mutexes or the number ofrw_locks depending on the type.Must be a power of 2 */union {ib_mutex_t *mutexes; /* NULL, or an array of mutexesused to protect segments of thehash table */rw_lock_t *rw_locks; /* NULL, or an array of rw_lcoksused to protect segments of thehash table */} sync_obj;mem_heap_t **heaps; /*!< if this is non-NULL, hashchain nodes for external chainingcan be allocated from these memoryheaps; there are then n_mutexesmany of these heaps */
#endif                /* !UNIV_HOTBACKUP */mem_heap_t *heap;
#ifdef UNIV_DEBUGulint magic_n;
#define HASH_TABLE_MAGIC_N 76561114
#endif /* UNIV_DEBUG * /
}

其上还封装了一个数据结构:

/** The hash index system */
struct btr_search_sys_t {hash_table_t **hash_tables; /*!< the adaptive hash tables,mapping dtuple_fold valuesto rec_t pointers on index pages * /
};

同时,为了创建AHI进行数据处理过程中何时触发等等条件还有一个数据结构:

/** The search info struct in an index */
struct btr_search_t {ulint ref_count; /*!< Number of blocks in this index treethat have search index builti.e. block->index points to this index.Protected by search latch exceptwhen during initialization inbtr_search_info_create(). *//** @{ The following fields are not protected by any latch.Unfortunately, this means that they must be aligned tothe machine word, i.e., they cannot be turned into bit-fields. */buf_block_t *root_guess; /*!< the root page frame when it was last timefetched, or NULL */ulint hash_analysis;     /*!< when this exceedsBTR_SEARCH_HASH_ANALYSIS, the hashanalysis starts; this is reset if nosuccess noticed */ibool last_hash_succ;    /*!< TRUE if the last search would havesucceeded, or did succeed, using the hashindex; NOTE that the value here is not exact:it is not calculated for every search, and thecalculation itself is not always accurate! */ulint n_hash_potential;/*!< number of consecutive searcheswhich would have succeeded, or did succeed,using the hash index;the range is 0 .. BTR_SEARCH_BUILD_LIMIT + 5 *//** @} *//**---------------------- @{ */ulint n_fields;  /*!< recommended prefix length for hash search:number of full fields */ulint n_bytes;   /*!< recommended prefix: number of bytes inan incomplete field@see BTR_PAGE_MAX_REC_SIZE */ibool left_side; /*!< TRUE or FALSE, depending on whetherthe leftmost record of several records withthe same prefix should be indexed in thehash index *//*---------------------- @} */
#ifdef UNIV_SEARCH_PERF_STATulint n_hash_succ; /*!< number of successful hash searches thusfar */ulint n_hash_fail; /*!< number of failed hash searches */ulint n_patt_succ; /*!< number of successful pattern searches thusfar */ulint n_searches;  /*!< number of searches */
#endif               /* UNIV_SEARCH_PERF_STAT */
#ifdef UNIV_DEBUGulint magic_n; /*!< magic number @see BTR_SEARCH_MAGIC_N */
/** value of btr_search_t::magic_n, used in assertions */
#define BTR_SEARCH_MAGIC_N 1112765
#endif /* UNIV_DEBUG * /
};

换一句话说,先通过btr_search_t处理数据,如果触发(需要达到一定次数17次)AHI则调用btr_search_sys_t来处理到hash_table_t数据结构中。

五、Log Buffer

redo log是非常重要的一个日志系统,它同样使用了LOG BUFFER,看一下相关的数据定义:

/** Logging modes for a mini-transaction */
enum mtr_log_t {/** Default mode: log all operations modifying disk-based data */MTR_LOG_ALL = 0,/** Log no operations and dirty pages are not added to the flush list */MTR_LOG_NONE = 1,/** Don't generate REDO log but add dirty pages to flush list */MTR_LOG_NO_REDO = 2,/** Inserts are logged in a shorter form */MTR_LOG_SHORT_INSERTS = 3,/** Last element */MTR_LOG_MODE_MAX = 4
};/** @name Log item types
The log items are declared 'byte' so that the compiler can warn if val
and type parameters are switched in a call to mlog_write_ulint. NOTE!
For 1 - 8 bytes, the flag value must give the length also! @{ */
enum mlog_id_t {/** if the mtr contains only one log record for one page,i.e., write_initial_log_record has been called only once,this flag is ORed to the type of that first log record */MLOG_SINGLE_REC_FLAG = 128,/** one byte is written */MLOG_1BYTE = 1,/** 2 bytes ... */MLOG_2BYTES = 2,/** 4 bytes ... */MLOG_4BYTES = 4,/** 8 bytes ... */MLOG_8BYTES = 8,/** Record insert */MLOG_REC_INSERT = 9,/** Mark clustered index record deleted */MLOG_REC_CLUST_DELETE_MARK = 10,/** Mark secondary index record deleted */MLOG_REC_SEC_DELETE_MARK = 11,/** update of a record, preserves record field sizes */MLOG_REC_UPDATE_IN_PLACE = 13,/*!< Delete a record from a page */MLOG_REC_DELETE = 14,/** Delete record list end on index page */MLOG_LIST_END_DELETE = 15,/** Delete record list start on index page */MLOG_LIST_START_DELETE = 16,/** Copy record list end to a new created index page */MLOG_LIST_END_COPY_CREATED = 17,/** Reorganize an index page in ROW_FORMAT=REDUNDANT */MLOG_PAGE_REORGANIZE = 18,/** Create an index page */MLOG_PAGE_CREATE = 19,/** Insert entry in an undo log */MLOG_UNDO_INSERT = 20,/** erase an undo log page end */MLOG_UNDO_ERASE_END = 21,/** initialize a page in an undo log */MLOG_UNDO_INIT = 22,/** reuse an insert undo log header */MLOG_UNDO_HDR_REUSE = 24,/** create an undo log header */MLOG_UNDO_HDR_CREATE = 25,/** mark an index record as the predefined minimum record */MLOG_REC_MIN_MARK = 26,/** initialize an ibuf bitmap page */MLOG_IBUF_BITMAP_INIT = 27,#ifdef UNIV_LOG_LSN_DEBUG/** Current LSN */MLOG_LSN = 28,
#endif /* UNIV_LOG_LSN_DEBUG *//** this means that a file page is taken into use and the priorcontents of the page should be ignored: in recovery we must nottrust the lsn values stored to the file page.Note: it's deprecated because it causes crash recovery problemin bulk create index, and actually we don't need to reset pagelsn in recv_recover_page_func() now. */MLOG_INIT_FILE_PAGE = 29,/** write a string to a page */MLOG_WRITE_STRING = 30,/** If a single mtr writes several log records, this logrecord ends the sequence of these records */MLOG_MULTI_REC_END = 31,/** dummy log record used to pad a log block full */MLOG_DUMMY_RECORD = 32,/** log record about creating an .ibd file, with format */MLOG_FILE_CREATE = 33,/** rename a tablespace file that starts with (space_id,page_no) */MLOG_FILE_RENAME = 34,/** delete a tablespace file that starts with (space_id,page_no) */MLOG_FILE_DELETE = 35,/** mark a compact index record as the predefined minimum record */MLOG_COMP_REC_MIN_MARK = 36,/** create a compact index page */MLOG_COMP_PAGE_CREATE = 37,/** compact record insert */MLOG_COMP_REC_INSERT = 38,/** mark compact clustered index record deleted */MLOG_COMP_REC_CLUST_DELETE_MARK = 39,/** mark compact secondary index record deleted; this logrecord type is redundant, as MLOG_REC_SEC_DELETE_MARK isindependent of the record format. */MLOG_COMP_REC_SEC_DELETE_MARK = 40,/** update of a compact record, preserves record field sizes */MLOG_COMP_REC_UPDATE_IN_PLACE = 41,/** delete a compact record from a page */MLOG_COMP_REC_DELETE = 42,/** delete compact record list end on index page */MLOG_COMP_LIST_END_DELETE = 43,/*** delete compact record list start on index page */MLOG_COMP_LIST_START_DELETE = 44,/** copy compact record list end to a new created index page */MLOG_COMP_LIST_END_COPY_CREATED = 45,/** reorganize an index page */MLOG_COMP_PAGE_REORGANIZE = 46,/** write the node pointer of a record on a compressednon-leaf B-tree page */MLOG_ZIP_WRITE_NODE_PTR = 48,/** write the BLOB pointer of an externally stored columnon a compressed page */MLOG_ZIP_WRITE_BLOB_PTR = 49,/** write to compressed page header */MLOG_ZIP_WRITE_HEADER = 50,/** compress an index page */MLOG_ZIP_PAGE_COMPRESS = 51,/** compress an index page without logging it's image */MLOG_ZIP_PAGE_COMPRESS_NO_DATA = 52,/** reorganize a compressed page */MLOG_ZIP_PAGE_REORGANIZE = 53,/** Create a R-Tree index page */MLOG_PAGE_CREATE_RTREE = 57,/** create a R-tree compact page */MLOG_COMP_PAGE_CREATE_RTREE = 58,/** this means that a file page is taken into use.We use it to replace MLOG_INIT_FILE_PAGE. */MLOG_INIT_FILE_PAGE2 = 59,/** Table is being truncated. (Marked only for file-per-table) *//* MLOG_TRUNCATE = 60,  Disabled for WL6378 *//** notify that an index tree is being loaded without writingredo log about individual pages */MLOG_INDEX_LOAD = 61,/** log for some persistent dynamic metadata change */MLOG_TABLE_DYNAMIC_META = 62,/** create a SDI index page */MLOG_PAGE_CREATE_SDI = 63,/** create a SDI compact page */MLOG_COMP_PAGE_CREATE_SDI = 64,/** Extend the space */MLOG_FILE_EXTEND = 65,/** Used in tests of redo log. It must never be used outside unit tests. */MLOG_TEST = 66,/** biggest value (used in assertions) * /MLOG_BIGGEST_TYPE = MLOG_TEST
}

通过上述的枚举类型可以看出相关的LOG类型有六十多种,所以不用全部一一分析到位,把其中一个弄清楚了,基本就都清楚了。
在log0types.h这个头文件中,定义了大量的与日志相关的数据结构,这里只举一部分:

typedef uint64_t lsn_t;/** Print format for lsn_t values, used in functions like printf. */
#define LSN_PF UINT64PF/** Alias for atomic based on lsn_t. */
using atomic_lsn_t = std::atomic<lsn_t>;/** Type used for sn values, which enumerate bytes of data stored in the log.
Note that these values skip bytes of headers and footers of log blocks. */
typedef uint64_t sn_t;/** Alias for atomic based on sn_t. */
using atomic_sn_t = std::atomic<sn_t>;/** Type used for checkpoint numbers (consecutive checkpoints receive
a number which is increased by one). */
typedef uint64_t checkpoint_no_t;/** Type used for counters in log_t: flushes_requested and flushes_expected.
They represent number of requests to flush the redo log to disk. */
typedef std::atomic<int64_t> log_flushes_t;/** Function used to calculate checksums of log blocks. */
typedef std::atomic<uint32_t (*)(const byte *log_block)> log_checksum_func_t;/** Clock used to measure time spent in redo log (e.g. when flushing). */
using Log_clock = std::chrono::high_resolution_clock;/** Time point defined by the Log_clock. */
using Log_clock_point = std::chrono::time_point<Log_clock>;/** Supported redo log formats. Stored in LOG_HEADER_FORMAT. */
enum log_header_format_t {/** The MySQL 5.7.9 redo log format identifier. We can support recoveryfrom this format if the redo log is clean (logically empty). */LOG_HEADER_FORMAT_5_7_9 = 1,/** Remove MLOG_FILE_NAME and MLOG_CHECKPOINT, introduce MLOG_FILE_OPENredo log record. */LOG_HEADER_FORMAT_8_0_1 = 2,/** Allow checkpoint_lsn to point any data byte within redo log (beforeit had to point the beginning of a group of log records). */LOG_HEADER_FORMAT_8_0_3 = 3,/** Expand ulint compressed form. */LOG_HEADER_FORMAT_8_0_19 = 4,/** The redo log format identifiercorresponding to the current format version. */LOG_HEADER_FORMAT_CURRENT = LOG_HEADER_FORMAT_8_0_19
};/** The state of a log group */
enum class log_state_t {/** No corruption detected */OK,/** Corrupted */CORRUPTED
};/** The recovery implementation. */
struct redo_recover_t;struct Log_handle {lsn_t start_lsn;lsn_t end_lsn;
};/** Redo log - single data structure with state of the redo log system.
In future, one could consider splitting this to multiple data structures. */
struct alignas(ut::INNODB_CACHE_LINE_SIZE) log_t {/**************************************************/ /**@name Users writing to log buffer*******************************************************//** @{ */#ifndef UNIV_HOTBACKUP/** Event used for locking sn */os_event_t sn_lock_event;#ifdef UNIV_PFS_RWLOCK/** The instrumentation hook */struct PSI_rwlock *pfs_psi;
#endif /* UNIV_PFS_RWLOCK */
#ifdef UNIV_DEBUG/** The rw_lock instance only for the debug info list *//* NOTE: Just "rw_lock_t sn_lock_inst;" and direct minimum initializationseem to hit the bug of Sun Studio of Solaris. */rw_lock_t *sn_lock_inst;
#endif /* UNIV_DEBUG *//** Current sn value. Used to reserve space in the redo log,and used to acquire an exclusive access to the log buffer.Represents number of data bytes that have ever been reserved.Bytes of headers and footers of log blocks are not included.Its highest bit is used for locking the access to the log buffer. */MY_COMPILER_DIAGNOSTIC_PUSH()MY_COMPILER_CLANG_WORKAROUND_REF_DOCBUG()/**@see @ref subsect_redo_log_sn */MY_COMPILER_DIAGNOSTIC_PUSH()alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_sn_t sn;/** Intended sn value while x-locked. */atomic_sn_t sn_locked;/** Mutex which can be used for x-lock sn value */mutable ib_mutex_t sn_x_lock_mutex;/** Padding after the _sn to avoid false sharing issues forconstants below (due to changes of sn). */alignas(ut::INNODB_CACHE_LINE_SIZE)/** Pointer to the log buffer, aligned up to OS_FILE_LOG_BLOCK_SIZE.The alignment is to ensure that buffer parts specified for file IO writeoperations will be aligned to sector size, which is required e.g. onWindows when doing unbuffered file access.Protected by: locking sn not to add. */aligned_array_pointer<byte, OS_FILE_LOG_BLOCK_SIZE> buf;/** Size of the log buffer expressed in number of data bytes,that is excluding bytes for headers and footers of log blocks. */atomic_sn_t buf_size_sn;/** Size of the log buffer expressed in number of total bytes,that is including bytes for headers and footers of log blocks. */size_t buf_size;alignas(ut::INNODB_CACHE_LINE_SIZE)/** The recent written buffer.Protected by: locking sn not to add. */Link_buf<lsn_t> recent_written;/** Used for pausing the log writer threads.When paused, each user thread should write log as in the former version. */std::atomic_bool writer_threads_paused;/** Some threads waiting for the ready for write lsn by closer_event. */lsn_t current_ready_waiting_lsn;/** current_ready_waiting_lsn is waited using this sig_count. */int64_t current_ready_waiting_sig_count;alignas(ut::INNODB_CACHE_LINE_SIZE)/** The recent closed buffer.Protected by: locking sn not to add. */Link_buf<lsn_t> recent_closed;alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Users <=> writer*******************************************************//** @{ *//** Maximum sn up to which there is free space in both the log bufferand the log files. This is limitation for the end of any write to thelog buffer. Threads, which are limited need to wait, and possibly theyhold latches of dirty pages making a deadlock possible.Protected by: writer_mutex (writes). */atomic_sn_t buf_limit_sn;/** Up to this lsn, data has been written to disk (fsync not required).Protected by: writer_mutex (writes). */MY_COMPILER_DIAGNOSTIC_PUSH()MY_COMPILER_CLANG_WORKAROUND_REF_DOCBUG()/*@see @ref subsect_redo_log_write_lsn */MY_COMPILER_DIAGNOSTIC_POP()alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_lsn_t write_lsn;alignas(ut::INNODB_CACHE_LINE_SIZE)/** Unaligned pointer to array with events, which are used fornotifications sent from the log write notifier thread to user threads.The notifications are sent when write_lsn is advanced. User threadswait for write_lsn >= lsn, for some lsn. Log writer advances thewrite_lsn and notifies the log write notifier, which notifies all usersinterested in nearby lsn values (lsn belonging to the same log block).Note that false wake-ups are possible, in which case user threadssimply retry waiting. */os_event_t *write_events;/** Number of entries in the array with writer_events. */size_t write_events_size;/** Approx. number of requests to write/flush redo since startup. */alignas(ut::INNODB_CACHE_LINE_SIZE)std::atomic<uint64_t> write_to_file_requests_total;/** How often redo write/flush is requested in average.Measures in microseconds. Log threads do not spin whenthe write/flush requests are not frequent. */alignas(ut::INNODB_CACHE_LINE_SIZE)std::atomic<uint64_t> write_to_file_requests_interval;/** This padding is probably not needed, left for convenience. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Users <=> flusher*******************************************************//** @{ *//** Unaligned pointer to array with events, which are used fornotifications sent from the log flush notifier thread to user threads.The notifications are sent when flushed_to_disk_lsn is advanced.User threads wait for flushed_to_disk_lsn >= lsn, for some lsn.Log flusher advances the flushed_to_disk_lsn and notifies thelog flush notifier, which notifies all users interested in nearby lsnvalues (lsn belonging to the same log block). Note that falsewake-ups are possible, in which case user threads simply retrywaiting. */os_event_t *flush_events;/** Number of entries in the array with events. */size_t flush_events_size;/** This event is in the reset state when a flush is running;a thread should wait for this without owning any of redo mutexes,but NOTE that to reset this event, the thread MUST own the writer_mutex */os_event_t old_flush_event;/** Padding before the frequently updated flushed_to_disk_lsn. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** Up to this lsn data has been flushed to disk (fsynced). */atomic_lsn_t flushed_to_disk_lsn;/** Padding after the frequently updated flushed_to_disk_lsn. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log flusher thread*******************************************************//** @{ *//** Last flush start time. Updated just before fsync starts. */Log_clock_point last_flush_start_time;/** Last flush end time. Updated just after fsync is finished.If smaller than start time, then flush operation is pending. */Log_clock_point last_flush_end_time;/** Flushing average time (in microseconds). */double flush_avg_time;/** Mutex which can be used to pause log flusher thread. */mutable ib_mutex_t flusher_mutex;alignas(ut::INNODB_CACHE_LINE_SIZE)os_event_t flusher_event;/** Padding to avoid any dependency between the log flusherand the log writer threads. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log writer thread*******************************************************//** @{ *//** Space id for pages with log blocks. */space_id_t files_space_id;/** Size of buffer used for the write-ahead (in bytes). */uint32_t write_ahead_buf_size;/** Aligned pointer to buffer used for the write-ahead. It is aligned tosystem page size (why?) and is currently limited by constant 64KB. */aligned_array_pointer<byte, 64 * 1024> write_ahead_buf;/** Up to this file offset in the log files, the write-aheadhas been done or is not required (for any other reason). */uint64_t write_ahead_end_offset;/** Aligned buffers for file headers. */aligned_array_pointer<byte, OS_FILE_LOG_BLOCK_SIZE> *file_header_bufs;
#endif /* !UNIV_HOTBACKUP *//** Some lsn value within the current log file. */lsn_t current_file_lsn;/** File offset for the current_file_lsn. */uint64_t current_file_real_offset;/** Up to this file offset we are within the same current log file. */uint64_t current_file_end_offset;/** Number of performed IO operations (only for printing stats). */uint64_t n_log_ios;/** Size of each single log file (expressed in bytes, includingfile header). */uint64_t file_size;/** Number of log files. */uint32_t n_files;/** Total capacity of all the log files (file_size * n_files),including headers of the log files. */uint64_t files_real_capacity;/** Capacity of redo log files for log writer thread. The log writerdoes not to exceed this value. If space is not reclaimed after 1 secwait, it writes only as much as can fit the free space or crashes ifthere is no free space at all (checkpoint did not advance for 1 sec). */lsn_t lsn_capacity_for_writer;/** When this margin is being used, the log writer decides to increasethe concurrency_margin to stop new incoming mini-transactions earlier,on bigger margin. This is used to provide adaptive concurrency margincalculation, which we need because we might have unlimited threadconcurrency setting or we could miss some log_free_check() calls.It is just best effort to help getting out of the troubles. */lsn_t extra_margin;/** True if we haven't increased the concurrency_margin since we entered(lsn_capacity_for_margin_inc..lsn_capacity_for_writer] range. This allowsto increase the margin only once per issue and wait until the issue becomesresolved, still having an option to increase margin even more, if new issuecomes later. */bool concurrency_margin_ok;/** Maximum allowed concurrency_margin. We never set higher, even when weincrease the concurrency_margin in the adaptive solution. */lsn_t max_concurrency_margin;#ifndef UNIV_HOTBACKUP/** Mutex which can be used to pause log writer thread. */mutable ib_mutex_t writer_mutex;alignas(ut::INNODB_CACHE_LINE_SIZE)os_event_t writer_event;/** Padding after section for the log writer thread, to avoid anydependency between the log writer and the log closer threads. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log closer thread*******************************************************//** @{ *//** Event used by the log closer thread to wait for tasks. */os_event_t closer_event;/** Mutex which can be used to pause log closer thread. */mutable ib_mutex_t closer_mutex;/** Padding after the log closer thread and before the memory usedfor communication between the log flusher and notifier threads. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log flusher <=> flush_notifier*******************************************************//** @{ *//** Event used by the log flusher thread to notify the log flushnotifier thread, that it should proceed with notifying user threadswaiting for the advanced flushed_to_disk_lsn (because it has beenadvanced). */os_event_t flush_notifier_event;/** The next flushed_to_disk_lsn can be waited using this sig_count. */int64_t current_flush_sig_count;/** Mutex which can be used to pause log flush notifier thread. */mutable ib_mutex_t flush_notifier_mutex;/** Padding. */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log writer <=> write_notifier*******************************************************//** @{ *//** Mutex which can be used to pause log write notifier thread. */mutable ib_mutex_t write_notifier_mutex;alignas(ut::INNODB_CACHE_LINE_SIZE)/** Event used by the log writer thread to notify the log writenotifier thread, that it should proceed with notifying user threadswaiting for the advanced write_lsn (because it has been advanced). */os_event_t write_notifier_event;alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Maintenance*******************************************************//** @{ *//** Used for stopping the log background threads. */std::atomic_bool should_stop_threads;/** Event used for pausing the log writer threads. */os_event_t writer_threads_resume_event;/** Used for resuming write notifier thread */atomic_lsn_t write_notifier_resume_lsn;/** Used for resuming flush notifier thread */atomic_lsn_t flush_notifier_resume_lsn;/** Number of total I/O operations performed when we printedthe statistics last time. */mutable uint64_t n_log_ios_old;/** Wall time when we printed the statistics last time. */mutable time_t last_printout_time;/** @} *//**************************************************/ /**@name Recovery*******************************************************//** @{ *//** Lsn from which recovery has been started. */lsn_t recovered_lsn;/** Format of the redo log: e.g., LOG_HEADER_FORMAT_CURRENT. */uint32_t format;/** Corruption status. */log_state_t state;/** Used only in recovery: recovery scan succeeded up to this lsn. */lsn_t scanned_lsn;#ifdef UNIV_DEBUG/** When this is set, writing to the redo log should be disabled.We check for this in functions that write to the redo log. */bool disable_redo_writes;/** DEBUG only - if we copied or initialized the first block in buffer,this is set to lsn for which we did that. We later ensure that we startthe redo log at the same lsn. Else it is zero and we would crash whentrying to start redo then. */lsn_t first_block_is_correct_for_lsn;#endif /* UNIV_DEBUG */alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Fields protected by the log_limits mutex.Related to free space in the redo log.*******************************************************//** @{ *//** Mutex which protects fields: available_for_checkpoint_lsn,requested_checkpoint_lsn. It also synchronizes updates of:free_check_limit_sn, concurrency_margin and dict_persist_margin.It also protects the srv_checkpoint_disabled (together with thecheckpointer_mutex). */mutable ib_mutex_t limits_mutex;/** A new checkpoint could be written for this lsn value.Up to this lsn value, all dirty pages have been added to flushlists and flushed. Updated in the log checkpointer thread bytaking minimum oldest_modification out of the last dirty pagesfrom each flush list. However it will not be bigger than thecurrent value of log.buf_dirty_pages_added_up_to_lsn.Read by: user threads when requesting fuzzy checkpointRead by: log_print() (printing status of redo)Updated by: log_checkpointerProtected by: limits_mutex. */MY_COMPILER_DIAGNOSTIC_PUSH()MY_COMPILER_CLANG_WORKAROUND_REF_DOCBUG()/**@see @ref subsect_redo_log_available_for_checkpoint_lsn */MY_COMPILER_DIAGNOSTIC_POP()lsn_t available_for_checkpoint_lsn;/** When this is larger than the latest checkpoint, the log checkpointerthread will be forced to write a new checkpoint (unless the new latestcheckpoint lsn would still be smaller than this value).Read by: log_checkpointerUpdated by: user threads (log_free_check() or for sharp checkpoint)Protected by: limits_mutex. */lsn_t requested_checkpoint_lsn;/** Maximum lsn allowed for checkpoint by dict_persist or zero.This will be set by dict_persist_to_dd_table_buffer(), which shouldbe always called before really making a checkpoint.If non-zero, up to this lsn value, dynamic metadata changes have beenwritten back to mysql.innodb_dynamic_metadata under dict_persist->mutexprotection. All dynamic metadata changes after this lsn have tobe kept in redo logs, but not discarded. If zero, just ignore it.Updated by: DD (when persisting dynamic meta data)Updated by: log_checkpointer (reset when checkpoint is written)Protected by: limits_mutex. */lsn_t dict_max_allowed_checkpoint_lsn;/** If should perform checkpoints every innodb_log_checkpoint_every ms.Disabled during startup / shutdown. Enabled in srv_start_threads.Updated by: starting thread (srv_start_threads)Read by: log_checkpointer */bool periodical_checkpoints_enabled;/** Maximum sn up to which there is free space in the redo log.Threads check this limit and compare to current log.sn, when theyare outside mini-transactions and hold no latches. The formula usedto compute the limitation takes into account maximum size of mtr andthread concurrency to include proper margins and avoid issues withrace condition (in which all threads check the limitation and thenall proceed with their mini-transactions). Also extra margin isthere for dd table buffer cache (dict_persist_margin).Read by: user threads (log_free_check())Updated by: log_checkpointer (after update of checkpoint_lsn)Updated by: log_writer (after increasing concurrency_margin)Updated by: DD (after update of dict_persist_margin)Protected by (updates only): limits_mutex. */atomic_sn_t free_check_limit_sn;/** Margin used in calculation of @see free_check_limit_sn.Read by: page_cleaners, log_checkpointerUpdated by: log_writerProtected by (updates only): limits_mutex. */atomic_sn_t concurrency_margin;/** Margin used in calculation of @see free_check_limit_sn.Read by: page_cleaners, log_checkpointerUpdated by: DDProtected by (updates only): limits_mutex. */atomic_sn_t dict_persist_margin;alignas(ut::INNODB_CACHE_LINE_SIZE)/** @} *//**************************************************/ /**@name Log checkpointer thread*******************************************************//** @{ *//** Event used by the log checkpointer thread to wait for requests. */os_event_t checkpointer_event;/** Mutex which can be used to pause log checkpointer thread.This is used by log_position_lock() together with log_buffer_x_lock(),to pause any changes to current_lsn or last_checkpoint_lsn. */mutable ib_mutex_t checkpointer_mutex;/** Latest checkpoint lsn.Read by: user threads, log_print (no protection)Read by: log_writer (under writer_mutex)Updated by: log_checkpointer (under both mutexes)Protected by (updates only): checkpointer_mutex + writer_mutex. */MY_COMPILER_DIAGNOSTIC_PUSH()MY_COMPILER_CLANG_WORKAROUND_REF_DOCBUG()/**@see @ref subsect_redo_log_last_checkpoint_lsn */MY_COMPILER_DIAGNOSTIC_POP()atomic_lsn_t last_checkpoint_lsn;/** Next checkpoint number.Read by: log_get_last_block (no protection)Read by: log_writer (under writer_mutex)Updated by: log_checkpointer (under both mutexes)Protected by: checkpoint_mutex + writer_mutex. */std::atomic<checkpoint_no_t> next_checkpoint_no;/** Latest checkpoint wall time.Used by (private): log_checkpointer. */Log_clock_point last_checkpoint_time;/** Aligned buffer used for writing a checkpoint header. It is alignedsimilarly to log.buf.Used by (private): log_checkpointer, recovery code */aligned_array_pointer<byte, OS_FILE_LOG_BLOCK_SIZE> checkpoint_buf;/** @} *//**************************************************/ /**@name Fields considered constant, updated when log systemis initialized (log_sys_init()) and not assigned toparticular log thread.*******************************************************//** @{ *//** Capacity of the log files available for log_free_check(). */lsn_t lsn_capacity_for_free_check;/** Capacity of log files excluding headers of the log files.If the checkpoint age exceeds this, it is a serious error,because in such case we have already overwritten redo log. */lsn_t lsn_real_capacity;/** When the oldest dirty page age exceeds this value, we startan asynchronous preflush of dirty pages. */lsn_t max_modified_age_async;/** When the oldest dirty page age exceeds this value, we starta synchronous flush of dirty pages. */lsn_t max_modified_age_sync;/** When checkpoint age exceeds this value, we write checkpointsif lag between oldest_lsn and checkpoint_lsn exceeds max_checkpoint_lag. */lsn_t max_checkpoint_age_async;/** @} *//** true if redo logging is disabled. Read and write with writer_mutex  */bool m_disable;/** true, if server is not recoverable. Read and write with writer_mutex */bool m_crash_unsafe;/** start LSN of first redo log file. * /lsn_t m_first_file_lsn;#endif /* !UNIV_HOTBACKUP * /
};

redo log本质是存储在文件系统中的,通过mtr(mini transaction)的数据形式刷到盘上。这个里面有一个mtr_t:

/** Mini-transaction handle and buffer */
struct mtr_t {/** State variables of the mtr */struct Impl {/** memo stack for locks etc. */mtr_buf_t m_memo;/** mini-transaction log */mtr_buf_t m_log;/** true if mtr has made at least one buffer pool page dirty */bool m_made_dirty;/** true if inside ibuf changes */bool m_inside_ibuf;/** true if the mini-transaction modified buffer pool pages */bool m_modifications;/** true if mtr is forced to NO_LOG mode because redo logging isdisabled globally. In this case, mtr increments the global counterat ::start and must decrement it back at ::commit. */bool m_marked_nolog;/** Shard index used for incrementing global counter at ::start. We needto use the same shard while decrementing counter at ::commit. */size_t m_shard_index;/** Count of how many page initial log records have beenwritten to the mtr log */ib_uint32_t m_n_log_recs;/** specifies which operations should be logged; defaultvalue MTR_LOG_ALL */mtr_log_t m_log_mode;/** State of the transaction */mtr_state_t m_state;/** Flush Observer */FlushObserver *m_flush_observer;#ifdef UNIV_DEBUG/** For checking corruption. */ulint m_magic_n;#endif /* UNIV_DEBUG *//** Owning mini-transaction */mtr_t *m_mtr;};#ifndef UNIV_HOTBACKUP/** mtr global logging */class Logging {public:/** mtr global redo logging state.Enable Logging  :[ENABLED] -> [ENABLED_RESTRICT] -> [DISABLED]Disable Logging :[DISABLED] -> [ENABLED_RESTRICT] -> [ENABLED_DBLWR] -> [ENABLED] */enum State : uint32_t {/* Redo Logging is enabled. Server is crash safe. */ENABLED,/* Redo logging is enabled. All non-logging mtr are finished with thepages flushed to disk. Double write is enabled. Some pages could bestill getting written to disk without double-write. Not safe to crash. */ENABLED_DBLWR,/* Redo logging is enabled but there could be some mtrs still runningin no logging mode. Redo archiving and clone are not allowed to start.No double-write */ENABLED_RESTRICT,/* Redo logging is disabled and all new mtrs would not generate any redo.Redo archiving and clone are not allowed. */DISABLED};/** Initialize logging state at server start up. */void init() {m_state.store(ENABLED);/* We use sharded counter and force sequentially consistent countingwhich is the general default for c++ atomic operation. If we try tooptimize it further specific to current operations, we could useRelease-Acquire ordering i.e. std::memory_order_release during countingand std::memory_order_acquire while checking for the count. However,sharding looks to be good enough for now and we should go for non defaultmemory ordering only with some visible proof for improvement. */m_count_nologging_mtr.set_order(std::memory_order_seq_cst);Counter::clear(m_count_nologging_mtr);}/** Disable mtr redo logging. Server is crash unsafe without logging.@param[in]   thd server connection THD@return mysql error code. */int disable(THD *thd);/** Enable mtr redo logging. Ensure that the server is crash safebefore returning.@param[in]   thd server connection THD@return mysql error code. */int enable(THD *thd);/** Mark a no-logging mtr to indicate that it would not generate redo logand system is crash unsafe.@return true iff logging is disabled and mtr is marked. */bool mark_mtr(size_t index) {/* Have initial check to avoid incrementing global counter for regularcase when redo logging is enabled. */if (is_disabled()) {/* Increment counter to restrict state change DISABLED to ENABLED. */Counter::inc(m_count_nologging_mtr, index);/* Check if the no-logging is still disabled. At this point, if wefind the state disabled, it is no longer possible for the state moveback to enabled till the mtr finishes and we unmark the mtr. */if (is_disabled()) {return (true);}Counter::dec(m_count_nologging_mtr, index);}return (false);}/** unmark a no logging mtr. */void unmark_mtr(size_t index) {ut_ad(!is_enabled());ut_ad(Counter::total(m_count_nologging_mtr) > 0);Counter::dec(m_count_nologging_mtr, index);}/* @return flush loop count for faster response when logging is disabled. */uint32_t get_nolog_flush_loop() const { return (NOLOG_MAX_FLUSH_LOOP); }/** @return true iff redo logging is enabled and server is crash safe. */bool is_enabled() const { return (m_state.load() == ENABLED); }/** @return true iff redo logging is disabled and new mtrs are not goingto generate redo log. */bool is_disabled() const { return (m_state.load() == DISABLED); }/** @return true iff we can skip data page double write. */bool dblwr_disabled() const {auto state = m_state.load();return (state == DISABLED || state == ENABLED_RESTRICT);}/* Force faster flush loop for quicker adaptive flush response when loggingis disabled. When redo logging is disabled the system operates faster withdirty pages generated at much faster rate. */static constexpr uint32_t NOLOG_MAX_FLUSH_LOOP = 5;private:/** Wait till all no-logging mtrs are finished.@return mysql error code. */int wait_no_log_mtr(THD *thd);private:/** Global redo logging state. */std::atomic<State> m_state;using Shards = Counter::Shards<128>;/** Number of no logging mtrs currently running. */Shards m_count_nologging_mtr;};/** Check if redo logging is disabled globally and markthe global counter till mtr ends. */void check_nolog_and_mark();/** Check if the mtr has marked the global no log counter andunmark it. */void check_nolog_and_unmark();
#endif /* !UNIV_HOTBACKUP */mtr_t() {m_impl.m_state = MTR_STATE_INIT;m_impl.m_marked_nolog = false;m_impl.m_shard_index = 0;}~mtr_t() {
#ifdef UNIV_DEBUGswitch (m_impl.m_state) {case MTR_STATE_ACTIVE:ut_ad(m_impl.m_memo.size() == 0);ut_d(remove_from_debug_list());break;case MTR_STATE_INIT:case MTR_STATE_COMMITTED:break;case MTR_STATE_COMMITTING:ut_error;}
#endif /* UNIV_DEBUG */
#ifndef UNIV_HOTBACKUP/* Safety check in case mtr is not committed. */if (m_impl.m_state != MTR_STATE_INIT) {check_nolog_and_unmark();}
#endif /* !UNIV_HOTBACKUP */}

注意其中有一个日志定义:typedef dyn_buf_t<DYN_ARRAY_DATA_SIZE> mtr_buf_t;
然后是log_t这个数据结构:

struct alignas(ut::INNODB_CACHE_LINE_SIZE) log_t {
#ifndef UNIV_HOTBACKUP/**************************************************/ /**@name Users writing to log buffer*******************************************************//** @{ *//** Event used for locking sn */os_event_t sn_lock_event;#ifdef UNIV_PFS_RWLOCK/** The instrumentation hook */struct PSI_rwlock *pfs_psi;
#endif /* UNIV_PFS_RWLOCK */
#ifdef UNIV_DEBUG/** The rw_lock instance only for the debug info list *//* NOTE: Just "rw_lock_t sn_lock_inst;" and direct minimum initializationseem to hit the bug of Sun Studio of Solaris. */rw_lock_t *sn_lock_inst;
#endif /* UNIV_DEBUG *//** Current sn value. Used to reserve space in the redo log,and used to acquire an exclusive access to the log buffer.Represents number of data bytes that have ever been reserved.Bytes of headers and footers of log blocks are not included.Its highest bit is used for locking the access to the log buffer. */alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_sn_t sn;/** Intended sn value while x-locked. */atomic_sn_t sn_locked;/** Mutex which can be used for x-lock sn value */mutable ib_mutex_t sn_x_lock_mutex;/** Aligned log buffer. Committing mini-transactions write thereredo records, and the log_writer thread writes the log buffer todisk in background.Protected by: locking sn not to add. */alignas(ut::INNODB_CACHE_LINE_SIZE)ut::aligned_array_pointer<byte, LOG_BUFFER_ALIGNMENT> buf;/** Size of the log buffer expressed in number of data bytes,that is excluding bytes for headers and footers of log blocks. */atomic_sn_t buf_size_sn;/** Size of the log buffer expressed in number of total bytes,that is including bytes for headers and footers of log blocks. */size_t buf_size;/** The recent written buffer.Protected by: locking sn not to add. */alignas(ut::INNODB_CACHE_LINE_SIZE) Link_buf<lsn_t> recent_written;/** Used for pausing the log writer threads.When paused, each user thread should write log as in the former version. */std::atomic_bool writer_threads_paused;/** Some threads waiting for the ready for write lsn by closer_event. */lsn_t current_ready_waiting_lsn;/** current_ready_waiting_lsn is waited using this sig_count. */int64_t current_ready_waiting_sig_count;/** The recent closed buffer.Protected by: locking sn not to add. */alignas(ut::INNODB_CACHE_LINE_SIZE) Link_buf<lsn_t> recent_closed;/** @} *//**************************************************/ /**@name Users <=> writer*******************************************************//** @{ *//** Maximum sn up to which there is free space in both the log bufferand the log files. This is limitation for the end of any write to thelog buffer. Threads, which are limited need to wait, and possibly theyhold latches of dirty pages making a deadlock possible.Protected by: writer_mutex (writes). */alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_sn_t buf_limit_sn;/** Up to this lsn, data has been written to disk (fsync not required).Protected by: writer_mutex (writes). */alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_lsn_t write_lsn;/** Unaligned pointer to array with events, which are used fornotifications sent from the log write notifier thread to user threads.The notifications are sent when write_lsn is advanced. User threadswait for write_lsn >= lsn, for some lsn. Log writer advances thewrite_lsn and notifies the log write notifier, which notifies all usersinterested in nearby lsn values (lsn belonging to the same log block).Note that false wake-ups are possible, in which case user threadssimply retry waiting. */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t *write_events;/** Number of entries in the array with writer_events. */size_t write_events_size;/** Approx. number of requests to write/flush redo since startup. */alignas(ut::INNODB_CACHE_LINE_SIZE)std::atomic<uint64_t> write_to_file_requests_total;/** How often redo write/flush is requested in average.Measures in microseconds. Log threads do not spin whenthe write/flush requests are not frequent. */alignas(ut::INNODB_CACHE_LINE_SIZE)std::atomic<std::chrono::microseconds> write_to_file_requests_interval;static_assert(decltype(write_to_file_requests_interval)::is_always_lock_free);/** @} *//**************************************************/ /**@name Users <=> flusher*******************************************************//** @{ *//** Unaligned pointer to array with events, which are used fornotifications sent from the log flush notifier thread to user threads.The notifications are sent when flushed_to_disk_lsn is advanced.User threads wait for flushed_to_disk_lsn >= lsn, for some lsn.Log flusher advances the flushed_to_disk_lsn and notifies thelog flush notifier, which notifies all users interested in nearby lsnvalues (lsn belonging to the same log block). Note that falsewake-ups are possible, in which case user threads simply retrywaiting. */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t *flush_events;/** Number of entries in the array with events. */size_t flush_events_size;/** This event is in the reset state when a flush is running;a thread should wait for this without owning any of redo mutexes,but NOTE that to reset this event, the thread MUST own the writer_mutex */os_event_t old_flush_event;/** Up to this lsn data has been flushed to disk (fsynced). */alignas(ut::INNODB_CACHE_LINE_SIZE) atomic_lsn_t flushed_to_disk_lsn;/** @} *//**************************************************/ /**@name Log flusher thread*******************************************************//** @{ *//** Last flush start time. Updated just before fsync starts. */alignas(ut::INNODB_CACHE_LINE_SIZE) Log_clock_point last_flush_start_time;/** Last flush end time. Updated just after fsync is finished.If smaller than start time, then flush operation is pending. */Log_clock_point last_flush_end_time;/** Flushing average time (in microseconds). */double flush_avg_time;/** Mutex which can be used to pause log flusher thread. */mutable ib_mutex_t flusher_mutex;alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t flusher_event;/** @} *//**************************************************/ /**@name Log writer thread*******************************************************//** @{ *//** Size of buffer used for the write-ahead (in bytes). */alignas(ut::INNODB_CACHE_LINE_SIZE) uint32_t write_ahead_buf_size;/** Aligned buffer used for some of redo log writes. Data is copiedthere from the log buffer and written to disk, in following cases:- when writing ahead full kernel page to avoid read-on-write issue,- to copy, prepare and write the incomplete block of the log buffer(because mini-transactions might be writing new redo records tothe block in parallel, when the block is being written to disk) */ut::aligned_array_pointer<byte, LOG_WRITE_AHEAD_BUFFER_ALIGNMENT>write_ahead_buf;/** Up to this file offset in the log files, the write-aheadhas been done or is not required (for any other reason). */os_offset_t write_ahead_end_offset;/** File within which write_lsn is located, so the newest file in m_filesin the same time - updates are protected by the m_files_mutex. This fieldexists, because the log_writer thread needs to locate offsets each timeit writes data blocks to disk, but we do not want to acquire and releasethe m_files_mutex for each such write, because that would slow down thelog_writer thread a lot. Instead of that, the log_writer uses this objectto locate the offsets.Updates of this field require two mutexes: writer_mutex and m_files_mutex.Its m_id is updated only when the write_lsn moves to the next log file. */Log_file m_current_file{m_files_ctx, m_encryption_metadata};/** Handle for the opened m_current_file. The log_writer uses this handleto do writes (protected by writer_mutex). The log_flusher uses this handleto do fsyncs (protected by flusher_mutex). Both these threads might usethis handle in parallel. The required synchronization between writes andfsyncs will happen on the OS side. When m_current_file is repointed toother file, this field is also updated, in the same critical section.Updates of this field are protected by: writer_mutex, m_files_mutexand flusher_mutex acquired all together. The reason for flusher_mutexis to avoid a need to acquire / release m_files_mutex in the log_flusherthread for each fsync. Instead of that, the log_flusher thread keeps thelog_flusher_mutex, which is released less often, but still prevents fromupdates of this field. */Log_file_handle m_current_file_handle{m_encryption_metadata};/** True iff the log writer has entered extra writer margin and stillhasn't exited since then. Each time the log_writer enters that margin,it pauses all user threads at log_free_check() calls and emits warningto the log. When the writer exits the extra margin, notice is emitted.Protected by: log_limits_mutex and writer_mutex. */bool m_writer_inside_extra_margin;#endif /* !UNIV_HOTBACKUP *//** Number of performed IO operations (only for printing stats). */uint64_t n_log_ios;#ifndef UNIV_HOTBACKUP/** Mutex which can be used to pause log writer thread. */mutable ib_mutex_t writer_mutex;#ifdef UNIV_DEBUG/** THD used by the log_writer thread. */THD *m_writer_thd;
#endif /* UNIV_DEBUG */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t writer_event;/** A recently seen value of log_consumer_get_oldest()->get_consumed_lsn().It serves as a lower bound for future values of this expression, because it isguaranteed to be monotonic in time: each individual consumer can only goforward, and new consumers must start at least from checkpoint lsn, and thecheckpointer is always one of the consumers.Protected by: writer_mutex. */lsn_t m_oldest_need_lsn_lowerbound;/** @} *//**************************************************/ /**@name Log closer thread*******************************************************//** @{ *//** Event used by the log closer thread to wait for tasks. */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t closer_event;/** Mutex which can be used to pause log closer thread. */mutable ib_mutex_t closer_mutex;/** @} *//**************************************************/ /**@name Log flusher <=> flush_notifier*******************************************************//** @{ *//** Event used by the log flusher thread to notify the log flushnotifier thread, that it should proceed with notifying user threadswaiting for the advanced flushed_to_disk_lsn (because it has beenadvanced). */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t flush_notifier_event;/** The next flushed_to_disk_lsn can be waited using this sig_count. */int64_t current_flush_sig_count;/** Mutex which can be used to pause log flush notifier thread. */mutable ib_mutex_t flush_notifier_mutex;/** @} *//**************************************************/ /**@name Log writer <=> write_notifier*******************************************************//** @{ *//** Mutex which can be used to pause log write notifier thread. */alignas(ut::INNODB_CACHE_LINE_SIZE) mutable ib_mutex_t write_notifier_mutex;/** Event used by the log writer thread to notify the log writenotifier thread, that it should proceed with notifying user threadswaiting for the advanced write_lsn (because it has been advanced). */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t write_notifier_event;/** @} *//**************************************************/ /**@name Log files management*******************************************************//** @{ *//** Mutex protecting set of existing log files and their meta data. */alignas(ut::INNODB_CACHE_LINE_SIZE) mutable ib_mutex_t m_files_mutex;/** Context for all operations on redo log files from log0files_io.h. */Log_files_context m_files_ctx;/** The in-memory dictionary of log files.Protected by: m_files_mutex. */Log_files_dict m_files{m_files_ctx};/** Number of existing unused files (those with _tmp suffix).Protected by: m_files_mutex. */size_t m_unused_files_count;/** Size of each unused redo log file, to which recently all unusedredo log files became resized. Expressed in bytes. */os_offset_t m_unused_file_size;/** Capacity limits for the redo log. Responsible for resize.Mutex protection is decided per each Log_files_capacity method. */Log_files_capacity m_capacity;/** True iff log_writer is waiting for a next log file available.Protected by: m_files_mutex. */bool m_requested_files_consumption;/** Statistics related to redo log files consumption and creation.Protected by: m_files_mutex. */Log_files_stats m_files_stats;/** Event used by log files governor thread to wait. */os_event_t m_files_governor_event;/** Event used by other threads to wait until log files governor finishedits next iteration. This is useful when some sys_var gets changed to waituntil log files governor re-computed everything and then check if theconcurrency_margin is safe to emit warning if needed (the warning wouldstill belong to the sys_var's SET GLOBAL statement then). */os_event_t m_files_governor_iteration_event;/** False if log files governor thread is allowed to add new redo records.This is set as intention, to tell the log files governor about what it isallowed to do. To ensure that the log_files_governor is aware of what hasbeen told, user needs to wait on @see m_no_more_dummy_records_promised. */std::atomic_bool m_no_more_dummy_records_requested;/** False if the log files governor thread is allowed to add new dummy redorecords. This is set to true only by the log_files_governor thread, andafter it observed @see m_no_more_dummy_records_requested being true.It can be used to wait until the log files governor thread promises not togenerate any more dummy redo records. */std::atomic_bool m_no_more_dummy_records_promised;#ifdef UNIV_DEBUG/** THD used by the log_files_governor thread. */THD *m_files_governor_thd;
#endif /* UNIV_DEBUG *//** Event used for waiting on next file available. Used by log writerthread to wait when it needs to produce a next log file but there areno free (consumed) log files available. */os_event_t m_file_removed_event;/** Buffer that contains encryption meta data encrypted with master key.Protected by: m_files_mutex */byte m_encryption_buf[OS_FILE_LOG_BLOCK_SIZE];#endif /* !UNIV_HOTBACKUP *//** Encryption metadata. This member is passed to Log_file_handle objectscreated for redo log files. In particular, the m_current_file_handle hasa reference to this field. When encryption metadata is updated, it needsto be written to the redo log file's header. Also, each write performedby the log_writer thread needs to use m_encryption_metadata (it's passedby reference to the m_current_file_handle) and the log_writer does notacquire m_files_mutex for its writes (it is a hot path and it's better tokeep it shorter). Therefore it's been decided that updates of this fieldrequire both m_files_mutex and writer_mutex.Protected by: m_files_mutex, writer_mutex */Encryption_metadata m_encryption_metadata;#ifndef UNIV_HOTBACKUP/** @} *//**************************************************/ /**@name Consumers*******************************************************//** @{ *//** Set of registered redo log consumers. Note, that this objectis not responsible for freeing them (does not claim to be owner).If you wanted to register or unregister a redo log consumer, thenplease use following functions: @see log_consumer_register() and@see log_consumer_unregister(). The details of implementationrelated to redo log consumers can be found in log0consumer.cc.Protected by: m_files_mutex (unless it is the startup phase orthe shutdown phase). */ut::unordered_set<Log_consumer *> m_consumers;/** @} *//**************************************************/ /**@name Maintenance*******************************************************//** @{ *//** Used for stopping the log background threads. */alignas(ut::INNODB_CACHE_LINE_SIZE) std::atomic_bool should_stop_threads;/** Event used for pausing the log writer threads. */os_event_t writer_threads_resume_event;/** Used for resuming write notifier thread */atomic_lsn_t write_notifier_resume_lsn;/** Used for resuming flush notifier thread */atomic_lsn_t flush_notifier_resume_lsn;/** Number of total I/O operations performed when we printedthe statistics last time. */mutable uint64_t n_log_ios_old;/** Wall time when we printed the statistics last time. */mutable time_t last_printout_time;/** @} *//**************************************************/ /**@name Recovery*******************************************************//** @{ *//** Lsn from which recovery has been started. */lsn_t recovered_lsn;/** Format of the redo log: e.g., Log_format::CURRENT. */Log_format m_format;/** Log creator name */std::string m_creator_name;/** Log flags */Log_flags m_log_flags;/** Log UUID */Log_uuid m_log_uuid;/** Used only in recovery: recovery scan succeeded up to this lsn. */lsn_t m_scanned_lsn;#ifdef UNIV_DEBUG/** When this is set, writing to the redo log should be disabled.We check for this in functions that write to the redo log. */bool disable_redo_writes;/** DEBUG only - if we copied or initialized the first block in buffer,this is set to lsn for which we did that. We later ensure that we startthe redo log at the same lsn. Else it is zero and we would crash whentrying to start redo then. */lsn_t first_block_is_correct_for_lsn;#endif /* UNIV_DEBUG *//** @} *//**************************************************/ /**@name Fields protected by the log_limits_mutex.Related to free space in the redo log.*******************************************************//** @{ *//** Mutex which protects fields: available_for_checkpoint_lsn,requested_checkpoint_lsn. It also synchronizes updates of:free_check_limit_sn, concurrency_margin, dict_persist_margin.It protects reads and writes of m_writer_inside_extra_margin.It also protects the srv_checkpoint_disabled (together with thecheckpointer_mutex). */alignas(ut::INNODB_CACHE_LINE_SIZE) mutable ib_mutex_t limits_mutex;/** A new checkpoint could be written for this lsn value.Up to this lsn value, all dirty pages have been added to flushlists and flushed. Updated in the log checkpointer thread bytaking minimum oldest_modification out of the last dirty pagesfrom each flush list. However it will not be bigger than thecurrent value of log.buf_dirty_pages_added_up_to_lsn.Read by: user threads when requesting fuzzy checkpointRead by: log_print() (printing status of redo)Updated by: log_checkpointerProtected by: limits_mutex. */lsn_t available_for_checkpoint_lsn;/** When this is larger than the latest checkpoint, the log checkpointerthread will be forced to write a new checkpoint (unless the new latestcheckpoint lsn would still be smaller than this value).Read by: log_checkpointerUpdated by: user threads (log_free_check() or for sharp checkpoint)Protected by: limits_mutex. */lsn_t requested_checkpoint_lsn;/** Maximum lsn allowed for checkpoint by dict_persist or zero.This will be set by dict_persist_to_dd_table_buffer(), which shouldbe always called before really making a checkpoint.If non-zero, up to this lsn value, dynamic metadata changes have beenwritten back to mysql.innodb_dynamic_metadata under dict_persist->mutexprotection. All dynamic metadata changes after this lsn have tobe kept in redo logs, but not discarded. If zero, just ignore it.Updated by: DD (when persisting dynamic meta data)Updated by: log_checkpointer (reset when checkpoint is written)Protected by: limits_mutex. */lsn_t dict_max_allowed_checkpoint_lsn;/** If should perform checkpoints every innodb_log_checkpoint_every ms.Disabled during startup / shutdown. Enabled in srv_start_threads.Updated by: starting thread (srv_start_threads)Read by: log_checkpointer */bool periodical_checkpoints_enabled;/** If checkpoints are allowed. When this is set to false, neither newcheckpoints might be written nor lsn available for checkpoint might beupdated. This is useful in recovery period, when neither flush lists canbe trusted nor DD dynamic metadata redo records might be reclaimed.This is never set from true to false after log_start(). */std::atomic_bool m_allow_checkpoints;/** Maximum sn up to which there is free space in the redo log.Threads check this limit and compare to current log.sn, when theyare outside mini-transactions and hold no latches. The formula usedto compute the limitation takes into account maximum size of mtr andthread concurrency to include proper margins and avoid issues withrace condition (in which all threads check the limitation and thenall proceed with their mini-transactions). Also extra margin isthere for dd table buffer cache (dict_persist_margin).Read by: user threads (log_free_check())Updated by: log_checkpointer (after update of checkpoint_lsn)Updated by: log_writer (after pausing/resuming user threads)Updated by: DD (after update of dict_persist_margin)Protected by (updates only): limits_mutex. */atomic_sn_t free_check_limit_sn;/** Margin used in calculation of @see free_check_limit_sn.Protected by (updates only): limits_mutex. */atomic_sn_t concurrency_margin;/** True iff current concurrency_margin isn't truncated because of too smallredo log capacity.Protected by (updates only): limits_mutex. */std::atomic<bool> concurrency_margin_is_safe;/** Margin used in calculation of @see free_check_limit_sn.Read by: page_cleaners, log_checkpointerUpdated by: DDProtected by (updates only): limits_mutex. */atomic_sn_t dict_persist_margin;/** @} *//**************************************************/ /**@name Log checkpointer thread*******************************************************//** @{ *//** Event used by the log checkpointer thread to wait for requests. */alignas(ut::INNODB_CACHE_LINE_SIZE) os_event_t checkpointer_event;/** Mutex which can be used to pause log checkpointer thread.This is used by log_position_lock() together with log_buffer_x_lock(),to pause any changes to current_lsn or last_checkpoint_lsn. */mutable ib_mutex_t checkpointer_mutex;/** Latest checkpoint lsn.Read by: user threads, log_print (no protection)Read by: log_writer (under writer_mutex)Updated by: log_checkpointer (under both mutexes)Protected by (updates only): checkpointer_mutex + writer_mutex. */atomic_lsn_t last_checkpoint_lsn;/** Next checkpoint header to use.Updated by: log_checkpointerProtected by: checkpointer_mutex */Log_checkpoint_header_no next_checkpoint_header_no;/** Event signaled when last_checkpoint_lsn is advanced bythe log_checkpointer thread. */os_event_t next_checkpoint_event;/** Latest checkpoint wall time.Used by (private): log_checkpointer. */Log_clock_point last_checkpoint_time;/** Redo log consumer which is always registered and which is responsiblefor protecting redo log records at lsn >= last_checkpoint_lsn. */Log_checkpoint_consumer m_checkpoint_consumer{*this};#ifdef UNIV_DEBUG/** THD used by the log_checkpointer thread. */THD *m_checkpointer_thd;
#endif /* UNIV_DEBUG *//** @} */#endif /* !UNIV_HOTBACKUP */
};

为了防止日志空洞,使用了一个数据结构link_buf:

template <typename Position = uint64_t>
class Link_buf {public:/** Type used to express distance between two positions.It could become a parameter of template if it was useful.However there is no such need currently. */typedef Position Distance;/** Constructs the link buffer. Allocated memory for the links.Initializes the tail pointer with 0.@param[in] capacity    number of slots in the ring buffer */explicit Link_buf(size_t capacity);Link_buf();Link_buf(Link_buf &&rhs);Link_buf(const Link_buf &rhs) = delete;Link_buf &operator=(Link_buf &&rhs);Link_buf &operator=(const Link_buf &rhs) = delete;/** Destructs the link buffer. Deallocates memory for the links. */~Link_buf();/** Add a directed link between two given positions. It is user'sresponsibility to ensure that there is space for the link. This isbecause it can be useful to ensure much earlier that there is space.@param[in] from    position where the link starts@param[in]   to  position where the link ends (from -> to) */void add_link(Position from, Position to);/** Add a directed link between two given positions. It is user'sresponsibility to ensure that there is space for the link. This isbecause it can be useful to ensure much earlier that there is space.In addition, advances the tail pointer in the buffer if possible.@param[in]    from    position where the link starts@param[in]    to      position where the link ends (from -> to) */void add_link_advance_tail(Position from, Position to);/** Advances the tail pointer in the buffer by following connectedpath created by links. Starts at current position of the pointer.Stops when the provided function returns true.@param[in]    stop_condition  function used as a stop condition;(lsn_t prev, lsn_t next) -> bool;returns false if we should followthe link prev->next, true to stop@param[in]    max_retry       max fails to retry@return true if and only if the pointer has been advanced */template <typename Stop_condition>bool advance_tail_until(Stop_condition stop_condition,uint32_t max_retry = 1);/** Advances the tail pointer in the buffer without additionalcondition for stop. Stops at missing outgoing link.@see advance_tail_until()@return true if and only if the pointer has been advanced */bool advance_tail();/** @return capacity of the ring buffer */size_t capacity() const;/** @return the tail pointer */Position tail() const;/** Checks if there is space to add link at given position.User has to use this function before adding the link, andshould wait until the free space exists.@param[in]    position    position to check@return true if and only if the space is free */bool has_space(Position position);/** Validates (using assertions) that there are no links setin the range [begin, end). */void validate_no_links(Position begin, Position end);/** Validates (using assertions) that there no links at all. */void validate_no_links();private:/** Translates position expressed in original unit to positionin the m_links (which is a ring buffer).@param[in] position    position in original unit@return position in the m_links */size_t slot_index(Position position) const;/** Computes next position by looking into slots array andfollowing single link which starts in provided position.@param[in]    position    position to start@param[out]   next        computed next position@return false if there was no link, true otherwise */bool next_position(Position position, Position &next);/** Deallocated memory, if it was allocated. */void free();/** Capacity of the buffer. */size_t m_capacity;/** Pointer to the ring buffer (unaligned). */std::atomic<Distance> *m_links;/** Tail pointer in the buffer (expressed in original unit). * /alignas(ut::INNODB_CACHE_LINE_SIZE) std::atomic<Position> m_tail;
};

日志是比较复杂的,Redo Log(重做日志)与 LSN(log sequece number)二者的联系,在8.0中的并行操作等等,其中redo log提交的无锁化也就是并发的写log buffer,同样,link_buf的引入就是为了解决在并发写的时候导致log buffer并发写产生的空洞。这个有点和共识的写入有点相似的感觉。

六、总结

这篇分析了InnoDB中的内存缓冲用到的基本数据结构,关于内存使用的流程和数据结构管理的相关部分代码,会在下一篇继续分析。在前面分析了MEM_ROOT,可以看到,MEM_ROOT是基于Server层的内存管理的,包括 Sharing和线程内存。这篇分析的则是数据库引擎InnoDB层的内存管理,但是他们底层都是调用的同样的内存分析接口(sbrk,mmap等)。很多的数据结构在一开始遇到后可能会感到有些无法区分它们之间的关系,但认真分析设计层面就可以清晰的划分出二者的应用场景。就如同样是拧螺丝的,有人拧的是电视机的,有人则是电冰箱的。
知道了这一点,也就明白了,MySql数据库本身占用的内存为什么远远大于InnoDB Buffer Pools配置的参数的大小的值的原因。
追根溯源,理清脉络,这是阅读代码最简单有效的方式。

mysql源码分析——InnoDB的内存结构源码相关推荐

  1. mysql源码分析——InnoDB的内存结构分析

    一.基本的数据结构 在InnoDB中,数据的分配和存储也有自己的数据结构,在前面分析过MySql中的内存管理,但是内存管理是有一个不断抽象的过程.在InnoDB中还会有一层自己的内存管理.在InnoD ...

  2. mysql源码分析——InnoDB的内存应用整体架构源码

    一.基本介绍 在前面基本把几个缓冲的创建应用的源码搞定了.但是在宏观层次上的使用是怎么设计的呢?这篇就分析一下Buffer Pool的整体应用框架,其它的如果有时间再慢慢一一补齐,重点还是要把MySq ...

  3. FFmpeg源码分析-直播延迟-内存泄漏

    FFmpeg源码分析-直播延迟-内存泄漏|FFmpeg源码分析方法|ffmpeg播放为什么容易产生延迟|解复用.解码内存泄漏分析 专注后台服务器开发,包括C/C++,Linux,Nginx,ZeroM ...

  4. live555 源码分析:RTSPServer 组件结构

    前面几篇文章分析了 live555 中 RTSP 的处理逻辑,RTSP 处理有关组件的处理逻辑有点复杂,本文就再来梳理一下它们之间的关系. live555 中 RTSP 处理有关组件关系如下图: 事件 ...

  5. Spark 源码分析之ShuffleMapTask内存数据Spill和合并

    Spark 源码分析之ShuffleMapTask内存数据Spill和合并 更多资源分享 SPARK 源码分析技术分享(视频汇总套装视频): https://www.bilibili.com/vide ...

  6. Android源码分析(十一)-----Android源码中如何引用aar文件

    一:aar文件如何引用 系统Settings中引用bidehelper-1.1.12.aar 文件为例 源码地址:packages/apps/Settings/Android.mk LOCAL_PAT ...

  7. Android源码分析-PackageManagerService(PMS)源码分析(三)- queryIntentActivities函数来查找activity

    queryIntentActivities函数的作用: 在Android应用程序开发中,用startActivity可以开启另外一个Activity或应用.startActivity函数必须包含Int ...

  8. v03.06 鸿蒙内核源码分析(时钟任务) | 调度的源动力从哪来 | 百篇博客分析HarmonyOS源码

    子曰:"巧言.令色.足恭,左丘明耻之,丘亦耻之.匿怨而友其人,左丘明耻之,丘亦耻之."<论语>:公冶长篇 百篇博客系列篇.本篇为: v03.xx 鸿蒙内核源码分析(时钟 ...

  9. 十年老架构师神级推荐,MyBatis源码分析,再也不用为源码担忧了

    十年老架构师神级推荐,MyBatis源码分析,再也不用为源码担忧了 前言 MyBatis是一个优秀的持久层ORM框架,它对jdbc的操作数据库的过程进行封装,使开发者只需要关注SQL 本身,而不需要花 ...

最新文章

  1. 请求rest接口返回中文乱码
  2. iOS 问题整理04----Runtime
  3. 云服务能力评估“国标”出炉,腾讯云TStack首批通过私有云“一级能力”认证
  4. mysql 磁盘i o 优化_经典案例:磁盘I/O巨高排查全过程
  5. 简练软考知识点整理-建设项目团队
  6. 带宽检测工具iftop
  7. MySQL中查某一字段包含某一字符的个数
  8. C#设计模式(13)——代理模式(Proxy Pattern)
  9. @Bean和@Componet区别
  10. JAVA-初步认识-第四章-其他流程控制语句
  11. c语言仿宋gb2312字体,仿宋gb2312字体
  12. win10动态桌面_需要2020考研倒计时的动态桌面源的亲们在此留言
  13. 新倩女幽魂服务器维护,《倩女幽魂Online》更新公告(版本1.0.23)
  14. [转载]从凡事细节看日本人的精致生活
  15. 中银涨停,A股牛市要启动了吗?
  16. sql server 帐户当前被锁定,所以用户 sa 登录失败。系统管理员无法将该帐户解锁 解决方法
  17. Spring配置SessionFactory
  18. KAZE FEATURES
  19. android 点击提示音,Android获取手机默认模式/提示音/响铃
  20. 群辉docker阿里云ipv6域名解析

热门文章

  1. ProtoPie vs. Axure:哪个最适合制作原型?
  2. CSS3 实现 Loading 动画
  3. 嵌入式开发<单片机软件升级>
  4. SIFT特征匹配算法——Vlfeat与Graphviz安装的相关问题解决
  5. Linux下PostSQL编译安装
  6. windows平台的剪贴板管理工具Clipx和插件
  7. vb数据库编程精华例题分享
  8. 《操作系统真象还原》第二章 ---- 编写MBR主引导记录 初尝编写的快乐
  9. Shell while语句
  10. 投资品讲解及国债逆回购