AbiWord Documentation

【转】

Introduction

One of the major bits of AbiWord word processing code is the Piece Table

PieceTable

Todo:
Add more class names / links to sources.

Introduction

A pt_PieceTable is the data structure used to represent the document. It presents an interface to access the document content as a sequence of (Unicode) characters. It includes an interface to access document structure and formatting information. It provides efficient editing operations, complete undo, and crash recovery.

Overview

The PieceTable consists of the following classs:

  1. InitialBuffer -- This is a read-only character array consisting of the entire character content of the document and initially read from the disk. (All XML tags and other non-content items are omitted from this buffer.)
  2. ChangeBuffer -- This is an append-only character array consisting of all character content inserted into the document during the editing session.
  3. InitialAttrPropTable -- This is a read-only table of Attribute/Property structures extracted from the original document.
  4. ChangeAttrPropTable -- This is an append-only table of Attribute/Property structures that are created during the editing session.

    在Abiword中以上信息主要在:pt_VarSet 类中定义。

  5. Piece -- This class represents a piece of the sequence of the document; that is, a contiguous sub-sequence having the same properties. Such as a span of text or an object (such as an in-line image). It contains a links to the previous and next Pieces in the document. Pieces are created in response to editing and formatting commands.  一个Piece相当于:pf_Frag 抽象类,Frag的类型可以是:文本、图片等。 
    1. TextPiece -- This subclass represents a span of contiguous text in one of the buffers. All text within the span has the same (CSS) properties. A TextPiece is not necessarily the longest contiguous span; it is possible to have adjacent (both in order and in buffer position) TextPieces with the same properties. A TextPiece contains a buffer offset and length for the location an size of the text and a flag to indicate which buffer. A TextPiece contains (or contains a link to) the text formatting information. Note that the buffer offset only gives the location of the content of the span in one of the buffers, it does not specify the absolute position of the span in the document.

      pf_Frag_Text 类。该类的所有文本必须具有相同的CSS属性。

    2. ObjectPiece -- This subclass represents an in-line object or image. It has no references to the buffers, but does provide a place-holder in the sequence.

      pf_Frag_Object 类。可以是图片等。

    3. StructurePiece -- This subclass represents a section or paragraph. It has no references to the buffers, but does provide (CSS) style information and a place-holder in the sequence. There are no links between StructurePieces or between other Pieces and their (containing) StructurePieces.

    pf_Frag_Strux 类。包括段落、章节等。

  6. PieceList -- This is doubly-linked list of Pieces. The are linked in document order. A forward traversal of this list will reveal the entire content of the document; in doing so, it may wildly jump around both of the buffers, but that is not an issue.

    pf_Fragments 类。是所有pf_Frag 类的容器,该容器是一个简单的双向链表的数据结构。

  7. PX_ChangeRecord -- Each editing and formatting change is represented as a ChangeRecord. A ChangeRecord represents an atomic change that was made to one or more pieces. This includes offset/length changes to a TextPiece and changes to the PieceList.
  8. ChangeVector -- This is a vector of ChangeRecords. This is used like a stack. ChangeRecords are appended to the vector (pushed onto the stack) as they are created in response to editing and formatting commands. The undo operation takes the last ChangeRecord in the vector and un-does its effect. A redo operation re-applies the ChangeRecord. The ChangeVector holds the complete information to undo all editing back to the initial document. The index of the current position in the ChangeVector is maintained. ChangeRecords are not removed from the vector until the redo is invalidated. When a ChangeRecord is removed from the vector, it is deleted.

px_ChangeHistory 类。该类可以执行undo 、redo 等操作。

Operations

  1. Insert(position,bAfter,c) -- To insert one or more characters c into the document (either before or after) the absolute document position position, we do the following:

    1. Append the character(s) to the ChangeBuffer.
    2. Find the TextPiece that spans the document position.
      • If the document position is in the middle of a TextPiece (p1), we split it into two TextPieces (p1ap1c) and create a third TextPiece (p1b). p1a and p1c contain the left and right portions referenced in p1p1b spans the newly-inserted character(s). The PieceList is updated so that the sequence p1a,p1b,p1c replace p1 in the list.
      • If the document position is at the end of a TextPiece and the buffer position in either buffer is contiguous with the buffer and position referenced in the TextPiece and the formatting is the same, we may avoid the three part split and simply update the offset/length in the TextPiece. This case is very likely when the user is composing text or is undoing a delete.
      • If the document position is between Pieces, a new TextPiece is created and inserted into the PieceList.
    3. Create a ChangeRecord and append it to the ChangeVector. For an insert, we construct a ChangeRecord of type InsertSpan.
      • cr.span.m_documentOffset contains the document position of the insertion.
      • cr.span.m_span marks the buffer position of the text that was inserted.
      • cr.span.m_bAfter remembers whether the insertion was before or after the document position.
  2. Delete(position,bAfter,length) -- To delete one or more characters from the document (either before or after) the absolute document position position, we do the following:
    1. Find the TextPiece that spans the document position.

      • If the length of characters is contained within the TextPiece (p1), we split it into two TextPieces (p1a and pl1b). The offsets and lengths are set in the new TextPieces such that the deleted sequence is not in either piece. (The deleted text is not actually deleted from the buffer; there are just no references to it from the PieceList.)
      • If the document position is at the beginning or end of a TextPiece, we can just adjust the offset/length, rather than doing the split.
      • If the deletion extends over multiple Pieces, we iterate over each piece in the range and perform a delete on the sub-sequence. This will result in a multi-step ChangeRecord.
      • TODO what about non-TextPieces??
    2. Create a ChangeRecord and append it to the ChangeVector. For a delete, we construct a ChangeRecord of type DeleteSpan.
      • cr.span.m_documentOffset contains the document position of the deletion.
      • cr.span.m_span marks the buffer position of the text that was deleted.
      • cr.span.m_bAfter remembers whether the insertion was before or after the document position.
  3. InsertFormatting()
  4. ChangeFormatting()
  5. Undo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is greater than zero, we have undo information. The information in the ChangeRecord prior to the CurrentPosition is used to undo the editing operation. After an undo the CurrentPosition is decremented.
    • If the ChangeRecord is of type InsertSpan: we perform a delete operation using cr.span.m_documentOffsetcr.span.m_span.m_length and cr.span.m_bAfter.
    • If the ChangeRecord is of type DeleteSpan: we perform an insert operation using cr.span.m_documentOffsetcr.span.m_span, and cr.span.m_bAfter.
    • If the ChangeRecord is of type ChangeFormatting:
    • If the ChangeRecord is of type InsertFormatting:
  6. Redo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is less than the length of the ChangeVector, the redo has not been invalidated and may be applied. The information in the ChangeRecord at the CurrentPosition provides complete information to describe the editing operation to be redone. After a redo the CurrentPosition is advanced.
  7. Autosave -- This can be implemented by periodically writing the ChangeBuffer, ChangeVector, and the ChangeAttrPropTable to temporary files. After a crash, the original document and the temporary files could be used to replay the editing operations and reconstruct the modified document.

Observations

  1. The content of the original file are never modified. Pieces in the PieceList describe the current document; the original content is referenced in a random access fashion. For systems with small memory or for very large documents, it may be worth demand loading blocks of the original content rather than loading it completly into the InitialBuffer.
  2. Document content data (in the two buffers) are never moved once written. insert and delete operations change the Pieces in the PieceList, but do not move or change the contents of the two buffers.
  3. The result of an undo operation must produce the identical document structure and content. Since consecutive Pieces in the PieceList may have the same formatting properties and may refer to congituous buffer locations (there is no requirement to coalesce them), an undo operation may produce a different PieceList than we originally had prior to doing the operation that was undone.
    • TODO Check this. Whether the PieceList should be identical or equivalent.

or Issues

  1. TextPieces represent spans of text that are convenient for the structure of the document and a result of the sequence of editing operations. They are not optimized for layout or display.

    • We can provide access methods to return a const char * into the buffers along with a length, which the caller could use in text drawing or measuring calls, but not c-style, zero-terminated strings.
  2. Mapping an absolute document position to a Piece involves a linear search of the PieceList to compute the absolute document position and find the correct Piece. The number of Pieces in a document is a function of the number of editing operations that have been performed in the session and of the complexity of the structure and formatting of the original document. A linear search might be painfully slow.
    • TODO We will find a tree-structure to use instead of the doubly-linked list that will give us O(log(n)) searching.
    • TODO Consider caching the last few lookup results so that we can avoid doing a search if possible. This should have a high hit-rate when the user is composing text.
  3. We provide a complete, but first-order undo with redo. That is, we do not put the undo-operation in the undo (like emacs).
  4. TODO The before and after stuff on insert and delete is a bit of a hand-wave.
  5. TODO Need to add multi-step-undo so that delete operations which span multiple pieces can be represented operation to the user.

Code

class PT_PieceTable
{const UT_UCSChar * m_InitialBuffer;const UT_UCSChar * m_ChangeBuffer;pt_PieceList * m_pieceList;pt_AttrPropTable m_InitialAttrPropTable;pt_AttrPropTable m_ChangeAttrPropTable;...
};
class pt_Piece
{enum PieceType { TextPiece,ObjectPiece,StructurePiece };PieceType  m_pieceType;<linked-list or tree pointers>...
};
class pt_Span
{UT_Bool        m_bInInitialBuffer;UT_uint32    m_offset;UT_uint32  m_length;
};
class pt_TextPiece : public pt_Piece
{pt_Span            m_span;pt_AttrPropReference m_apr;...
};
class pt_ObjectPiece : public pt_Piece
{...
};
class pt_StructurePiece : public pt_Piece
{pt_AttrPropReference   m_apr;...
};
class pt_PieceList
{<container for linked-list or tree structure>...
};
class pt_AttrPropReference
{UT_Bool        m_bInInitialTable;UT_uint32 m_index;...
};
class pt_AttrProp
{UT_HashTable * m_pAttributes;UT_HashTable * m_pProperties;...
};
class pt_AttrPropTable
{UT_vector<pt_AttrProp *> m_Table;...
};
class pt_ChangeRecord
{UT_Bool m_bMultiStepStart;UT_Bool m_bMultiStepEnd;
    enum ChangeType { InsertSpan,DeleteSpan,ChangeFormatting,InsertFormatting,...};struct {UT_uint32    m_documentOffset;UT_Bool        m_bAfter;pt_Span        m_span;} span;struct {UT_uint32 m_documentOffset1;UT_uint32 m_documentOffset2;pt_AttrPropReference  m_apr;} fmt;...
};
class pt_ChangeVector
{UT_vector  m_vecChangeRecords;UT_uint32    m_undoPosition;...
};

转载于:https://www.cnblogs.com/songtzu/p/3539739.html

AbiWord 中Piece Table 数据结构的实现相关推荐

  1. AbiWord 中Piece Table 数据结构的实现----AbiWord Documentation

    AbiWord Documentation[转] Introduction One of the major bits of AbiWord word processing code is the P ...

  2. MATLAB table数据结构 首篇

    MATLAB常用基本数据类型有:整型,浮点型,字符型,函数句柄,元胞数组和结构体数组.除了这些基本数据类型,MATLAB还有很多其它的数据类型不为人熟悉,这些数据类型在编程中也非常有用.MATLAB高 ...

  3. matlab如何创建table,MATLAB table数据结构 首篇

    >第2.5节,构造函数和类的名称相同).在containers.Map的介绍中,我们举了电话号码簿的例子,如表Table.3所示,它是我们这节要构造的table对象的原始数据 Table.3 电 ...

  4. 【ClickHouse 技术系列】- ClickHouse 中的嵌套数据结构

    简介:本文翻译自 Altinity 针对 ClickHouse 的系列技术文章.面向联机分析处理(OLAP)的开源分析引擎 ClickHouse,因其优良的查询性能,PB级的数据规模,简单的架构,被国 ...

  5. Python中常用的数据结构---哈希表(字典)

    Python中常用的数据结构-哈希表(字典) 常用的数据结构有数组.链表(一对一).栈和队列.哈希表.树(一对多).图(多对多)等结构. 在本目录下我们将讲解,通过python语言实现常用的数据结构. ...

  6. 【cocos2d-x】Lua中的table函数库

    一部分的table函数只对其数组部分产生影响, 而另一部分则对整个table均产生影响. 下面会分开说明. table.concat(table, sep,  start, end) concat是c ...

  7. Java基础-JAVA中常见的数据结构介绍

    Java基础-JAVA中常见的数据结构介绍 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.什么是数据结构 答:数据结构是指数据存储的组织方式.大致上分为线性表.栈(Stack) ...

  8. 转HTML中的table转为excel

    转换html 中的table 为excel,firefox浏览器支持,代码如下 <%@ page language="java" contentType="text ...

  9. ABAP中的Table Control编程

    SAP中,Table Control是在Screen中用的最广泛的控件之一了,可以实现对多行数据的编辑. 简单来说,Table Control是一组屏幕元素在Screen上的重复出现,这就是它与普通屏 ...

  10. Python中的高级数据结构详解

    这篇文章主要介绍了Python中的高级数据结构详解,本文讲解了Collection.Array.Heapq.Bisect.Weakref.Copy以及Pprint这些数据结构的用法,需要的朋友可以参考 ...

最新文章

  1. java 抓屏_java抓屏代码
  2. 脑机融合技术的哲学审思
  3. php网络相关的扩展,文章专题扩展功能组件
  4. Oracle sessions,processes 和 transactions 参数 关系 说明
  5. gateway动态路由_spring-cloud-gateway简介
  6. html mint ui,vue mint-ui初次使用总结
  7. 导航菜单设计五步法——B端设计指南
  8. kaggle入门竞赛之泰坦尼克事故存活预测(xgboost方法)
  9. left join 和 inner join
  10. android IO流_Flutter实战经验(十):打包和发布到 Android 平台
  11. 使用dropwizard(3)-加入DI-dagger2
  12. 数据的表示方法和运算方法
  13. ResNet详解(转)
  14. cocos2dx对于强大的RichText控制
  15. Elasticsearch 嵌套类型nested
  16. 自绘LISTVIEW的滚动条(Delphi实现)
  17. 打印机服务器启用后自动关闭,共享打印机点击打印就自动关闭该怎么办?
  18. AdminLTE与php,如何使用Vue整合AdminLTE模板
  19. 自学SpringBoot二之配置文件--yml格式配置
  20. NCURSES程序设计之魔数方阵

热门文章

  1. 10 分钟实现 Spring Boot 发送邮件功能
  2. 解决Navicat无法连接到MySQL的问题
  3. uniapp监听PDA激光扫描
  4. oracle异常:主动抛出自定义异常+捕获指定异常
  5. 机器学习分类算法_收藏:机器学习算法分类图谱及其优缺点分析
  6. swf缓存文件在哪里_移动web缓存介绍
  7. date类before()方法的主要作用是_过程(Sub)、函数(Function)、集合(Collection)作用范围...
  8. 谷粒商城:13.分布式基础篇总结
  9. HighCharts:为plotLines基准线添加label标签不显示
  10. Node.js:Express