Android ART Oat文件格式简析（下）

在上篇中，我们分析到了OatFile的begin_和end_变量分别被指定到了符号oatdata和oatlastword指定的位置。那么指定的这一段数据到底是什么呢？本文会接下来分析。

首先来看OatFile::Setup的实现：

bool OatFile::Setup() {if (!GetOatHeader().IsValid()) {LOG(WARNING) << "Invalid oat magic for " << GetLocation();return false;}......

GetOatHeader？这是什么东东？接下来看看OatFile::GetOatHeader的实现：

const OatHeader& OatFile::GetOatHeader() const {return *reinterpret_cast<const OatHeader*>(Begin());
}

很简单，就是将前面设置的begin_指向的内存，强制转换成OatHeader。OatHeader又是什么东西？翻出定义看看，只看成员变量定义：

private:uint8_t magic_[4];uint8_t version_[4];uint32_t adler32_checksum_;InstructionSet instruction_set_;uint32_t dex_file_count_;uint32_t executable_offset_;uint32_t interpreter_to_interpreter_bridge_offset_;uint32_t interpreter_to_compiled_code_bridge_offset_;uint32_t jni_dlsym_lookup_offset_;uint32_t portable_resolution_trampoline_offset_;uint32_t portable_to_interpreter_bridge_offset_;uint32_t quick_resolution_trampoline_offset_;uint32_t quick_to_interpreter_bridge_offset_;uint32_t image_file_location_oat_checksum_;uint32_t image_file_location_oat_data_begin_;uint32_t image_file_location_size_;uint8_t image_file_location_data_[0];  // note variable width data at endDISALLOW_COPY_AND_ASSIGN(OatHeader);
};

我们也对照着把文件中的内容拿出来看看，还是以系统的Boot Oat（system@framework@boot.oat）为例， oatdata值为0x60a9d000，而通过看Program Header，可以知道elf文件起始被指定映射到了0x60a9c000：

所以，可以知道，oatdata指向的位置在文件中的偏移是0x60a9d000-0x60a9c000=0x1000。拿出二进制编辑工具看看那里有什么：

顿时明白了一切，原来所谓oat文件，其实就是影藏在elf文件中的一个子文件，其有特殊的头和数据格式。我们对照着实际的数据一一做个分析：

1）首先是oat文件的magic code：

const uint8_t OatHeader::kOatMagic[] = { 'o', 'a', 't', '\n' };

2）然后是oat文件的版本号：

const uint8_t OatHeader::kOatVersion[] = { '0', '0', '7', '\0' };

3）接下来是什么checksum，不太重要，暂时忽略；

4）接下来指定oat文件支持的CPU指令集，有以下几种：

enum InstructionSet {kNone,kArm,kThumb2,kX86,kMips
};

实际文件中的值是2，对应于Thumb2指令集，也是ARM处理器支持的一种指令集啦。

5）接下来是指定Dex文件的数量，看来例子中的oat文件共包含15个Dex文件；

不明明是oat文件嘛，怎么又来Dex文件啦？其实oat文件是通过Dex文件转换过来的，并且oat文件包含了一个完整的初始的Dex文件，这也就解释了为什么Dex文件转换成oat文件后比原来要大很多。关于这点，先解释到这里，后面碰到的时候还会更具体的分析。

6）可执行部分的偏移，其值是0x18CE000。值得注意的，这里的偏移就不是相对于elf文件头的了，而是相对于oat文件头的。前面说过，oatdata的值为0x60a9d000，所以0x60A9D000+0x18CE000=0x6236B000，正好等于oatexec指向的地址：

而这个虚拟地址0x6236B000，正好是Program Header第三项指定段的起始地址，该段的属性是可读可执行（RE）。真好，全都对应上了。

7）接下来还有一堆偏移，后面碰到再解释；

8）紧接着这些offset的应该是关于Image的一些信息，例子中的文件这些都是0，所以跳过。

好的，解释完OatHeaader后，我们回到OatFile::Setup继续往下看：

  const byte* oat = Begin();oat += sizeof(OatHeader);if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found truncated OatHeader";return false;}oat += GetOatHeader().GetImageFileLocationSize();if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found truncated image file location: "<< reinterpret_cast<const void*>(Begin())<< "+" << sizeof(OatHeader)<< "+" << GetOatHeader().GetImageFileLocationSize()<< "<=" << reinterpret_cast<const void*>(End());return false;}......

这段代码比较容易理解，局部变量oat先获得Oat文件头的位置，然后加上OatHeader结构体的大小，最后还要加上记录Image文件位置的字符串的长度。这时oat变量指向的内存地址刚好就是跳过OatHeader后的位置，应该就是数据区了。好，接着看：

  for (size_t i = 0; i < GetOatHeader().GetDexFileCount(); i++) {size_t dex_file_location_size = *reinterpret_cast<const uint32_t*>(oat);if (dex_file_location_size == 0U) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " with empty location name";return false;}oat += sizeof(dex_file_location_size);if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " truncated after dex file location size";return false;}const char* dex_file_location_data = reinterpret_cast<const char*>(oat);oat += dex_file_location_size;if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " with truncated dex file location";return false;}std::string dex_file_location(dex_file_location_data, dex_file_location_size);uint32_t dex_file_checksum = *reinterpret_cast<const uint32_t*>(oat);oat += sizeof(dex_file_checksum);if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " truncated after dex file checksum";return false;}uint32_t dex_file_offset = *reinterpret_cast<const uint32_t*>(oat);if (dex_file_offset == 0U) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " with zero dex file offset";return false;}if (dex_file_offset > Size()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " with dex file offset" << dex_file_offset << " > " << Size();return false;}oat += sizeof(dex_file_offset);if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " truncated after dex file offset";return false;}......}

最开始，根据前面得到的Dex文件数进行遍历。为了方便理解，我们还是结合实际文件中的值来看：

首先得到的是dex文件位置字符串的大小，这里是0x21，就是有33个字符。紧接着的就是具体的dex文件的位置信息，这里是”/system/framework/core-libart.jar“，可以数数，刚好是33个字符。再下面4个字节是对应此dex文件的checksum。再下面记录的是dex文件相对oat头地址的偏移，此例中是0xD1B0，由于这是相对于oat头位置的偏移，加上oat头相对于elf头的0x1000偏移，所以应该在0xE1B0处找，我们看看那有什么：

看到没有，是不是非常熟悉呢？众里寻他千百度，暮然回首，那人却在灯火阑珊处，这里就是转换前的dex文件所在地呀。好了，接着分析剩下的代码：

    const uint8_t* dex_file_pointer = Begin() + dex_file_offset;if (!DexFile::IsMagicValid(dex_file_pointer)) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " with invalid dex file magic: " << dex_file_pointer;return false;}if (!DexFile::IsVersionValid(dex_file_pointer)) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " with invalid dex file version: " << dex_file_pointer;return false;}const DexFile::Header* header = reinterpret_cast<const DexFile::Header*>(dex_file_pointer);const uint32_t* methods_offsets_pointer = reinterpret_cast<const uint32_t*>(oat);oat += (sizeof(*methods_offsets_pointer) * header->class_defs_size_);if (oat > End()) {LOG(ERROR) << "In oat file " << GetLocation() << " found OatDexFile # " << i<< " for "<< dex_file_location<< " with truncated method offsets";return false;}oat_dex_files_.Put(dex_file_location, new OatDexFile(this,dex_file_location,dex_file_checksum,dex_file_pointer,methods_offsets_pointer));}

此时，局部变量dex_file_pointer已经指向了dex文件的位置。代码会先验证一下该dex文件的magic code和版本号。接着局部变量header被强制类型转换成DexFile::Header类型，表示dex的文件头位置。而另一个局部变量methods_offsets_pointer指向了前面dex文件偏移的后4个字节，局部变量名的意思是指向一个什么方法偏移指针数组，而且这个数组成员的个数，似乎还跟dex文件头中的class_defs_size_有联系。很奇怪是吧，这是什么东西呢？那么 class_defs_size_代表什么意思？熟悉dex文件格式的人应该都知道，这里稍微提一下，它其实就表示dex文件中共包含了几个类。再结合变量名联想一下，猜测这个指针指向的是一个数组，元素个数就是dex文件中定义的类的个数，数组内的元素都是一些偏移，指向的是一组方法，这些方法我猜测就对应的是各个类内部定义的方法。空口无凭，我们来简单验证一下，首先看看对应的dex文件中有几个类：

为什么在这，我不解释。好，一共有0x853个类。而methos_offsets_pointer_指向的值是0x106D。好，我们在简单计算一下0x106D+4*0x853=0x31B9，这个位置有什么呢，我们再看看：

看见了没有，刚好是表示下一个dex文件信息的开头。那数组中具体每个偏移指到的是什么呢？先别忙，我们再接下来分析。OatFile::Setup还剩下最后一部分：

    oat_dex_files_.Put(dex_file_location, new OatDexFile(this,dex_file_location,dex_file_checksum,dex_file_pointer,methods_offsets_pointer));}return true;
}

往自己的内部变量oat_dex_files_中插入一项，插入的是什么呢？创建了一个OatDexFile的对象，这又是什么？从中能不能知道前面问题的答案呢，让我们来看看OatFile::OatDexFile::GetOatClass函数：

  const OatFile::OatClass* OatFile::OatDexFile::GetOatClass(uint16_t class_def_index) const {uint32_t oat_class_offset = oat_class_offsets_pointer_[class_def_index];const byte* oat_class_pointer = oat_file_->Begin() + oat_class_offset;CHECK_LT(oat_class_pointer, oat_file_->End()) << oat_file_->GetLocation();mirror::Class::Status status = *reinterpret_cast<const mirror::Class::Status*>(oat_class_pointer);const byte* methods_pointer = oat_class_pointer + sizeof(status);CHECK_LT(methods_pointer, oat_file_->End()) << oat_file_->GetLocation();return new OatClass(oat_file_,status,reinterpret_cast<const OatMethodOffsets*>(methods_pointer));
}

这个函数是用来在Oat文件中找所谓OatClass的。其中oat_class_offsets_pointer_是前面提到的那个偏移数组，从中我们大致可以了解到底这些偏移指向的东西是什么。首先，根据传入的类定义序号找到相应的偏移，然后加上oat文件头地址，得到绝对地址，付给变量oat_class_pointer，从字面来看是所谓的oat类指针，这又是什么呢？接着强制转换成mirror::Class::Status，这是一个枚举值：

  enum Status {kStatusError = -1,kStatusNotReady = 0,kStatusIdx = 1,  // Loaded, DEX idx in super_class_type_idx_ and interfaces_type_idx_.kStatusLoaded = 2,  // DEX idx values resolved.kStatusResolved = 3,  // Part of linking.kStatusVerifying = 4,  // In the process of being verified.kStatusRetryVerificationAtRuntime = 5,  // Compile time verification failed, retry at runtime.kStatusVerifyingAtRuntime = 6,  // Retrying verification at runtime.kStatusVerified = 7,  // Logically part of linking; done pre-init.kStatusInitializing = 8,  // Class init in progress.kStatusInitialized = 9,  // Ready to go.};

这些应该表明该类的当前状态。除去 mirror::Class::Status枚举的长度后，接下来指向的是OatMethodOffsets结构体数组，定义如下：

class PACKED(4) OatMethodOffsets {public:......uint32_t code_offset_;uint32_t frame_size_in_bytes_;uint32_t core_spill_mask_;uint32_t fp_spill_mask_;uint32_t mapping_table_offset_;uint32_t vmap_table_offset_;uint32_t gc_map_offset_;
};

共由7个变量组成，第一个从字面上看是代码偏移。

我们还是顺着刚才的例子，找第一个偏移看看那到底有什么：

哦，看来该类应该已经初始化完成了（状态值为9），第一个方法的代码段偏移是0x18CE045，第二个方法的代码段偏移是0x18CE0D4。
好了，分析到这里应该已经差不多了，下面简单总结一下什么是oat文件，以及它的一些特性：

1）oat文件其实是包含在一个elf文件中的，符号oatdata和oatlastword分别指定了oat文件在elf文件中的头和尾的位置，符号oatexec指向可执行段的位置；

2）对于包含oat的elf文件来说，如果是Boot Oat，则其是要被加载到一个固定的地址上的，具体来说是紧接着Image文件之后。而对于普通应用程序的oat文件来说，可以被加载到内存中的任何位置；

3）oat文件有自己的头和格式，并且其内部包含了一个完整的dex文件。

Android ART Oat文件格式简析（下）相关推荐

Android ART Oat文件格式简析（上）
前面写了一篇博客大致描述了一下Image文件的结构,本文将接下来简单描述一下Oat文件的大致结构. 和前面一样,还是来看一下代码,代码非常复杂,为了保证大家不分心,我会尽量去除一些冗余的部分,只留下主 ...
android中so文件格式详解,[原创]一 Android ELF系列:ELF文件格式简析到linker的链接so文件原理分析...
Android ELF系列:ELF文件格式简析和linker的链接so文件原理分析 Android ELF系列:实现一个so文件加载器 Android ELF系列:手写一个so文件(包含两个导出函数) ...
MIDI二进制文件格式简析
MIDI二进制文件格式简析本文主要参考自Official MIDI Specifications Chunks 每个MIDI文件由一系列chunk组成,每个chunk的前四个字节为魔数(magic ...
Android开机启动流程简析
Android开机启动流程简析 (一) 文章目录 Android开机启动流程简析 (一) 前言一.开机启动的流程概述二.Android的启动过程分析 (1).总体流程 init简述 Zygote简 ...
Android热修复-Tinker简析
一.简介日常工作工作中难免会遇到项目上线后出现bug问题,如果紧急发版往往由于渠道审核时间问题,导致bug修复不及时,影响用户体验.这时我们需要引入热修复,免去发版审核烦恼. 热更新优势: 让应用能 ...
常见图片文件格式简析
"常见":此处指BMP JPEG GIF PNG 四种. 软件: Windows 画图(除了Photoshop,我最喜欢的编辑器,简单粗暴) HxD BMP BMP文件分为4部分: ...
Android短信数据库简析
如果想跳过数据库介绍,直接看数据库操作代码的话,请点击这里: 读取Android短信 -------------– Android短信数据库: 读取Android系统所有短信读取Android短信会 ...
android之descendantFocusability用法简析
2019独角兽企业重金招聘Python工程师标准>>> listView的Item被抢焦点,这是开发中很常见的一个问题,项目中的listview不仅仅是简单的文字,常常需要自己定义l ...
android 系统的组成,简析Android 的GUI 系统组成
GUI是图形用户界面,是Graphical User Interface的缩写,又称图形用户接口,其是指采用图形方式显示的计算机操作用户界面.与早期计算机使用的命令行界面相比,图形界面对于用户来说在视 ...

Android ART Oat文件格式简析（下）

Android ART Oat文件格式简析（下）相关推荐

最新文章

热门文章