1。熟悉scATAC-seq的数据的基本格式


ATAC-seq数据,用于测序的reads如图所示。
按照它的参考文件中的描述,一共有4条测序的reads。
(1)双端测序的Read 1N和Read 2N,这部分序列就是插入的序列。==>分别对应的是R1和R3的信息
(2)index序列:8bp ==>对应的是I1的信息
(3)10X barcode & spacer:16bp + 8bp =24bp ==>对应的是R2的信息

我们从序列的角度看看是怎样的情况:

(base) [xxzhang@mu02 V1C]$ head ATAC_S1_20210413NA_S2_L002_R1_001.fastq
@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG
GNGAGAGAGAGAGACCAATGGATCAGGTTCATTTAGCACCTGAAAGGGGGTGGTGTTTGGGGACAGAGAGACCTTTGGAGTTCCAGCTTAAGGGTATCAGCCTCCCTGGCTGATGTAAGTCAGAGGCCTCTTATACCCACTTTGATGAGGA
+
F#FFF,FFF,FFF:FFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00928:207:HYLCHDSXY:2:1101:1452:1031 1:N:0:TTCTACAG
CNCCATGGCAGCAACTGGCTGCTGAGTCAGATCCATGCCCACAGAAGCTCATCTTGAGCCCTTGGGTGCCTGATTTACATTCAGGAAGACATTGGCATTAGGGACTGTCTCTTATACACATCTCCGAGCCCACGAGACTTCTACAGATCGC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,,
#R1的长度:151bp(base) [xxzhang@mu02 V1C]$ head ATAC_S1_20210413NA_S2_L002_R2_001.fastq
@A00928:207:HYLCHDSXY:2:1101:1072:1031 2:N:0:TTCTACAG
CAGACGCGATAACGGTGGTGCTGT
+
FFFFFFFFFFFFFFFFFFFFFFFF
@A00928:207:HYLCHDSXY:2:1101:1452:1031 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAA
+
FFFFFFFFFFFFFFFFFF:FFFFF
#R2的长度:24bp(base) [xxzhang@mu02 V1C]$ head ATAC_S1_20210413NA_S2_L002_R3_001.fastq
@A00928:207:HYLCHDSXY:2:1101:1072:1031 3:N:0:TTCTACAG
GGTAGGTTCCAGTATCTGAGGGGTGGACGTCAGTCAGTTTCAGTGTGGCCACCCCCACTGTGGGGGGGTTCTGAAGCAGGCTGACCCGCTTTGACTTAGAACCAGTTGGATACAGATGGCCATTGGTGAAGTACAGGATCTGGGGAGAGGC
+
FFFFFFF:FFFFF:FFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFF::FFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F
@A00928:207:HYLCHDSXY:2:1101:1452:1031 3:N:0:TTCTACAG
TCCCTAATGCCAATGTCTTCCTGAATGTAAATCAGGCACCCAAGGGCTCAAGATGAGCTTCTGTGGGCATGGATCTGACTCAGCAGCCAGTTGCTGCCATGGAGCTGTCTCTTATACACATCTGACGCTGCCGACGACAGACGCGTACTTG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#R3的长度:151bp(base) [xxzhang@mu02 V1C]$ head ATAC_S1_20210413NA_S2_L002_I1_001.fastq
@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG
TTCTACAG
+
FFFFFFFF
@A00928:207:HYLCHDSXY:2:1101:1452:1031 1:N:0:TTCTACAG
TTCTACAG
+
FFFFFFFF
@A00928:207:HYLCHDSXY:2:1101:1470:1031 1:N:0:AGACTTTC
AGACTTTC
#I1的长度:8bp

从这个方面看,弄明白了。
那么我还是有一些问题,对于这部分的reads的信息。
1。看起来这几条reads是分开的存放在不同的文件中的(barcode和insert sequence区分开的)。那么怎样确定insert的两条reads来自于同一个细胞呢?
答:目前猜测是通过开头的这一串@。现在开始验证:

(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG" ATAC_S1_20210413NA_S2_L002_R1_001.fastq ATAC_S1_20210413NA_S2_L002_R2_001.fastq ATAC_S1_20210413NA_S2_L002_R3_001.fastq ATAC_S1_20210413NA_S2_L002_I1_001.fastq
ATAC_S1_20210413NA_S2_L002_R1_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG
ATAC_S1_20210413NA_S2_L002_I1_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG
#R2/R3会有所不同。分别是:2:N:0:TTCTACAG/3:N:0:TTCTACAG
######################################################################################################(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1101:1072:1031"
ATAC_S1_20210413NA_S2_L002_R1_001.fastq
ATAC_S1_20210413NA_S2_L002_R2_001.fastq
ATAC_S1_20210413NA_S2_L002_R3_001.fastq
ATAC_S1_20210413NA_S2_L002_I1_001.fastq                                                                                                                                                                ATAC_S1_20210413NA_S2_L002_R1_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG
ATAC_S1_20210413NA_S2_L002_R2_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 2:N:0:TTCTACAG
ATAC_S1_20210413NA_S2_L002_R3_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 3:N:0:TTCTACAG
ATAC_S1_20210413NA_S2_L002_I1_001.fastq:@A00928:207:HYLCHDSXY:2:1101:1072:1031 1:N:0:TTCTACAG#测试表明:是通过前端的这部分序列来标记同一来源的reads的。那现在其实拆分起来也并不是特别难了。

2。ATAC-seq原理

非紧急重要,明天再来弄这个事情。我现在想留到九点半(虽然不一定会遇到学姐他们,今天这么囧,还是不要遇到了好了)
我现在要整理文献整理文献整理文献。我这周末之前必须理解明白!!我要看文献看文献看文献!

现在用自己的语言来概括一下,自己理解到的ATAC-seq的测序的原理。在我看来,这种测序技术就是把某一状态下,利用转座酶的转座的特性,把细胞核中所有处于开放状态下的sequence全部抓下来。而我们所抓下来的这部分sequence可能与细胞的生物学特征相关。比如特定基因的正在表达。这部分序列从上面这张图上看,可以是贯穿一个核小体两边的序列,也可以是两个核小体间的序列。
根据我们已有的关于核小体的基本知识。

核小体的基本结构:
(1)组蛋白八聚体,物种间高度保守:H2A,H2B,H3,H4(同型上下对齐)
(2)146bp的DNA片段,围绕组蛋白颗粒1.75圈
(3)组蛋白H1,不太保守,和20bp作为的DNA片段结合,锁住序列的进出口位点
(4)组蛋白与组蛋白之间是连接DNA,一般为60bp,不同物种这部分DNA的长度在0~80bp之间。

那么,我的问题也就来了,我们测序得到的那151bp是什么?仅仅是大约146bp的一个染色体的长度么?


这张图实际上就展示了,文库构建的过程,但是我有点不太理解,这个绿色&红色的片段是什么?以及为什么会在图C的上方表现为拆分的两段?
那么我需要去了解Tn5转座酶的作用的原理。
参考链接:https://max.book118.com/html/2018/1008/7112135201001151.shtm
那个红色和绿色的部分是我们的接头(就是最上面图中的Read 1N,Read 2N序列,这部分的序列有利于转座酶结合),并将这部分序列随机的接到我们染色质的开放区域。但是红色和绿色的位置会自己断开嘛?
===>为了“偶遇”,还是坚持到晚上十点吧。呜呜呜呜……
我还是不理解。

这张图虽然展示了填补的过程,但是却没有解释绿色和橙色断开的那部分如何解释?(或者我觉得它画得并不是特别的标准)明天和师姐讨论一下。

总结:总的来说,我们这次测到的是开放区域的DNA的序列(可能有调控区,也有基因的转录区,包括核小体太松弛脱下来了都不一定),反正通过这种技术可以间接的检测到细胞内的染色质的开放状态,从而提取开放序列。
==>必要的话,可以明天和师姐讨论一下。

  • 主要是不太明白,这红色,绿色是怎么断掉的?
  • 以及这151bp究竟测得是什么?真的是146bp的的核小体吗?

3。策略一:所有的reads参与比对

(base) [xxzhang@cu05 ~]$ cd workplace/RepeatAnnoation/
(base) [xxzhang@cu05 RepeatAnnoation]$ bowtie2 -q -p 32 --local  --no-unal -x HumanRepeat -1 ./fastq/V1C/ATAC_S1_20210413NA_S2_L002_R1_001.fastq -2 ./fastq/V1C/ATAC_S1_20210413NA_S2_L002_R3_001.fastq -S result.sam

这样比对虽然能够得到全部reads的比对的结果,但是会丧失掉reads来源于哪一个细胞的信息。不过倒是可以统计到这部分reads有多少比对到我们感兴趣的重复序列的片段上。

比对的结果:

691467535 reads; of these:
691467535 (100.00%) were paired; of these:
669605746 (96.84%) aligned concordantly 0 times
6590225 (0.95%) aligned concordantly exactly 1 time
15271564 (2.21%) aligned concordantly >1 times
----
669605746 pairs aligned concordantly 0 times; of these:
1164262 (0.17%) aligned discordantly 1 time
----
668441484 pairs aligned 0 times concordantly or discordantly; of these:
1336882968 mates make up the pairs; of these:
1323335466 (98.99%) aligned 0 times
2941959 (0.22%) aligned exactly 1 time
10605543 (0.79%) aligned >1 times
4.31% overall alignment rate

目前的结果是这样的。

#基因前面的那一段比对到了AluYb1这个subfamily?hg38_rmsk_AluY  0       AluYb1  1       2       84M10I2M2I28M12I136M2D17M7S     *       0       0       GGCCAGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCTGATCGTGAGGTCAGGAGATTGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAAAAAAAAAAAAATTAGCCGGGCATGGTGGTGGGCACCTTTAGTCCCAGCTACTCGGGAGGCTGAGATAGGAGAATGGCGTGAACCCGGGAGGCGGAGGTTGCAGTAAGCCGAGATCGCGCTGCTGCACTCCAGCCTGGGCGACAGCGAGACTCCATCTCAAAAAAAA   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:308     XS:i:289        XN:i:0  XM:i:16 XO:i:4  XG:i:26 NM:i:42 MD:Z:4G52G4A0C13C51G6C4G3G26G0C30C7G14C0A21^AG11G5      YT:Z:UU#从我们初步的结果看:也能够找到比对到AluYb1这个subfamily的reads?而且我们的reads一共是有151bp,
A00928:207:HYLCHDSXY:2:1102:3821:10488  83      AluYb1  99      2       8M1I69M1I72M    =       91      -175    TCTCTACTAAAAAATACAAAAATTAGCTGGGCGTGGTGGCACGTGCCTGTGATCCCACTTACTTGGGAGGCTGAGGTAGGGAGAACTGCTTGAACCCGGGAGGTGGAGGTTGCAGTGAGTGGAGATCGTGTCACTGCACTCCAGCCTGGGC      FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:150        XS:i:127        XN:i:0  XM:i:19 XO:i:2       XG:i:2  NM:i:21 MD:Z:26C12G0G1C6A0G5G0C4C12C7T0G2G13C4C10C0C7C1C20      YS:i:115        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:3821:10488  163     AluYb1  91      2       18S16M1I69M1I40M6S      =       99      175     AGCCTGGCCAACATGGCAAAACCACATCTCTACTAAAAAATACAAAAATTAGCTGGGCGTGGTGGCACGTGCCTGTGATCCCACTTACTTGGGAGGCTGAGGTAGGGAGAACTGCTTGAACCCGGGAGGTGGAGGTTGCAGTGAGTGGAGA      FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF: AS:i:115        XS:i:92 XN:i:0  XM:i:17 XO:i:2       XG:i:2  NM:i:19 MD:Z:5C1G26C12G0G1C6A0G5G0C4C12C7T0G2G13C4C10   YS:i:150        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:5014:10488  83      AluYb1  94      28      2S149M  =       94      152     GGCCCCATCTCTACTAAAAATACAAAAATTAGCTAGGCGTGGTGGCATGGGCCTGCAATCCCAGCTACTCAGGAGGCTGAGGCAGAAGAATTGCTTGAACCTGGGAGATGGAGGTTGCAGTGAGCCGAGACCGTACCACTGTACTCCAGCC      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:160        XN:i:0  XM:i:20 XO:i:0  XG:i:0  NM:i:20 MD:Z:4G26C0G11G0G1C5T1G12G14G5G2G6C5G0C4C16T2C0G6C9  YS:i:158        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:5014:10488  163     AluYb1  94      28      1S149M1S        =       94      -152    GCCCCATCTCTACTAAAAATACAAAAATTAGCTAGGCGTGGTGGCATGGGCCTGCAATCCCAGCTACTCAGGAGGCTGAGGCAGAAGAATTGCTTGAACCTGGGAGATGGAGGTTGCAGTGAGCCGAGACCGTACCACTGTACTCCAGCCC      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:158        XN:i:0  XM:i:20 XO:i:0  XG:i:0  NM:i:20      MD:Z:4G26C0G11G0G1C5T1G12G14G5G2G6C5G0C4C16T2C0G6C9     YS:i:160        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:31376:10488 73      AluYb1  228     0       18S42M91S       =       228     0       GTCCCTCTGCCAGCACCACACAGCACTCCAGCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAGCTTTGTAATAAAATTTGCAATCCCGTAGAATGTTTTCTTCCACGTTATTTATTTAGTTTTTGGTAAAAATGGGAT      F,FFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFF,FFFFFF,FFFFFFFFFFFFFF:FFFFF,F,F,F:FFFFFFF,FFFF:FF:,,F:F,,:F:FF,FF,F,,F,,,:FF,F,F,FFF,,FF:F,F:FFF,F,:F:,F AS:i:70 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:3T23C14 YT:Z:UP
A00928:207:HYLCHDSXY:2:1102:26223:10488 73      AluYb1  15      2       48M2I10M3I4M1D4M7I47M26S        =       15      0       GCTCACACCTGTAATCCCAGCATTTTGGGAGGCCAAGGCGGGTGGATCGTTTGAGGTTAGTTCAAGACCAGCCTGGCTAACATGGCGAAACCCCATCTCTACTAAAAATACAAAAATTAGCTGGGTGGTAGTGGCAAACGCCCCTGTAATC      FFFFFFFFFFFFF:FFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF AS:i:71 XN:i:0  XM:i:14 XO:i:4  XG:i:13      NM:i:27 MD:Z:6G15C11G7C5A0C5C2G3^T1G3C2C2T8G26C3        YT:Z:UP
A00928:207:HYLCHDSXY:2:1102:15049:10488 83      AluYb1  97      2       3S105M2D2M1D41M =       90      -170    CCTCGTCTCTACCAAAAATATAAAAATTAGCCGGGCGTGGTGGCGCGGGCCTGTGGTTCCAGCTACTCGGGAGGCTGAGGCAGGAAAATTCCTTGAACCCGGGAGGAGGCTGCAGTGAGCCAAGACCACGCCACTGCACTCCAACCTGGGC      FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:172        XS:i:171        XN:i:0  XM:i:15 XO:i:2       XG:i:3  NM:i:18 MD:Z:9T7C24G1C6A2C27G3G0G1G13C1^GA2^T11G3T1G15G7        YS:i:154        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:15049:10488 163     AluYb1  90      2       12S112M2D2M1D25M        =       97      170     GGCCAATATGGCGAAACCTCGTCTCTACCAAAAATATAAAAATTAGCCGGGCGTGGTGGCGCGGGCCTGTGGTTCCAGCTACTCGGGAGGCTGAGGCAGGAAAATTCCTTGAACCCGGGAGGAGGCTGCAGTGAGCCAAGACCACGCCACT      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:154        XS:i:149        XN:i:0  XM:i:15      XO:i:2  XG:i:3  NM:i:18 MD:Z:6C9T7C24G1C6A2C27G3G0G1G13C1^GA2^T11G3T1G7 YS:i:172        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:12120:10488 99      AluYb1  88      2       27M2I122M       =       174     237     GTGAAACCCTGTCTCTACTAAAAACACAAAAAAATCAGCCGGGCATGGTGGCGGGCGTTTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCAGAATTTGCAGTAAGCCGAGATCAAGCCACTGCACT      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFF:FFF::FFFFFFF,FF,:FFFFFFFFFFFFF:FFFFFF:FFFFFF:FFFFF:FFFFF:FFFFFFFFFFFFF:FF:FFFF:FF:FFFFFFFFF, AS:i:192        XS:i:183        XN:i:0  XM:i:14 XO:i:1       XG:i:2  NM:i:16 MD:Z:9C14T8T8G12C0C16C32G6G2G0C7G10G0C11        YS:i:121        YT:Z:CP
A00928:207:HYLCHDSXY:2:1102:12120:10488 147     AluYb1  174     2       90M61S  =       88      -237    CAGGAGAATGGCGTGAACCCCGGAGGCAGAATTTGCAGTAAGCCGAGATCAAGCCACTGCACTCTAGCCTAGGCGACAGAGCGAGACTCCATCACAAAAAAAAAAGGAAAAGAAAAAAAGAAAGGCTGGGGGCAGTGGCTCACACCTGTAA      ::F:,,:,,FF,:F:FFFF:,FF,F::F:FFFF:FF,FFF::FFF:,FF,F,:F:FFFFFFF,FFFFF:FFFFF:FF:,,FF,,:F:,FFF,FF,FFFFFFFFFFFFF:,,FF:FFF:F:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF AS:i:121        XS:i:121        XN:i:0  XM:i:9  XO:i:0  XG:i:

参考链接:https://genome.sph.umich.edu/wiki/SAM
得到了初步的结果:
(1)能够找到比对到我们感兴趣的subfamily的reads。

grep "AluYb1" result.samA00928:207:HYLCHDSXY:2:1101:8621:13228  163     AluYb1  15      2       50M2I21M11I1M1I28M1I36M =       87      197     GCTCACGCCTGTAGTCCCAGCACTTTGGGAGGCTGAGGCCGGTGGATCACCTGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACCTGGTGAAACCCTGTCTCTACTAAAAATACAAAAAATTAGCCGGCTGTGGTGATGGGCGCCTGTAGT      FFFFFFFFFFFFFFFFFFFFFF:::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFF:FFFF::FFFFFFFFFFFFFFFF,FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:130        XS:i:126        XN:i:0  XM:i:11      XO:i:4  XG:i:15 NM:i:26 MD:Z:13A19C5G2C18A1C18C30G0C6G0C13      YS:i:199        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:3043:13244  99      AluYb1  93      2       22M1I52M76S     =       93      -226    ACCCCATCTCTACTAAAAATACAAAAAATTAGCTGGGCGTGGTGGCACGCGCCTGTAGTCCCAGGTACTCGGGAGCTGTCTCTTATACACATCTCCGAGCCCACGAGACGATGCAGTATCTCGTATGCCGTCTTCTGCTTGAAAAGGGGGG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FF,FFF,F::FFF:F:::F:,,,,F,F,,:,FF AS:i:105        XS:i:102        XN:i:0  XM:i:5  XO:i:1       XG:i:1  NM:i:6  MD:Z:5G26C12G0G16C10    YS:i:105        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:3043:13244  147     AluYb1  93      2       76S22M1I52M     =       93      226     CGACCACCGAGATCTACACTTATGTGGTACACCGCCGCGTCTGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACCCCATCTCTACTAAAAATACAAAAAATTAGCTGGGCGTGGTGGCACGCGCCTGTAGTCCCAGGTACTCGGGAG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:105        XS:i:102        XN:i:0  XM:i:5  XO:i:1       XG:i:1  NM:i:6  MD:Z:5G26C12G0G16C10    YS:i:105        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:7636:13244  83      AluYb1  116     14      75M1D59M1I16M   =       88      -179    AAAATTAGCCGGGCATGGTGGTGCGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCTTGAACCAGGAGGCAGAGGTTGCAGTGAGCCAAGATCGCGCCACTGCACTCCAGCCTGGGCGACAAGAGTGAGACTGCGTC      FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:201        XS:i:186        XN:i:0  XM:i:12 XO:i:2       XG:i:2  NM:i:14 MD:Z:14G6C1G1C7G36G4^C2G6G3C12G36C6C4   YS:i:217        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:7636:13244  163     AluYb1  88      14      103M1D48M       =       116     179     GTGAAACCCCATCTCTACTAAAAATACAAAAATTAGCCGGGCATGGTGGTGCGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCTTGAACCAGGAGGCAGAGGTTGCAGTGAGCCAAGATCGCGCCACTGCACTCCA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:217        XS:i:194        XN:i:0  XM:i:11 XO:i:1       XG:i:1  NM:i:12 MD:Z:10G31G6C1G1C7G36G4^C2G6G3C12G21    YS:i:201        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:29062:13213 99      AluYb1  134     2       2S55M1I81M12S   =       156     174     GTTGGCGTGCACCTGTAATCCCAGCTACTTAGGAGGCTGAGGCAGGAGAATCGCTTGAAAGCCGGCAGGTGGAGGTTGCAGTGAGCCGAGATCATGCCACTGCACTCCAGCCTGGGCAACAGAGCAAGACTCTGTCTCAAAAAAAAAAAAA      FFFFFFFFFFFFFFFFF:FFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFF:FFFFF AS:i:155        XS:i:147        XN:i:0  XM:i:16 XO:i:1       XG:i:1  NM:i:17 MD:Z:5G2G6G11C0G20G2G4C4G3C4C18G0C22G7G6C6      YS:i:133        YT:Z:CP
A00928:207:HYLCHDSXY:2:1101:29062:13213 147     AluYb1  156     2       33M1I81M36S     =       134     -174    CTACTTAGGAGGCTGAGGCAGGAGAATCGCTTGAAAGCCGGCAGGTGGAGGTTGCAGTGAGCCGAGATCATGCCACTGCACTCCAGCCTGGGCAACAGAGCAAGACTCTGTCTCAAAAAAAAAAA

(2)老师想要进一步去对每一个subfamily的reads进行计数,但是我觉得由于PCR扩增现象的存在,可能会出现PCR扩增的偏好性,那么这个时候对其计数就不够准确了。

  • 验证PCR扩增偏好性的存在。即R2的值相同(barcode+Spacer)但是存在多条reads。
grep "CAGACGCGATAACGGTGGTGCTGT" -B 1 ATAC_S1_20210413NA_S2_L002_R2_001.fastq #输出结果:
#从结果上看,其实这个barcode+spacer,有许多条,也就是说并不唯一
@A00928:207:HYLCHDSXY:2:1133:32145:24533 2:N:0:GATGCAGT
CAGACGCGATAACGGTGGTGCTGT
--
@A00928:207:HYLCHDSXY:2:1133:14100:24596 2:N:0:CCGAGGCA
CAGACGCGATAACGGTGGTGCTGT
--
@A00928:207:HYLCHDSXY:2:1133:25997:24721 2:N:0:TTCTACAG
CAGACGCGATAACGGTGGTGCTGT
--
@A00928:207:HYLCHDSXY:2:1133:25120:24799 2:N:0:CCGAGGCA
CAGACGCGATAACGGTGGTGCTGT
--
@A00928:207:HYLCHDSXY:2:1133:12581:25128 2:N:0:CCGAGGCA
CAGACGCGATAACGGTGGTGCTGT
--
@A00928:207:HYLCHDSXY:2:1133:19623:25645 2:N:0:CCGAGGCA
CAGACGCGATAACGGTGGTGCTGT############################
grep "CAGACGCGATAACGGTGGTGCTGT" -B 1 ATAC_S1_20210413NA_S2_L002_R2_001.fastq | wc -l
428522 #一共有428522条 #如果是PCR扩增产生的,要删掉那么多条,我有点不信grep "CAGACGCGTACTTGCACGATAGAA" -B 1 ATAC_S1_20210413NA_S2_L002_R2_001.fastq#输出结果:
@A00928:207:HYLCHDSXY:2:1115:29026:24612 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:9923:25066 2:N:0:GATGCAGT
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:25093:25316 2:N:0:CCGAGGCA
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:7346:25363 2:N:0:GATGCAGT
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:1398:27618 2:N:0:GATGCAGT
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:8919:29653 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAAgrep "CAGACGCGTACTTGCACGATAGAA" -B 1 ATAC_S1_20210413NA_S2_L002_R2_001.fastq |wc -l
#输出结果:
207449#刚刚和学姐讨论之后,我明白了。这个barcode+Spacer标记的就是cell,所以PCR偏好性可以不考虑。
  • 另外一点想要确认的是同一个cell中,是否存在重复的reads?
(base) [xxzhang@mu02 V1C]$ grep "CTCCTCTGCTCCTCTTAATCCCTACACCCGTCATTCCAGAAAGGCATTTATCTGCCTGCTGGGAGCCCTTAACTGAGGGTACATTAGCATCCTCCTTTTTATTAACTTTGTCCAAAAATTGAGCTTTTCCCTTTAAGAAACCCTAGCACAG" -B 5 ATAC_S1_20210413NA_S2_L002_R1_001.fastq#########################################################
#存在重复序列 #但是reads的ID不一样@A00928:207:HYLCHDSXY:2:1104:13937:10003 1:N:0:CCGAGGCA
CTCCTCTGCTCCTCTTAATCCCTACACCCGTCATTCCAGAAAGGCATTTATCTGCCTGCTGGGAGCCCTTAACTGAGGGTACATTAGCATCCTCCTTTTTATTAACTTTGTCCAAAAATTGAGCTTTTCCCTTTAAGAAACCCTAGCACAG
--
@A00928:207:HYLCHDSXY:2:1115:9923:25066 1:N:0:GATGCAGT
CTCCTCTGCTCCTCTTAATCCCTACACCCGTCATTCCAGAAAGGCATTTATCTGCCTGCTGGGAGCCCTTAACTGAGGGTACATTAGCATCCTCCTTTTTATTAACTTTGTCCAAAAATTGAGCTTTTCCCTTTAAGAAACCCTAGCACAG#########################################################
#检查R3是否一致? #由此可见,如果是重复序列的话,R1和R3的同一条readsID对应的序列与另一个ID对应的序列相同。(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1104:13937:10003" -A 5 ATAC_S1_20210413NA_S2_L002_R3_001.fastq
@A00928:207:HYLCHDSXY:2:1104:13937:10003 3:N:0:CCGAGGCA
TCCCTGGACTTGGGCAGAAAGAGGTAGATCTGATGTTCTGTGCTAGGGTTTCTTAAAGGGAAAAGCTCAATTTTTGGACAAAGTTAATAAAAAGGAGGATGCTAATGTACCCTCAGTTAAGGGCTCCCAGCAGGCAGATAAATGCCTTTCT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFF,FFF(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:9923:25066" -A 5 ATAC_S1_20210413NA_S2_L002_R3_001.fastq
@A00928:207:HYLCHDSXY:2:1115:9923:25066 3:N:0:GATGCAGT
TCCCTGGACTTGGGCAGAAAGAGGTAGATCTGATGTTCTGTGCTAGGGTTTCTTAAAGGGAAAAGCTCAATTTTTGGACAAAGTTAATAAAAAGGAGGATGCTAATGTACCCTCAGTTAAGGGCTCCCAGCAGGCAGATAAATGCCTTTCT
+
F,FF,F:FFFFF:FFFFFFFFFFFF:FFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF这里怎么去除重复序列呢?
(1)先将同一个细胞内的reads取出来?整理成两个文档?
(2)然后再reads与reads之间比较,如果重复就保留一条?
(3)最后这些非冗余的再去计数?还是?
(1)先将同一个细胞内的reads取出来?整理成两个文档?
(2)然后去与参考序列进行比对?
(3)然后再对这个结果进行去冗余(如果是相同的reads,我想比对的结果会完全一致)==>也可以回到原始的reads序列中进行验证#目前我想,也许方法二更加简单可行一些。
#先用小批量的数据看一下!#等一下!我好像index的数据没有看!会不会是通过index来唯一标记reads呢?
#果然,我好像漏掉了一个非常重要的信息。
#这两条我认为的重复的reads,好像index不同。所以他们到底是什么序列呢?是由于PCR扩增的过程中产生的reads嘛?
#index 是桥式扩增之前添加的。
#所以现在的推论是如果是PCR扩增过程中的重复序列,那么index和cell barcode应该是相同的。
#我们又回到了R1的序列grep "CAGACGCGTACTTGCACGATAGAA" -B 1 ATAC_S1_20210413NA_S2_L002_R2_001.fastq@A00928:207:HYLCHDSXY:2:1115:12264:10332 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:15673:11068 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAA
--
@A00928:207:HYLCHDSXY:2:1115:27751:12727 2:N:0:TTCTACAG
CAGACGCGTACTTGCACGATAGAA(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:12264:10332" -A 1 ATAC_S1_20210413NA_S2_L002_R1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:12264:10332 1:N:0:TTCTACAG
GTCCAGTGTCTTCCTGAAACTGAGGCAACATAGATTATAGGGCTTTGAAGTTATTATGAGTTTAAATCCTGACTCTGACACTAACTCTATGACCTCAAGCAACTCTCCAACTTCAGTTTACTTATCTGTAAAATGGAGAGTAAAAATCATC(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:15673:11068" -A 1 ATAC_S1_20210413NA_S2_L002_R1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:15673:11068 1:N:0:TTCTACAG
TGCTATGAACAAGCATATCAGCTGTGACTCTCTGTATCAGCCTGTCTCTCCAGATTTGGAGATGGCAATTTGCCTCATAATGTCATCTCATTGACAGATTCAACAAAAGTCATTGATTTTCCTTTTGTCCATTTTTTTCTTGTTGCAAGAA(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:27751:12727" -A 1 ATAC_S1_20210413NA_S2_L002_R1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:27751:12727 1:N:0:TTCTACAG
CCACGAGCATCCGCCTCCCAAAGTGATGGGATTACAGGTTATAGGCGTGAGCCACCGCTTCCCGCGGTGTAGTATTAAGAGCTCAAGCTCTAAAGATATTTTGTATGTTTTATTTTATTTTTTTTTTTTTTTTTTTTTTTAAAATTTTGAT#又是和自己猜测的不一样。虽然index是相同的,但是R1的reads的序列却不太一样。
(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:12264:10332" -A 1 ATAC_S1_20210413NA_S2_L002_I1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:12264:10332 1:N:0:TTCTACAG
TTCTACAG
(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:15673:11068" -A 1 ATAC_S1_20210413NA_S2_L002_I1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:15673:11068 1:N:0:TTCTACAG
TTCTACAG
(base) [xxzhang@mu02 V1C]$ grep "@A00928:207:HYLCHDSXY:2:1115:27751:12727" -A 1 ATAC_S1_20210413NA_S2_L002_I1_001.fastq
@A00928:207:HYLCHDSXY:2:1115:27751:12727 1:N:0:TTCTACAG
TTCTACAG

所以现在就遇到了挺棘手的一种情况:
(1)index 相同,序列不同
(2)序列相同,index不同
(3)同一个barcode有几万条reads

(3)这个时候,我就想反思一下,老师为什么想要计数?(我突然想到,我们在比对的过程中,实际上是可以控制去除重复序列的)我觉得这里的计数,应该类似于我们做chip-seq的时候,对于peaks的查找。

cp -R /home/fastq/outs/ /home/xxzhang/workplace/RepeatAnnoation/

看网上的教材,是可以选择sambamba来去重复,序列完全一样,就是比对到同一条reads了吗?这是我们的基本的前提。

(4)去除重复。

samtools view -b result.sam result.bam[W::sam_hdr_create] Duplicated sequence 'MER61I'
[W::sam_hdr_create] Duplicated sequence 'MER77'
[W::sam_hdr_create] Duplicated sequence 'ORSL'
[W::sam_hdr_create] Duplicated sequence 'MER9'
[W::sam_hdr_create] Duplicated sequence 'THE1BR'
[W::sam_hdr_create] Duplicated sequence 'LINE2'
[W::sam_hdr_create] Duplicated sequence 'HERVK22I'
[W::sam_hdr_create] Duplicated sequence 'MamSINE1'
[E::sam_hrecs_update_hashes] Duplicate entry "MER61" in sam header
samtools view: failed to add PG line to the header

这里报错。


本来觉得一切顺利的时候,发现了一个了不起的错误:
>[W::sam_hdr_create] Duplicated sequence ‘MER61’
[W::sam_hdr_create] Duplicated sequence ‘ZOMBI_A’
[W::sam_hdr_create] Duplicated sequence ‘MARE2’
[W::sam_hdr_create] Duplicated sequence ‘MER21B’
[W::sam_hdr_create] Duplicated sequence ‘HERVK3I’
[W::sam_hdr_create] Duplicated sequence ‘THE1BR’
[W::sam_hdr_create] Duplicated sequence ‘L1MD_5’
[W::sam_hdr_create] Duplicated sequence ‘MER66I’
[W::sam_hdr_create] Duplicated sequence ‘L1PAXX_5’
[W::sam_hdr_create] Duplicated sequence ‘MER88’
[W::sam_hdr_create] Duplicated sequence ‘L1MB7’
[W::sam_hdr_create] Duplicated sequence ‘MLT1G2’
[W::sam_hdr_create] Duplicated sequence ‘L1MCA_5’
[W::sam_hdr_create] Duplicated sequence ‘ZOMBI_B’
[W::sam_hdr_create] Duplicated sequence ‘GOLEM_A’
[W::sam_hdr_create] Duplicated sequence ‘MER72’
[W::sam_hdr_create] Duplicated sequence ‘MLT1J’
[W::sam_hdr_create] Duplicated sequence ‘L1MB7’
[W::sam_hdr_create] Duplicated sequence ‘MER70I’
[W::sam_hdr_create] Duplicated sequence ‘MARE2’
[W::sam_hdr_create] Duplicated sequence ‘L1M6_5’
[W::sam_hdr_create] Duplicated sequence ‘MER68B’
[W::sam_hdr_create] Duplicated sequence ‘L1MD_5’
[W::sam_hdr_create] Duplicated sequence ‘LINE2’
[W::sam_hdr_create] Duplicated sequence ‘MER4E’
[W::sam_hdr_create] Duplicated sequence ‘TIGGER2’
[W::sam_hdr_create] Duplicated sequence ‘LTR10’
[W::sam_hdr_create] Duplicated sequence ‘ZOMBI’
[W::sam_hdr_create] Duplicated sequence ‘MSTAR’
[W::sam_hdr_create] Duplicated sequence ‘MER67C’
[W::sam_hdr_create] Duplicated sequence ‘L1PBA_5’
[W::sam_hdr_create] Duplicated sequence ‘PABL_A’
[W::sam_hdr_create] Duplicated sequence ‘MER77’
[W::sam_hdr_create] Duplicated sequence ‘L1P2_5’
[W::sam_hdr_create] Duplicated sequence ‘MER69B’
[W::sam_hdr_create] Duplicated sequence ‘MER63C’
[W::sam_hdr_create] Duplicated sequence ‘HERV4_I’
[W::sam_hdr_create] Duplicated sequence ‘HERV17’
[W::sam_hdr_create] Duplicated sequence ‘LTR29’
[W::sam_hdr_create] Duplicated sequence ‘MER68A’
[W::sam_hdr_create] Duplicated sequence ‘L1MC5’
[W::sam_hdr_create] Duplicated sequence ‘L1MC3’
[W::sam_hdr_create] Duplicated sequence ‘MER31’
[W::sam_hdr_create] Duplicated sequence ‘MER77’
[W::sam_hdr_create] Duplicated sequence ‘MER4I’
[W::sam_hdr_create] Duplicated sequence ‘GOLEM_B’
[W::sam_hdr_create] Duplicated sequence ‘MER63C’
[W::sam_hdr_create] Duplicated sequence ‘HERVS71’
[W::sam_hdr_create] Duplicated sequence ‘MER80’
[W::sam_hdr_create] Duplicated sequence ‘MER97’
[W::sam_hdr_create] Duplicated sequence ‘MER104’
[W::sam_hdr_create] Duplicated sequence ‘MER68B’
[W::sam_hdr_create] Duplicated sequence ‘MER34’
[W::sam_hdr_create] Duplicated sequence ‘HERVS71’
[W::sam_hdr_create] Duplicated sequence ‘MARE1’
[W::sam_hdr_create] Duplicated sequence ‘L1M3_5’
[W::sam_hdr_create] Duplicated sequence ‘MER68A’
[W::sam_hdr_create] Duplicated sequence ‘MER60’
[W::sam_hdr_create] Duplicated sequence ‘LTR20’
[W::sam_hdr_create] Duplicated sequence ‘L1MA9’
[W::sam_hdr_create] Duplicated sequence ‘L1MA5’
[W::sam_hdr_create] Duplicated sequence ‘MER61I’
[W::sam_hdr_create] Duplicated sequence ‘MER77’
[W::sam_hdr_create] Duplicated sequence ‘ORSL’
[W::sam_hdr_create] Duplicated sequence ‘MER9’
[W::sam_hdr_create] Duplicated sequence ‘THE1BR’
[W::sam_hdr_create] Duplicated sequence ‘LINE2’
[W::sam_hdr_create] Duplicated sequence ‘HERVK22I’
[W::sam_hdr_create] Duplicated sequence ‘MamSINE1’
[E::sam_hrecs_update_hashes] Duplicate entry “MER61” in sam header
[E::sam_parse1] failed to parse header
[W::sam_read1] Parse error at line 387
[bam_rmdup_core] failed to read input file

也就是说我们的参考的reads是有问题的,名称不唯一。

首先,删去我们这个reference中所有的非人类标识的reads。
然后再除去这个list中的重复。
手工去除了,我讨厌重复劳动。


发现自己愚蠢的工作陷入了死循环。后来发现是自己的reference用错了,不应该自作聪明用humapp.ref而应该使用humrep.ref。


这篇文章已经足够长了,换一篇文章接着往下写吧。

实验记录 | scATAC-seq数据的比对(一)相关推荐

  1. HIT-大数据分析Lab1:数据预处理-实验记录

    本文是哈工大大数据分析实验1的完整实验记录,包含环境配置,相关知识介绍以及实验解析.希望对后来人有帮助(新手小白没什么头绪,走一步查一步对应的博客o(╥﹏╥)o),博客链接之间会穿插一些我自己的理解, ...

  2. 【Oracle RAC+DG实验】Oracle RAC+ASM+DataGuard配置实验记录+常见问题

    [Oracle RAC+DG实验]Oracle RAC+ASM+DataGuard配置实验记录+常见问题 1.环境规划: ---RAC环境介绍(primary database)            ...

  3. mysql实验6语言结构_实验六 SQL语言数据查询语言DQL.pdf

    实验六 SQL语言数据查询语言DQL 实验六 SQL 语言数据查询语言DQL 一.实验目的 数据查询语言指对数据库中的数据查询.统计.分组.排序等操作.查询语 句可以分为简单查询.连接查询.嵌套查询和 ...

  4. Flink的scala+python的shell模式实验记录汇总

    根据[1],FLINK的shell有以下一些运行模式 ################################下面是scala-shell########################### ...

  5. 【深度学习】【U-net】医学图像(血管)分割实验记录

    医学图像分割实验记录 U-net介绍 数据集 实验记录 实验1 实验2(fail) 实验3(fail) 实验4(fail) 实验5(fail) 实验6(fail) 本项目仅用于大创实验,使用pytor ...

  6. 【深度学习】图像匹配Siamese网络实验记录

    图像匹配Siamese网络实验记录 Ⅰ. Siamese 网络介绍 Ⅱ. 数据集 AT&T 分拣行李匹配图像 Ⅲ. 实验记录 A. 模型1 1. 实验1 2. 实验2 3. 实验3 B. 模型 ...

  7. 【实验记录】EA-MLP(演化算法--全连接神经网络)实验记录

    large scale evaluation net -- MLP全连接实验记录 Ⅰ. Experiment detail Ⅱ. Method Vertex Edge DNA Evolution_po ...

  8. 【实验记录】Fashion-Mnist分类实验记录

    Fashion-Mnist实验记录 使用深度学习解决Fashion-Mnist分类问题 • Problem Description • Solution Design • Data Preparati ...

  9. CSAPP实验记录(二)Bomb Lab

    CSAPP实验记录(二)Bomb Lab 二进制炸弹是由一系列阶段组成的程序.每个阶段都要求你在 stdin 上键入一个特定的字符串.如果你输入了正确的字符串,那么这个阶段就被拆除,炸弹进入下一个阶段 ...

  10. CSAPP Lab2 实验记录 ---- Bomb Lab(Phase 1 - Phase 6详细解答 + Secret Phase彩蛋解析)

    文章目录 Lab 总结博客链接 实验前提引子 实验需要指令及准备 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase Secret(彩蛋Phas ...

最新文章

  1. GlideApp 引入不了问题
  2. java文件名特殊字符_如果拒绝打开文件名中带有特殊字符的文件,如何修复Java?...
  3. 您能看出这个Double Check里的问题吗?(解答)
  4. Android ContentProvider的介绍
  5. Enhancement set functionality missing in some system
  6. inner join 重复数据_Ramp;Python Data Science 系列:数据处理(2)
  7. 启动关闭HadoopSpark历史服务
  8. 数据结构——C语言实现链表
  9. 网页提示“证书错误:导航已阻止”,无法跳转解决办法
  10. 【NOIP2015普及组】推销员
  11. 用MyDiskTest检测存储卡真实容量(图)
  12. Windows系统如何关闭防火墙保姆式教程,超详细
  13. win10怎么录制电脑屏幕 电脑录制视频
  14. 超简单安装Win10!不用U盘!保留原应用快速纯净!安装Window10系统,Windows7升级到Windows10。(亲测有效)
  15. 使用python画k均值分类图
  16. java转go之初体验(一)
  17. C#入门9.14——本章小结及任务实施
  18. 计算机培训感想幼儿园,幼儿园指南培训心得感想
  19. 数据增加的两种方法(二)
  20. GPS时间系统概述和世界时系统

热门文章

  1. php宝典2015,驾考宝典2015电脑版 v5.3.5 官方版
  2. python 进程生命周期_计算客户生命周期价值的python解决方案
  3. tinymce编辑器之placeholder插件的实现
  4. html5的基本工作原理,HTML5基础开发教程
  5. 浙里办开发票据认证单点登陆、令牌获取用户信息
  6. CSS中 *{ }、*zoom,各种 * 代表的意思
  7. “数说”四十年春运交通变迁
  8. 【精品盘点】2020年最受欢迎的6个知识库整理软件!
  9. 设计一个程序,程序中有三个类,Triangle,Lader,Circle。
  10. 光功率 博科交换机_华为交换机查看光功率的方法请大神指教