文章目录

  • 格式说明
  • 实例演练
    • 判断fastq序列编码是Phred33(Illumina1.8+) or Phred64(Illumina1.3+)
    • fastq转换fasta格式
    • Linux 操作fastq
      • 获取数据
      • 统计reads_1.fq文件中共有多少条序列信息
      • 输出reads_1.fq文件中的标识符(即以@开头的那一行)
      • 输出reads_1.fq文件中所有序列的信息(即每个序列的第二行)
      • 输出reads_1.fq文件中‘+’及其后面的描述信息(即每个序列的第三行)
      • 输出reads_1.fq文件中质量值信息(即每个序列的第四行)
      • 计算reads_1.fq文件含有N碱基的reads个数
      • 计算reads_1.fq文件里面的序列碱基总数
      • 计算reads_1.fq文件中所有reads的N碱基总数
      • 计算reads_1.fq中测序碱基质量值恰好为Q20的个数
      • 计算reads_1.fq中测序碱基质量值恰好为Q30的个数
      • 计算reads_1.fq中所以序列的第一位碱基的ATCGNactg分布情况
      • 将reads_1.fq转为reads_1.fa文件(即将fastq转化为fasta)
      • 统计上述reads_1.fa文件中共有多少条序列
      • 计算reads_1.fa文件中总的碱基序列的GC数量
      • 删除reads_1.fa文件中每条序列的N碱基
      • 删除reads_1.fa文件中含有N碱基的序列
      • 删除reads_1.fa文件中短于65bp的碱基序列
      • 删除reads_1.fa文件每条序列的前后五个碱基
      • 删除reads_1.fa文件中的长于125bp的序列
      • 查看reads_1.fq中每条序列的第一位碱基的质量值的平均值

格式说明

FASTQ文件中每个序列通常有四行:

1.第一行:必须以“@”开头,后面跟着唯一的序列ID标识符,然后跟着可选的序列描述内容,标识符与描述内容用空格分开;

2.第二行:序列字符(核酸为[AGCTN]+,蛋白为氨基酸字符);

3.第三行:必须以“+”开头,后面跟着可选的ID标识符和可选的描述内容,如果“+”后面有内容,该内容必须与第一行“@”后的内容相同;

4.第四行:碱基质量字符,每个字符对应第二行相应位置碱基或氨基酸的质量,该字符可以按一定规则转换为碱基质量得分,碱基质量得分可以反映该碱基的错误率。这一行的字符数与第二行中的字符数必须相同。

查看fatsq文件

!cat ./data/test1.fq
@HWI-ST1223:80:D1FMTACXX:2:1101:1243:2213 1:N:0:AGTCAA
TCTGTGTAGCCNTGGCTGTCCTGGAACTCACTTTGTAGACCAGGCTGGCATGCACCACCACNNNCGGCTCATTTGTCTTTNNTTTTTGTTTTGTTCTGTA
+
BCCFFFFFFHH#4AFHIJJJJJJJJJJJJJJJJJIJIJJJJJGHIJJJJJJJJJJJJJIIJ###--5ABECFFDDEEEEE##,5=@B8?CDD<AD>C:@>
@HWI-ST1223:80:D1FMTACXX:2:1101:1375:2060 1:N:0:AGTCAA
NTGCTGAGCCACGACAAGGATCCCAGAGGGCCNAGCCCTGCATCTTGTATGGACCAGTTACNCATCAAAAGAGACTACTGTAGGCACCATCAATCAGATC
+
#1:DDDD;?CFFHDFEEIGIIIIIIG;DHFGG#)0?BFBDHBFF<FCFEFD;@DD@A=7?E#,,,;=(>3;=;;C>ACCC@CCCCCBBBCCAACCCCCCC
@HWI-ST1223:80:D1FMTACXX:2:1101:1383:2091 1:N:0:AGTCAA
NGTTCGTGTGGAACCTGGCGCTAAACCATTCGTAGACGACCTGCTTCTGGGTCGGGGTTTCGTACGTAGCAGAGCAGCTCCCTCGCTGCGATCTATTGAA
+
#1=DDFDFHHHHHJGJJJJJJJJJJJJJJJIJIGDHIHIGIJJJJJJJIIIGHHFDD3>BDDBDDDDDDDDDDBDCCBDDDDDDDDDDDBBDDDDEEACD
@HWI-ST1223:80:D1FMTACXX:2:1101:1452:2138 1:N:0:AGTCAA
NTCTAGGAGGTCTAGAAAGCCCAGGCCACCGGTACAAACATCAAGGGTGTTACGGATGTGCCGCTCTGAACCTCCAGGACGACTTTGATTTCAACTACAA
+
#4=DFFEFHHHHHJJJJJIJJJJHIIJGJJJJ@GIIJJJJJJIJJJJFGHIIIJJHHHDFFFFDDDDDDDDDDDDCDDDDDDDDDDDCCCEDEDDDDDDD

每个碱基的质量值与错误率之间的关系:

Q=−10log10PQ = -10 log_{10}P Q=−10log10​P

其中P代表该碱基被测序错误的概率,如果该碱基测序出错的概率为0.001,则Q应该为30,那么30+33=63,那么63对应的ASCii码为“?”,则在第四行中该碱基对应的质量代表值即为“?”

print(ord("?"))
63

一般地,碱基质量从0-40,既ASCii码为从 “!”(0+33)到“I”(40+33)。以上是sanger中心采用记录read测序质量的方法

对于每个碱基的质量编码标示,不同的软件采用不同的方案,目前有5种方案:

  • Sanger,Phred quality score,值的范围从0到92,对应的ASCII码从33到126,但是对于测序数据(raw read data)质量得分通常小于60,序列拼接或者mapping可能用到更大的分数。

  • Solexa/Illumina 1.0, Solexa/Illumina quality score,值的范围从-5到63,对应的ASCII码从59到126,对于测序数据,得分一般在-5到40之间;

  • Illumina 1.3+,Phred quality score,值的范围从0到62对应的ASCII码从64到126,低于测序数据,得分在0到40之间;

  • Illumina 1.5+,Phred quality score,但是0到2作为另外的标示,详见http://solexaqa.sourceforge.net/questions.htm#illumina

  • Illumina 1.8+

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6lozhwm1-1583370774136)(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAArMAAAGDCAYAAAA4dZgrAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JQAAgIMAAPn/AACA6QAAdTAAAOpgAAA6mAAAF2+SX8VGAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4gcXCyIxEo8K5AAASutJREFUeNrt3X2MFNed//tPz1MPDM8DY2yM8RjwYsfsjR92fW0nFrveH443YiMvkJvlOpGQEi+S/7CFN0Ls/mGsaMNFWXwn0s0KO5ZQNhaOFibWxvIvMb/Fv9nExLsRjr0764R4sNs2EGCGYYBhgJqZ7r5/jLvprqnH7qququ73S0JM96mHc06dqvr2qVNVKcMw8gIAAAASqCnqDAAAAACVIpgFAABAYhHMAgAAILEIZgEAAJBYBLMAAABILIJZAAAAJBbBLAAAABKrJagF7UzvLPu83dhOGmmkkRZ6GgCgsQUSzO5M77Q98ZBGGmmkhZUGAEAowwycek1II4000sJKAwA0nkCC2e3GdtveEtJII420sNIAAEgZhpEPamGFE45VzwlppJFGWlhpAIDGFWgwW2Ae40YaaaSRVos0AEDj4dFcAAAASKxAglmn8WykkUYaaWGlpdNppdNp0kgjjTTSEpxWrVCeM2u+BEgaaaSRFlYaAKCxhTJmFgAAAKgFxswCAAAgsQhmAQAAkFgEswAAAEgsglkAAAAkFsEsAAAAEotgFgAAAIlFMAsAAIDEIpgFAABAYgX2BjCgnqR3XnvlnrHdKPtcijTSSCOtXtOM7YaAJKBnFrBQOIhb/U8aaaSR1ghpQFIQzAI2Snsp0jvTZQd30kgjjbRGSAOSgGAWAAAAiZUyDCMfdSaAOHPqqSCNNNJIa4Q0IM7omQUcFA7uVjdMkEYaaaQ1QhoQdwSzAAAASC7DMPLV/pOUl0QaafWVtkO2n0kjjTTSGiItTsdk0uo2rdp/gYyZTafThcCYNNLqIs3qWYtWl99II4000uo1zdhuxOaYTFp9p1WLG8AAAACQWIyZBQAAQGIRzAIAACCxCGYBAACQWASzAAAASCyCWQAAACQWwSwAAAASi2AWAAAAiUUwCwAAgMRqiToDANyl023TvjOMcdJII420UNOAJOANYECCpNNtticZ0kgjjbSw0oA4Y5gBAAAAEotgFgAAAIlFMAsAAIDEIpgFAABAYhHMAgAAILECCWbT6bTS6TRppJEWUhoARCFOx0HS6jetWjxnFkiA0udAFv62ekYkaaSRRlqQaUAS8JxZAAAAJBZjZgEAAJBYBLMAAABILIJZAAAAJBbBLAAAABKLYBYAAACJRTALAACAxCKYBQAAQGIRzAIAACCxeAMYUMfaDli/OnB8g2GbThpppJFWmgbEHW8AAxpA4URld3JqO5AmjTTSSHNMA+KKYQYAAABILIJZAAAAJBbBLAAAABKLYBYAAACJRTALAACAxAokmE2n00qn06SRRlrM0gCgWnE6ppFWv2nVoGcWAAAAiRVIMGsYhgzDII000mKU1nYgbfvShEI6aaSRRppTmhSfYxpp9Z1WDd4ABtQ5pxcmkEYaaaS5pQFxxxvAAAAAkFiMmQUAAEBiEcwCAAAgsQhmAQAAkFgEswAAAEgsglkAAAAkFsEsAAAAEiuy58weuPPO4t8b3nmHNNJIIy2wNABA44isZ7Zw8rE6CZFGGmmkVZMGAGgcDDMAAABAYhHMAgAAILEIZgEAAJBYBLMAAABILIJZAAAAJFYgweyBO+8se0yO1zSraUkjjTTSgkhLp9NKp9OW05FGGmmkkRaftGpF/pxZqxMTaaSRRlo1aQCAxpEyDCMfdSYAAACASjBmFgAAAIlFMAsAAIDEIpgFAABAYhHMAgAAILEIZgEAAJBYBLMAgIbi93mXYT4fE0iqSvaJsPajmgWzhQL4LUiQBfdzQCqdLskHsSTnHQAKx20/x263B7cbhiHDMDwv0+/0QD2pZP+zSyvdj4LcnwINZr0UwDAM23nM89sV2C3QtMuHed1O5SidNqkHMXM53Bqk21s77Ora73JL57Ga18u6zNN4WV6Y27CSH0mVrIMeIjSiQjBZ4LQPmKe1Srf622s+vApqX61kOUEFH+bpvKZ7OSdwLEsWr23fy75nFw9WI/Q3gDkFsn7ms/s/qPUFVc64Ks2fVZDrtVxuy6m2wTst3zxf6Y8dqx9JVu0gjO3kt9xO05duC6d8O5XZall29W6exuqHm5e24vSDz2l+83xun83LtEvzsgy3+nCbz2pbFb6rdDuEsf3s+Cmf+Xu3dK915qWMTsuK8zE3qE4Qv8txO4aGlU8/y3Taf9EYgt53Qx9mUBqJlyr9VVbpL7RKA+RKprHbOZ16jZ2+qyW3nma3noxq111NPksFEZQGfeCuJB9ObamwLcyXYkoP/k7BhFV9WS3Tan1281kFbV7msyuH03x2/1v97ZTmdVl2dWb1vV17NAcOXurMbjtUOp9V3qqtF7s6d1qPn2Va5T/OwWkSBVmfXoJilIv63N9IIrsBzOkEZTd9EOsMoxxOPYRRCSovXi/X+c1DtUMXwqivSi7l2f1IcxtKUUlQ7VaffpZZSW+yU7qX+aPcN8Jef6VDk4Ic0uQWEIZZfqfjoF1aXI6VAJIv9GEGTpwuh3jteXK7xFyYDtVzGtoh+avnIIcuuF0at8prteV26sH3Uwd2J/cogr9K1uclwA5yfZUIui5rHZzbDT0JIx9ehnKFVcZKr3DE4fhut438pJnTvd6HYHXJ3u6Y7HacdBvu4nT8rzTdKX9W5XM6t7vVi5/tYDckxm3YlNO2MC/DKs3qs/l787Lc/va7jiSKNJh1U6hs8wnfy3ja0mWEyUseCtN5yVM1PWDmPNWKVSDqlIdqAxy3QDLoHbjWO3fQ2y+M/cHriTaoZYaRTydO+3IQeanmR49Vmt9lhsFLPu2CnajzXg23H+B+frg7LdNtfW7HXrfjs126U1DoNd3rNnY6r1vFA146QqzW72U7lKb7vUfBayedVfBpFevY1a95Wrvv7T475dWunF5E8UMzsmC29LJT6XdeDoRBnmCq5TaOzqlcXpYXF5U0TC/bNoy8OfV4BsXpqoDTPE75DqIe3NYb9PATLz0pbnm122eC4OXHVa1Us363QLFWZXQK8KPo0Y0DL+3dz3K8COP8V6vjc1zz6Xed5kA0qONqpfuR3TxOy/Ta+RTG/UlBCzSYrabiK7m87LbsWorLwTuMRlTJLzanbVkvJzung4ObSnqu7cYeRlm/lfZoJz3oCfoYE5feVT9lDiKv1baBuLQjtx60oFVS3ih68ePS6RSWOP1YbnQ1ec5s0PxcsvCbL7dehyDLUOn4Tb/zmcvkpyxOl1KCFtaPEC9lr3R7VJJnt/o358WqB6DaunIbCuM3v2HUU5Dc1h9Er1ol+5jVZVQvP2aCyLOXPHkV1JWzJAY/QZ0bwq4z83CEWgjiWBVXXoav+VmGeVl+elgLy/Iy1tYqD7X8IRaW2I6ZteuFsprG6iAat7FkTuWrVR68DJr3Msjdrgx2O0rhf7fxPV6X6aUu7ZZZSVtwO1n5OflXeuKzK6O5fu2G7/ipa/MyrNqC3Q8+pzJ6WZ9TmezyZFfPfurF6Tur+axO0nbtzG+9uM1nd/Kx26+9jNurhp982n2XZHY/DN22kZ/jYOkyneZz2nej3P+c6sTqnG23PKd8urVrP3XtlEcvy7SaxilOqfYHkVP787qdki5lGEY+6kw4iar73ukyblzLxqWOcFCv8VSLfTSKMknR/wCPE68Bg5cAv5GFPRwpzvtgNVcj60U15XIK5u0C/FrXYeyD2XpVrzsMUCsELQiKnxs3kyyMMc9BLi/qMiehPGGVvdLlx6WeCGYBAACQWJG9AQwAAACoFsEsAAAAEotgFgAAAImVyOfMArUUp3Ydp7xQdqC+sI+hFsJoZ/TMAgAAILEIZgEAAJBYBLMAAABILIJZAAAAJBbBLAAAABKLYBYAAACJRTALAACAxCKYBQAAQGIRzAIAACCxCGYBAACQWASzAAAASCyCWQAAACQWwSwAAAASK2UYRj7qTAAAAACVoGcWAAAAiUUwCwAAgMQimAUAAEBiEcwCAAAgsQhmAQAAkFgtfiZOp9PFvw3D8JwGxFVpuy0otN8o2nRhnU7rS6fTdbePOW0Hc3q9lR0Imt0xolGPLwiHWzuT7GPFoNuY52DWnOnSz05pQNzZ7Yy1btNO+1QjsAteOb4A3lj9KCxNa+TjC4LjtZ2ZP4fZBhlmAHhgGIbjDlwt845tt5OHmYcoEawC1TMMw9OPc7vjWb0eXxAsu3bmxOs5rlKee2adVsyJB0kWp0vYbpdnwg6q4yjqbQI0gkY9viBYbsfrsM63vsbMlmak0vE4QNx46RWsxQGey+nOOL4AQHK4ndOCPMf5DmadxjswHgdJY9dOzT0Uteix4KYM53JyfAHCwT6FWgizjfkOZoFGEbeDu9NNUQBQDY4vCFKt25DnG8Dc7l4Dkigpbbcw4L5wcGikE01SthGQVI18fEHwovgxxNMMABu17KngxgsAYTEfX+h5RVisxsVK4Z/jUoZh5P1ksoCXJqBeRPGA50ry4iU96Sp5CDeAKV5fPuLl0X/sZ7Dj9WVD5jRzepBtzFcwCwAAAMQJwwwAAACQWASzAAAASCyCWQAAACQWwSwAAAASi2AWAAAAiUUwCwAAgMQimAUAAEBiBRbM7kzv1M70zqjLA/hWaLtW7bfW7dopL4X0eheH7QAASI6WIBayM71T243t0/4G4s7cXu3aci3atVteGhXHFwCAk6p7Zs0nl+3G9oY+8aI+WLXrKG03tkeeh1owHzs4vgAA3DBmFg3NLUB0u+xfy7zUO3pdAQCVCGSYAVAPnC7zW30OMx8SwS0AAF7QMwvYiCqYLAwpaKTL6fTKAgAqRTALiGAqDkqHczRSIA8AqA7BLBpeXALZRg7gCr3Rhe0Qh+0BAEiGqoNZ8+VQq3GGTs/MJI20qNPM7VVyv2s+jLxUKm71GWTZ3Y4vAAAEcgNY6QmHEw2Sxi7AMgdSYbdtp/WZA7pa5CcKpcMMSntp67nMAIDqpAzDyEedCQAAAKASjJkFAABAYhHMAgAAILEIZgEAAJBYBLMAAABILIJZAAAAJBbBLAAAABKLYBYAAACJFchLE4B6lN6Znvadsd2IPD9R5iHKcjdi2YEwpHemp+1LjXp8QX0gmAUcxOXAXnrysToR1StzWRup7ECtNOrxBfWDYQZAzJlPLsZ2w7LXuBE0ctmBIJj3H44vqAcEs4CD9M508R8AJBm9rqhXDDMAHHCJO174UQEAMCOYRcNyCoyM7QaBawyYL3lyCRSoDD/GUc8IZtGwOLAnA9sJCEbpD0GCW9QTxswCNugBBFAvClebCgEsgSzqCcEsUAGnm8KCTjNfWrd6XFWt8hJVWiOXnTTSgkiz43Z8AZKAYQaADavxmnHIS6OdaBq57EDQCvtSadDKPoakSxmGkY86EwAAAEAlGGYAAACAxCKYBQAAQGIRzAIAACCxCGYBAACQWASzAAAASCyCWQAAACQWwSwAAAASi2AWSIh0uk3pdFvU2Yis3I1YdiAMVvsS+xiSjDeAAQmQTrfJMMan/V3vzGVtpLIDtdKoxxfUD3pmgZgzn1wMY7xhe1AauexAEMz7D8cX1AOCWQAAGgC9rqhXBLMAEoMeIwCAGWNmAcSW+ZInl0CBytAri3pGMAsg1jgBA8Eo/SFIcIt6QjALAECd46kgqGeMmQViznxp3epxVXaX3uslrZHLThppQaTZcTu+AElAzyyQAKUnnEY70TRy2YGgFfal0qCVfQxJlzIMIx91JgAAAIBKMMwAAAAAiUUwCwAAgMQimAUAAEBiEcwCAAAgsQhmAQAAkFgEswAAAEgsglkAAAAkFi9NABpY24F08e/xDUbU2UlsHoEkaTuQnrYvFfYz9jEkEcEs0KDMJzSrE1zUkpBHIOlK9yv2MSQRwwwAJMb4BqOspxaAP+b9xxy8so8hiQhmgQZF7wvQWOh1Rb1imAHQ4JI0Vo4eIwCAGcEs0ODiPFbOfMmTS6BAZeK4fwNBIZgFEGucgIFglP4QJLhFPWHMLNCg6OEEGsf4BqP4r/AZqBcEswCmaTuQtg12o0gr/dv8qK445ZM00uKaZsc8dIceWyQRwwyABmU1HjWOSvMZ1zwCSVHYl0qDVvYxJF3KMIx81JkAAAAAKsEwAwAAACQWwSwAAAASi2AWAAAAiUUwCwAAgMQimAUAAEBiEcwCAAAgsQhmAQAAkFiBBbMH7rxTB+68s+L5KpkXCIJTG6x123TbH9hP7FFnANCYAnkD2IE779SGd96Z9ref+fzOCwTBqQ1W2q7Dygv8oc4AoDFU3TNrPgFveOcdTiJIPKt2HaUN77wTeR7izOqYQ50BQGOIdMwsJxpEza0N1nIYDPtDZbiiAwCNLZBhBkHghISouQ17qVUbLQTO7A8AALjjaQaAjaiCycLlcYbruONHMAAgFsEsJyREjTaYXKXDQPgBAACNJ/JgliACUYtLGyQQ86/Qi13YfnHYjgCA2qo6mDVfDrUaZ+j0/EfztF7nI420oNKs2qDbZf4w8lKpuNVnLdMAAAjkBrDSE7/fnhFOUoiaXRs0B7Rh9/o5rc/8Q68W+UmS0mEGVs/mpc4AoH6lDMPIR50JAAAAoBKRj5kFAAAAKkUwCwAAgMQimAUAAEBiEcwCQMKNjY1N+7x3714NDg5OmzaTyai3t9dyPrvvMIW6A+KJG8AAIGJPPvmkuru7tXTpUr3yyit67rnn1NXVZTntW2+9pdHR0bLvdu/erfvuu0/f/OY3JU0FVUuXLtW+ffu0fv36smkfe+wx7d+/X7t27dKLL76or3/968W0TCajQ4cO6dVXX1V3d7fl+gvB3Pr16/Xwww/rkUce0VNPPVWTespkMvqXf/kXz9Nv27atrA76+/u1a9cu3XPPPY7rOHr0qH74wx9O2wZJrjugntEzCwAxkMlkNHv2bL377ruu046OjmrdunUaHR3V7t27tWrVKt1///06fPiw7r333mIP4ezZs8vm6+np0bvvvqu+vj7dfvvtGhgY0Je+9CVt2rRJmzZt0oMPPqiBgQF1dHRYrre/v1+bNm3SCy+8IEm67777NHfu3Kirbppt27bp/Pnz2rVrV9n3s2bN0v79+3X+/Pnidy+++KKOHDlS/Dw8PKyTJ09OW2aj1B2QRIE8ZxYA4M/g4KD27dsnSTp69KgWLVqk1157TZK0b98+HTlyRPfcc8+0nrv77ruvOHzg85//vF544QXNnz9fv/nNb7R06VINDAzo0qVL09bX29urF198Ud///vd19OhRXbhwQZLKejozmYxjnv/mb/5GK1eu1A9/+ENJ0v3336+tW7fqy1/+sm0QF6Tu7m7XnszBwUFt27ZN69ev1+rVq8vSCnksTfvpT3+qRx99tNh729vbq6GhobJe2XqoO6CeEcwCQATOnDmjbdu2Tes9LNi/f/+0y+GFy+SdnZ2SpL//+7/XyZMndeDAAS1ZssR2Xb29vfr5z3+uN954Q++++662bNmilStXSpK+9KUvFYOpX/ziF9qzZ4/lMvbu3au+vj719fUVA70HHnhAkvSd73xHO3bsqFnd9ff369ChQ8XPmUxGw8PDev755z3N39vbW5z/5MmTeuWVV3T8+HFJKuulrce6A+oRwSwAROC6666TJG3atEmZTEbd3d26/fbbdfToUT311FPatm2bbr/99rJ5Vq9ereeff16HDx/Wnj179Hd/93c6evRocezlwYMH1dfXN21dX/jCF7R+/XoNDg5q9+7dWrNmjR5//HFt2rTJU+/iwYMHtWXLFu3bt0/33Xdf8fuOjg4999xzWrdunUZGRvTtb3+7Jr2M77//ftkPgeHhYU/DMypRb3UH1COCWQCI0L59+3T06FENDw8rk8no5MmT6unpsZ2+o6NDr732mjZu3DjtBqXZs2dr48aNmjVr1rR5JOkf//Ef1dfXp6NHj2pgYEAbN27UQw89VJzu1KlTGh4e1i9+8YviZfeenh5t27ZNGzdu1PHjx6flrTD+dM+ePdqzZ4927dqlb3zjGzUJzApDDnp6ejQ0NKSOjg7HJwoU0tyGGRTGtdZz3QH1hBvAACBB3nrrLR06dEjbtm2bljZr1iwNDQ0Vg6bSG8B6enp04MCB4ud/+qd/0j333KNDhw4V/61bt06dnZ3q7+8vBn4//elPdfToUT366KOSpnogCzdYSdKxY8e0cuVK20vscVIYS3zo0CH19PSop6enOMyg8PmVV16ZNh91B8QbPbMAECGnYQZm/f39+sY3vqGXX3652LP4la98Rbt379bcuXO1efNm7dmzRz/4wQ+0ffv24mXtvXv36siRI3r22We1adMmDQwM6Gtf+1rZsksf93XgwIFi7+Xrr78uScXHTe3du1eS9M1vflMdHR3FXtHNmzdr8+bNNa27QtBuHudq59SpU77XUa91B9QTemYBoIYGBwfV09NTfJJBYZjBkSNH9Nprr5UNM9i9e7fS6XTx+aSzZs3SG2+8oVtuuUWDg4MaHBzUmjVrNDAwoAsXLmhwcFCnT5/WW2+9VfZ82S9/+ct66aWXij21o6OjGh0d1datW6c9huqzn/2snnvuuWlDFQp+9KMfacuWLcVL4efPn9eqVauirlZPTp48qZUrV+qpp54q/luyZIkeffTR4udCLyp1ByQHPbMAEJFdu3YVe2C3bNmi7u5uLVmypOzmpkceeaQ4faGHr6enR+fPn9e8efOKaaW9k4888ojef//9Yu+teQzm5z//eXV1den48ePavXu3Xn311eI0S5Ys0dq1ay3z29vbq76+Pv3DP/xD8btjx45pxYoVkdRfYczswYMHPU1/6NChshcdeFGvdQfUE4JZAKihrq4uPfXUU8pkMtqyZYsklb2launSpVq0aJG2bdum7du3Fy9Jlyp91mrhuarmZ9IWXmdr9yYxaerRUtu2bdPp06e1fPlyx3z39vZq06ZN2rdvn2bNmqWenh7de++92r9/f83HfK5fv16GYRQ/r127ttjjbae/v1/79++fNnyj0Lt68OBB/eY3v/E8ZCGpdQfUI4JZAKixvXv3asuWLdqyZcu016auX79e69ev1z333KNt27Zp9erVxUDX6XWuR44cKQZzmUxGe/bs0Zo1a/TjH/+4GAybX4Pb3d2t48ePa+vWrRoaGpI0/a1hhWfbvvvuu3r11Ve1du1aZTKZ4s1MkrRmzZqoq7SokKfCo88KXnzxRe3atWvaixQGBgaKZc5kMtq/f3/xR0apRqg7IKlShmHko84EADSS3t5e3XrrrdMCK6vp7rrrruLwAmnqaQZuPYGlzG+yOn78uOVbtPbu3asLFy5MS+vv79eRI0csb1Dq7e0tXnaPi4MHD+r666+fVre9vb1l44jt7N271/KtXI1Qd0BSEcwCAAAgsXiaAQAAABKLYBYAAACJRTALAACAxCKYBQAAQGIRzAIAACCxCGYBAACQWASzABAjO9M7o85CovLl5rUdr+nt3rclSUOZIb3w8Avamd6pA08e8LyMt3vf1s70Tsv5DvUcKqYd6jkkSbo4eFEvPfaShjJDURcfaAgEs6hI4eAdp3UUpreaxykNzpJQb0HmL8p2tDO9U9uN7ZbrDGt9dmU0225sj307MHttx2u6MnJFd6+/W8aYoX3r9unux+/W1nNbde7ouWKQ6+RE/wm9+cyb+vrRr+uJ40/o5KGTOrz3sKSpIHfx7Yu13diuv3j1L/Srbb/SUGZIc7rm6LZHb9O+dft0cfBi1NUA1D1fr7M1H8jMB92g1Xp9KGd3Qiv8H/aJzc86zEFA6WenNLhLYhBTbXkl5/Zfq/pwasf1sL4wvXfwPY0cG9FjLz0mSfrvn/23JOnu9XdP/f/43XrzmTeLn+38Z+9/asWGFVrUvUiS9LlnP6c3n3lT93z5nrJ5Vzywomy+QtpPtv6kmAcA4fDcM1s4sBX+ha3W68N0pYFr4V8cgpo45AGVC2P71cMxwu1HV9D7n3lZXtYXl2OAF4d3H9adX7uz+Pl0/2kteWhJ8fNNd92kywOXXYcCjBwbUfu89uLn6269TpcHLpf1uF4cvKj9f71fD+55sBj0SlMB7fC7w/rgrQ+irg6grlU8zKDWJ496OFkhPE7tg7YDlEt6r6ubocyQhvuGy3pLR46NaH73/OLndEdakmRcMhyX1d7ZrlNHTtmmn+g/oe8t/Z6O7z+uqxeuTktfsWGFjr5+NOoqAeqa52EGhV/ktToAuq2vtHfAfOmvtPfAavyZeT1elum23EZmrhMvdWZV11Zp1ebFa1rQdeG37Oa267WtBVnXfrZDJUOASuex2letPlvxWja3fLrN64dTGThW1N4nv/5E0rWAtRrdD3br4KaD+uCJD7T8vuX63aHfSZLmdM2RJN24+kZtN7br8N7D+vmWn2ve0nllww/mLZunzKFM1FUC1DVfY2b9BnNOJyUv8zudnK0+F6a3G/flZxxl6TILn+u9N8OOW8BjrlO3OvM7vtWcB6c8OY33q8VYwErLbhfEueUz6Lp22w5uaV7zanU52zwe20s+ncaw+i1DNW3CrQxxEtd8Ba1zTWfZ5/kr5uvq+Ws9pxfPTA0TSM9yDnjvXn+3Mlsy+uc1/6zONZ2auWim/njXH08LlB/Y/IAyhzI6f/x82fftc9p1eehy1NUB1DVfwazkLyAI4oBptT63mzAqXa+fHqFGUs0lfC89637zkJSTsVUew7yByG9dV5oX84/MoPbz0h7pMLav3Y+eIJcfdhmC4hTY14vhvuGyz4tXL9abz7wp7Zj6fOb9M5q5cmbZGFc7G767Qfru1NjYH331R/qjTX9kOd38FfPVPre97LurF69q5qKZUVcHUNd8B7NxYe658cKtZ7keD+hxFXRduw1JidO2raTtVsNtmEIleamHO95rvR3iIok/DP266a6bJE0Fn4XhAHd84Q69+cyberv3ba38/Eq9/cLb+tyzn/O8zLd739abz7yp9S+vLy6z1In+E/qvnf+lJ44/Ufb9+Y/Pa/6K+V5XA6ACvp5mUEtu4/UqOQCbL4PHqbyNLMq6rvUzVJ3abi169oLokfM7xtWrsHs0a9EbWU0Z3MYPWw2NCKNn2amOkhL8LupepM41nTr+7vHid+mOtNa/vF5vPvOmvrf0e1r+yPLi2NbDew9rZ3qn5VMHCi9FyPw8o02vbtKNq28spr2247Xidujb1aevHvnqtED32IFjWvXwqqirBKhrKcMw8l4mjNszZr0MCfByQ5fdMr08a7Le+akrv0M0/N545PcHiN/1BbVNveTd7fmlfgIiv+vzmxertGqPBX63hdebuNzSvZa9kjpxK4OfurHLp90NbkEMq7Kqs6DXV2vvHXxPh3cf1uOvP+5p+qHMkN7ve18PbH4gsDy83fu2fvvKb3nOLBAyz8FsPeDh+bBCO4iPJPcGBpVfP/PXsm6Sth2kqZ7TkWMj2vj8RscnG7x38D199MuP9Gff/LNAnoBQWOa/bv1XbX5js+WwBADBaahgVgr2cTxIviSeoOtNpT3qcRJFPmm73hzqOTTtcVlhuzh4UT/Z+hM9/K2HPd1gBqA6DRfMAgAAoH5U/AYwAAAAIGoEswAAAEgsglkAAAAkFsEsAAAAEotgFgBiJL0zmEdDkWcAjaLunmZQelA1thtRZ6dhxW07FPLjlJf0znQs8ho3XuouakFuO6e2G3a79lMOqwAyyDxZ5cWpLbD/AIhKi5eJzAfwWgUq5oO1l3UVpqGnIDhJ3w6lJ1lOuP6Z9/l659R249SuS/NTC0ndj3a8tkOrF6/W+rvXW6YffO+g1v1kXfHz8SeOq2tOl6dlZ4Yy6vm3Hu0Z2KNdf7xLTz30lPpP9Otv/uffqG+4T2s612jPo3vUvahbPYd6NLd9rjY/sDnqKgHqjqdhBoWDlt3/YSgcLAv/EI2kbQdzkGE+6dqVIS7BSSMIo66T0DbdxClA9LIfWf3Ijdt+tOO1HRq5MmIbyErSa799Tee2nise47wGsv0n+rXqxVWaP2O+jj9xXE899JQk6cV/f1E//MoPdfyJ45KkLa9skSRt+qNNOpQ5pL2H90ZdLUDd8TxmtvRAFcVBNy4H+UaX1O2Q3pku/rNKS2q50Nic2nU1y6yH/eHgewd1bOSYvrvhu7bT9J/o14PdD6oj3eFr2YMXB3XPD+/Rq3/xqnZ8cUcxAO4/0a+v3PkVdc3pUtecLj39wNPqG+5TZiijrjlden7j8/rRez/SWx+8FXX1AHXF0zCDKBSCZy89aX4OvHbzlY4FsxoXFrcxoLVSD9vBnP8wT9Z2Q3LcymAut928Qa3Pb5rTet3yaDWP1Xa2+mzFT3twymeQ+3RUQ7Fq1a6TaPfh3Xr6gacdp+n9z17t/K+d2vjbjXr0tkcde3BL/eMv/lErZ67U7sO7te4n67Rl5RZ9d8N3tfrG1WXTXT/3+rLPHekOPX734/reW9/Tfcvvi7qKgLrhK5h1C2zMnE5KXsdd2p2cKzmIO83ndAKqZTAUR3HfDlaXRM3595qXIOqqsFy7G2isymAXxLnlrdL1VZLmtEy/dWN1ydrcBrzk02kMq98yVNMO3MpQCbfjZ9Q3oMVZZiijvuE+/XjFjx2n2/HFHfrmn31T33/z+9p0cJP2XN3jOqZ1zBjTzv/aWQxgC2NuH3z7wWnB8CXjktZ0rlH3ou7id3fddJc2Hdykbw19q+x7AJXzFcyWBg9+bgKqRtA3HLidIKzWE7ebPqIQ5+3gFvD6yUuQJ3Or5YTZlvyOB640L+YfN0Ht536PL9XUTxjbIegyRBVUhrU/1NKvP/m1JHkaPtCR7tBTDz2le2++V2v+eY2+uPqLjuNmBy8OSpK+/n9+XZK09jNrtfGdjfp55ufTgtkfvfMj/cOf/0PZd4Vln754mmAWCEhshxmEpdIDs7mnCNWJy3aIose91m3Ja8+0n7wk8a72oMoeF2HUfT1dgVrTuabs88MvPKy+4T7LskrSfcvv08qZK3Xm4hnPN4EV3HP9PTpy6kjZd71v9+rB7genDT0oBNijV0ejriKgbni+AczpUnAYanFysbtj1+1O3kaS9O1QTVsN+sYatzLUoncyiF43v2NcvQqz3EGVPcwyBFmXQbddc968DGmJWmngKkmvP/66pyezXDfnOsflFgLd98+8X/zu/NXzWjF/RfFz/4l+Xbx60XIM7pgxJkma3T476ioC6oanlya43bASBrdxj243ENnNG8SNR3Z5qkf1sh383ChUTe+Ul/y5Pb/UT0Dkd31+82KVVsnNX37y45bHINqZn/J5qRO3MvipmyBupAwiH071GfeXJmSGMlr14ipfz4z18xzYvYf3aveR3Xpj8xsaM8a06sVVOvLVI1p942r1n+jXod8dKj6qa8wY0z8f+eficgt5O/r1owwzAAJSd28AA4IUpxN0I0hCj18lZajl/EEvJ27r8urhFx7W0w88rbWfWWuZPmaM6S9/8JfFHtx9a/cVe1LHjDEteG6BNi7dqJcee8ly/icPPKk9A3skSa/+xata+5m16j/Rr3t+eM+0afu+3Fd8ekHv27165bev2C4XgH8Es4CNOJ6g61GUr4+tRRmiyk8c8hGlg+8d1O7Du/X6469XvIwdr0097cDvc2jtFALoHQ/t4NFcQIAIZgEAdWnHazt0bOSYnt/4vK+AdMwY03f+9Tu6/+b7bXt2/RozxvTX+/9aD3U/xCttgYARzAIA6lbPoR4tnbfU8wsRwsyH1zG5APwhmAUAAEBieX40FwAAABA3BLMAAABILIJZAIiRdLot6iwkKl8AkMhgNp1uK/7zkxbEemtdziDrJui8xW0dXqbnhOxfLbZ1EHkMury1Pr4Ulm8Y45brDGt9dmU0M4zx2LcDAI2pxeuEdge3sDitr/C/0zRJPuhWmnfzidDqxFhtPkq3Qdh17GcdpWWtttwo12hBTJyOL7Vu1+xHAJLIc8+sYYyXBTJhH+RqvT6vearVesJal5+TsLn+4xLUmPNgPuna1V0c8o5wtkMcjg/VcvsxGvT+52U/Mk8Tl2OAHzt2tKi3t1mSlMmk9PDDrUqn2/Tkk577cop6e5t1xx1TPdeDg6ni8gu92T09U8scHEzpscdalcmkoi4+0BASOcwA1urhhF4Nt8vDjV4/QEGj7A87drRoZERavz6rsTFp3bpWPf54TufOjevo0VQxyPXiySdb9MILTXr22awMY1xdXXkdPNik1avzMoxx7duX1bZtTTp4sEldXXk9+mhO69a1FoNeAOHx/9M0oQoBjvkyoV1Pg7mHxOn70mVYjXezmreWZa31+sxltsqPU71UOlY4yCEW1daF37Kb25JTmwpifX7TnNbrlkereez2Py89f17L5pbPIPdNpzI0QtAYRwcPNunYsZReemlCkvSzn00FruvXZyVJjz+e0zPPNBc/O+npadHwcEo//vGEOkpeJHbyZEqbN2eLy12zpkm//GWT1q7NFZe7dWtLMQ8AwhFqz2xpT5n5X62ZTyhWl6XtxoPaXfY3jy0zz1f6fS0vz5nXZ653v9uhdHqrgMe8vtL6Kk33Ui/mNKs8WH12C/iiGCLip+x2+XPLd5B17WU7uKV5zav5b7v/3Ya6OOXBTxmq3TfdyhInjdIru3t3s772tWuBan9/Sg89lCt+vuuunAYG5DoUIJNJadu2qVPlggVtuuOONvX3T30uBLIFq1aVv4No/fqs3n03pbfe4iIoEKZQ97DSk4X5Xz1xC0SivhvcXO9+t4PT9F6CLSt+A+pqy2AOhGvB7QdQ0PnwW9eV5qUQ/AV5BcD84yuMY4TVD9ggt0MtyhCUKPaHWspkUurrS+mBB64Fr8eOpdTdfW2aQg/rpUvOwWxfX5NWrpS2bcvq3Llxffazef3VX7VobGz6tMPDKd1/f67suw0bcnr9dYJZIEzsYSGz6i0Lc11JUcsfN1aBcJRq2SbM5XfqoaxkmUlqc2a13g5xEbf9IQy//vXUqa10SEClPv54qkd39eqcOjqkJ57IamBA+vDD8tNnf3+TOjvzWru2PJhdtiyvY8cYNwuEqWGD2VqchMPonYm6lzcMQVzireZxZrWsT6c2YR4vG0aQYTWUoJplBBnQhlnuoMoeZhncxg9bjQsPsu26rS/MegvDmjXll/xXrMjr/Plrn8+cmQowZ83K+1iqtHz51PSXLl37bmxM2rWrWX/3d9PH386ZIw0NRV0bQH2r6DmztbjByGl9laaZxyi6nQzMN+KYl2m+pGj+bHWi93IyqLSuvdx44mebOV0ydSq71XROj/xxexyQl0dveSm7VXnCfNyRl7Lb1W9Y6/OaF7d8VnuDk99jiNtNmFbLrLTslR5fwuA0rrratuu2PyQlaLXT11feG7p6dV7PPNOsHTumPr//fkorV0rd3c7B7LJleR040Kyxsame3sLwgsWLr833t3/boueem1RX1/RlXbwoLVoUdW0A9S1lGIa/n6VAnUlSb1O9S3pvYBD59TN/LesmSdshk0lp1apWHT8+UQwwx8ake+9t07PPZvX5z+f01a+26PHHc65PMxgcTOlP/7RVTz+d1ebN2eLjvr773UlJU4/seuqpbDEo7u1t1l135Yqfd+xo+fT/yairBahbDTvMAJCSdYKuV25Pyyj9P66cyuBXHANZP/mKg+7uvNasyevdd6/1znZ0SC+/PKlnnmnW0qWteuSRfDGQ3bu3Wel0m+VTB7q68nr55Unt3j01zbFjKX3729cC2T17mrRqVWtx+7/ySlNZb++BA016+OGcAISHnlkAQN05eLBJu3c36/XXvT3jdeoJCE3THrdVjd7eZr3yShPPmQVCRs8sAKDurF2b03335fXYY62Wj9EqdfBgk37wg2Z9+cvBBbIHDzbpmWea9dxzDC8AwkbPLACgbvX0tGjp0rynN30FZXAwpa1bW/Stb0263mAGoHoEswAAAEgshhkAAAAgsQhmASBG2g6ko85CovIFAIkcZlB6UB3fYHhOC2K9QS8zyPVZnWzCym9hXWHWh991eJm+1tuwHtRiWweRx6DyF9Xxxa4ctT6mOW1v9h8AceT5DWC1DJTc1lf432maJPciVJP3ILeJ2zYIu479rKP0JMsJN1i12NZxEqfji7kth92263E/avnVDuXnr1Z25XqlLmTU8u9blBrtU+76LZp84Lu+ltU80Kvm3z4jjQ9o4s+PKz+zSy2/2qGmT3ZKknIrdmnys08pdXlQLf+xVZN3fUv5ud1RVwFQ9zwPMxjfYJQFMmEf5Gq9Pq95qtV6wu5R9VPeQn7iEtSY82A+6drVXRzyjnC2QxyOD9WK4sqP0/qt9ve4HAO8avnVDmliRNmV66WJMbX+2zrlbnlc4186p9Slo2oe6PW+rMNPqunDF5S97VmNbzCUn9mlpo8PKj9/tcY3GMr+H/vUdGzb1Hczu5S78VG1/ts6pS4PRl0NQN1jzGydaTuQLv5rNE5lr5deJjSWMH/U1vv+0PTxQaXGjhV7X5s/+pkkTQW2rR3K3fL4VC+rBy3v9ig1PqyJP/3x1PyfSo2dLH7Orlyv/Ow1ajrzy+Ln7G3PquU/tkZdFUDd8zzMIOnM48BKP7sNobAbs2a1jFqPsXPLdy3WaVe3TnXjdVyinzxEUXa7uvBbdnNbcmpTQazPb5rTet3yaDWP3f7npefPa9nc8hnkvulUhiDaYRLGLcdN89Hdyq56uvg5NdKvXOdDxc+5rrvU/J8DSl3IOA4FSF3IqOnYNuU7N6rtXxZIbSs1ef/Lyi1creztm8umzc9aVfY5u3K9mn/7jJp+/5ZyN9wXdZUAdSvUYNbppFTrg7LTSdMcQJT+X5pXu0tuduPMah1geb287vdE6xbwmMtb+tnuBhO7erFKcyuDWzlqGdhWWnYvP6qCXF+l28EtzWterS5nm8dje8mn0xhWv2Wopo24laFaQY5jbYRe2dSFzNS42Bt+fO27sWPKz7/n2kStHVPfT1yS013QTSf7pLaVyq7eptzc59X65l+r5Zd/pfFH/qO4jOI6xoeVvfGLZd/lFm9Q04nXCWaBEIUazNb7AdOtnHG5Ga3ak7bT9F6DLTO/deJUBi/LcgpkwmK1jjDbhN+6rjQv5h+CQdSl3Q/JsOonjO1QizIEJYr9oZaaBn899Ycp2KxE6tLHynU+pNzC1ZKk7B88oZZfrlHThQ+L30lS09l+5ds6lVu2tmz+/KxlajpzKOoqAepawwwziIq5h6lW64q7WuYzDsMPStWyTZjLH1Re6uGO91pvh2rzGJS47Q9hyc9eU/65Y4U0fr74OXX5zNT3rbP8LXfe8qk/xi9d+3JiTM39uzR573PTZ2idI40PRV0dQF1r2BvAanHyCuNEUY83d1VbnmrusK51fTq1iVr07AXRI+d3jKtXYfdo1qI3spoy+K3LoNuuef1ehq3EWWq0r+xzfv5qNZ0+cC195H2pbaXro7Pys5apafiQNDE29cWn/+c7FhenafnV32ry3ueUn9k1fQETF6W2RVFXB1DXfL80odY3FZWq9PmPbjdtlU5ndfON3TK93PRRyThIL+Xzckk56JtanNL83LTjZ9l+yuLnZqCweqe85N1t+/oJiPyuz29erNIqufnLT37c8ug2xKSaGw3dxi57re9K25PflyZUuy6nOkvySxNSFzJq/V+ris+ClSRNjKntp/cqe9uzyi35vFp+8VXlbnm87OkElsu6PKjWf/1TZVc8reztm4uP+yo8JaHl8JPK3vFUMShuHuhVruuu4ueWX+2QJE3+8Y6oqwWoW56C2Xp8kDZQQJuOj6T3BgaRXz/z17JukrYdWl9/WNlVT5eNYW0626+WX/6VND5QfMGBJDX/Zq+af7NFk/f3Wd6oVTpfvnOjJj73vNTaMfXs2VN7yqbNd27UxJ+8dK3efnKHJu/5PjeAASHyHMwWJOlgBrhJ2gm6HkX5+thalCHMdca5TqLW9PFBNR/drYmHX/c0fepCRk0n+6Y9bqsazQO9ajrxSllwCyB4vocZAACQBC2/2qHU2LFiT6qdpo8PqunMLzV55zcDeQJCYZkt/7lVE3/2hvVYWgCBIZgFANStlnd7lO9Y6jo2Nkipy4Nq+Y+tmrzrW643mAGoHsEsAAAAEqthH80FAACA5COYBQAAQGIRzAIAUIeifMFOvb3cB/GW6GC2lm/Lqac3bwVdFurFfllxqpu45afeeNnm1H+5KPcTu/XWyzYyPx++2rckus1vnibItwMCblqizkCtVPvK03p6aURQB5l6O1AFWS9+3zBW6VvPwixftW/6iju3t+n55fSWsCjLZ85fnFRaZ0HsL40UbFVTVq/bqJHqE/GT6J5Zp4NXkD1spethh71mfIMRyxNkEhXqsVCnUbezQrtPyjauJBiKop5rccwyly/o7ReH419puYIuYxjtvdZ1FpdOl6iPY2gcDdMzGwannTQOB5KoNHq92JUxyfUS9/z5YfcDtV7KWE9lAQAvYhnMll7+KxyYzd8VOL3+0u4yYlCXF730spgv0diVxWo6u7w6XT70Wjdhol7Kl13ag+Rlei/5rDTNK7cAz8/67HpnvJTBqU2Y5wliv/bbzryuq5JjltP4w0ouqYdV13b5rLbOKuFWBnNe3OrEaT4v5XNrn07bwa0c1dSPW734WRc9r4iDWAazbic/pzE8bmNbazX+1e0gXxgsb86L20nMajymU5pdWaM6AFEv3urH6kTjpwxBlM8pMPGyPnMgX/rD1FxOu2U6tYnSfAW1L1fSzryst5JjlrmOgupBDrquzTcZ+Wm7QXMrg9P+4LSNKj2+OLVPp+3gVJ/V1o/V+ivdRnE7fqJxxTKYDVOQB4ZqLxtXkhdzgBFV7yv14q0evB7ovVz2tps/zBuPahn0J2EYRq1v8rIL/mtRxlrPG5cy+F1+LfY/AM4aLpgNUlQHGqdf+3E4+FEv4eTBbfhGLQMtu7xUc1k0im1UyXprXdeIP9oEEK3YPs0gih4Jp3wUVJqXah4LZvfZ7U5Rr+uM6jmP9VovQfRi+ilDGPtHpY/wMl8qrcX6/So9pvgZG1xavlryewy0ajsEWMEKe/+rVlyeIhDHukF9ShmGkY86E3bsgtlKbiap9maKanpxrNbplhe3G6O8ltsuzWp9lZbNKT/Ui7d6cStrtWWoJj/V1KfddnO6yc/pOy832gR5U5ZbPu3yVsk2cMtPNe3QzzbyW9fVlK3SY4jXY4bXMbFeylHp/ue1/NXsr3Z15FTXQW2joI+fQCViHcyidjjoWKNe/KnlzT5B5lmKxxAdpzzGOX+IpyjbDW0WtUQwCw46NqiXytTqMUz1jnoEAG8IZgEAAJBYsb0BDAAAAHBDMAsAAIDEIpgFAABAYhHMAgAAILESE8zG4QHQQJho4wAA+JeIYJZHJPln9/aqpARMUeczireixeWtPQAAJElL1Bnwq9o3eUWZ31rmkcCoOtQfAADJEPueWXOvrDkYLHz2EyTWKkgpff97XALtuOSjXvIZRrkJogEA8C72wWycVHvpuVEDNAAAgLAkbphBQaEHyypAtBt6UPq91fvYgx6y4JRHt/VVmhe7YNtteU7DIazSzPXnVp9e0/zk02rdVvnwU2d+toNTXsz58VNGAADgXWKDWTvm4LH0c2kwYRWY2M1XUEngYRdgOa3PS168lt2cd7ubwpyCfqs0q0DNy3xB57MwhKPQa+60jb3Wmd2wFi/byPwDxvy50m0LAACs1V0wW42wxio6BTK1zotd/kqDVKfgymvg5bRMP+vzW45K5wmyvksDWLsfTgAAIBiJDmaDvlmm1j1kQQSNQeclyJ5Cp2WGsb5KmXtOwxZ1eQEAqCexvwEsqru77S5z+8lLUNOGXX6n4QJe82L+3m0Igtf1hS2sYNptvLRVXTjlJ4rn3gLVyOdzgU4HSLQXWEsZhpGPOhNu7HrOvFy+r/QGMaf5vAY/bs+YrdUNYHYBo11vZCX5NA8dqOTmL6f8O+Wl2mcPe60zv9vI7w2KXuahVxdJcfLkx1q8+EY1NzcHMh0g0V5gLRHBrBSPy9FAmGjjyZfLZTU4eEqLF98YdVYiZRhXNTIyrMWLl/iaLpvN6vLlS5o9e66n9WSzkzp//pyy2ay6uq6Putg1d+XKZTU3N6utrTGu2nhtV3HB8aB2EjNmlpM86h1tPNmuXLmswcHfa2JiPOqsRG54+Iy6um7wNd3Y2CUNDv5ekjwFs8PDg8rncxoZGfYc/NaLbDar4eEzGho6rWXLVjRMMOu1XcUBx4Paiv2YWQBIghkzZqqzs8v3fGNjo4FOF7XxcUNNTc1qaWn1NV1HxyzNn7/Q83o6O7u0cOHihgtkJam5uVldXTconW6POisV89uevbaruKj0eIDKEMwCQERyuZzOnTsb2HRxcPbsGXV2XhfYdKg/lbRn2gucJGaYAQDUk8nJCZ0+fcL17myv08XBxMS4UqmUWltbA5kO9aeS9kx7gRuCWQAImGFc1eDg75XP5zUxMa4FCxaVXUK/cGFEExPjyuXyMgxDZ8+ekSS1tLRo3rxOX9NdunRRp06dUCol3XDDMg0PDyqVSmlyckLz5nVq3rwFZXkbHx/XyMiQmptbijdQTUyM6zOfuVOpVPnFusnJCX3wwVHNmNGhm266xbXcZ8+e0cKF3nplnaYzjKs6d25IuVxWhmFo3rxOLVjgfQhCqYmJcQ0OnlJra5tSqanyL1q0eNo4U8O4quHhM2pra9fk5IRyuZxaWlp07txZLV3arY6O2b7WW7j5p6mpSU1NTbp69aoWLFiomTNnVbQ9JibGdfr0SaVSUlNTk9rayocYXL16Rb///SfK5bK66ablGho6LSmv8fFxLViwUHPnLvCS7YqXl8vldObMSeVyOaVSKWWzk+rqul7p9IziNF7bvZ/24lZ/Y2OXdObMSaVSKXV33ypJyufzGhz8vYaGTmv58ts0Y8bMCsrrbfsW2pbT8cBL/fnNn9f2XMhbc3Ozmptb1NaWVi6X08yZszRjxkzP9Vdw7tyQJiYmik+bMIyrWrjwuuKQmKDbaQHBLAAEaHJyQmfPntGSJcvU1NSsiYkJDQz8t2bPnqeWlqlD7ty58yVNHfilvO2J2st0s2bN0eLFS3Ty5EcaHb2gpUu7JU3dJPThh0eVSqWKy5GkEycyWrZsRfFks2jR9cpkfqfx8fFpYzBzubwmJyc1Oel+E8vk5ITy+bxaW9uqmi6bndS5c2e1ePGNnwblkzp27Ddqa2vTrFlzfG6LSWUy72vZshXFso2PG/roowHdcssfFMdf5nI5ffLJB+ruvrX43VTvYV5Ll3aXBWRe5PN5ffTRgBYs6Cr+mLhy5bIymfe1atVqNTVde6yUl+2RzWaVybyvG264qVgHV69e0eDgqeJy2ttnqKvrep04kdHZs2d0/fVL1dTUpMnJSQ0M/LdmzZqj5mbvp3w/y8vn8/r44wEtWLCoGIxcvXpFmcz7Wr78tuK29tru/bQXt/rr6JilBQsWamRkuDhPKpXSddctKfvOb3m9bl8vxwMv9ecnf17b8/j4uD7++FjZ/jE6ekEff3xMy5ffJkme60+SBgdP6fLlS7r55pXF7y5duqiTJz/SLbesCqWdFjBmFgAClEo16YYbbiqe0FpbW9XS0irDuOpp/gN33lnBOlNqamrWddddu9O7ublZCxYs0tmzp4vf5fN5Xb16WalUqvhdU1OTrr/+JsvltrW16bbb/lDd3X/gmq+gemXHxw11dV1fzGNLS4vmzJmnkRH/Y4bPnDmp2bPnlgXpbW1pzZkzT2fO/L743eXLl9TU1FR2c9GsWXM0OnpBHR2zi0GHV+fOnVUuly/rFS8sP5u9dnnd6/Y4d26q57E0mG9vn6G2tukBXirVVAwQCvXnp/1VsrwLF0aUy+XKetXa22doxoyOTwPXyjm1F+/tOWU5fypl9Z17eb1u38Ly3I4HfurPS/68tufBwd9r9ux5ZfvH7NlzLW6y81Z/Uz84Z5u+S+vq1SsV1bMfieyZLT2obnjnHc9pQaw36GUGuT7zySbMvBbWFad1RNUu6l0ttnUQeQwqf9W2o+bm5rKTqzR1ovMyRtCqHAfuvFNtt92qz/4/33act3BSKJVOt+vKlWsnklQqpTlz5uvDD49q/vxF6vviOq3/93/XzJkdtvkonIQ3vPOObT1PTk4om826PiLKy3QzZnRMeyB+Ot1eUVB08eKIliy5edr3M2fO0smTH2nJkmWFmlE+P/2R6+ZhF16Njp6fdlJPp9t1662fMS2/fHvMmzdfzc0t07bHpUsXPQ9zsGp/qVRKuVxlY669LG909IJmzOiYNm9bW7suXx6raL2Se3vxWn/Bl9fb9rVbnvl44Kf+vG1fb+15bOyi5TNwU9axq6vCUJGJiXGNjY1qfNxQLpezzEvQ7dRTMBtFIGDVC1BYd+F/p2kq6d2Ii0rybncSrGZ7uW2DsOvYzzrMZS397JQGd7XY1nESp+NLoa2eOzekf9+1Uzftes7X/FOX6/LK5XLFYPfGG2/WxYvnNTw8qNn/9wZ98smHuu66qcc8VVqus2cHPfbKeptuejmalctlfc2Ty+WUzWYtbxqa6kHLFutl1qzZamlp1aVLF4u9nyMjwxXlVZoa3uD1EVKl2+P06ROaPXtucXsUZLOTvnuHaymbnVQ2O6nh4cGy7/P5XFWPD/PSXrzUX9D8bN8o6s9re56cnKzokr59vUzo5MmP1dLS8ukQjE6Nj49PK1cYPP3sNAeRtbDhnXfK1huH4KNWeQizvH5OVub6j0tQE4c8oHJhbL84HB+q5faj644nnvBdd9nspFKppmm9tnPmzNPbf/l/6dKBn6itrU0ffvg7GcYV12OP1TFg6kQ84XrS9Tqd9bxZ3yfdqRtzmi17erLZyeKNO9LU5eqpXuwxDQ2d1tDQac2f31k21tiP5uZmZbOTnqefM2eeurtv1cqVt5dtj2vLa/EdzNdSc3Oz2ttnqLOzq+zfDTfcpBtuuKmiZfppL271F0Z5/WzfWtef1/bc3Nxi2WtaqU8++VDpdLuWLLm57KawWmDMLOqC2wkYiKOmpuZpJxOrh8l7nc7sypUxdXRcu7t6dPSCLl0aLQbKuYujWrz4Rs2f36mLFy9YLiObzTqe8M6eHfT4XFlv01kNxzCMq5Z3ibuZPXuurly5bFkvs2fPK/s8Njaq2bPnqrOzS4sWLba82Syfz+vy5UuuAUBHxyyNjV2yTCsdLlHYHgVtbWnL7TFjRoflWMJstrIA12s5vOromG1Zz4W6NfPSnr20F6/119w8fX35fH7a+Fbv5fW2fcOqPzde2/PMmR2WQX8uV15XXupvcnJCly9f0pw5lf0ArFZ8r1sEzDzur/Sz0+XE0mntvi9dRlRjNa3KUeshIVaXYO3qxuu4xGrz4jUt6LrwW3ZzW3JqU0Gsz2+a03q91mnpPHb7n5fef69lc8tnkPtmYVktN96gO/7f73hednv7DJ05c6V4orh4cWTaI5dKp1NTyna6q1evamxstDi2cnzc0LlzQ1q6dHlxmqampqkbn0w9tfl83nKc4fj4uAYG3tOMGTOKdyOXymazmpgYV3u7893+XqeTpCtXrmh83CiOk5yYGNfo6IXiY4H8uO66Jfrkk2OaO3d+8W74bHZSFy6c1803ryhON2NGh1pb0/rgg98pn8+ptbVV6fQMdXTMVmdnV7EH9/TpExoenrr0bTXWsKCzs0vnzp3V6Oj5YtBceJxR6SOoCtujo+PWsvGD5u3R2blIH3zwW129eqVYh+fODWlycrKi5w97LYdX8+Z1anh4SOfPnyu7Kers2TOWb2grbfepVGpae/baXrzWX3v7DBnGFU1MjKu1ta24LaR8RfXndfuGVX9uvLbnhQsX6+TJjzRv3sLiMJaRkbPTrgJ4qb/m5ha1tLToypWxsrofG7tY/Pv8+eGK6seLUINZp5NSrXvLnE6a5gDCfKOD3Ti5wnR24zOjGKtplVdzvv2exN0CHnN5Sz97GcvrVmd+ymC3HLe0MOrfT9m9/KgKcn2Vbge3NK95NW8z8/7mNZ9OY1j9lqGaNrHhnXf04z/9E935/31Xly5d1Jr/9TP1/Y8vaMM77+js2TPFZ2oWTioF7e0zNH9+pzKZ32nGn3xO2WzO8gRemK5j3Rd04kRGHR1zpk3X3t6uq1evaHT0onK5rMbHDd14Y3fZiaWlpVUf/Ox/6g9f/oFOnTqh9B9+RoODp9TePsPyBqOmppSam+1fITo8POjplZ1ep5OkG29cpgsXzimbzSmXyyqbndRNNy0vuwkon89/Og4vr9HRC8rn85+ejFPq7OwqBjZtbW1auvQWDQ6eUktL66ePAJrQsmXLyx71dOHCiNra2nT77Z8tPtvWMK5oZGRY4+NG8UaxdLrd8hmvZs3NLbrlllU6c+akRkaG1dLSqnw+r/nzF5ZdNm9padXly5c0MPCeZs+ep9bWVuVyuWnbo7W1TTffvFKnT59Qc3Ozmpqai88BPX9+RC0tbWpqSml4eFCGcbXsCQBW7c9LOQzjiuflNTU16ZZbbtXp0yd14cK5YnuZP7/TcphAabtva0tPa89e24uf+rvuuiX66KOBT9tBsxYu7NLo6AWdO3e2OLbca3m9bN+g68/P8ry255kzO7R48Y06fvxDtbZOtaFZs+ZOG5/tpf7S6XYtWzbVRg3jarE3d/bseZoxo0MnT36s+fMX+iqHH6EGs41yedeunLW8WcQpQKj2pF3NJXy7dL91ksQbuKzyGGab8FvXlebF/EMwiG1h90MyrPoJYzv85Rv/27IMCxde53gTS1fXDerqukG//t9vav5znY7Tjf3LT3XjDvunGrgFAOl0uy58/5/0X9//p+J3P3/4z23ru6WlVatW/aFl2lTAfFUzZlzvuE6v00ma9oIHO6lUqlinCxcudi3ztacWWBsaOqWbblquVCr16d3wU3fEt7Wlyx7htWDBIi1YsMhTHqcC6W7XvN1xx92eltfePrPs+Z3SVLBTatmyFdPms2p/XsqRTs/wvDxpKoB3q+dShXZv5qe9+Kk/qzKvWHF7RfUnuW/foOvPz/K8tmdpaiiOuff31KlPKqq/GTNmWl5BueWWP6i4nr1qmGEGUTH3MGFKLR+bFrfgt9Ztwm2YQiV5qUUvd9jqad/0M/YxqB+G+bwsg5FKp4tSe/sMXbp0cVov4pUrlzVnzryos9dQktBe4q7a9jx1OAnuxrBaqPgGsKQf/KPoLQ1qmUmve6syRbnuWq7fqU3UonfS6VJ7JcsI8gkXYZY7qLKHWQa38cNWQyMO3Hmnrly5rHPnhnT16tRrKSu9KchrHRby0Nzc7PpcWT/TRWnJkpuVzWZ1+vSJ4t3fZ86clKSqeovgXxLaS9xV2p4nJyd05szvNTk5oaGh06E+ESJoKcMwXMNvt8uUYfD7jEcvaaXpdjdMWd18Y7dMpxOP1/xUWna/N7tUuw3cbhzyc9OOn2X7KUs1NzqF8cD9SsrnN5j1uz6/ebFKq7ad+d0WXm/ickv3WvZqji92ZfBTN3b5tNvnq1mX3/IluSceQP3yFMwC9YwTdHyE8fKPOJQhrPlrWTdJ2w4AGgfBLBoaJ+jo1cNriKN6S2Kc6wQAaoVgFgAAAInFG8AAAACQWASzAAAASCyCWQAAACQWwSwAAAASi2AWAAAAiUUwCwAAgMQimAUAAEBi/f8VysC5JnB3/gAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOC0wNy0yM1QxMTozNDo0OSswODowMHAGS/8AAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTgtMDctMjNUMTE6MzQ6NDkrMDg6MDABW/NDAAAAAElFTkSuQmCC)]

ASCII值小于等于58(相应的质量得分小于等于25)对应的字符只有在Phred+33的编码中被使用,所有Phred+64所使用的字符的ASCII值都大于等于59。在通常情况下,ASCII值大于等于74的字符只出现在Phred+64中。利用这些信息即可在程序中进行判断

实例演练

判断fastq序列编码是Phred33(Illumina1.8+) or Phred64(Illumina1.3+)

def fq_phred_check(inputfile): """判断fastq序列编码"""fastq_file = open(inputfile,'r',encoding='utf-8') count = 1 for line in fastq_file: line_strip = line.strip() if count % 4 == 0: for each in line_strip: ASCII_each = ord(each) if ASCII_each > 75: phred_value = 64breakelif ASCII_each < 58:phred_value = 33break      count += 1 fastq_file.close()  return phred_valueinputfile = './data/test1.fq'
phred_value = fq_phred_check(inputfile)
print(phred_value)
33

fastq转换fasta格式

def fastq_fasta(inputfile): """将fastq转换成fasta序列格式""" fastq_file = open(inputfile,'r',encoding='utf-8') out_file_name = inputfile.strip().split('/')[-1].split('.')[0] + '.fasta' output_fasta = open(out_file_name,'w',encoding='utf-8') i = 0 for line in fastq_file: if i % 4 == 0: line_new = line.strip().split('\n')[0].replace('@','>') output_fasta.write(line_new + '\n') elif (i - 1) % 4 == 0: output_fasta.write(line)i = i + 1 print("fastq transform fasta success", out_file_name)fastq_file.close() output_fasta.close() def main(): inputfile = './data/test1.fq' fastq_fasta(inputfile) if __name__ == '__main__': main()
fastq transform fasta success test1.fasta

Linux 操作fastq

获取数据

mkdir -p ~/biosoft
cd ~/biosoft
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.3/bowtie2-2.3.4.3-linux-x86_64.zipunzip bowtie2-2.3.4.3-linux-x86_64.zip
cd ~/biosoft/bowtie2-2.3.4.3-linux-x86_64/example/reads

统计reads_1.fq文件中共有多少条序列信息

[sunchengquan 08:07:20 ~/local/app/bowtie2-2.2.4/example/reads]
$ ll
总用量 8.4M
-rwxrwx--- 1 sunchengquan sunchengquan 4.0M 10月 23 2014 longreads.fq
-rwxrwx--- 1 sunchengquan sunchengquan 2.2M 10月 23 2014 reads_1.fq
-rwxrwx--- 1 sunchengquan sunchengquan 2.2M 10月 23 2014 reads_2.fq
-rwxrwx--- 1 sunchengquan sunchengquan 9.1K 10月 23 2014 simulate.pl
[sunchengquan 08:07:27 ~/local/app/bowtie2-2.2.4/example/reads]
$ wc reads_1.fq40000   40000 2285692 reads_1.fq
[sunchengquan 08:07:42 ~/local/app/bowtie2-2.2.4/example/reads]
$ wc -l reads_1.fq
40000 reads_1.fq[sunchengquan 09:06:02 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $[`wc -l reads_1.fq |cut -d' ' -f1`/4]
10000
[sunchengquan 09:06:52 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $((`wc -l reads_1.fq |cut -d' ' -f1`/4))
10000
[sunchengquan 09:07:10 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $(expr `wc -l reads_1.fq |cut -d' ' -f1`/4)
40000/4
[sunchengquan 09:07:35 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $(expr `wc -l reads_1.fq |cut -d' ' -f1` / 4)
10000
[sunchengquan 09:12:17 ~/local/app/bowtie2-2.2.4/example/reads]
$ let c=`wc -l reads_1.fq |cut -d' ' -f1`/4
[sunchengquan 09:12:36 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $c
10000
[sunchengquan 09:16:11 ~/local/app/bowtie2-2.2.4/example/reads]
$ a=`wc -l reads_1.fq |cut -d' ' -f1`
[sunchengquan 09:16:23 ~/local/app/bowtie2-2.2.4/example/reads]
$ b=4
[sunchengquan 09:16:27 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo `expr $a / $b`
10000
[sunchengquan 09:18:03 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo `expr $a / 4`
10000

输出reads_1.fq文件中的标识符(即以@开头的那一行)

[sunchengquan 09:18:22 ~/local/app/bowtie2-2.2.4/example/reads]
$ grep '^@' reads_1.fq |head -3
@r1
@r2
@r3[sunchengquan 09:21:59 ~/local/app/bowtie2-2.2.4/example/reads]
$ grep '^@' reads_1.fq |cut -f1 |cut -c1|head -3
@
@
@[sunchengquan 09:22:38 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fq |paste - - - - |wc10000   40000 2285692
[sunchengquan 09:21:51 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fq |paste - - - - |cut -f1|cut -c1|head -3
@
@
@[sunchengquan 09:27:19 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==1) {print $0}}' reads_1.fq|head -3
@r1
@r2
@r3[sunchengquan 09:28:47 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==1) print $0}' reads_1.fq|head -3
@r1
@r2
@r3
[sunchengquan 09:31:38 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==1){print}}' reads_1.fq|head -3
@r1
@r2
@r3
[sunchengquan 09:32:28 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{print NR}' reads_1.fq|head -3
1
2
3

输出reads_1.fq文件中所有序列的信息(即每个序列的第二行)

[sunchengquan 09:34:51 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print $0 }}' reads_1.fq|head -1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG

输出reads_1.fq文件中‘+’及其后面的描述信息(即每个序列的第三行)

[sunchengquan 09:35:06 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==3){print $0 }}' reads_1.fq|head -3
+
+
+

输出reads_1.fq文件中质量值信息(即每个序列的第四行)

[sunchengquan 09:36:24 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==0){print $0 }}' reads_1.fq|head -1
+"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>

计算reads_1.fq文件含有N碱基的reads个数

[sunchengquan 09:39:13 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print $0 }}' reads_1.fq|grep N|wc6429    6429  782897[sunchengquan 09:41:03 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print $0 }}' reads_1.fq|grep -c N
6429

计算reads_1.fq文件里面的序列碱基总数

[sunchengquan 12:46:57 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print}}' reads_1.fq |grep -o '[ATGCN]' |wc
1088399 1088399 2176798[sunchengquan 12:47:12 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print length}}' reads_1.fq |head -1
122
[sunchengquan 12:49:36 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print length($0)}}' reads_1.fq |head -1
122
[sunchengquan 12:58:27 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print length($0)}}' reads_1.fq |paste -s -d +|bc
1088399

计算reads_1.fq文件中所有reads的N碱基总数

[sunchengquan 12:58:50 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print}}' reads_1.fq |grep -o '[N]' |wc26001   26001   52002

计算reads_1.fq中测序碱基质量值恰好为Q20的个数

[sunchengquan 17:14:00 ~/local/app/bowtie2-2.2.4/example/reads]
$ python -c 'print (chr(33+20))'
5
[sunchengquan 17:15:32 ~/local/app/bowtie2-2.2.4/example/reads]
$ python -c 'print (chr(33+30))'
?[sunchengquan 17:15:43 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==0){print}}' reads_1.fq |grep -o '[5]'|wc -l
21369
[sunchengquan 17:17:18 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==0){print}}' reads_1.fq |grep -o '5'|wc -l
21369

计算reads_1.fq中测序碱基质量值恰好为Q30的个数

[sunchengquan 17:17:27 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==0){print}}' reads_1.fq |grep -o '?'|wc -l
21574

计算reads_1.fq中所以序列的第一位碱基的ATCGNactg分布情况

[sunchengquan 17:26:36 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==2){print}}' reads_1.fq |cut -c1|sort |uniq -c2184 A2203 C2219 G1141 N2253 T

将reads_1.fq转为reads_1.fa文件(即将fastq转化为fasta)

[sunchengquan 17:33:13 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==1){print ">"$0} else if(NR%4==2){print}}' reads_1.fq |head -2
>@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
[sunchengquan 17:45:57 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==1){print ">"substr($0,2)} else if(NR%4==2){print}}' reads_1.fq |head -2
>r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG[sunchengquan 17:46:36 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fq |paste - - - - |cut -f1,2|tr '\t' '\n'|head -2
@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
[sunchengquan 17:48:14 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fq |paste - - - - |cut -f1,2|tr '\t' '\n'|tr '@' '>'|head -2
>r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG

统计上述reads_1.fa文件中共有多少条序列

[sunchengquan 17:50:29 ~/local/app/bowtie2-2.2.4/example/reads]
$ echo $[`wc -l reads_1.fa |cut -d' ' -f1`/2]
10000

计算reads_1.fa文件中总的碱基序列的GC数量

[sunchengquan 18:02:31 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fa |paste - - |cut -f2|grep -o '[GCgc]' |wc -l
529983

删除reads_1.fa文件中每条序列的N碱基

[sunchengquan 18:04:46 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fa |tr -d "N" |grep N

删除reads_1.fa文件中含有N碱基的序列

[sunchengquan 18:09:32 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fa |paste - - |grep -v N|tr '\t' '\n'|less

删除reads_1.fa文件中短于65bp的碱基序列

[sunchengquan 18:18:44 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fa |paste - - |awk '{if(length($2)>=65){print $1"\n"$2}}'|tail -2
>r9999
TATCGCGCTGTGACGATGCTAATCCCAAACCTTACCCAACCCACCTGGTCACGGACTGTTAAGCCGCTGTATGACGCTCTGGTGGTGCAATGCCACAAAGAANAGTCAATC

删除reads_1.fa文件每条序列的前后五个碱基

echo "AGJKLOIUYTKIOFYTFFLHHJWERDFCVBNM" |awk '{print substr($0,6,length($0)-10)}'
OIUYTKIOFYTFFLHHJWERDF[sunchengquan 18:28:20 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%2==0){print substr($0,6,length($0)-10)}}' reads_1.fa|head -1
GCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTT

删除reads_1.fa文件中的长于125bp的序列

[sunchengquan 18:28:57 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat reads_1.fa |paste - - |awk '{if(length($2)<=125){print $1"\n"$2}}'|tail -2
>r10000
GGTGATGCGCGGCTCCGTGCCGCCAAAGCCGTCCGGCACTGACTNGTCGCAG

查看reads_1.fq中每条序列的第一位碱基的质量值的平均值

[sunchengquan 19:09:46 ~/local/app/bowtie2-2.2.4/example/reads]
$ awk '{if(NR%4==0) print}' reads_1.fq |cut -c1 >tmp[sunchengquan 19:36:19 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat tmp |awk 'BEGIN{for(i=0;i<256;++i) ord[sprintf("%c",i)]=i}BEGIN{count=0}{ count+=ord[$1]-30}END{print count,count/NR}'
193621 19.3621
[sunchengquan 19:36:46 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat tmp |awk 'BEGIN{for(i=0;i<256;++i) ord[sprintf("%c",i)]=i}BEGIN{count=0}{ count+=(ord[$1]-30)}END{print count,count/NR}'
193621 19.3621
[sunchengquan 19:38:57 ~/local/app/bowtie2-2.2.4/example/reads]
$ cat tmp |awk 'BEGIN{for(i=0;i<256;++i) ord[sprintf("%c",i)]=i}BEGIN{count=0}{ count+=(ord[$1])}END{print count,count/NR}'
493621 49.3621

生物信息数据格式:fastq格式相关推荐

  1. fasta和fastq格式文件的shell小练习 http://www.bio-info-trainee.com/3575.html

    其次完成生物信息学数据格式的习题(blast/blat/fa-fq/sam-bam/vcf/bed/gtf-gff),收集这些格式的说明书. fasta和fastq格式文件的shell小练习 http ...

  2. linux怎么查看fastq格式文件,fastq格式文件处理大全(一)

    从计算机的角度来说,生物的序列属于一种字符串,也是一种文本,因此生物信息分析属于文本处理范畴.文本存储为固定格式文件,生物信息的工作就是各种文本文件之间格式的转换,例如通过序列拼接将fastq转换为f ...

  3. SRA到fastq格式的批量转换

    生物信息分析人员一般会接触到从NCBI等网站下载的SRA数据,之前也介绍了下载SRA数据的几种方式.下面,我就简单介绍一下如何将下载的sra格式数据转换成为常用的fastq等格式. 1.fastq-d ...

  4. 关于illumina产生的测序源文件bcl转换成fastq格式的问题

    由于连接测序仪的服务器不知道哪里抽了风,无法直接的生成fastq格式的文件,好久都无解,经过一段时间仍无法解决,所以采用曲线救国的方法,看能不能利用三方软件将bcl转换成fastq文件 google以 ...

  5. linux怎么查看fastq格式文件,2020-01-11 了解FASTQ格式并处理FASTQ文件

    FASTQ文件格式是测序仪展示数据的标准格式,可以看成FASTA文件的变种(FASTA+Q),因为其包含了对序列中每个碱基的Qualify Measurement.(如:碱基A出错的可能性是1/100 ...

  6. RNA-seq流程学习笔记(4)-使用FastQC软件对fastq格式的数据进行质量控制

    今天开始学习使用FastQC软件对范例SRA测序文件的质量进行分析. 主要参考文章: RNA-seq(3):sra到fastq格式转换并进行质量控制 转录组入门(3):了解fastq测序数据 用Fas ...

  7. 医学图像数据格式和格式转换

    医学图像数据格式和格式转换 本文转载自:http://blog.csdn.net/kingmicrosoft/article/details/35798249 由于最近碰到了数据格式的问题,重建不出效 ...

  8. fq,fa,fna,ffn,faa都是什么鬼,与fasta,fastq格式有什么关系?终于1分钟搞懂了

    fasta与fastq的区别: fasta格式(格式缩写为fa)是一种存储核酸或氨基酸序列的文本格式 ,允许在序列前定义名称和编写注释. 已成为生物信息学的标准格式,格式简单,多种文本处理工具和 Py ...

  9. NGS基础---Fasta/Fastq格式记录

    Fasta/Fastq格式记录 时间:2020-10-21 生信中,常用到Fasta和Fastq格式,这两种是比较基础和常见的序列保存文件.通过wiki和网上资料,对这两种格式进行说明和记录. 1. ...

  10. **生信自学记录1——获取Fastq格式的反向互补序列**

    ` 生信自学记录1--获取Fastq格式的反向互补序列 总共分为三步 1.读取基因序列的str格式,返回反向互补序列str 2.打开fastq格式的文本提取基因序列,返回互补序列list 3.读取互补 ...

最新文章

  1. iptables如何开放被动模式的FTP服务
  2. 2016 、12 、11本周
  3. GPU算力免费用?百度AI Studio两周年惊喜活动开启
  4. c语言printout函数,只使用处理I/O的PrintDigit函数,编写一个过程以输出任意实数...
  5. 颠覆:链表在删除和插入的效率一定优于数组吗?
  6. spring中aop事务
  7. java共享租车信息管理系统jsp源码
  8. python使用协程_Python使用协程进行爬虫
  9. 刷屏代码·稳 from林凯
  10. 穷举法 解决用3个水桶等分8升水 python实现
  11. 移动网络怎么修改服务器地址,移动宽带怎么修改wifi密码?
  12. 模块划分-1 功能划分
  13. Nginx重写功能——location/rewrite
  14. CryEngine 渲染流程
  15. JavaScript 移动端点击事件延迟问题
  16. DNS欺骗与钓鱼网站
  17. 2020中科大计算机分数线,2020年中国科学技术大学强基计划入围分数线,录取分数线,中国科大强基计划笔试、面试...
  18. 最新云开秒赞系统公益版网站源码
  19. MySQL必知必会学习历程(一)
  20. API对接网关 code review

热门文章

  1. 阿里云上做二级、三级等保的基础概念、方案以及价格
  2. caxa画图怎么倒角_CAXA怎么画倒角和圆角?
  3. jsmind(Jsmind数据格式)
  4. 攻防世界misc解题(一)
  5. IBM GPFS并行文件系统
  6. Redis过期策略及内存淘汰机制
  7. android vlc m3u8,Exoplayer播放m3u8文件Android
  8. 锐浪报表数据源access_C# 锐浪报表 示例源码
  9. 下载pyboard的flash中的驱动程序_如何安装爱普生打印机驱动程序
  10. PHP开票接口,云增值税发票API详情