vfc格式linux,2021-03-17 在linux上将vcf文件转plink的格式bed,bim,fam
我现在的问题是这样的
vcf文件转plink的格式
方法一
vcftools
[lyc@200server ~]$ vcftools --vcf Rice.recode.vcf --plink --out output
出错这样
VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf Rice.recode.vcf
--out output
--plink
Warning: Expected at least 2 parts in FORMAT entry: ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele">
After filtering, kept 141 out of 141 Individuals
Writing PLINK PED and MAP files ...
Unrecognized values used for CHROM: Chr1 - Replacing with 0.
ls
ls
clear
Unrecognized values used for CHROM: Chr2 - Replacing with 0.
Expected at least 2 parts in FORMAT entry: ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele">
Unrecognized values used for CHROM: Chr3 - Replacing with 0.
Unrecognized values used for CHROM: Chr4 - Replacing with 0.
Unrecognized values used for CHROM: Chr5 - Replacing with 0.
Unrecognized values used for CHROM: Chr6 - Replacing with 0.
Unrecognized values used for CHROM: Chr7 - Replacing with 0.
Unrecognized values used for CHROM: Chr8 - Replacing with 0.
Unrecognized values used for CHROM: Chr9 - Replacing with 0.
Unrecognized values used for CHROM: Chr10 - Replacing with 0.
Unrecognized values used for CHROM: Chr11 - Replacing with 0.
Unrecognized values used for CHROM: Chr12 - Replacing with 0.
Unrecognized values used for CHROM: ChrUn - Replacing with 0.
Unrecognized values used for CHROM: ChrSy - Replacing with 0.
Unrecognized values used for CHROM: chrC - Replacing with 0.
Done.
After filtering, kept 7186300 out of a possible 7186300 Sites
Run Time = 1647.00 seconds
[lyc@200server ~]$ ls
file.log output.ped prettify
file-temporary.bed output-temporary.bed Rice.recode.vcf
file-temporary.bim output-temporary.bim test
file-temporary.fam output-temporary.fam toy.map
LICENSE plink toy.ped
output.log plink_linux_x86_64_20201019.zip
output.map ppp
[lyc@200server ~]$
[lyc@200server ~]$ ls
file.log output.ped prettify
file-temporary.bed output-temporary.bed Rice.recode.vcf
file-temporary.bim output-temporary.bim test
file-temporary.fam output-temporary.fam toy.map
LICENSE plink toy.ped
output.log plink_linux_x86_64_20201019.zip
output.map ppp
[lyc@200server ~]$ clear
[lyc@200server ~]$ Expected at least 2 parts in FORMAT entry: ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele">
-bash: 未预期的符号 `newline' 附近有语法错误
方法二plink
plink --vcf Rice.recode.vcf --recode --out file
plink --vcf Rice.recode.vcf --recode --out output --double-id
plink --vcf Rice.recode.vcf --recode --out output --const-fid family_id
都是一样的结果
773821 MB RAM detected; reserving 386910 MB for main workspace.
Error: Line 38 of .vcf file has a GT half-call.
Use --vcf-half-call to specify how these should be processed.
就无语
之前我的文件有这些
file.log LICENSE plink Rice.recode.vcf
file-temporary.bed output.log plink_linux_x86_64_20201019.zip test
file-temporary.bim output.map ppp toy.map
file-temporary.fam output.ped prettify toy.ped
现在是这些
file.log output.ped prettify
file-temporary.bed output-temporary.bed Rice.recode.vcf
file-temporary.bim output-temporary.bim test
file-temporary.fam output-temporary.fam toy.map
LICENSE plink toy.ped
output.log plink_linux_x86_64_20201019.zip
output.map ppp
然后看到了这些
image.png
https://www.cog-genomics.org/plink/1.9/input
然后我就重新安装了plink为plink2
http://www.cog-genomics.org/plink/2.0/
linux上安装Plink
https://blog.csdn.net/qq_40605470/article/details/108882992
但是又遇到了新的问题
[lyc@200server ~]$ plink2 --vcf Rice.recode.vcf --recode --out ccc
PLINK v2.00a3LM 64-bit Intel (2 Mar 2021) www.cog-genomics.org/plink/2.0/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ccc.log.
Options in effect:
--export ped
--out ccc
--vcf Rice.recode.vcf
Start time: Thu Mar 18 20:20:58 2021
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Using up to 80 threads (change this with --threads).
--vcf: 7146k variants scanned.
Error: Invalid chromosome code 'ChrUn' on line 7146662 of --vcf file.
(Use --allow-extra-chr to force it to be accepted.)
End time: Thu Mar 18 20:21:32 2021
然后发现文件又是临时文件
[lyc@200server ~]$ ls
ccc.log plink2
ccc-temporary.psam plink2_linux_x86_64_20210302.zip
file.log plink2.log
file-temporary.bed plink_linux_x86_64_20201019.zip
file-temporary.bim ppp
file-temporary.fam prettify
LICENSE Rice.recode.vcf
output.log test
output.map toy.map
output.ped toy.ped
output-temporary.bed transform.log
output-temporary.bim transform-temporary.bed
output-temporary.fam transform-temporary.bim
plink transform-temporary.fam
说明又出错了
所以,现在或许又要解决那一行的无效染色体
然后我就使用了--allow-extra-chr
[lyc@200server ~]$ plink2 --vcf Rice.recode.vcf --recode --out ccc --allow-extra-chr
PLINK v2.00a3LM 64-bit Intel (2 Mar 2021) www.cog-genomics.org/plink/2.0/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ccc.log.
Options in effect:
--allow-extra-chr
--export ped
--out ccc
--vcf Rice.recode.vcf
Start time: Fri Mar 19 16:31:21 2021
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Using up to 80 threads (change this with --threads).
--vcf: 7186300 variants scanned.
Error: Line 38 of --vcf file has a GT half-call.
Use --vcf-half-call to specify how these should be processed.
End time: Fri Mar 19 16:32:26 2021
哈,又回到了最初的起点
image.png
可是我看意思,plink2不应该有这个问题啊
[lyc@200server ~]$ plink2 --vcf Rice.recode.vcf --recode --out ccc --allow-extra-chr --vcf-half-call
PLINK v2.00a3LM 64-bit Intel (2 Mar 2021) www.cog-genomics.org/plink/2.0/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ccc.log.
Options in effect:
--allow-extra-chr
--export ped
--out ccc
--vcf Rice.recode.vcf
--vcf-half-call
Start time: Fri Mar 19 16:33:16 2021
Error: Missing --vcf-half-call argument.
For more info, try "plink2 --help " or "plink2 --help | more".
[lyc@200server ~]$ plink --vcf Rice.recode.vcf --allow-extra-chr --recode --vcf-half-call'missing' --out eee
PLINK v1.90b6.21 64-bit (19 Oct 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to eee.log.
Options in effect:
--allow-extra-chr
--out eee
--recode
--vcf Rice.recode.vcf
--vcf-half-callmissing
Error: Unrecognized flag ('--vcf-half-callmissing').
For more information, try "plink --help " or "plink --help | more".
发现原来是因为少了空格
[lyc@200server ~]$ plink --vcf Rice.recode.vcf --allow-extra-chr --recode --vcf-half-call 'haploid' --out eee
PLINK v1.90b6.21 64-bit (19 Oct 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to eee.log.
Options in effect:
--allow-extra-chr
--out eee
--recode
--vcf Rice.recode.vcf
--vcf-half-call haploid
773821 MB RAM detected; reserving 386910 MB for main workspace.
--vcf: eee-temporary.bed + eee-temporary.bim + eee-temporary.fam written.
7186300 variants loaded from .bim file.
141 people (0 males, 0 females, 141 ambiguous) loaded from .fam.
Ambiguous sex IDs written to eee.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 141 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.878065.
7186300 variants and 141 people pass filters and QC.
Note: No phenotypes present.
--recode ped to eee.ped + eee.map ... done.
[lyc@200server ~]$ plink --file eee --allow-extra-chr --make-bed --out rice
PLINK v1.90b6.21 64-bit (19 Oct 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to rice.log.
Options in effect:
--allow-extra-chr
--file eee
--make-bed
--out rice
773821 MB RAM detected; reserving 386910 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (7186300 variants, 141 people).
--file: rice-temporary.bed + rice-temporary.bim + rice-temporary.fam written.
7186300 variants loaded from .bim file.
141 people (0 males, 0 females, 141 ambiguous) loaded from .fam.
Ambiguous sex IDs written to rice.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 141 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.878065.
7186300 variants and 141 people pass filters and QC.
Note: No phenotypes present.
--make-bed to rice.bed + rice.bim + rice.fam ... done.
这回终于对了
image.png
但问题是第一种转的,不知道应该是哪一种
image.png
我直接用了
我发现用plink2依然也还是这个问题,我看到一个答案貌似说是因为版本还不够新,今年初的那个改进版本或许可以修正这个问题
[lyc@200server ~]$ plink2 --vcf Rice.recode.vcf --allow-extra-chr --make-bed --out test
PLINK v1.90b6.21 64-bit (19 Oct 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to test.log.
Options in effect:
--allow-extra-chr
--make-bed
--out test
--vcf Rice.recode.vcf
773821 MB RAM detected; reserving 386910 MB for main workspace.
Error: Line 38 of .vcf file has a GT half-call.
Use --vcf-half-call to specify how these should be processed.
vfc格式linux,2021-03-17 在linux上将vcf文件转plink的格式bed,bim,fam相关推荐
- 2021.03.17模块
2021.03.17 总结 模块 什么是模块,什么是包 一个py文件就是一个模块,文件名就是模块名(如果一个模块想要被其他模块使用,模块名必须是标识符并且不是关键字) 一个包含__init__.py文 ...
- 2021.03.17 pokémon小游戏开发记录与周总结
2021.03.17 pokémon小游戏开发记录与周总结 此篇仅包含部分项目代码,只是个人的学习总结. 文章目录 2021.03.17 pokémon小游戏开发记录与周总结 前言 一.前期准备 二. ...
- 如何免费把vcf文件转换成excel格式
vcf文件怎么转成excel这篇文章有网友评论说不想花钱.那么我们就来讲一讲vcf文件怎么转成excel格式不花钱. 默认20条内容不收费 九雷VCF转换器支持一键批量把VCF通讯录文件转换成Exce ...
- 2021.9.17 zookeeper Linux 常用命令
zookeeper的安装目录:/usr/local/zookeeper-3.4.6/bin/zkServer.sh; 配置文件路径:-/conf/zoo.cfg 端口 :2181: ZooKeeper ...
- linux 入门命令,新手入门Linux命令集锦
一.常用系统工作命令 1.wget 命令 作用:用于在终端中下载网络文件. 格式:wget [参数] 下载地址 参数及作用: -b : 后台下载模式 -d:显示调试信息 -N:该参数指定wget只下载 ...
- 怎么转换html文件为mp3,如何把音频转换成mp3_音频文件怎么转mp3格式-系统城
随着计算机技术的发展,网络上的音频文件的格式会随着音质的好坏决定存储的格式,一些朋友想要把某些音频文件转化成mp3格式,却不知道怎么操作.那么我们该如何把音频文件转换成mp3呢?接下来小编就给大家带来 ...
- html格式怎么转换mp4视频文件怎么打开吗,QSV文件怎么打开 qsv文件转换成mp4格式教程详解...
很多朋友都有遇到过QSV视频文件无法打开的情况吧.今天本文主要分享一下QSV文件怎么打开,另外如果需要手机.电脑都可以轻松打开qsv文件,则还需要将QSV文件转换成MP4格式就可以了,下面具体来看看. ...
- 如何将CAJ文件转换成PDF格式?分享两种实用的方法
CAJ是一种特定的文献格式,通常用于中国学术期刊和学位论文等.在学习生活中我们查阅一些文献资料,一些权威文献报刊通常情况下都是CAJ文件格式,打开它需要使用专业的阅读工具 ,这时候就需要将它转换成PD ...
- 怎么把音乐文件转成mp3格式?这4个方法帮你轻松搞定
分享4个好用的音乐文件转换工具,支持多种音乐格式的转换,亲测好用! 一.加密音乐格式转换 1.音乐解锁 一个加密音乐格式转换在线工具,支持多个音乐平台的音乐格式转换,页面简洁,使用也方便,打开之后就可 ...
- 如何将pdf文件转换成txt格式
工作中会遇到很多pdf格式的文件,有的是自己查找的资料,有的是客户发来的文件,针对这些pdf文件想要进行二次编辑,只能将其转换成可编辑的其他格式,比如txt,那么如何将pdf文件转换成txt格式呢? ...
最新文章
- 解决idea导入项目后依赖报错问题
- 指纹浏览器 开源 linux,浏览器指纹--Canvas指纹
- 关于无法用127.0.0.1连接数据库的解决办法
- Vue学习之路---No.7(分享心得,欢迎批评指正)
- java jdbc连接derby,通过JDBC连接到Derby数据库失败
- 全栈深度学习第5期: 神经网络调试技巧
- 51单片机扩展io口实验c语言,【51单片机】普通I/O口模拟SPI口C语言程序
- week one(1)—What is machine learning?
- sap gui java_不喜欢SAP GUI?那试试用Eclipse进行ABAP开发吧
- PS小技巧 | 不需要抠图的黑白配
- PSPNet: Pyramid Scene Parsing Network论文解读
- rocketmq获取消息id_贞炸了!上线之后,消息收不到了!
- 美元升值对中国资产价格的影响
- HTML5期末大作业:抗疫主题网站设计(14页) HTML+CSS+JavaScript web课程设计网页规划与设计...
- 高斯公式(三重积分和第二类曲面积分互相转换)
- python学习面向对象_Python小白必学的面向对象
- 雨伞16骨好还是24骨好_伞骨什么材质好 晴雨伞骨数越多越好吗
- 青铜变王者,桌面云是如何逆袭的?
- SOAP Version 1.2
- 计算机系统——汇编语言基础