-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Majority of SNPs have allele frequency = 0.000000 #2321
Comments
Without seeing the VCF it is hard to say if the problem is with the data or the program. I suggest you take a small region and check if the |
Attached is a small VCF file (in txt format) as well as the result I got running bcftools stats on this vcf file. the VCF file contains 11 snps. 10 are present in 1/118 individuals and 1 is present in 67 individuals. When I do the math myself, I would expect the allele frequency to be 1/118 = 0.008475 for the first SNP and 67/118 = 0.567797 for the second SNP. the stats file however gives this result:
Thanks for your help! |
I see what you mean now. The output should be interpreted differently, there are 10 variants in the first allele frequency bin. There are 100 bins by default, the 0.008475 falls in the first bin (0,0.01). The option
I see the output is a bit confusing, in the first case ( This could be improved, but hopefully it sheds some light on what the output means. |
Thank you for explaining that! I think I will use bcftools +fill-tags to add the AF tag to my marker sets and then sort into bins myself in the future. |
Hi,
I'm using bcftools to call and filter markers for a haploid biparental population of 118 individuals. I have Illumina sequencing for all progeny. After indexing my reference genome (one of the parents of the biparental population), converting my .fq files to .sam to sorted .bam files, I called markers using:
bcftools mpileup -Ou -f genomic.fasta *.bam | bcftools call -mv -Ob --ploidy 1 --threads 4 -o calls1.bcf
and convert to .vcf using: bcftools view -Ov calls1.bcf > test.vcf
I then run bcftools stats calls1.bcf > stats.txt.
I don't understand why, out of the 3.2 million SNPs in my .bcf file, roughly 1.9 million have an allele frequency of 0.000000. Additionally when I use grep -c "AC=0" calls1.vcf, it gives me a result of 0. So I'm not sure how to look at these markers that are supposedly present at such low frequency. Nor do I understand why markers would be called if they truly aren't present in the population.
Here's a portion of the data from bcftools stats:
I can get rid of these by filtering with a minor allele frequency, but I'd like to know why they were present in the first place.
Thanks!
The text was updated successfully, but these errors were encountered: