Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools csq Error: the fasta reference does not match the VCF REF allele at h2tg000007l:1000944 #2323

Closed
leon945945 opened this issue Nov 24, 2024 · 15 comments

Comments

@leon945945
Copy link

Hi, as the title says, bcftools csq throws an error, but the wrong site h2tg000007l:1000944 did't exist in my vcf file.
What is the probable problem and how to fix it.
Thanks very much.

@pd3
Copy link
Member

pd3 commented Dec 3, 2024

Any chance you could narrow the problem down and create a small test case to reproduce the error? The error message seems clear enough, it's hard to tell what's going on. A test case can be created with the help of these two scripts

bcftools/test/csq/make-csq-test
bcftools/test/csq/make-csq-test-by-region

@leon945945
Copy link
Author

Hi @pd3 ,

I generated test case by make-csq-test-by-region -f hap1.fa -g hap1.bcftools.gff3 -v hap1.sv.filter.vcf -r h1tg000002l:1010000-1020000 -o csq-test, and three files were produced csq-test.fa, csq-test.gff and csq-test.fa.vcf.

Then, I used bcftools csq --force -p m -B 1 -s - -f csq-test.fa -g csq-test.gff -o csq-test.csq.vcf csq-test.vcf to perform the variation annotation, and errors were emerged:

Parsing csq-test.gff ...
Warning: Ignoring transcript with unknown biotype .. h1tg000002l AnnotationFinalizer five_prime_utr 100 523 . + . Parent=transcript:g1392.t1;
Indexed 4 transcripts, 0 exons, 16 CDSs, 0 UTRs
Warning: 16 warnings were supressed, run with --verbose 2 to see them all
Calling...
[W::vcf_parse] Contig 'h1tg000001l' is not defined in the header. (Quick workaround: index the file with tabix.)
Error: the chromosome "h1tg000001l" is not present in csq-test.fa

No idea what happened, I will follow your instructions if I did something wrong.

@pd3
Copy link
Member

pd3 commented Dec 4, 2024

Not sure how to navigate you more, apparently the script is not clever enough. If not too big, perhaps you could share the original data?

@leon945945
Copy link
Author

Hi, the files are not too big, but github only allows 25Mb to be uploaded, how could I share the data to you.

@pd3
Copy link
Member

pd3 commented Dec 5, 2024

Can you use https://wetransfer.com or simlar?

@leon945945
Copy link
Author

Hi @pd3 , the genome, gff and vcf files were uploaded to here
Looking forward for your inspection to problem.

@pd3 pd3 closed this as completed in d6cf97b Dec 11, 2024
pd3 added a commit that referenced this issue Dec 11, 2024
This modifies the commit d6cf97b, it is better to include
such transcript and modify their boundaries instead of omitting
the transcripts entirely

Resolves #2323
@pd3
Copy link
Member

pd3 commented Dec 11, 2024

Thank you for the test case. The problem was caused by invalid transcript boundaries in the GFF file, some of the exons are not fully contained which, I think, is a bug in the GFF. See for example

h1tg000001l GUSHR   mRNA    76535   165638  .   -   .   ID=transcript:g16.t1;Parent=gene:g16;biotype=protein_coding;
h1tg000001l AUGUSTUS    CDS 165856  168762  1   -   0   Parent=transcript:g16.t1;

The parser now checks this and skips features which are not fully contained within the transcripts and extends the transcript boundaries.

@leon945945
Copy link
Author

Thanks very much, I will try your newest version!

@leon945945
Copy link
Author

Hi, to use the newest bcftools, I git clone the bcftools repository
the gcc version is gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

git clone --recurse-submodules https://github.com/samtools/htslib.git
git clone https://github.com/samtools/bcftools.git

I encountered compile error:

gcc -Wall -g -O2 -I. -I../htslib -c -o bam2bcf.o bam2bcf.c
bam2bcf.c: In function ‘bcf_call2bcf’:
bam2bcf.c:1245:17: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int k = 1; k < rec->n_allele; k++) {
^
bam2bcf.c:1245:17: note: use option -std=c99 or -std=gnu99 to compile your code
make: *** [bam2bcf.o] Error 1

according to the tips, I added the -std=C99 parameter in command line:
gcc -Wall -g -O2 -I. -I../htslib -c -o bam2bcf.o bam2bcf.c -std=c99

Another error occured:

gcc -Wall -g -O2 -I. -I../htslib -c -o bam2bcf.o bam2bcf.c -std=c99
bam2bcf.c: In function ‘calc_mwu_bias’:
bam2bcf.c:801:48: error: ‘M_PI’ undeclared (first use in this function)
return mann_whitney_1947(na,nb,U) * sqrt(2M_PIvar2);
^
bam2bcf.c:801:48: note: each undeclared identifier is reported only once for each function it appears in
bam2bcf.c: In function ‘calc_mwu_biasZ’:
bam2bcf.c:878:55: error: ‘M_PI’ undeclared (first use in this function)
return mann_whitney_1947_(na, nb, U) * sqrt(2M_PIvar2);
^
bam2bcf.c: In function ‘calc_mwu_bias’:
bam2bcf.c:802:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
bam2bcf.c: In function ‘calc_mwu_biasZ’:
bam2bcf.c:881:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^

I wanted to use the newest bcftools csq, please give me a hand. Thanks.

@leon945945
Copy link
Author

Hi @pd3 ,

could you please release the newest bcftools version on conda.

Thanks.

@jmarshall
Copy link
Member

You would have to request that of the bioconda maintainers, but note that they generally only build packages when there has been a new upstream release.

@jkbonfield
Copy link
Contributor

jkbonfield commented Dec 12, 2024

Hi, to use the newest bcftools, I git clone the bcftools repository the gcc version is gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

git clone --recurse-submodules https://github.com/samtools/htslib.git git clone https://github.com/samtools/bcftools.git

I encountered compile error:

gcc -Wall -g -O2 -I. -I../htslib -c -o bam2bcf.o bam2bcf.c
bam2bcf.c: In function ‘bcf_call2bcf’:
bam2bcf.c:1245:17: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int k = 1; k < rec->n_allele; k++) {

Firstly, this OS is officially no longer supported by RedHat. You should consider upgrading as it won't be getting any security fixes.

Secondly, to fix your problem, just enable C99 mode as the error hints at (but without disabling the other GNUisms). You can do this with make CC="gcc -std=gnu99".
It's well well past time for us to start using C99 features (it's 25 year old for heaven's sake), but we ought to update our INSTALL instructions to explain to people using ancient tooling how they can compile the software.

Edit: or better yet do the -std=gnu99 while running ./configure.

@leon945945
Copy link
Author

Bad news. Maybe I choose to fix the gff file.

@jkbonfield
Copy link
Contributor

Did it not build with -std=gnu99? It would be useful to know so we can update the notes or edit the autoconf file, but the OS is now so out of date that you can't even install it any more (the package servers have been removed).

@leon945945
Copy link
Author

Hi, @jkbonfield ,I can not build with configure CC="gcc -std=gnu99".

Here is the error info:

gcc -std=gnu99 -rdynamic -o bcftools main.o vcfindex.o tabix.o vcfstats.o vcfisec.o vcfmerge.o vcfquery.o vcffilter.o filter.o vcfsom.o vcfnorm.o vcfgtcheck.o vcfview.o vcfannotate.o vcfroh.o vcfconcat.o vcfcall.o mcall.o vcmp.o gvcf.o reheader.o convert.o vcfconvert.o tsv2vcf.o vcfcnv.o vcfhead.o HMM.o consensus.o ploidy.o bin.o hclust.o version.o regidx.o smpl_ilist.o csq.o vcfbuf.o mpileup.o bam2bcf.o bam2bcf_indel.o bam2bcf_iaux.o bam2bcf_edlib.o read_consensus.o bam_sample.o vcfsort.o cols.o extsort.o dist.o abuf.o ccall.o em.o prob1.o kmin.o str_finder.o gff.o edlib.o mpileup2/mpileup.o vcfplugin.o ../htslib/libhts.a -lz -lm -lbz2 -llzma -lcurl -lm -lz -ldl -lpthread
/usr/bin/ld: mpileup2/mpileup.o: unable to initialize decompress status for section .debug_info
/usr/bin/ld: mpileup2/mpileup.o: unable to initialize decompress status for section .debug_info
mpileup2/mpileup.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [bcftools] Error 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants