Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools merge crash (develop branch) #1990

Closed
jkbonfield opened this issue Sep 1, 2023 · 2 comments
Closed

bcftools merge crash (develop branch) #1990

jkbonfield opened this issue Sep 1, 2023 · 2 comments

Comments

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 1, 2023

I cannot get merge to work. Starting from a 1000genomes file:

$ bcftools view -Oz --write-index -o _NA21143.vcf.gz -s NA21143 g1k_chr20.vcf.gz 20:10m-11m
$ bcftools view -Oz --write-index -o _NA21144.vcf.gz -s NA21144 g1k_chr20.vcf.gz 20:10m-11m
$ bcftools merge -Oz -o _m.vfz.gz _NA21143.vcf.gz _NA21144.vcf.gz
Segmentation fault

It's dying in some place with minimal debugging information, which is a little odd.

(gdb) where
#0  0x00005555555ddd43 in vcmp_set_ref ()
#1  0x0000555555582ce9 in can_merge ()
#2  0x0000555555585005 in merge_vcf ()
#3  0x0000555555585bc7 in main_vcfmerge ()
#4  0x00007ffff62a3c87 in __libc_start_main (main=0x555555564140 <main>, argc=7, argv=0x7fffffffca88, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffca78)
    at ../csu/libc-start.c:310
#5  0x000055555556404a in _start ()

The input data will be available to you locally. It's just a symlink to here:

lrwxrwxrwx 1 jkb team117 108 Sep 1 14:44 g1k_chr20.vcf.gz -> /nfs/users/nfs_j/jkb/scratch/data/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

I tried both 1.18 and develop branch. I don't get why no arguments to the function are shown in gdb. I even get this with -g and no optimisation. I'll try address sanitizer, but adding it here incase I'm just doing something wrong.

I don't know how to use merge, and I couldn't understand what the usage statement means for -g with -|REF.fa. I don't have GVCF I think as I start from a merged VCF and are splitting it out in order to remerge (as I want to experiment with local-alleles). However it's clearly failing in reference comparison, but there doesn't appear to be any way of specifying a reference for non-vcf.

Either way, incorrect usage shouln't cause a crash.

@jkbonfield
Copy link
Contributor Author

Ah figured out the debugging oddity. Something in my htslib was disagreeing with gdb. That had been built by clang, but gdb on the bcftools binary was throwing its hands up in horror:

Reading symbols from bcftools...Dwarf Error: Cannot handle DW_FORM_<unknown> in DWARF reader

Anyway, rebuilding that with the system gcc gave me a binary that gdb could swallow again. The error is:

Program received signal SIGSEGV, Segmentation fault.
0x000055555561484e in vcmp_set_ref (vcmp=0x0, ref1=0x555555c22840 "C", ref2=0x555555c23b00 "C") at vcmp.c:57
57	    vcmp->ndref = 0;

This is from args->vcmp, which only appears to be set when I do -m none to set args->collapse = COLLAPSE_NONE. I don't know enough about what I'm doing here, as I'm not a user - just trying to evaluate the local alleles code. -m none cures the SEGV, but clearly the default should work. Why is it using vcmp when the collapse mode is not none?

What I'm doing seems to be the most naive thing possible. Split out two samples and then merge them back together again.

@pd3 pd3 closed this as completed in 7a00a28 Sep 7, 2023
@pd3
Copy link
Member

pd3 commented Sep 7, 2023

Duh, that's just a silly bug, your usage is alright. Fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants