null terminated bytestrings? #483

hasufell · 2022-02-04T19:04:03Z

This is more of a discussion/question than an issue.

I was looking into calling into the libc function strpbrk, because it is much faster than any equivalent of findIndex could be, see:

But then I noticed... Haskell bytestrings are not null-terminated and doing so would require an entire memcpy, which kind of defeats the purpose when looking for optimization.

So I wondered:

what if ByteStrings were null-terminated internally, without changing any of the external API? That would make it easier to just pass them to C functions expecting null-terminated strings without copying. Yes, I'm aware that a bytestring can have null bytes anywhere and that you'd potentially get divergent behavior between strpbrk and a "pure Haskell implementation"
what if there was another module enforcing the variant? Via a newtype maybe?
are there other tricks that could be employed? Lazy bytestrings, obviously, don't help here. Could Text be an alternative? The main reason I use ByteString for this task is because it has those very fast elemIndex functions implemented via memchr.

I think there might be many more such C functions, that are not re-implemented for ByteString exactly for that reason.

The text was updated successfully, but these errors were encountered:

Bodigrim · 2022-02-04T19:10:25Z

You cannot retain constant-time slicing for null-terminated strings.

hasufell · 2022-02-04T19:17:11Z

You cannot retain constant-time slicing for null-terminated strings.

My suggestion wasn't to change the internal representation of ByteString. But you could have e.g. a module where only functions are exposed where adding the null byte during construction is trivial (e.g. fromString) and then maintain that invariant for all operations.

Bodigrim · 2022-02-04T19:23:49Z

Trade offs and API would be vastly different from bytestring, so I don’t really see it fit into it.

We can discuss adding cbits implementation for findIndex instead.

hasufell · 2022-02-04T19:29:38Z

We can discuss adding cbits implementation for findIndex instead.

It wouldn't even need to be a full findIndex, but something like

findIndex' :: [Word8]
           -> ByteString
           -> Maybe (Int, Word8)

I'm not sure whether I have any visions about how to implement this fast.

Bodigrim · 2022-02-17T19:49:22Z

Something like this could be implemented very efficiently, even without cbits:

newtype Mask = Mask ByteString -- 256 bits = 32 bytes
findIndexInMask :: Mask -> ByteString -> Maybe (Int, Word8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

null terminated bytestrings? #483

null terminated bytestrings? #483

hasufell commented Feb 4, 2022

Bodigrim commented Feb 4, 2022

hasufell commented Feb 4, 2022 •

edited

Loading

Bodigrim commented Feb 4, 2022

hasufell commented Feb 4, 2022

Bodigrim commented Feb 17, 2022 •

edited

Loading

null terminated bytestrings? #483

null terminated bytestrings? #483

Comments

hasufell commented Feb 4, 2022

Bodigrim commented Feb 4, 2022

hasufell commented Feb 4, 2022 • edited Loading

Bodigrim commented Feb 4, 2022

hasufell commented Feb 4, 2022

Bodigrim commented Feb 17, 2022 • edited Loading

hasufell commented Feb 4, 2022 •

edited

Loading

Bodigrim commented Feb 17, 2022 •

edited

Loading