Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead per key #30

Closed
ahmad-masad opened this issue Oct 31, 2023 · 2 comments
Closed

Overhead per key #30

ahmad-masad opened this issue Oct 31, 2023 · 2 comments

Comments

@ahmad-masad
Copy link

Hi! We currently anticipate a 5% storage overhead per key in the KV store. We were planning to confirm this estimate by running some tests to get an empirical approximation but thought to ask first in case an estimate has already been measured. This will help us make decisions related to sharding and instance sizing. Thank you!

@peiwenhu
Copy link
Collaborator

Hi! what's the stats of your key/values? I assume we're looking at the in-RAM storage overhead (rather than the persistent blob storage). how big are the keys&values at 50 percentile, 90 percentile, and total number of keys, etc?

For the in-RAM storage there is some fixed overhead to store metadata so the percentage changes depending on the k/v sizes. Right now it's a 64-bit integer for the timestamp (https://github.com/privacysandbox/fledge-key-value-service/blob/release-0.13/components/data_server/cache/key_value_cache.h#L93) and some internal overhead required by the data structures like unique_ptr, the hash map and probably overhead incurred by TEE. This may change as we may optimize the storage implementation in the future.

However, with that said, so far our finding suggests that TEE implementation by the cloud could be more impactful to the sharding and instance sizing decision than per-key implementation overhead. For example on AWS there is certain limitation on what can be used from a machine for TEE. (aws/aws-nitro-enclaves-cli#263)

We don't have an analysis ready to share but there is work in progress to test our sharding capabilities. And with the big caveat above I recommend that you test out with the empirical approach. If you share the data stats with us we might also be able to take a look along with our own testing.

@ahmad-masad
Copy link
Author

Apologies for the lack of response, was conferring with my team about this. We will share this info in a private shared doc with our contact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants