I'm wondering some best practices or approaches for data where, for example, as in the below image the low value bins are most common, but you are interested in the whole distribution.
Here is a visual of the raw data:
When we take logy
, we can see there is more potentially interesting behaviors in the higher values. But now the axes make interpretation more difficult.
It's quite a broad question, but
- how can we model such distributions?
- can we fit multiple distributions to the log histogram?
Original dataset can be found here: https://www.kaggle.com/datasets/wilomentena/uk-government-petitions The plot is for non-rejected petitions