Partition sizes of 1GB on multi tenant and 10GB on dedicated are much too small for many big data and IoT use cases. Please support much bigger partitions.
The partition sizes allow you to estimate how long consumers can be offline (e.g. maintenance) before you will lose data in message hub.
If you are round robin distributing data across all partitions, you can estimate max maintenance/downstream offline time with:
max_offline (h) = num_partitions * max_partition_size / ingest_rate_into_mh (gb/h)
100 partitions * 1 gb (multi tenant) / 20 gb/h = 5 hours (multi tenant)
Based on this scenario if your downstream consumers/persistence stores are offline for more than 5 hours, you may lose data in kafka because it will be overwritten.
If you are not using round robin partitioning, and your distribution of data is not evenly spread across all of the partitions, the time to fill up a partition and overwrite data will be lower.
NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "email@example.com" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions