Thanks @sipa for clarifying. Its natural that if probability of a bucket being chosen is P(bucket) and it has n addresses, then probability of a particular address being picked from a particular bucket is P(bucket) / n. That satisfies your example above.
But to me, that doesn't seem like satisfying your above statement, " if a bucket has fewer entries, its entries are supposed to be picked more frequently."
Because bucket choices are uniformly random, for m non-empty buckets, probability of a particular bucket being chosen is 1/m. And, then because the search ensures that a non-empty bucket will always yield an address, so probability of getting any address from that bucket is 1.
Following this logic, we have, for m buckets, probability that any address from a particular bucket to be chosen is always 1/m, uniformly distributed over all the buckets, and independent of size of buckets.
So mapping back to your example, it would say, probability of choosing an address from Bucket A == Probability of choosing an address from Bucket B == 50%. The address count doesn't play a role anymore, because we will always choose an address from a bucket if it has any.
To compare against the old code:
In the old code, the probability is uniformly distributed across all addresses, irrespective of their buckets.
In the new code the probability is now uniformly distributed over all buckets, irrespective of their contents.
Once the bucket is chosen, then there is no probability anymore, as we will always get an address from it if its not empty.
Thus to me it seems like the probability is independent of bucket size.