Here at Lumber Labs, we've been spending a lot of time around credit cards, and we noticed something interesting: the distribution of digits in credit card numbers appears to not be uniform. Below is the percentage distribution of each digit:
We can make two observations. First, it makes sense that this is credit card data as Visa and MasterCard are likely responsible for the slight spike in the 4's and 5's (Visa card numbers start with 4, and MasterCard with 5). Second, this chart somewhat resembles Benford's Law, which states that in a list of numbers from certain real-world datasets (like lengths of rivers and street addresses), the first digit will occur roughly 30% of the time. Although in our data the first digit occurs only 18% of the time, this is more frequent than one would expect from a purely uniform distribution. One explanation could be how card numbers are chosen by credit card companies: Benford's Law is known to apply when numbers are randomly generated across multiple distributions.
If you have a good explanation for this phenomenon, let us know! Or, even better, come work with us.
[Update: MarkMC points out that there are a number of real-world issues at play here, including BIN number assignment and other pre-set number ranges for third-party compatibility. Thanks, Mark!]