Here at Lumber Labs, we've been spending a lot of time around credit cards, and we noticed something interesting: the distribution of digits in credit card numbers appears to not be uniform. Below is the percentage distribution of each digit:

We can make two observations. First, it makes sense that this is credit card data as Visa and MasterCard are likely responsible for the slight spike in the 4's and 5's (Visa card numbers start with 4, and MasterCard with 5). Second, this chart somewhat resembles Benford's Law, which states that in a list of numbers from certain real-world datasets (like lengths of rivers and street addresses), the first digit will occur roughly 30% of the time. Although in our data the first digit occurs only 18% of the time, this is more frequent than one would expect from a purely uniform distribution. One explanation could be how card numbers are chosen by credit card companies: Benford's Law is known to apply when numbers are randomly generated across multiple distributions.

If you have a good explanation for this phenomenon, let us know! Or, even better, come work with us.

[Update: MarkMC points out that there are a number of real-world issues at play here, including BIN number assignment and other pre-set number ranges for third-party compatibility. Thanks, Mark!]

HI Mike,

ReplyDeleteFirst off great product! And great Product announcement! Congratulations.

Great insights can be had using Bedford's Law! The results of your study is fascinating. I believe that you are seeing the results of the way Payment Card Numbers are encoded. They use a process of Luhn Formula or the Modulus 10 algorithm.

The formula verifies a number against its included check digit, which is usually appended to a partial account number to generate the full account number. This account number must pass the following test:

Counting from the check digit, which is the rightmost, and moving left, double the value of every second digit.

Sum the digits of the products (eg, 10 = 1 + 0 = 1, 14 = 1 + 4 = 5) together with the undoubled digits from the original number.

If the total modulo 10 is equal to 0 (if the total ends in zero) then the number is valid according to the Luhn formula; else it is not valid.

Assume an example of an account number "4992739871" that will have a check digit added, making it of the form 4992739871x:

Account number 4 9 9 2 7 3 9 8 7 1 x

Double every other 4 18 9 4 7 6 9 16 7 2 x

Sum together all numbers 64 + x

To make the sum divisible by 10, we set the check digit (x) to 6, making the full account number 49927398716.

And thus there is a non uniform distribution of numbers with a bais to 0 and 1.

Hope this helps!

∞Brian

http://www.quora.com/Brian-Roemmele

The account number 49927398716 can be validated as follows:

Double every second digit, from the rightmost: (1×2) = 2, (8×2) = 16, (3×2) = 6, (2×2) = 4, (9×2) = 18

Sum all the individual digits (digits in parentheses are the products from Step 1): 6 + (2) + 7 + (1+6) + 9 + (6) + 7 + (4) + 9 + (1+8) + 4 = 70

Take the sum modulo 10: 70 mod 10 = 0; the account number is probably valid.

Hello! Thank you for sharing those relative information between credit cards and Benford's Law.

ReplyDeletecredit repair letters

Hi there Mike, I understand why this chart has to be posted. But anyway, should it be possible that due to lack of number uniqueness, I should be asked to file for credit repair letters ?

ReplyDelete