Through the course of my day job, I’ve come to learn a fair bit about the United States Social Security Number (SSN). There are numerous sources out there that each provide small bits of information, but I wanted to take this opportunity to write my own post about what I’ve learned. With any luck, this post will be a useful resource for someone else seeking to better understand Social Security Numbers.
For a much longer history of the Social Security Number, I highly recommend “The Story of the Social Security Number”, published by the Social Security Administration. But I’ll capture a few of the things I find especially interesting or important here.
The Social Security Number was created in 1936 in order to track a worker’s earnings throughout their lifetime and later determining the amount of Social Security benefit paid out in retirement. Within the first 4 months of the registration campaign to get workers registered, almost 26 million SSNs were assigned. 1 Back in these days, SSNs were assigned by simply asserting that you should be given one.
In 1943, Executive Order 9397 was introduced, which required federal agencies to use the SSN to identify individuals in any new record systems. This is likely the start of the SSN’s journey from humble earnings-tracking identifier to national person identifier.
In 1972, it became necessary to provide evidence that establishes age, identity, and citizenship or alien status in order to receive an SSN.
From it’s inception until June 25th, 2011, Social Security Numbers were issued in a structured way (Continue reading for more detail). Social Security Numbers issued after June 25th, 2011 now use a randomized process 2, which eliminates geographical significance and eliminates significance of the “High Group” – we’ll get to that soon.
Social Security Numbers can be broken up into 3 chunks, which are also represented in the typical Social Security Number format of XXX-XX-XXXX
. These three chunks are called the “Area” number, the “Group” number, and the “Serial” number.
Prior to the Social Security Number Randomization, these numbers each held certain significance, which are outlined here:
The Area Number is the first three digits of the SSN and represents the geographic region that issues the SSN. These regions began in the northeast and moved westward, so people born on the east coast typically have lower area numbers than those in western states. There were, of course, exceptions to this rule – Several specific Area Numbers were assigned to specific regions, such as American Samoa, Puerto Rico, and there was even a range of Area Numbers reserved for railroad workers for a period of time.
Area Numbers also have, or rather, previously had, some limitations:
There are some resources out there to help determine which states were issued which Area Number prefixes, here are a couple:
Group Number is the second group of numbers, the fourth and fifth digits, and roughly represent a period of time in which the social security number was issued. It’s not a direct translation, but through something called the High Group list 3, you can roughly determine when an SSN was issued. I haven’t been able to find High Group lists prior to November 2003 online, so this method is probably not super reliable for the nearly 70 years before 2003. But, based on how the Group Number is assigned, one could reasonably make certain assumptions when comparing two SSNs – e.g. someone with a lower group number is probably older than someone with a higher group number, assuming the Area Number remains the same.
The Serial Number is the easiest one of the 3 numbers, because it never held any significance. It was a straight numerical series of numbers from 0001-9999. 0000
was never assigned in any group, but otherwise this likely just reflected the order of the envelopes on a processors desk when they were assigning SSNs at work that day.
For all Social Security Numbers issued after June 25th, 2011, Area Number and Group Number have lost their significance. As the Social Security Administration had been on computers for decades at this time, it become trivial to simply issue a random social security number out of the remaining pool. There are still certain restrictions, however:
000
, 666
, and 900-999
are not assignable [^2[]00
will not be assigned 40000
will not be assigned 4Validating SSNs can be tricky, depending on how you’d like to do it. There are obviously certain technical controls we can implement to look for incorrect SSNs, based on the criteria we’ve outlined already. A sample python attempt at performing this validation is provided below, though I’d probably consider more robust options than these simple format-checking solutions.
def is_valid_ssn(ssn: str) -> bool:
# Accepts ssn in the format of `123456789` for simplicity sake
if ssn[0] == "9":
return False
if ssn[0:2] in ["000", "666"]:
return False
if ssn[3:4] == "00":
return False
if ssn[-4:] == "0000":
return False
return True
But even this simple solution fails to account for things like when the date of birth for the person is, and whether or not the Group Number is within the bounds of the High Group list (assuming you’re only dealing with adult Social Security Numbers at the time of writing this – we’ve got until 2029 before randomized SSNs become common place for adults). So it can tell you if an SSN is formatted correctly, but it can’t tell you anything else, really.
If you just needed to validate a small number of Social Security Numbers, such as for validating someone’s I-9 documentation to prove their eligibility to work in the US, manual verification of SSN authenticity can be done via the SSNVS, a service offered by the Social Security Administration, however it has numerous limitations on authorized use.
There is additionally a service called Consent Based Social Security Number Verification (CBSV) that government agencies and certain private entities can enroll in to do online instant verification. Using this system requires an EIN (issued by the IRS, and perhaps a topic for another post), and requires filling out an enrollment form, submitting a $5,000 enrollment fee, and executing a legally binding agreement with the Social Security Administration. 5
Once you’ve got access to this system, you can submit a Name, Social Security Number, and Date of Birth combination, and you’ll get back yes
or no
, depending on whether or not the combination is a match for what the SSA has on file. This means you can’t just submit a Name and SSN, nor can you just submit an SSN and DOB. You need the Name and SSN as it appears on the Social Security card, as well as the date of birth the SSA has on file.
If you’re a business that is dealing with Social Security Numbers and PII matching, this service is probably something worth looking into. But bare in mind that it will force rigid name matching – “John” can’t pass validation if “Jonathan” is on file with the Social Security Administration.
The Social Security Administration has issued something like 450 million social security numbers, putting us just about halfway to exhaustion. The SSA has never reissued an SSN, and so if a person has died, their SSN should probably not be considered valid for all sorts of business and identity verification purposes. But how do we know if a person has died?
Enter the Death Master File (what an ominous name). https://www.ssa.gov/dataexchange/request_dmf.html
This is a file that contains information about SSNs whose bearer has been determined to be deceased. It contains the SSN, first name, middle name, surname, date of birth, and date of death.
Unfortunately access to this file also costs money and involves government bureaucracy, but perhaps that is for the best. If you’re a business, you should probably have access to the file, or use a service that has access to it in order to verify the identities you’re dealing with and making sure they aren’t dead.
Based on some rough math about social security numbers, I estimate that we’re a little over halfway through assignable social security numbers. Given the previously established rules in which no SSN will start with 000, 666, or 9, and no SSN will have 00 in the middle two digits, there are about 898 million valid SSNs. The Social Security Administration also estimates that they are assigning approximately 5.5 million social security numbers every year. This leaves us a little over 80 years of social security numbers in their current format, before we have to either extend the SSN or start to reuse numbers.
I’m personally hoping that in 80 years we’ll have a durable technology that can last a person’s lifetime and be used as a cryptographic assertion of identity, rather than “here’s my most important number” anytime someone wants to know who you are. But who knows.
There are many cases where an SSN may not be available to a person but they otherwise need a legally trackable identification number. Well, the Social Security Administration tapped out of this conversation and the Internal Revenue Service stepped up. If anyone wants to make sure they can track individuals, it’s the folks who want to make sure they are getting paid, I suppose.
A Taxpayer Identification Number (TIN) is simply a nine digit number used by the IRS to track an individual. If you have a Social Security Number, then this is also your TIN, according to the IRS. If you’re an individual without an SSN, you’ll get something called an Individual Taxpayer Identification Number (ITIN). This is formatted the same way as an SSN, but always begins with a 9. Note that this means ITIN and SSN can never have an overlap, since the Social Security Administration ensures us that SSNs will never start with a 9.
If you’re dealing with an entity, like a corporation, trust, or estate, you’ll have something called an Employer Identification Number (EIN). This is also a 9 digit number issued by the IRS, but this one is formatted as XX-XXXXXXX. Similar to SSNs, these used to have geographic significance for the bearer – prior to 2001, the two digit prefix represented the geographic region of the business. But since 2001, the prefix only represents the “campus” that issues the EIN – and one of those “campuses” is “Internet”.6
I find it particularly amusing that EINs and SSNs are indistinguishable from one another if you strip the dashes out. Load bearing hyphens. Though I guess in most cases an SSN by itself isn’t valuable, and it’s probably not often that you’re asked to provide your EIN and Date of Birth.