Falsehoods Programmers Believe About “Falsehoods Programmers Believe About Names”
Preface: If you haven’t read the excellent essay Falsehoods Programmers Believe About Names by Patrick McKenzie, or haven’t read it recently, I strongly suggest that you do so before reading this article. This article will brutally undermine the original’s efficiency by overanalysing its points, and in my opinion, it’s good to have your own thoughts about it before you hear mine.
Falsehoods Programmers Believe About Names has, at this point, become a meme in several relatively niche spaces. Since the original post, several “Falsehoods Programmers Believe” articles have been written about various different topics, with the most common post besides the original being Falsehoods Programmers Believe About Time by Noah Sussman, which I personally believe already begins to miss the point of the original post.
To me, the point of the original Falsehoods post is extremely clear, and yet I’m shocked that every time I share it with another person, they say “okay, but surely…”
The point of Falsehoods isn’t that we can’t possibly cover every case in every system, and that it’s okay. The point is that although not every system that requires storing names acknowledges every falsehood, there will be some system which believes a falsehood and shouldn’t. And most of these are easy enough to fix that you should just do them anyway.
This becomes abundantly clear, in my opinion, from the very first paragraph of the article where McKenzie says:
…anything someone tells you is their name is — by definition — an appropriate identifier for them.
From the start, we know that storing names is hence impossible; we can’t hope to account for every case, because every rule will have its own counterexamples. In order to store names, we must compromise. But compromises suck, and we shouldn’t just toss our hands in the air and reject any name we don’t like.
So, in an exhaustive fashion, I’m going to go through every single falsehood listed in Falsehoods Programmers Believe About Names and explain the cases where these apply, and what systems shouldn’t believe them.
It’s also worth noting— I haven’t spoken with McKenzie about the essay, and although I took a clear message away from it, I can’t speak for the author themself. My only hope is that after reading this analysis, you can ponder a bit more about the points made and take those thoughts with you when you think about the way the systems in our world are designed.
1. People have exactly one canonical full name.
This is the falsehood that McKenzie explicitly mentions before listing them, as it applies directly to McKenzie themself. Patrick McKenzie is a person with a very non-Japanese name living in Tōkyō, Japan, where most systems expect a person’s name to be Japanese.
You could argue that in most contexts, the term “name” in these systems refers to the name on a person’s legal paperwork, which in Japan are very strictly defined and must be written in Japanese; but to this point, I refer you back to the first paragraph of Falsehoods, which states:
…anything someone tells you is their name is — by definition — an appropriate identifier for them.
And hopefully, by now, we realise the crux of this particular falsehood. Since no system can possibly hope to cover all cases for names, each set of compromises manifests itself as yet another canonical name for any given person. Often, different sets of compromises will end up to the same name, but not always.
In McKenzie’s case, three possible names are already apparent just from information listed in the article:
Patrick McKenzie, the English name.
パトリック・ミッケンジー, the Japanese name.
patio11, the online alias.
Depending on the system you interact with, you may see any one of these names, or a different one. As such, we must clarify what kind of identifier we would like when designing a system that requires names.
2. People have exactly one full name which they go by.
This is a continuation of the first falsehood, and it’s the natural response by many people to the first falsehood. “Okay, I know that we’ve all got multiple names, but what’s your real name?”
A lot of people like to avoid the necessity to fully clarify their requirements for names, and instead jump to one of several different vague ideas for names and say “that one.” Like “oh, I just want your legal name” or “what do most people call you?”
You might think “okay, well, you might go by multiple names, but I want the one on your ID.” My first response to this is “which one?”
I personally changed my legal name a few years ago, and I can tell you first-hand that you do not have a singular legal name. Each legal entity you interact with has its own copy of your legal documents, and when you legally change your name, all you get is an additional document which gives you permission to change your name from a given former name to a new, current name. I personally had to change my name on my Driver’s License (commonly used as a photo ID in the US), my passport, my birth certificate, my student records at university, my bank account, my lease, and others. Depending on who you ask, any one of these sources is sufficient for a legal name, and although it’s considered bad practice to have separate names for each of these sources, it is possible.
One common way names “drift” between sources is when people “fix” their names by shortening or lengthening them at any point in the process. One recent example I’ve seen is how comedian Phil Jamesson often sees people “correct” his surname to Jameson, assuming that Jamesson is a typo when it is not. Another example: a friend of mine from university has the given name “Nate” and often sees people “correct” his name to Nathan, assuming that he accidentally put in a nickname instead of his formal name. Even if people formally have a shorter or longer “version” of their name they use in one place, most places will often accept the alternative version, leading to situations where people can have essentially multiple valid legal names.
Often, instead of a legal name, people ask for a colloquial name instead. “The kind your friends call you,” as some might call it. The most common example here is transgender people, but online aliases also apply.
Although it’s not specifically limited to trans people, most trans people will decide to change their name at some point to better reflect their gender identity, and this can happen multiple times. Depending on how comfortable a trans person is with someone, they may in fact be using a newer name around that person, or an older name. Sometimes, people change their names multiple times; sometimes, they might change it just around close friends to “try it out” before choosing to use a name around everyone else.
Another very common example is online aliases; people online often think of alternative aliases for themselves due to an existing name being “taken” or just because they think it’s cool. Sometimes, people who interact together mostly online will even use these names if they meet each other in person. Sometimes, these names aren’t even “obviously” online aliases; for example, the gamer-scientist CarlSagan42 will often go by “Carl” even among friends in person, despite the fact that his given name is Andy.
Even outside of these contexts, there are people who might just use different names in different contexts. Growing up, I knew a person who was given the name Steve to match his father, but eventually adopted the name Alex so they could easily differentiate between him and his father at home. At school, most people called him Alex, but at one point he decided to switch to using Steve because his name on the school roster was still Steve, and he was tired of explaining why he went by Alex to every new teacher he had.
Hopefully, by this point, you realise that it’s important to state the context in which a name is used to help constrain which name a person might use to identify themself. If you plan to verify someone’s name with an external source, which source are you using? If you plan to identify a person by their colloquial name, to whom are you sharing this name and in what context?
3. People have, at this point in time, exactly one canonical full name.
This is also 4. People have, at this point in time, one full name which they go by and 7. People’s names do not change.
The comments for 2 should cover this, but I think it’s worth mentioning anyway. Many people will look at the context where the name might be vague and think “okay, well, people might change their name, but after time it’ll settle out.” It won’t.
However, because names can change based upon time, systems should be able to change names, period. Although one’s name may have been a particular thing at one point in time, at a different point in time, that may change.
5. People have exactly N names, for any value of N.
This is an attempt to discretise the contexts from 2. Surely, if we count up the names someone uses, there will be an upper bound, and we can simply ask for that many names and cover everything.
Even if you can, don’t try. In addition to context not being discrete, the point isn’t to try and collect up all the names someone might use and then list them all so someone knows what you go by. Names have a specific purpose of identifying a person in a given context, and you shouldn’t try to collect names outside their context.
In other words, define your use case, and only your use case. Don’t try and cover all cases in an attempt to not define one.
6. People’s names fit within a certain defined amount of space.
I know that some folks will think they’re clever and say “okay, well, if I let you type 1000 characters in your name, that surely should work for everyone,” and to that I’ll say “congratulations; you missed the point.”
So many systems for names will try and look at very large sets of names and try to find a “reasonable maximum length” for names. You may have at some point tried to fill out a form where the field for the name seems too small, and you had to write particularly small. You may have at some point tried to fill out a “scannable” form with your name where the number of characters allowed is just one too few. Someone might have thought “well, surely, 20 characters for a name should be enough” and then found someone else whose name required 21.
The point here is that riding up on the boundary of what’s allowed is always going to give you exceptions, and you should take what you think is more than you need and multiply it by ten, or something like that.
8. People’s names change, but only at a certain enumerated set of events.
This is a continuation of 3, 4, and 7, but it’s worth clarifying. A lot of people will try and think about the cases where names change and try to make it easier to change in those situations. This seems nice, although in practice, it can end up as really annoying to the people whose cases you didn’t consider.
Here’s a more concrete example: a lot of people know that someone’s family name may change when they get married. So, you’ll find sites where the family name can be changed by the user and the given name cannot, because these sites assume that the given name does not change. If you’re going to let people change their name on their own, you should let them change their name in full whenever they want, and not just when you think it’s reasonable.
9. People’s names are written in ASCII.
It’s truly upsetting how many systems can’t even handle non-English European names, let alone names written in non-Latin scripts. If you have to remove the accents from someone’s name to enter it in your system, your system is garbage.
In the US in particular, we have a “soft” character requirement on names. This means that by law, systems are required to render plain alphabetic names correctly, but that all other characters are offered “as-is.” In practice, this means that people are often given the ability to write their name with characters beyond the basic 26 letters, but that their names may be utterly mangled or destroyed when transferred between systems.
11. People’s names are all mapped in Unicode code points.
This is also 10. People’s names are written in any single character set.
As you might expect, this is one of the falsehoods most programmers object to; in fact, it’s the main inspiration for writing this article. Surely, it must be fine to encode names using the system whose principal goal is encoding all human languages? Usually, yeah.
However, as you can imagine, Unicode has not yet achieved its goal. Every version, Unicode adds new scripts to its ever-expanding list, and although the most commonly used languages are all supported, there are many that aren’t. For example, the latest Unicode version at the time of writing, Unicode 14, includes support for the critically endangered Toto language spoken by the Toto people in northeast India.
Toto is officially classified as a critically endangered language because it’s spoken by only around 1500 people (by latest estimates), and most Toto people also speak Bengali, a language with much more widespread Unicode support. The term “endangered” is used because, quite literally, the language could die with its people.
As you can imagine, Unicode support often decides whether languages get supported by systems on the web. A system often regarded for its absurdly strict rules on names, Facebook, continues to spread to areas where its languages are not yet supported by Unicode. Facebook also, at the time of writing, supports Bengali but not Toto.
Forcing users to use specific languages on systems when their native languages aren’t in Unicode can easily drive languages past the threshold to extinction. The fact that Toto can now be encoded in Unicode may, in fact, help prevent it from going fully extinct.
Unicode isn’t complete, and it might not ever be. For most people, we just have to deal with it, but in some cases, designing your own encoding for a language before it has Unicode support is definitely the right thing to do.
13. People’s names are case insensitive.
This is also 12. People’s names are case sensitive; 16. People’s names are not written in ALL CAPS; 17. People’s names are not written in all lower case letters; and 30. There exists an algorithm which transforms names and can be reversed losslessly.
Yes, all of these points can apply at the same time.
Some people’s names are case-sensitive, some aren’t. Some names don’t have case; for example, names written using Chinese characters have no sense of “uppercase” and “lowercase;” the concept of case doesn’t even make sense when talking about these names.
When comparing names, you might be tempted to normalise case to ensure that uppercase and lowercase versions of a name are the same, but chances are that you’re not even doing that correctly. Unicode refers to these sorts of operations as “folding,” where you “fold” multiple ways of writing things into a common format. Specifically for case folding, it’s not as simple as converting everything to uppercase or to lowercase; a good counterexample is the German ß and ẞ (sharp S), which may be converted to SS or ss when converting between cases.
In Japanese, two names can be written exactly the same and be pronounced completely different, or pronounced the same and spelled completely different; how do we hope to account for that? Usually, we just ask users to say how a person’s name is pronounced in addition to how it’s spelled.
And finally, there’s the issue of in what case names should be written; a very recent example is the recently passed bell hooks who explicitly lowercases her name, against the usual conventions. Capitalising her name is hence wrong; bell hooks and Bell Hooks are two different people.
When deciding how to deal with case in names, again, it depends a lot on context. Always be explicit about how your system handles differently cased names.
14. People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
I hopefully shouldn’t have to say too much on this one. Anyone who’s learned about European monarchs know that the number at the end is is important. Speaking of which…
15. People’s names do not contain numbers.
A lot of people try to limit the types of characters allowed in names, and ultimately, you’re always going to be missing someone if you leave anything out.
The “proper” way for filtering out characters in Unicode is by checking the Unicode Character Database, usually checking the “Script” property for each character. For most European names, Latin characters only is usually okay. Oh, and don’t forget the “Common” script, which includes things like spaces; super important. Except that “Common” includes stuff like emoji and mathematical symbols too, and maybe you aren’t looking to include those in names specifically. Probably.
Ultimately, limiting the characters a name can have is limiting the inclusion of real names in your system. Often, the simple answer is not the correct one, at least in this case.
18. People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
Remember what I said about Japanese names? Two identically spelled names can be pronounced differently, and two identically pronounced names can be spelled differently. Do you include the spelling as part of the name, in this definition?
What about case-folded names? Does that help ensure proper sorting for Latin-script names?
If you’re still unconvinced, I highly recommend checking out the Unicode Common Locale Data Repository (CLDR) and looking at the data on collation, which accounts for ordering of text in different locales. Do you think that every locale will have a sorting system that distinctly orders every single name?
In French, the word café explicitly comes after the word cafe; what about カフェ?
No matter what you do, the way you sort names will be some sort of compromise. Often, giving users locale choices to determine collation helps, but doesn’t fully mitigate the problem. Expect headaches.
19. People’s first names and last names are, by necessity, different.
Based upon what I’ve already said, I think you should be able to determine why this one is wrong.
20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
One common example of cases where this doesn’t apply are online aliases; we haven’t yet approached the point where usernames are passed down from parent to child, and I hope we never do.
However, there are lots of cases where people don’t have the same “family name” as their relatives. If someone gets married and takes the surname of their spouse, their surname no longer matches their direct relatives. If someone is forced to change their name for one reason or another, their surname could also be different from their relatives.
I really, really hope I don’t have to explain any more why assuming relation based upon surname is a bad idea, especially when “no relation” is an actually common thing people say when asked if they share a surname with their relatives.
21. People’s names are globally unique.
Does your organisation assign emails based upon people’s names?
22. People’s names are almost globally unique.
Maybe it’s not a problem now…
23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
Here’s a quote from Wikipedia’s page on Chinese surnames:
Around 2,000 Han Chinese surnames are currently in use, but the great proportion of Han Chinese people use only a relatively small number of these surnames; 19 surnames are used by around half of the Han Chinese people, while 100 surnames are used by around 87% of the population.
And it’s not just China. There are all sorts of common names that will be used by very, very large numbers of people.
24. My system will never have to deal with names from China.
Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
That Klingon Empire thing was a joke, right?
Again, every time you make a compromise like this, you should accept it for what it is: a compromise. Forcing a person to use a version of their name that’s sanitised for a particular culture is just one way of saying you think a particular culture doesn’t matter.
On that last point: sometimes, people will choose names for themself based upon fictional worlds instead of existing ones, and I think that we shouldn’t be the ones to intervene if it doesn’t take a lot of effort to support that.
29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
31. I can safely assume that this dictionary of bad words contains no people’s names in it.
…anything someone tells you is their name is — by definition — an appropriate identifier for them.
32. People’s names are assigned at birth.
OK, maybe not at birth, but at least pretty close to birth.
Alright, alright, within a year or so of birth.
You’re kidding me, right?
Not everyone is assigned a name, ever. Not everyone has a name.
As a transgender person in particular, I’ve seen several cases where someone has showed up somewhere and said “I don’t know what I want to use as a name yet, but I don’t like my given name, so just call me ‘you’. ” And it’s okay to just accept that these people don’t have names, but are still people.
Does your system really need to have names at all?
37. Two different systems containing data about the same person will use the same name for that person.
I actually mentioned this when talking about legal names.
38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
If a system has to make compromises, it’s hard to say that it’s well-designed.
In that sense, no system that requires names is well-designed. It’s just… designed, hopefully a respectable amount.
39. People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
Maybe. But to that those people, you’re an asshole. I personally don’t find it agreeable to be an asshole.
40. People have names.
Does your system even need names? If you’re signing up for a newsletter, should you be required to provide your name? Even if it’s not required, should it be an option at all?
Maybe the solution to the problem is to not create the problem at all.
Hopefully you keep that in mind.