Jump to content

Nomenclatur


Charon117

Recommended Posts

This is a letter I sent one of my colleagues about nomenclatur. (C++)

 

Quote

We need to talk. Whenever you are ready.

 

So following your suggestions about nomenclatur, and your critic about not knowing the intention of a variable I want to formulate the following thesis after my experience.

 

Your initial critic that lFileA is not something you understand, apart from that I hope that it contains a file, I redeclared it as mergeFileList. I felt it was necessary to add the "List" so that I, as a programmer, know what I am dealing with. Otherwise I would have to ask myself, is it a file as a string, a vector, a list, an array ... ?
Now this name already has a ton of problems in it, but I will point them out as we go.

The next thing to rename would be itLAUppDelim, which is an abbreviation for "iterator List A Upper Delim". An abbreviation which I felt contained all the necessary information you need as the writer as well as the reader of a program as well as being short, and distinctive. But that was obviously not ok for several reasons. First, type inclusion ages VERY badly. Enough literature has been written on why type inclusion into naming is bad, and makes code maintenance a nightmare, so I wont go into that. Secondly "List A" is an identifier which obviously revers back to lFileA, but that is not knowable without the context.

This pretext leads us up to the question: "What is a good nomenclatur in programming ?".

The first we have to ask is what is "good" ?

The best definition that I can currently come up with is

good = that, which drives you closer towards your (practical) goal
bad = that, which drives you away from your (practical) goal

This definition creates the logical question of what your practical goal is that you are trying to reach. Code doesnt have to be the best code if you go back a thousand years, and it doesnt necessarily have to be the best code in a thousand years, aka 3020. What "good" and what "bad" is always changes depending on your goals, and your goals also keep changing all the time, which makes "the" best nomenclatur an unreachable stage. A good nomenclatur always changes depending on the goals and the environment.
The next logical conclusion is to ask what the (pragmatic) goals are that we currently try to reach.

First of all the nomenclatur needs to make me want to work with the program I am currently writing at. While programming conventions over the spawn of 10 years can change, it doesnt change the fact that now I am working with a limited amount of knowledge, with limited skills and with a tool that cant do everything, but is also very much limited in what it can do. While I dont want to delude programmers who read the program in 10 years with a nomenclatur that doesnt make sense to them then, I would like to know where I am with my current skill level, my current knowledge and working with the current tool which altogether limits of what I can do in a very real sense.
 

Readability. What is a "readability" ? Readability is the easiness of which meaning manifests itself to the reader. What is a "reader" ? A reader is somebody who does not have any pretext of what code is trying to achieve. This can be a random person, your coworker, or yourself after reading the code you wrote 5 months, 2 weeks or 5 days ago.
The best readability is if a reader can read a random line of code, and deduct of what it is trying to do without knowing any context.
The worst readbility for a programme is if the reader has to read the entire code in order to understand what a single line does.
And then ofcourse there are infinite variations inbetween.
 

As you can see there are not infinite goals, but only 2 very concrete people we have to satisfy, the Writer and the Reader. Too many people insist on styles which do not satisfy both ends, but enough people talk about their style as "the best of both worlds", which might be true for them, but is hardly transferable to other people who might have different cultures and context knowledge.

Taking this knowledge into account lets take a look at itLAUppDelim and lFileA. The cure for lFileA seems to be mergeFileList. First it eliminates the data type, and secondly it describes what it is, AKA it is the merging File.
what is itLAUppDelim ? iterator List A Upper Delimeter. Thats obviously not good so how about mergeUppDelimIterator ? [merge] describes that it belongs to mergeFileList, [UppDelimIterator] describes in a very real sense of what it is, namely an upper delimeter iterator.

The first problem I have with this is that while we remove the data type from the name, we kinda sneak it back in with [Iterator] and [--List], a requirement for the Writer to orient himself. The second problem is that while mergeFileList makes it easy to deduct that it means the merging file, mergeUppDelimIterator is kinda ambigious. Are we trying to merge the Upper Delimeter with something else ? Who knows ? BUT WAIT you are going to say, we only added the [merge] because it belongs to mergeFileList. [merge] is not an action, adjective or noun, it is an identifier. This identifier tries to logically link together two variables, functions, Macros, classes or templates, implying that they have some kind of connection, which sets them apart from other code. OK, valid point.

This is why I want to to introduce Identifier into the nomenclatur. Identifier are a means for orientation for the Writer and Reader alike. Once the Reader and Writer know what a variable, function, macro, class and template is trying to achieve an Identifier tries to give an approximate orientation of where that function, ... belongs to, and has its place in the code.

An example for that would be a23File. Nobody needs to know what [a23] means, all you know is that that variable is a file in some form, and that it is connected with other stuff which are also tagged as [a23]. Identifier address a lot of pragmatic problems that crop up during development, for instance if you use vectors and lists you will necessarily have a lot of iterators. Most of them will be named very similarly, and if you process data deeply most of them will named even more alike. Trying to pull identifiers from the meaning of other variables quickly leads to confusion, especially for prolonged development. It leads to confusion because the Writer and Reader read it as something meaningfull, while it only tries to show a connection between functions, ... etc.

Skipping a lot of the practical problems and going straight to the point

Because of the points mentioned above

I suggest the following system.

[Identifier][Meaning]_[Orientation for the Writer]

 

[Identifier]: The Identifier consists of only lowercase letters or numbers, and is supposed to be held like a deep directory. For instace mFile is a file which belongs to the identifier m, and miManager is a process which belongs to m and I, in that order. mimClock is a function, ... which belongs to m, then I, and then m again. If you are reading mimProcess than you obviously know that there are 3 additional layers where something interesting might go on. Only single letters or digits for each "directory" level. m =/ mi=/ mim.
The Identifier compresses a lot of context information, which is not necessary for the Reader to understand a random line of code, while showing the connection to the Reader and Writer as soon as more context is aquired.
It is important to note that that the Identifier doesnt have to be dead accurate in hindsight. It is not meant as a replacement for checking the code, but as a guideline for which code to check first. The Identifier does not replace context knowledge, it just is an arrow into which direction context knowledge is most likely to be found first.

[Meaning]: Simply put, this is what you are trying to achieve in its purest form, without being burdened by having to hint at connections to other functions, ... or by having to orient yourself by adding [--List] and the like. Write it like you are shakespear. Write it like you want to read in 5 days, 2 weeks, and 5 months. Verbosity is encouraged, Keep it as short as possible, but dont compromise on the meaning. Be Wild, Be Yourself. Also [Meaning] uses Camel Case, so as soon as you get to the first capital letter you know you arrived at the [Meaning] of the function, ... .

[Orientation for the Writer]: Dividing the common orientation and the meaning from the exclusive orientation for the Writer with an underscore is this. Every person is in a unique constellation of a personal skill level, knowledge, coworkers, environment, IDE, language limitations, and infinite other influences, too many to count them all, which limits the scope of the Writer in a very meaningful way of what is currently important. Write in lowercase single letter and digit abbreviations what is important for you, the Writer. If you currently need to keep in your head whether you are dealing with 16bit encoding or 32bit encodinging than write mimFile_32b. Is it important for you to know the data type than write mimFile_ls. Or maybe mFile_liststring. Do you need to single out any important information which is necessary to know for you currently, write it down I a short lowercase abbreviation you, and only you, have to understand.
Functions, ... dont have to have this, the [WriterOrientation] can also be completely missing.

 

Applying above rules to the examples given above we get this.

From our original lFileA, over mergeFileList we would formulate it as mMergeFile_ls.

itLAUppDelim > mergeUppDelimIterator would consequently become mUpperDelim_lsit.

I find this naming convention to be more appropriate and a lot better to read, as the Reader can start to read the [Meaning] immediately with the first capital letter, while [WriterOrientation] doesnt have to be correct, meaningful or even looked at by anybody, but the Writer of the code. The Identifier provides additional context direction for both the Writer and the Reader alike, while not claiming to be correct at any given point. Also the main() function has an empty Identifier eg. File_lsit.

With this we satisfy both the Reader and Writer alike, as a Reader doesnt need to look at information he/she would need more context to understand, while the Writer can orient himself while always knowing that neither the [Identifier] nor the [WriterOrientation] are a substitute for checking the code.

 

Another point. Some people are stuck in naming paralyses, which means they spend time not knowing what their function, ... will do, and therefore cant come up with a fitting name for it. That is completely natural, and can be avoided 100% by simply not naming functions, ... before you write the code. Put in a placeholder name which approximitelly describes what it should do, write the code, then update the name accordingly. In the best case you put a placeholder name down, write code until you go out of scope, then read what your code does, and then find the most fitting name for it. Most IDEs should have no problem with replacing context specific, but even with standard text editors this shouldnt be a problem. Worst case scenario you can make custom delimeters like xxx_IDontKnowWhatThisDoes_xxx.

 

These rules are not set in stone, subject to change and are meant to be used in conjunction and addition to any existing ruleset there. I would like to know what you think about this and hope my message finds you in your best health,
[Charon]

[overly official :p]

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...
Please Sign In or Sign Up