Monday, 1 July 2013

How much code documentation is sufficient? Can we unit-test the documentation?


I stumbled today upon a blog post which states that one page of external documentation (i.e. not counting in-code comments) is sufficient. I can see the appeal of such an idea but I find it somewhat simplistic. I had recently a project at the end of which we ended up writing +30 A4 pages documenting covering 10 man-months of coding. There were some graphs and diagrams so the text was probably only slightly more than 20 pages or, roughly, about a page for each 2 man-weeks of coding.

Is this too much or too little? 


It depends. For most of the developers who wrote it this was a rather tedious task. But these happened to be internship students and they were going to leave our team. The quality of the code was not stellar as can be expected from programmers with little experience but, more importantly, there was discontinuity - anyone working on the codebase would not be able to ask the authors directly. Thus, in my opinion, the volume of the documentation - although it took several man-days to produce - was not really excessive.

But the question regarding the quality and quantity of the documentation is a rather deep one and cannot be easily answered with a simplistic metric like number of pages. The problem is that currently we have no objective metrics regarding the completeness of the documentation (i.e. "code coverage"), nor regarding its quality (i.e. how easy is it to understand and whether it correctly describes what the code does). These are rather involved issues and I don't expect the community to come up anytime soon with a widely agreed metric (e.g. number of words per line of code) that would resolve these issues in an objective manner (*) and that can be verified automatically similarly to the way static code analysis is done by various tools such as those included in SonarCube.

Why not test the documentation?


These days we assume that we shall test all or most of the code we deliver. Why not do the same for the documentation? I suggest that we combine the concept of testing with another familiar concept - that of the peer programming. What I mean is that we could go for peer-documenting. For example, a programmer A writes the documentation for the code written by programmer B. Than the documentation produced by B is tested by giving it to another team member C. C would have to assess the completeness and clarity of the documentation. If he/she needs to ask any questions than the documentation is not sufficient in quantity or quality (i.e. does not pass the test) so B would have to amend it till C is happy with it.

I reckon there will be some resistance as developers find writing documentation boring. But I remember that a while ago not everyone was happy with the idea that we shall unit-test most of the code. Nowadays unit-testing is considered a good practice and most programmers do it. I think that documentation testing is likely to prove its worth, too. I also think that the combination of testing and peer-documenting would inject even more rigour in the process.


===========

(*) The problem is somewhat related to the completeness or the clarity of a mathematical proof. At the university some professors were thoroughly writing down every step of the derivation of a proof while some did just proof sketches leaving everything else "as an exercise to the reader". Needless to say, some of these exercises were anything but trivial and I found such hand-waving occasionally quite frustrating.