We began with definitions last time. Let's add a few more before we press on.
Complexity
The English sources at my disposal underwhelm, so let's instead take a stab at the Latin beneath. Plecto means "I weave" or "I twist."1 Com- is the familiar prefix denoting "with". Therefore we may not sin too much calling complexity the measure of how weaved-with an item is, either in itself or with its surroundings.2 In software, there are two kinds of complexity worth mentioning, accidental and essential.3 Essential complexity is that which cannot be reduced without changing your algorithm. Accidental complexity is an accumulation of human error, of engineering tradeoffs and other failings.4 Given the depth-first-search toposort, the essential complexity is linear with the number of vertices and edges, O(V + E). Notice that nowhere do you find terms for whether you used a recursive or iterative solution, whether your recursive solution was implemented on a system with tail-recursion optimization, or whether your program's process was suspended while the operating system received an interrupt. In those invisible terms lie the accidental complexity, and they comprise most of the hulking mass of modern computing.
Ownership
The old language had an entire declension for ownership.5 To own something is for that thing to be an extension of yourself, for your causes to flow through it. It means the object is ready to your hand. What does it mean when you say "my country"? Does the country move when you move? What do you mean by "my computer"? Does your light shine out the other side of it, amplified, or does it confound you, complicate your actions, and bury you in ever-increasing cost?
We're a few lost lifetimes away from a functional Linux system yet. Let's continue.
Libc
We're building an operating system around the Linux kernel, which exposes an interface of system calls to user-space programs. We could conceivably implement these system calls ourselves, along with the myriad utility functions present in most libc libraries, but we'd be treading ground already trod at great cost. Of the available options, I'll discuss these two.
musl
This thing is slim, performs well, and has good compatibility with POSIX6. See here for a comparison with other libc libraries. Moreover, consider this from the musl FAQ:
When will it be finished? When there's nothing left to remove.
musl-1.1.24 cloc .
2521 text files. 2331 unique files. 247 files ignored. github.com/AlDanial/cloc v 1.70 T=9.07 s (252.8 files/s, 11741.6 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C 1448 6314 7295 48270 C/C++ Header 537 4316 273 30931 Assembly 299 490 539 6485 Bourne Shell 4 111 160 618 awk 3 56 74 301 make 1 68 10 157 sed 1 0 6 9 ------------------------------------------------------------------------------- SUM: 2293 11355 8357 86771 -------------------------------------------------------------------------------
After weighing the thing, I'm truly in love.
glibc
Meanwhile, lets strap a loathsome pig down and take a few measurements.
glibc-2.30 cloc . 16983 text files. 16253 unique files. Complex regular subexpression recursion limit (32766) exceeded at /usr/bin/cloc line 7327. 3020 files ignored. github.com/AlDanial/cloc v 1.70 T=69.28 s (201.9 files/s, 25976.2 lines/s) --------------------------------------------------------------------------------------- Language files blank comment code --------------------------------------------------------------------------------------- C 8781 114862 172458 683447 C/C++ Header 2882 38763 64340 282848 Assembly 1821 39782 93919 235315 Bourne Shell 82 1991 2558 15567 make 203 2498 2476 11080 Python 44 1173 2376 9013 TeX 1 813 3697 7162 m4 50 349 144 3467 awk 38 291 422 2054 Windows Module Definition 17 190 0 2024 C++ 32 258 310 1238 Perl 3 87 142 632 Korn Shell 1 55 68 435 yacc 1 56 36 290 Pascal 7 82 326 182 JSON 2 0 0 91 Expect 8 0 0 77 Bourne Again Shell 1 8 31 77 sed 11 8 15 70 --------------------------------------------------------------------------------------- SUM: 13985 201266 343318 1255069 ---------------------------------------------------------------------------------------
86k vs 1.23m lines. I get physically ill looking at wanton messes like these. The utter contempt for human comprehension is abhorrent.7
If somebody else solves the same problem as you sparing over an order of magnitude, you congratulate them and get out of their way.
We still don't have that blinking cursor, but we're getting closer. There are still explosions of accidental complexity to avoid in the days ahead. See you tomorrow.
- And because the language is so damned orderly, also "I punish". [↩]
- Latin would happily allow you to disambiguate the two if you wanted! [↩]
- These are much better discussed here http://curtclifton.net/papers/MoseleyMarks06a.pdf, and I encourage you to go give that a read before continuing. [↩]
- Consider that mathematical truths are discovered, while products of beings-an-engineer are built. [↩]
- The genitive or "begetting" form. [↩]
- POSIX, first codified in 1988, is a standard for the APIs and utility programs of UNIX-like operating systems. Adhering to the standard means that a huge amount of software designed for POSIX can be ported to a given operating system. [↩]
- Yes, we glossed over the plump, moo-eyed smiles of the Linux kernel and gcc because we have no alternatives. That's no excuse. [↩]
Tiny nitpick/clarification re "Essential complexity is that which cannot be reduced without changing your algorithm." - according to the paper you ref, afaik this would be so if you mean there users' "algorithm" aka the problem's own (as seen/presumed known -and that's a big presumption- by the users). Was this what you had in mind behind those (hard to notice at the time really) prods re users of TMSR-OS? Because I realised this only now that I read this article of yours.
(I have experienced that physical sickness @ the sudden grasp of the size&spread of a code-mess of such proportions)
Comment by Diana Coman — 2019/12/30 @ 2:21 p.m.
I must say I'm actually quite happy with the complexity definitions.
> The utter contempt for human comprehension is abhorrent
It's not just the plain size, either. Musl comments are 10%. glibc comments 2-something%. And this isn't even merely as bad as the straight comparison indicates, but significantly worse : a sub 100k libc getting away with 10% comment by mass is very different from a >1mn libc getting away with anything like HALF commentary by mass -- much in the vein of how my coffee maker may come with a single page of printed manual, 1 ppm of the entire packaging, but my BMW'd better come with complete fucking docs, because guess what, if something goes awry I have no way to intuit the correct approach to significantly longer, utterly more complex mechanics.
As a rule of thumb I would propose that every thousand lines of code added to a codebase DOUBLES the required PERCENTAGE of commentary included. So a 1001 LOC codepile'd better include at least TWO comment lines, a 10k LOC codepile'd better include 10% commentary by line count, and a 1mn codepile can not possibly exist, because 21000 exceeds 106
Comment by Mircea Popescu — 2020/01/02 @ 10:23 a.m.
Thanks for the definitions and measurements.
I'll link in spyked's piece On Intellectual Ownership to supplement.
Comment by Robinson Dorion — 2020/01/12 @ 1:33 p.m.