trinque

2020/01/20

A Republican OS - Part 3

Filed under: Uncategorized — trinque @ 10:58 p.m.

Before we continue our enumeration of components, let's discuss how a Linux system boots on an ATX-compatible computer. We'll focus on BIOS/MBR for now.1

First, the power button is pressed, which causes the PSU to turn on. Meanwhile, the motherboard begins sending "reset" to the CPU. The PSU, having established voltages within acceptable ranges, sets voltage high on the "PWR_OK" pin. This causes the motherboard's chipset to release the CPU from reset. The CPU then executes the BIOS from system ROM.2 The BIOS performs a power-on self test, and barring any terminal errors continues the boot process. This may involve executing the BIOS of video cards, hard drives, etc.3 After several lifetimes of initialization, which we'll elide for now, the computer is on and it's time to load an operating system from an attached hard drive.

The BIOS initiates this process via interrupt 0x19h, the handler of which makes successive attempts to load a master boot record at memory address 0x7C00 and execute.4 The first sector of a bootable hard drive5 contains contains the MBR6. This contains executable machine code,7 a unique-ish drive identifier,8 two null bytes,9 the partition table which reserves 16 bytes per partition,10 and the terminating two-byte "signature" of 0xAA55.11 Having identified a valid MBR, execution jumps to its memory address. Given the significant storage space restrictions in the MBR, the so-called 1st-stage boot-loader does little else than to locate, load, and execute a larger 2nd-stage boot-loader located somewhere else on the drive. The second-stage boot-loader in turn loads and executes an operating system kernel, in this case the Linux kernel.

The initialization process of the Linux kernel will require a series of articles all its own. For now, we'll move on. When kernel initialization is complete, the final stage of the boot process involves handing control over to a user-space program called init. This program controls the rest of the boot process, and is also responsible for graceful shutdown and reboot. There are customary features and behaviors for an init program, but all that's strictly necessary is that it never exit, for an exit from init will result in a kernel panic. Here we'll need to make another decision, so here we'll resume our analysis of options.

Busybox

This software is a damned treat. It was written with embedded systems and initramfs in mind, providing a thorough and useful set of user-land software. The size constraints of embedded systems have kept it from ballooning uncontrollably as Linux utilities more commonly used on desktop systems have. For not much more than the cost of bash12 you get an init, device node management, networking utilities, archivers, text editors... indeed, all the shell commands you might expect.13 If there's a compelling case why Busybox is an insufficient base system, I'd love to know why.

busybox-1.31.1 cloc .
    3726 text files.
    3114 unique files.
    1727 files ignored.

github.com/AlDanial/cloc v 1.70  T=13.75 s (145.6 files/s, 24038.0 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                               675          27959          60252         180543
C/C++ Header                   1115           1415           2721          31012
Bourne Shell                    179           1556           1468           8658
HTML                             10           1853             32           7324
C++                               1            166             62           1197
make                              6            307            451            911
yacc                              1             93             20            570
Perl                              3            100            185            337
lex                               1             40             12            303
NAnt script                       1             84              0            260
Bourne Again Shell                5             51             94            234
Python                            1             12             13            110
Qt Project                        1              0              0             33
awk                               1              2              8             30
bc                                1             10              0             25
diff                              1              1              8             15
--------------------------------------------------------------------------------
SUM:                           2002          33649          65326         231562
--------------------------------------------------------------------------------

Not-busybox

Assuming we reject busybox, we'll still need to assemble much of what it provides. We'll make some conservative selections going forward, and see where we end up.

sysvinit

Here's a simple standalone init program. Aside init, several other redundant programs are provided, including one for killing everything but init,14 and another which provides a rudimentary logger.15

sysvinit-2.95 cloc .
      71 text files.
      66 unique files.
      38 files ignored.

github.com/AlDanial/cloc v 1.70  T=0.30 s (108.8 files/s, 40593.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               21           1262           2218           8004
C/C++ Header                     9             51            205            258
make                             2             55             25            202
Bourne Shell                     1              3              3             28
-------------------------------------------------------------------------------
SUM:                            33           1371           2451           8492
-------------------------------------------------------------------------------

coreutils

This is the GNU incarnation of the standard UNIX utilities. It's heavier than all of busybox, and doesn't even give you the shell to run the 76k lines of shell script.

coreutils-8.31 cloc .
    2878 text files.
    2858 unique files.
     484 files ignored.

github.com/AlDanial/cloc v 1.70  T=11.84 s (202.2 files/s, 40526.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                              870          29351          36181         151337
Bourne Shell                   637          24014          20200          75874
C/C++ Header                   373           7551          12925          40036
m4                             423           2459           2498          36820
make                            16           3502           3068           7920
Perl                            70           1819           2409           7671
TeX                              1            811           3695           7163
yacc                             1            279            309           1840
Python                           1             12              9             48
sed                              2              0              0             16
-------------------------------------------------------------------------------
SUM:                          2394          69798          81294         328725
-------------------------------------------------------------------------------

bash

So here's the shell then, too. We're up to 536k lines by now, and we're still missing quite a lot.

bash-5.0 cloc .
    1251 text files.
    1219 unique files.
     755 files ignored.

github.com/AlDanial/cloc v 1.70  T=5.80 s (85.6 files/s, 46369.2 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
C                                      258          20944          20498         107295
HTML                                     3           3854             38          26338
Bourne Shell                            36           3333           3388          20114
Windows Module Definition               44           2581             11          15096
C/C++ Header                           111           2821           3506           7617
TeX                                      1            821           3462           6762
yacc                                     2            824            968           5398
m4                                       4            478            439           4742
Perl                                     2            535            834           4229
Bourne Again Shell                      27            235            346            994
make                                     3             48             36            110
Assembly                                 2             11             20             48
awk                                      1              8             15             24
sed                                      2              0              0             16
---------------------------------------------------------------------------------------
SUM:                                   496          36493          33561         198783
---------------------------------------------------------------------------------------

net-tools

We'll need these to manipulate our network interfaces if we pass on busybox. 549,362 and counting.

net-tools-1.60 cloc .
     152 text files.
     131 unique files.
      69 files ignored.

github.com/AlDanial/cloc v 1.70  T=0.26 s (314.7 files/s, 65320.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               67           1767           1571          12322
C/C++ Header                    11            133            116            680
make                             4             87            122            257
Bourne Shell                     1             14             56            103
-------------------------------------------------------------------------------
SUM:                            83           2001           1865          13362
-------------------------------------------------------------------------------

gzip

Busybox comes with its own implementation of several archive formats implemented in about 14k lines. Meanwhile, just a standalone GNU gzip is about four times fatter. 609,918 lines of code to know, own, and understand and we don't even have a text editor yet.

gzip-1.3.14 cloc .
     362 text files.
     352 unique files.
      70 files ignored.

github.com/AlDanial/cloc v 1.70  T=0.70 s (419.6 files/s, 123880.3 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
Bourne Shell                            19           4519           3206          21230
C                                       86           2697           4765          15743
m4                                     106            666            742          10168
C/C++ Header                            69           1530           2666           5905
TeX                                      1            731           2941           5619
make                                     6            628            493           1677
Assembly                                 1             21             38            179
DOS Batch                                1              0              0             18
Perl                                     1              1              2             12
Windows Module Definition                2              0              0              5
---------------------------------------------------------------------------------------
SUM:                                   292          10793          14853          60556

eudev

Our machine needs to make new device nodes for when new hardware is connected.16 Let's select one of the cheaper device management daemons from Gentoo. 638,677 lines.

eudev cloc .
     205 text files.
     203 unique files.
      60 files ignored.

github.com/AlDanial/cloc v 1.70  T=0.39 s (371.4 files/s, 101693.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               71           4844           3961          22185
C/C++ Header                    43            690            853           2124
Perl                             2            108             36           1753
XML                              4            120             16           1542
make                            16            121              8            490
m4                               2             74             10            380
Python                           2             38             46            225
Bourne Shell                     3             10              6             25
Markdown                         1              8              0             21
YAML                             1              0              0             14
-------------------------------------------------------------------------------
SUM:                           145           6013           4936          28759
-------------------------------------------------------------------------------

We're not nearly done selecting the components that would fill the rest of the gaps in our non-busybox system, and we're already at nearly three times the line count. If there are not significant gains to be had from the additional features found in the heftier implementations of these utilities, it would be difficult to justify their presence in a system which much be understood by its maintainers, instead of merely excreted by them. As we proceed, we must remain cognizant of how much complexity we've chosen to accumulate beneath us. Until next time.

  1. Note that GPT partitioning does not require use of UEFI. This was a point of confusion on my part, and perhaps yours! []
  2. This is mapped to the memory address 0xFFFFFFF0 on x86. []
  3. Yes, your computer is made of computers, which are made of... aw hell. []
  4. If anyone knows what hysterical raisins came up with this address, let me know. []
  5. Go on, take a look at your own. dd if=/dev/sda of=mbr.bin bs=512 count=1; hexdump mbr.bin []
  6. 512 bytes due to historic hard drive sector size. These days sector sizes of 4k are common. []
  7. 0x000-0x1b7 []
  8. 0x1b8-0x1bb []
  9. 0x1bc-0x1bd []
  10. 0x1be-0x1fd []
  11. Note that your hex editor will display this the other direction on a little-endian architecture. []
  12. !!!!!!!!!!! []
  13. Not to mention, a full-featured shell that will happily pretend to be bash for you, if you like. []
  14. Surely this can be accomplished with a simple shell loop and `kill`? []
  15. Aside which you probably intend to install a syslog daemon, so why have two? []
  16. A USB drive or peripheral, perhaps. []