Saturday, August 29, 2009

Design problems in Linux

Having played around with Linux as a server and desktop environment for about 8 years I've developed kind of love-hate relationship with it. I decided to list a couple of things, Linux should do to be a lot better operating system. These are important issues from practical system administrator point of view. I don't think all these points are necessary beneficial, and all of them would definitely need a closer analysis, but I've come across some ideas from other disciplines, and I'd like to share them.

1. Centralized package management

In short, Linux servers have two kinds of information. Information you produced, which you need to maintain manually and information someone else needs to update. In a perfect world, we wouldn't have to ever do something the computer could do for us. The less there is need for manual maintenance, the more efficient the system is.

Package management system should be the master of all programs. Usually after some time my Linux installation was full of libraries, programs and whatever that was just there because I had been using some software for a moment and it just happened to have several dozen dependencies. But flagging solves this issue. In short, package management software installs, updates your software and removes it without you have to do any of the heavy lifting.

That is great but the bad news is that there will be lots of software that do not use this. Scripting environments like PHP, Perl, Ruby, Python have all their own package managers. Having an outdated PEAR or CPAN library can mean a serious security flaw. Updating each of them is just extra maintenance that can be easily forgotten but can be disastrous to the system.

The biggest dilemma here is that sometimes these language libraries have packages in the Linux distribution system. Perl has dozens, if not hundreds in Debian's repositories. Some programs have dependencies on those libraries. That's not a big issue until you have to install custom Perl libraries. Once I was able to destroy the Perl libraries completely by just installing CPAN and updating it over Debian-flavored Perl libraries.

This is a complex problem because PHP, Perl etc. have to support several platforms.
Ideally this program would be open-source, multi-platform and able to support distributions and individual programs alike. In the best case, it could integrate with the system when necessary. This would mean some kind of an open API between program-specific package managers and operating system package management systems. If implemented correctly, this could have great benefits.

2. /etc is very disorganized

My main complaint against /etc is that there're way too many configuration files and syntaxes. Editing configuration files in text mode is primitive and tiresome for several reasons:
  1. Syntax errors in configuration files. They are dangerous and avoidable by design
  2. Program integration. Web-based management programs can corrupt /etc completely
  3. Usability. GUI can be a lot more intuitive than configuration files for obvious reasons
  4. Configuration updating. Differentiating configuration files is just bad by design.
The fundamental problem is that there's a lot of structured information of similar form that is not structured in an uniform manner. This leads to inefficencies because software developers and system administrators will have to take do similar steps to manage the information. Most of the config files are key-values but implementing that with 100 different syntaxes is not good.

Thus we need one syntax that works for almost everything. The syntax could be XML or anything that is flexible enough to fit these features:
  1. Value and key
  2. Hierarchy
  3. Virtual versions
  4. Versioning
  5. Default values
  6. User comments
  7. Documentation
Benefits of such system would be very high:
  1. Less design problems and maintenance for software developers
  2. Less maintenance and manual repair for system administrators
  3. Useful extra features provided with minimal costs
And all of this, all of this would be done with strict and common syntax. Whether its Apache, Bind, MySQL, Samba, Vim, X, FTP, Ldap you would have the same syntax.

The configuration file should be split up between the program- and user-customized part. Package management system would update the program-controlled part, and would set new default values for meaningless variables that are never touched and keep customized parts customized.

This would make it easier to do a lot of administrative tasks.

But the main benefit would be that a specific configuration program would be used to change the settings. Simply superconf ftp and you get Kernel-like ncurses view. You can scroll up to change a setting, check if you want use its default setting, check the comment you made in 2 years ago and with a click, see a short description of what it is.

The standardization would also make GUI and Web-based server configuration software have a field day because they would have simple syntax to edit settings. This would not require any extra steps from the author.

Above everything, predone program logic (that requires updating) should not be in /etc. It is not that I don't want init-scripts to be editable, but I don't think /etc should be cluttered with files 95% people never edit. When making a configuration dialog for any GUI program, UI designers usually think twice which features are actually needed and what can be left to be customized some other way.

Fundamentally I think file system has its limits and Linux is trying to push it too much. It may be better to implement the whole /etc with an object database with a static-file fallback (imagine a MySQL database with an auto-updated SQLite database).

3. Filesystem should be virtualized

After destroying my Perl-environment and half of the server with the CPAN installation, I realized there are several design problems causing this. The first problem is that there's this shady area between program-specific package managers and OS ones. The second flaw is related to the filesystem which permits this to happen. This wouldn't have to happen if filesystem would be "virtualized".

This goes for filesystems that contain program code and any assets related to these programs.

With a virtualized filesystem, the Kernel creates a real filesystem by reading and combining the contents of packages from a given directory. I'll leave the practical implementation to people wiser than me but main idea is that you add layers of new files upon other layers instead of modifying the existing layers.

The benefits of such system would be huge. It would be practically impossible to destroy contents of /usr. If you would want to install Apache, the APT would copy the package file to some directory like /usr/pkg and then it would generate contents of /usr based on the packages there. If you want try to out ApacheTweaks-2.2, you can just install it with APT (which keeps all the useful records of installations) and then if it doesn't work out for you, you can just remove it.

The same idea is used in some Q3-based games and some Steam-based games. If you want to try custom models, you don't need to move models manually and backup the old ones but you can copy the package to some directory and the resulting final data directory structure will be generated. It'll also make it easy to download files from servers this way without corrupting the data directory for different mods.

4. Source installation is madness

Quite frankly source installation is a flawed idea by design for other than engineering purposes.
  1. Compilation depends on a lot of conditions being correct. Failing compiling is likely and for any production environment this is just unacceptable.
  2. Compiling is process that has rather dimnishing returns yet huge costs
  3. Updating is a risky and tiresome process
  4. It can kill if you don't update it, especially if its a network service.
  5. It will have dependencies that'll be installed and forgotten
  6. You'll litter your filesystem with extra files that may have to be manually cleaned up
I try to avoid source installation at all costs. The real question we must ask is why is there a need for source installation. Most of the answers can be reduced to some kind of defect of the package management system.

5. Conventions over configurations.

Web frameworks have realized this. The importance of convention over configurations. In usability Linux is what Lenya is compared to Ruby on Rails. Not everything has to be customizable, atleast on the front. Make sensible defaults and leave stupid stuff out. Partly the problem is the lack of common syntax in /etc, which would solve most of our configuration nightmares but this applies to usage aswell.

6. Kernel configuration is unnecessarily complex

I think configuring Linux Kernel should take much less time. Windows is able to detect hardware and install drivers for it. Why not Linux?

First of all I think the package manager should be able to install Linux modules. And the necessary packages should be auto-detected and installed if configured to do so. There're of course security concerns and I obviously this doesn't work for everyone but I find it inefficient to go through a million options just flip a switch in the Kernel.

Maybe whole Kernel could be just another program controlled by the package management system, having a normal configuration file.

If you would remove all the options which can somehow (either upfront or later) auto-detected, how many options would be left? Not a lot. There're important things like which IRQ and process scheduler to use and this is exactly what Kernel configuration is for. Sometimes you need a realtime-kernel, sometimes a grsec-kernel. Now imagine a situation where installing a grsec kernel with custom IRQ scheduler would be a matter of installing the correct packages and editing a configuration file and executing maybe one command after that.

This is not such a big issue since updating Kernel is not an every-day problem but nevertheless I think the process could be simplified.

7. Lack of price system

This has more to do with economics, but changing prices have always been the fastest way to transfer information. I don't think Linux should be commercial but the very lack of a direct consumer relationship is just causing this dire situation in lack of usability. Price system forces people to think about what the users want.

Just like you don't want to fill five paper forms everytime you purchase a PC component, you don't want to write a shell script do anything that can be more efficently. Efficiency is the very reason for the western society. We need less time, less people and less resources to do something.

Getting sound to work in Windows is hardly ever an issue thanks to price system. You don't have this mess in Windows. Whoever designed that should really think twice what's wrong with it and how did we end up in such situation in the first place. Or maybe the question should be who didn't design it.

8. Too many distributions

There shouldn't be so many distributions Think about all the redundant work that would be eliminated by co-operation to create one distribution:
  1. Package managers - they do the same job
  2. Package repositories - a simple program can have 100 different versions of it
  3. Different configuration files and directory paths add practically nothing of value!
This is a derivative problem of all other problems, this'll be fixed when one distribution is good enough to destroy others. Because there isn't one distribution that is good enough, people divide and use different distributions which leads to all kinds of other problems, one of them being lack of binaries of recent software releases.

Obviously different systems have different needs but I'd like Linuxes share more in common.

Quite frankly, this is some kind of Prisoner's dilemma.

9. Modularity is important

I believe the problem with eg. Debian's dedicated package manager is that it assumes only one version per software when it should be much more modular. One solution is virtualized Kernel but that's inefficient for most purposes. In Windows I have sometimes several copies of the same software. In Linux this is a problem because source installation is in my opinion just very, very impractical and the package manager doesn't support several software versions of the same software.

A good example of this is RVM. I love the program. It makes it possible to test out all different kinds of Rubies without cluttering the filesystem. Now take that idea and apply it system-wide. The system package manager could easily manage several copies of same software along with all the data.

Fundamentally this is a problem with idea of the file system where you have different directories for different types of data. This kind of assumes only one instance of one type of data. For example the filesystem could be cleaned up someway like this.
  1. /etc should have around 1-file-per-program and naming conventions for different versions
  2. /var could be have a directory for each program-version consisting of its data files
  3. /bin and /lib are in my opinion just obsolete legacy
  4. /usr could be virtualized as explained previously (also remove stuff like games, X11 etc.)
  5. /opt will be obsolete with virtualized packages
  6. /dev could probably be organized in better fashion
Some of these are maybe superficial changes but I like elegant and simple design. I prefer 5 files over 50 files. It is like having a clean desk, much easier to focus on the important information.

10. Design carefully

I think a lot of these issues could be avoided if Linux developers would think twice before making stuff. They're not only making a drastic experience for system administrators but they're making their fellow developers make extra work for bad design decisions.

Everytime a new system is created or implemented some of the following questions should be asked and answered:
  1. How much work is needed to install, maintain and remove the system?
  2. Are there are repetitive tasks that could be avoided?
  3. Does it share some common characteristics of some other systems? If yes, could they be integrated or implemented in similar fashion?
  4. How modular is it if something needs to be customized?
Designing a good system takes a bit of thinking but it saves a lot of headache. I think law of unintended consequences should be understood. Like in law, simplicity and minimalism are important. Interaction of complex rules can have unexpected results.

Final Words

Most of the code behind Linux is good, but the usability is just drastic and mostly because of design flaws, which if fixed, would save lots of development and sysadmin hours.. Obviously the system can always be improved but I have had too many headaches that were avoidable.

The famous DRY principle should be the guiding stick for the next Linux.

If you find this article interesting, please post it to your friends, tweet it or something. I really want some of these ideas happen but I'm not sure if I have personal time and skills to implement them. And I think if implemented, everyone will benefit.

After posting this I realized I would have think some of my solutions more thoroughly but this blog post had been laying here over a year or two, and I wanted to publish this.

No comments:

Post a Comment