The Linux Gene Network
May 5, 2010 9:19 PM Subscribe
Yale scientists analogize the Linux call graph with the E. coli gene regulatory network in an open access PNAS article. Carl Zimmer explores the implications of network design versus evolution, suggesting that a more modular architecture in bacteria leads to a rugged (i.e. robust) system that does not "crash" like a computer.
It's been several years since classes where I've needed to read something from the Proceedings of the National Academy of Sciences, but it still gets me every time...
PNAS
hah!
posted by phunniemee at 9:35 PM on May 5, 2010 [4 favorites]
PNAS
hah!
posted by phunniemee at 9:35 PM on May 5, 2010 [4 favorites]
Maybe bacteria do sometimes crash -- it's not like we're watching all of them, all the time. They only need 'uptime' of a couple of hours to reproduce, after all...
posted by Malor at 9:39 PM on May 5, 2010 [1 favorite]
posted by Malor at 9:39 PM on May 5, 2010 [1 favorite]
Living things can be killed; thus went smallpox. The idea is, can you name a component of Linux that you could corrupt and cause the whole system to crash? It's easier to do than in bacteria, which can withstand perturbations. (Up to a point- of course, if your perturbation is targeted just right, the organism dies.)
posted by jjray at 9:40 PM on May 5, 2010
posted by jjray at 9:40 PM on May 5, 2010
Life is all about RISC.
posted by wobh at 9:52 PM on May 5, 2010 [1 favorite]
posted by wobh at 9:52 PM on May 5, 2010 [1 favorite]
Insert joke about Linux being resistant to viruses here. Begin flame war.
posted by Chuffy at 10:04 PM on May 5, 2010
posted by Chuffy at 10:04 PM on May 5, 2010
jjray: "Living things can be killed; thus went smallpox. The idea is, can you name a component of Linux that you could corrupt and cause the whole system to crash? It's easier to do than in bacteria, which can withstand perturbations. (Up to a point- of course, if your perturbation is targeted just right, the organism dies.)"
If I flipped a single bit among 4GB of RAM (non-ECC) within a Linux computer, what are the odds that this would crash the computer?
posted by pwnguin at 10:04 PM on May 5, 2010 [1 favorite]
If I flipped a single bit among 4GB of RAM (non-ECC) within a Linux computer, what are the odds that this would crash the computer?
posted by pwnguin at 10:04 PM on May 5, 2010 [1 favorite]
If I flipped a single bit among 4GB of RAM (non-ECC) within a Linux computer, what are the odds that this would crash the computer?
Crash the computer? Pretty low odds. Crash a program? Far too many variables to guess. In the pathological case it could be almost guaranteed to crash, and at the other end it could be almost guaranteed not to.
But either way you're asking the wrong question because almost all of those 4GB will be occupied by userspace data (and why stop at 4GB? Linux can address far, far more). The better question is: what are the odds that flipping a bit in the kernel image would crash the computer at some point? Or what are the odds that flipping a bit in the source code would lead to a crash or even failure to compile? My guess is pretty high odds, especially if you stripped the comments out of the source code first. Source code is extremely fragile stuff.
posted by jedicus at 10:23 PM on May 5, 2010 [2 favorites]
Crash the computer? Pretty low odds. Crash a program? Far too many variables to guess. In the pathological case it could be almost guaranteed to crash, and at the other end it could be almost guaranteed not to.
But either way you're asking the wrong question because almost all of those 4GB will be occupied by userspace data (and why stop at 4GB? Linux can address far, far more). The better question is: what are the odds that flipping a bit in the kernel image would crash the computer at some point? Or what are the odds that flipping a bit in the source code would lead to a crash or even failure to compile? My guess is pretty high odds, especially if you stripped the comments out of the source code first. Source code is extremely fragile stuff.
posted by jedicus at 10:23 PM on May 5, 2010 [2 favorites]
Source code is extremely fragile stuff.
MD_Update(&m, buf, j);
posted by tracert at 10:34 PM on May 5, 2010
MD_Update(&m, buf, j);
posted by tracert at 10:34 PM on May 5, 2010
If Linux represents bacteria, then does Microsoft represent humanity? I mean, the theory of Linux as an organism is intriguing...and the organization of Microsoft OSes sort of looks like how we organize ourselves socially, no?
posted by Chuffy at 10:35 PM on May 5, 2010
posted by Chuffy at 10:35 PM on May 5, 2010
Source code is extremely fragile stuff.
There are interesting parallels in how redundancy and failure tolerance evolves in complex systems of information, for example in the development of verification of source through MD5 signatures, and chemical self/non-self verification of DNA through methylation, or redundancy through DNA repeats, the codon code, or multiple, independent methods for doing something with, say, an OpenGL framework.
posted by Blazecock Pileon at 10:49 PM on May 5, 2010
There are interesting parallels in how redundancy and failure tolerance evolves in complex systems of information, for example in the development of verification of source through MD5 signatures, and chemical self/non-self verification of DNA through methylation, or redundancy through DNA repeats, the codon code, or multiple, independent methods for doing something with, say, an OpenGL framework.
posted by Blazecock Pileon at 10:49 PM on May 5, 2010
jedicus: "
Crash the computer? Pretty low odds. Crash a program? Far too many variables to guess. In the pathological case it could be almost guaranteed to crash, and at the other end it could be almost guaranteed not to.
But either way you're asking the wrong question because almost all of those 4GB will be occupied by userspace data (and why stop at 4GB? Linux can address far, far more)."
Because 4GB is roughly the size of the human genome, the only genome who's size I'm actually familiar with. In the above scenario I argue that the kernel is closer to the ribosomes that are highly conserved in bacteria, and the other parts of RAM, the userspace part, is the heat stress transcription factor and other things useful for competitive advantage but not strictly required for survival.
Not that I'm volunteering to do this, but a true test of robustness that would compare computers with bacteria would be to build an Ubuntu server with a default install configured to boot from PXE, compile all the source code that it embodies (kernel, and compiler as a matter of necessity) and then serves PXE to the next computer. Perterb that process and see how it copes.
This thought experiment would probably still not compare well, except maybe the code in perl. Optimizing compilers treat ambiguity as a failure and do substantial static analysis, whereas ribosomes are GIGO. If we turned off all the unit testing and make the build as lenient as possible you might have a shot. But terrible code. And that's really what I think genomes are. Terrible code binary patched to hell with wicked side effects in multiple transcriptions when you change any one damn thing.
posted by pwnguin at 11:06 PM on May 5, 2010
Crash the computer? Pretty low odds. Crash a program? Far too many variables to guess. In the pathological case it could be almost guaranteed to crash, and at the other end it could be almost guaranteed not to.
But either way you're asking the wrong question because almost all of those 4GB will be occupied by userspace data (and why stop at 4GB? Linux can address far, far more)."
Because 4GB is roughly the size of the human genome, the only genome who's size I'm actually familiar with. In the above scenario I argue that the kernel is closer to the ribosomes that are highly conserved in bacteria, and the other parts of RAM, the userspace part, is the heat stress transcription factor and other things useful for competitive advantage but not strictly required for survival.
Not that I'm volunteering to do this, but a true test of robustness that would compare computers with bacteria would be to build an Ubuntu server with a default install configured to boot from PXE, compile all the source code that it embodies (kernel, and compiler as a matter of necessity) and then serves PXE to the next computer. Perterb that process and see how it copes.
This thought experiment would probably still not compare well, except maybe the code in perl. Optimizing compilers treat ambiguity as a failure and do substantial static analysis, whereas ribosomes are GIGO. If we turned off all the unit testing and make the build as lenient as possible you might have a shot. But terrible code. And that's really what I think genomes are. Terrible code binary patched to hell with wicked side effects in multiple transcriptions when you change any one damn thing.
posted by pwnguin at 11:06 PM on May 5, 2010
But if one bacterium crashes, it's just that one bacterium. Boo hoo. There's a bazillion more. Do computers (programs) need to multiply as quickly as possible so that at least one of them will ... finish the computation? Or find the answer to an unasked question?
Not just bacteria, but all biological cells have many complex regulatory pathways running on feedback loops, both positive and negative. (MAP Kinase Kinase Kinase amuses me... but so does Buffalo Buffalo buffalo buffalo Buffalo Buffalo buffalo).The balance of kinase and phosphatase activity in pre-/post-synaptic neuronal systems is actually really intriguing.
I'd argue that a mammalian cell has much better error correction and stability than most bacterial cells. Bacteria don't have to be perfect - hell, it's an advantage for bacteria to be lossy and prone to error; "just in case theerror bug is actually a feature."
posted by porpoise at 11:15 PM on May 5, 2010
Not just bacteria, but all biological cells have many complex regulatory pathways running on feedback loops, both positive and negative. (MAP Kinase Kinase Kinase amuses me... but so does Buffalo Buffalo buffalo buffalo Buffalo Buffalo buffalo).The balance of kinase and phosphatase activity in pre-/post-synaptic neuronal systems is actually really intriguing.
I'd argue that a mammalian cell has much better error correction and stability than most bacterial cells. Bacteria don't have to be perfect - hell, it's an advantage for bacteria to be lossy and prone to error; "just in case the
posted by porpoise at 11:15 PM on May 5, 2010
Do computers (programs) need to multiply as quickly as possible so that at least one of them will ... finish the computation?
This is not a bad description of how render farms work.
posted by rodgerd at 1:35 AM on May 6, 2010
This is not a bad description of how render farms work.
posted by rodgerd at 1:35 AM on May 6, 2010
Linus Torvalds has explicitly stated that evolution is superior to deliberate design, even in the context of large software projects.
posted by a snickering nuthatch at 4:29 AM on May 6, 2010
posted by a snickering nuthatch at 4:29 AM on May 6, 2010
Crap comparison IMHO, because they're missing dependencies that are not part of their "call graph" and they're ignoring a lot of the biological mechanism. They're mixing completely different metaphors within the two models, not to mention comparing a self-replicating system with one that is not.
For example, their concept of "workhorses" and "managers" is faulty because they're ignoring the lowest level of the genetic mechanism, which is the bit where transcription occurs and proteins are synthesized - a very very small number of "workhorses" synthesizing a bunch of different stuff from all those different individual genes. Have an error there and it's pretty catastrophic, just like it's catastrophic to have an error in an IDE driver on linux.
But sure, designed systems don't look much like evolved systems. Duh.
posted by polyglot at 4:36 AM on May 6, 2010
For example, their concept of "workhorses" and "managers" is faulty because they're ignoring the lowest level of the genetic mechanism, which is the bit where transcription occurs and proteins are synthesized - a very very small number of "workhorses" synthesizing a bunch of different stuff from all those different individual genes. Have an error there and it's pretty catastrophic, just like it's catastrophic to have an error in an IDE driver on linux.
But sure, designed systems don't look much like evolved systems. Duh.
posted by polyglot at 4:36 AM on May 6, 2010
Also fodder for whether flipping a random bit in RAM would cause the computer to crash: it depends on whether the bit being flipped is in a word (a grouping of bits) that will be interpreted in a more code-like or data-like way.
If the word will be interpreted in a very code-like way (e.g. actual machine code, text source code for a script that is about to be run, configuration information) then a crash is very likely. There are some words that will be interpreted in a way between that of code and data- data that directs the interpretation of other data (e.g. a number that describes how large a buffer is in memory)- and flipping a bit in one of those words is also likely to cause a crash. If the word is interpreted in very data-like way (e.g. the text in a text editor's window, or an amplitude value in an uncompressed sound file, or a pixel's color in an uncompressed image) then a crash is unlikely.
posted by a snickering nuthatch at 4:42 AM on May 6, 2010
If the word will be interpreted in a very code-like way (e.g. actual machine code, text source code for a script that is about to be run, configuration information) then a crash is very likely. There are some words that will be interpreted in a way between that of code and data- data that directs the interpretation of other data (e.g. a number that describes how large a buffer is in memory)- and flipping a bit in one of those words is also likely to cause a crash. If the word is interpreted in very data-like way (e.g. the text in a text editor's window, or an amplitude value in an uncompressed sound file, or a pixel's color in an uncompressed image) then a crash is unlikely.
posted by a snickering nuthatch at 4:42 AM on May 6, 2010
This discussion about flipping random bits is interesting, but so is the article. The actual image shows the basic idea. Computer programming seems to be about making many high-level actions depend on just a few low-level operations. E coli seems to be the opposite.
But of course, E coli doesn't have a lot of high-level actions. If we moved up to a worm or something, would the graph shape remain the same or would the low-level operation stay the same size with only the high-level action expanding?
Also, the reason programmers use a few low-level operations to support everything else is not exactly robustness. It's for understandability and predictability. If you have 104 different ways to add two numbers, you are never going to remember which ones to use. And you'll have so many test cases you won't be sure that any two random numbers will add correctly.
posted by DU at 4:55 AM on May 6, 2010
But of course, E coli doesn't have a lot of high-level actions. If we moved up to a worm or something, would the graph shape remain the same or would the low-level operation stay the same size with only the high-level action expanding?
Also, the reason programmers use a few low-level operations to support everything else is not exactly robustness. It's for understandability and predictability. If you have 104 different ways to add two numbers, you are never going to remember which ones to use. And you'll have so many test cases you won't be sure that any two random numbers will add correctly.
posted by DU at 4:55 AM on May 6, 2010
From Zimmer's blog post: and major computer companies like Microsoft and Dell begn to support the system.
I think he might want to double check his facts.
posted by atbash at 7:51 AM on May 6, 2010
I think he might want to double check his facts.
posted by atbash at 7:51 AM on May 6, 2010
I think he might want to double check his facts.
Microsoft doesn't write a lot of software for Linux, but it does write a little. For example, the Subsystem for UNIX-based Applications (SUA, formerly Services For UNIX), includes a binary for RedHat for performing password synchronization between Windows and Linux servers.
posted by jedicus at 8:05 AM on May 6, 2010
Microsoft doesn't write a lot of software for Linux, but it does write a little. For example, the Subsystem for UNIX-based Applications (SUA, formerly Services For UNIX), includes a binary for RedHat for performing password synchronization between Windows and Linux servers.
posted by jedicus at 8:05 AM on May 6, 2010
Microsoft doesn't write a lot of software for Linux, but it does write a little. For example, the Subsystem for UNIX-based Applications (SUA, formerly Services For UNIX), includes a binary for RedHat for performing password synchronization between Windows and Linux servers.
I'm familiar with the fact that they've got this, but I'd argue it's a far cry from "supporting" Linux.
posted by atbash at 8:17 AM on May 6, 2010
I'm familiar with the fact that they've got this, but I'd argue it's a far cry from "supporting" Linux.
posted by atbash at 8:17 AM on May 6, 2010
It's an interesting paper, but I think it largely ignores that the various ways genetic code fails have many traits that aren't as common in the ways that computer programs fail. For example, many genetic failures are places where the code is damaged (such as transpositions), which actually helps create the more divergent graph they're showing, whereas computer programs don't (as) often fail by randomly calling the wrong code*, and when they do, it doesn't change future invocations of the program.
The reason for this is fundamental - computer programs are (generally) copied from their storage and then executed, and self-modifying code is largely frowned upon. If something does go wrong and the executing code is altered, the original is not damaged. Gene-based programs are executed in place, and if something goes wrong and causes a change in the code, the original is what is modified.
* I am aware of security exploit techniques that intentionally cause this very behavior
posted by atbash at 8:27 AM on May 6, 2010
The reason for this is fundamental - computer programs are (generally) copied from their storage and then executed, and self-modifying code is largely frowned upon. If something does go wrong and the executing code is altered, the original is not damaged. Gene-based programs are executed in place, and if something goes wrong and causes a change in the code, the original is what is modified.
* I am aware of security exploit techniques that intentionally cause this very behavior
posted by atbash at 8:27 AM on May 6, 2010
It's kind of annoying that Zimmer calls Linux "brittle". It would certainly be brittle in the face of a mutation, but that rarely happens to computer memory.
posted by delmoi at 5:51 PM on May 6, 2010
But of course, E coli doesn't have a lot of high-level actions. If we moved up to a worm or something, would the graph shape remain the same or would the low-level operation stay the same size with only the high-level action expanding?They're only looking at cellular interactions. And who knows if we have a complete protein interaction graph for humans or other eukaryotes. I think it's unlikely.
posted by delmoi at 5:51 PM on May 6, 2010
The thought just occurred to me, a day later, that large-scale organisms most definitely crash. Humans do it all the time. It's called cancer.
posted by Malor at 8:18 AM on May 7, 2010
posted by Malor at 8:18 AM on May 7, 2010
« Older The Year of the Drone | And we're off... Newer »
This thread has been archived and is closed to new comments
posted by leviathan3k at 9:29 PM on May 5, 2010