Yellow=DNA
Purple=ribosomes scanning white mRNA
and a jolly green flagellum.
David S. Goodsell




The Virtualization of Biology

Fred Hapgood

In our time one of great walls dividing the sciences has come down. Before computers, the sciences were divided into the experimental and the observational; physics and chemistry being instances of the former; astronomy and geology, the latter. Thanks to digital simulations, today all sciences are experimental to some degree, even cosmology.

Yet while one wall was being destroyed another has gone up. Even fields with a rich history of experiment would prefer to use simulations if they could, at least some of the time. A cutting- edge hardware procedure in (for instance) cell biology might cost $50,000 or more, take weeks or even months, and is perpetually at risk of being wrecked by an absentminded undergraduate misreading some instruction about nutrient dosage. Simulations can execute a virtual experiment in minutes for pennies. They are infinitely easier to control. Replicability and general access, the twin essences of the scientific method, could not be simpler to manage.

Models make an unrivaled publication medium (even for data derived from hardware) because they can be queried dynamically and as often as necessary. A researcher stumbling over a difficult point in a paper article has to try to reach the experimenter directly. That can be delicate, especially if the latter sees the former as a competitor. In short, in the 21st century a science without a powerful simulation of its subject matter risks becoming the professional equivalent of naked-eye astronomy.

Cell biology in particular is increasingly being tugged towards questions that hardware- based experiments can't answer for any amount of time or money. in vitro lab work has gotten frighteningly good at breaking a cell open and spilling out huge numbers of parts. However, part lists don't communicate what all these biodevices do, which is obviously what matters. Understanding function requires tracing the career of a macromolecule through its interactions with all the other parts of the cell. That is difficult to impossible in vitro, but trivially easy with a simulation.

Biologists have understood all this since at least 1987, when Harold Morowitz of Yale lofted the vision of a cell simulator at Santa Fe's seminal Matrix of Biological Knowledge. Still, for all this clarity of vision, not much has gotten done. For a cell simulation to be useful it would first have to have a cell or organism "scanned" into it to define the default state into which changes would be made. "Scanning" here means specifying the geometric, electrical, mechanical, and chemical properties of every macromolecule of biological significance (a macromolecule is an assemblage of molecules, like a ribosome or hemoglobin) in the reference cell, including what would happen for any plausible interaction between or among them. While hopefully not every single macromolecule would need to be modeled explicitly, no one would be at all surprised if a useful model turned out to require coding 100,000 simultaneous data processes.

It gets worse. Some research on these macromolecules has been done, but almost none of it has been conducted with the idea of contributing data to a simulation. Much is qualitative or spread across different cell strains or just not spelled out in sufficient detail for the purpose. So a cell simulation project would not just be a enormous IT infrastructure project and an enormous computer modeling project; it would also have to do huge amounts of original research.

Finally, there were political and cultural issues to worry about. The project would require the combined efforts of at least hundreds, and probably thousands, of scientists spread all over the globe, all of whom would have to be willing to subject themselves to a centralized research standardization process of unprecedented detail and invasiveness, specifying everything from the exact composition of the nutrient media to the amount of vibration experienced by the bacteria. (Otherwise you couldn't be certain that data from different laboratories working on different cell systems was interoperable.)

This would be Big Science with a vengeance. Manhour-wise it would dwarf the Human Genome Project. However, as Michael Ellison, Director of the Institute of Biomolecular Design at the University of Alberta. observes, the sole example of the HGP to one side, microbiology has always been an artisan business. "Cell science researchers are not accustomed to being lectured to by world committees on how to do their work".

You can see why nothing happened for all these years. However in December of 2001, the stakes and costs got close enough for the profession to cross its collective fingers and push off, though many of these questions were still unresolved. "The technologies had begun to come together enough to make a comprehensive picture possible," says Mark Hermodson, a professor of biochemistry at Purdue. Given how useful the tool would be, "possible" was good enough for a launch.

The organism with the largest pre- existing research community, a specific strain of E. coli, got the nod to be the first uploaded living creature. The time-to- completion estimate most often heard for the project is about ten years, though of course no one knows. The organization sheparding the effort is called the International E. coli Consortium, though in recognition of the funding realities each country has their own sub-project. In this country the Consortium is headquartered at Perdue University.

Once the simulation is complete any researcher, probably eventually down to and including high school students, will be able to log on, make a new organism by introducing a modification in the reference beast, and see what happens. He or she will be able to cut and paste genes at will, reengineer metabolic pathways, add or subtract adaptations, alter the creatures' environments, whatever, press 'enter,' and see what happens.

Among other fruits, rational drug design would be enormously simplified. Indeed, with enough RAM you could just describe what you wanted on a general level ("find a cure for dysentery"), have the program throw candidate molecules at colonies of virtual organisms until one worked, and send you a notifying email. The price of drug discovery would go from hundreds of millions to a few thousand bucks, most of which would be absorbed by the patent application.

An outsider might reflect that this would all be true just for people working in E. coli, in one among dozens of strains of the bacterium, and for that matter in a strain that, thanks to the governing protocol, has had an upbringing very different even from others of its own kind. Those researchers might benefit, but there are thousands of cells of interest out there. Most of these are animal cells, which are very different from bacteria. What would the E. coli project do for them?

The organizers reply that this project is just the first rung on the ladder. It is really about setting standards and templates for all the cell simulation projects that will follow this one. After this project has unrolled for a few years the entire profession will know how to do the research required to make a virtual organism, how to code up data so it is compatible with everyone else's data, how to build search and sort tools that work smoothly in petabytes of information, what the interfaces should look like, and so on. "A good analogy is the process of establishing internet protocols like ftp, http, tcpip, etc.," says Professor George M. Church, Director of the Lipper Center for Computational Genetics at Harvard Medical School. By the time the project is over, Church says, "I would be surprised if the cost of obtaining accurate models is not reduced by a few powers of ten relative to current methods."

So in a decade or two we might have hundreds of simulations, ranging from dozens of bacterial species though single- celled animals (like amoebae), up to multicellular structures like tissues and organ systems and metazoa. Eventually, in fifty years or so, these simulations will probably extend to every creature we know of today: dogs, cows, whales, etc.

At some point the society will have to face the problems of whether these simulations are alive (from the point of view of humane treatment restrictions), whether to make virtual humans, and what to do with them if we do. The argument for life is that by definition all these models pass the duck test (the Turing test for animals), and what other criteria is there? You can't appeal to simple physicality, because at least in theory the model could be downloaded into a robot duck, making it as physical as any creature.

Harvard's Church sees an important distinction between "the complexity of the environment needed to replicate an algorithm (i.e. computers, factories, power plants, humans, farms) and the simplicity of the inorganic substances (CO2, NH3, H2O, PO4, SO4, KCl) required to replicate a cell". The autonomous intelligence needed to be alive is more integral to a creature built out of biochemicals than one built out of bits, and that difference controls the ethical decisions.

On the other hand Mark Taylor, Professor of Religion and Science at Williams College, thinks Church's distinction is one step too subtle. So far as he is concerned, the same ethical restrictions that apply to wet animals ought to apply to treatment of virtual ones. "If you can't cut the leg off one, you shouldn't be able to cut the leg off the other," he says.

I asked him if he would give the vote to a virtual human. "That's a tough one," he said, and that was the best I could get.