The Human Genome Project is an important milestone in the history of human civilization at the turn of the century. It is known as the "Moon Landing Project" of life sciences. According to this, the 21st century is called the post-genome era. In such a new era, we obviously should and can think about life and the science related to it from a new philosophical height. The philosophical thinking about life always runs through two eternal themes, one is the ontological theme, that is, what life is; the other is the methodological theme, that is, how to understand life.
Life is a high degree of unity between material, form and environment
Throughout the course of human understanding of life, the question of what life is about, there are two main views: a view called "reductionism" (reductionism), think organisms There is no essential difference from non-living bodies, and life can be formed from simple non-living substances; the opposite view is "vitalism": the living world and the non-living inorganic world have completely different boundaries. Life has a special property that non-biological bodies do not have-"vital force".
Look at life from the unity between the material and form of the dialectic
of life have an alias - organism (organism), this term began in the ancient Greek philosopher Aristotle, that life is dynamic from "seed" or "embryonic" in Grown out. Aristotle used the Greek word "indelaisi" to express the "vitality" that exists in the organism, which means "realization", that is, this special property enables life forms to achieve their self-improvement purposes. "Yindelaixi" later became synonymous with vitality theory.
In the early 19th century, chemists recognized that carbon-containing compounds are the basic substances that make up organisms. They called the chemistry of carbon-containing compounds "organic chemistry". In the eyes of early organic chemists, there is an insurmountable boundary between organic compounds and inorganic compounds of non-life origin. Only organisms with "vitality" can synthesize organic compounds; researchers can only extract them from organisms such as animals and plants. Organic compounds, instead of synthesizing organic compounds from inorganic compounds in the laboratory. In 1828, the German chemist F. Wohler converted the inorganic compound "ammonium cyanate" into the organic compound "urea" for the first time in the laboratory. This experiment broke the artificial "partition wall" between organic and inorganic compounds. The original intent of organic chemistry has since become a historical legend, and the "vitality theory" has gradually declined.
The work of artificially synthesizing insulin in the middle of the 20th century further showed that biologically active proteins can also be produced in the laboratory. Therefore, complex living substances are just some simple small molecule compounds gathered together in a certain physical and chemical way. . In the post-genome era, scientists have achieved more and greater results in synthetic life. On May 20, 2010, American biologist C. Venter announced the birth of the first human life-researchers used chemical synthesis to artificially synthesize a complete bacterial genome with 1.08 million bases , And then constitute a kind of artificial bacteria containing only this artificial genome, and through this artificial genome control and realize self-replication and other life activities ; the researcher especially emphasized: "The nature of the cell controlled by the artificial genome It behaves as if the entire cell is artificially synthesized (that is, the DNA software makes its own hardware)". This landmark experiment made people think that life can be artificially synthesized in the laboratory. While reporting on the work, the US "Newsweek" published Venter's avatar on its cover with the title "Playing God".
These research works clearly reflect a particular perspective of reductionists' attention to the essence of life: life can be reduced to its constituent substances or constituent materials, and it is not fundamentally different from non-living materials. However, if life is defined only from the perspective of its constituent materials, then Aristotle, the originator of the theory of vitality, can also be classified as a reductionist. Aristotle classified more than 520 animals that were known at that time according to their reproductive methods, constructed 6 animal levels in order from lower to higher, and claimed that the lowest animals are naturally derived from the soil. Generated .
The crux of the problem is that the constituent materials of life are not equal to life. In Aristotle’s view, all objects, including life, are composed of both "form" and "material." "Form is what is constituted, and material is constituent. Form determines the object The essence is". In other words, the reason why life is regarded as life is not only to consider its constituent materials, but also to see that life has a specific form that is not possessed by inorganic bodies. Obviously, this understanding of the essence of life goes beyond the simple argument between reductionism and vitality. In the past, the dispute between the two was based on the dichotomy of materialism or idealism in traditional philosophy. The relationship between the material and form of life was split: reductionists often regarded life as the material that constitutes them, while vitalists It is believed that life possesses non-material "vitality" independent of the constituent materials. However, life should actually be composed of its material and form, which are interdependent and indispensable; material is the "potential" of life, and form is the realization of the potential; Aristotle put forward the "hidden virtue" The original meaning of "Laixi" refers to the unity of opposites between the material and the form in the organism. In this sense, life is an indivisible and complete presentation. Therefore, "vitality theory" can often be equated with another concept: "holism" (holism).
The completion of the Human Genome Project provides strong support for researchers to understand life from the perspective of "holism". In the human genome, there are more than 20,000 genes encoding proteins, and there are extensive interactions between them; each life activity not only depends on the corresponding gene or protein and other constituent elements, but also depends on the interaction between these elements. The interaction network formed. In multicellular organisms, the interaction network of these biomolecules not only exists at the cellular level, but also spans various levels such as tissues and organs. Not long ago, U.S. scientists put forward a new model-"omnigenic model" based on systems biology theory and big data analysis to explain how genes control complex traits: there is not only a specific The trait has a core gene that directly affects the trait, and there are more peripheral genes that interact with the core gene, and these peripheral genes have an indirect effect on the trait. The model believes that due to the extensive associations and interactions between genes, every complex trait of an organism may be more or less affected by every gene in the genome .
Material from the dialectical unity of life and the environment to see relations
in the post-genomic era, scientists are working to understand the essence of life from the perspective of the life of complex systems. M. Kirschner, the founding dean of the Department of Systems Biology at Harvard University in the United States, specially wrote: "It is worth asking the question: To what extent can the'post-genome' view of contemporary biology enable the 19th-century vitality theory? People accept today’s understanding of the nature of life" . He then put forward the "molecular vitalism" point of view from the perspective of systems biology, "At the turn of the 21st century, we have made an up-to-date thinking on the vitalism: it must be pointed out that we need to fundamentally transcend cell vitalism. Genomic analysis of RNA and protein components (this type of analysis will soon become obsolete) has turned to the analysis of the'vitality' nature of molecular, cellular, and body functions"
Not long ago, Professor Bai Shunong, a biologist at Peking University, and others, based on the negative entropy of the Austrian physicist E. Schrödinger and the dissipative structure theory of the Belgian chemist I. Prigogine, proposed The unique view of the essence of life believes that the characteristics of life refer to the special interaction of specific components under certain environmental conditions . This special interaction originates from the coupling and circulation between two different chemical processes. One is the spontaneous organization process of biological elements forming molecular complexes by consuming Gibbs free energy, and the other is the free energy provided by the environment. The thermodynamic decomposition process of the dissociation of this molecular complex. Therefore, they believe that this coupling and cyclic process is the first feature that distinguishes life from non-living bodies-metabolism (metabolism) .
What needs to be pointed out is that Professor Bai Shunong and others clearly regarded the specific energy-providing environment as the source of life's "vitality": "This kind of cyclic process can be defined as'being' or'living matter'. The first sign of this. People should not regard the material material as living alone, but should realize that this material participates in a dynamic cycle driven by external energy" . This point of view has led to a new idea of the author's nature of life, that is, to regard the environment as an indispensable role in the composition of life. Various life characteristics such as the formation, survival, reproduction and evolution of life are based on specific environments.
In the past, the controversy between reductionism and vitality focused on life itself, and not much consideration was given to the significance of the environment to life. But in fact, it is not great to talk about the meaning of life without the environment. Is the virus life? Before encountering a suitable host, a virus is just a polymer of biological macromolecules such as proteins and nucleic acids. Only after encountering a host, the virus can become a "live" virus under the specific environmental conditions provided by the host, showing it Various characteristics. In other words, this is the true meaning of "Yindelaixi": the relevant materials that constitute life provide the "potential" of life, and specific environmental conditions enable this potential to be realized. The author calls this environment that allows life's potential to be realized as "vital environment". In genetics, which studies the relationship between genetic material and traits, there is a well-known formula:
phenotype = genotype + environment
Life can also be given a similar definition:
life = + dynamic environment of biological material
according to this formula can be a lot of controversy unclear provide answers to questions. For example, is the "prion" (prion) that causes mad cow disease by inducing the abnormal conformation of proteins to be life? In terms of its constituent materials, "prions" are nothing more than simple proteins and cannot be called life. However, once it enters the specific "vital environment" of the mammalian brain, its potential to induce abnormal conformations of proteins is realized, and it becomes a living organism that can cause diseases. Another example, is computer virus life? Although it can replicate and "infect" itself in a computer environment, it has no biological constituent materials such as nucleic acids or proteins, so it cannot be called life.
From the above arguments, an important philosophical proposition can be derived-the relationship between "existence" and "essence". As far as life is concerned, existence precedes essence, that is, as the material that constitutes life, it can exist stably without life activity. As mentioned above, Professor Venter synthesized a complete bacterial genome nucleus through chemical methods. Nucleic acid sequence, but the nucleic acid material that constitutes this artificial genome itself does not show the characteristics of life. Only when the researcher puts it in a "vital environment" of bacterial cells from which the natural genome has been removed, the artificial genome shows The vital characteristics of self-replication and metabolic regulation . Based on this, it can be further deduced that the existence of life can be separated from the essence. For example, cells or individuals stored in a low-temperature state are only a kind of material, and life can only be reproduced under the "vital environment" under suitable conditions for resuscitation. Signs. In other words, the material that constitutes life is only a necessary condition for the formation of life, and the specific "vital environment" is a sufficient condition for the formation of life, and none is indispensable.
Data-driven open life science research paradigm
Molecular biology, which was born in the middle of the 20th century, is based on reductionism, which believes that life activities follow the basic laws of physics and chemistry. As Schrödinger pointed out in his famous book "What is Life": For organisms, events that occur within it must follow strict laws of physics . In other words, in the eyes of molecular biologists, life is a "machine" that operates according to the law of determinism; the task of the researcher is to put forward scientific hypotheses, and then to understand and reveal this law through research. However, research in the post-gene era revealed that life is not such a simple deterministic "machine", and the Human Genome Project has also spawned a research paradigm that is different from hypothesis-driven research—a data-driven research paradigm.
Uncertainty of life science
-based life sciences determinist reductionism often have such a potential conviction, as long as the knowledge sufficient enough, accurate enough information, you can understand and control all activities of life, will be able to destroy all the crimes against humanity disease. The enthusiasm of modern life science to study the three-dimensional structure of biological macromolecules such as nucleic acids and proteins is a prominent manifestation of this deterministic view: people are trying to explain the functions of biological macromolecules or their interactions with precision to the atomic level. , And then to discover the molecular mechanism of action in the organism. In other words, molecular biology is based on the idea that structure determines function.
However, the biological macromolecules in organisms are of great variety and quantity. Even simple single-celled prokaryotes like Escherichia coli have a total number of copies of various protein molecules as high as 2.5 million, which is 30% of the entire cell volume. Both left and right are occupied by biological macromolecules. Therefore, these biological macromolecules are usually in an extremely crowded environment and disordered arrangement in the cell. More importantly, various biological macromolecules such as proteins and nucleic acids are impenetrable and cannot diffuse and move freely in the solution like small inorganic molecules, which greatly reduces the actual accessible space of any biological macromolecule. It is the "repulsive volume effect". This crowded cell liquid environment and repulsive volume effect lead to quite complex interactions between biological macromolecules, one of which is called "phase separation". The phase separation of organisms means that specific proteins and RNA and other biological macromolecules can be organized under certain conditions to form "droplets" of high concentration of specific molecules, just like oil droplets are separated from water . Different from protein interaction in the traditional sense, proteins with phase separation ability often rely on the interaction between a type of "intrinsic disorder region" (IDR) with no defined three-dimensional structure to achieve phase separation [7 ].
Biological macromolecules not only have structurally disordered organization, but also have many random fluctuations in the process of their synthesis. The random volatility of this kind of biological macromolecule is usually called biological noise, which is mainly manifested in the process of gene transcription and protein translation, such as the time response of its promoter being activated and inactivated during gene transcription. The difference in speed, or the difference in the rate of protein synthesis reaction and degradation reaction, etc. The researchers found that in prokaryotic cells, noise has little effect on gene transcription, mainly affecting the level of protein synthesis; in eukaryotic cells, noise not only affects protein synthesis, but also significantly affects the level of gene expression.
An important biological phenomenon caused by biological noise is that the quantitative relationship between gene expression levels and protein synthesis levels is not highly correlated. In the past, people thought that the relationship between the abundance of genes and proteins was linear, that is, if the number of mRNA copies produced by gene transcription is more, the number of corresponding proteins is higher; conversely, when the former is less, the latter is also less. However, in the quantitative analysis of the transcriptome and proteome of different types of organisms such as yeast cells and animal cells, it is found that there is no good correlation between mRNA abundance and corresponding protein abundance. A single-molecule study on E. coli found that under the influence of biological noise, there was a random relationship between gene expression and corresponding protein expression, so that the researchers came to the conclusion: For certain genes, there is no correlation between protein copy number and mRNA copy number in a single cell" .
People usually think that random "noise" will have a negative impact on life, such as the uncertainty perturbation in important life activities such as gene transcription and protein translation. From this point of view, noise is not good for life and should be eliminated. However, more and more studies have shown that noise in life is not only difficult to eliminate, but also has a positive side to life. It often has many important biological functions. For example, biological noise often triggers random mutations in the process of DNA replication, providing raw materials for the evolution of life; in the process of cell signal transduction, biological noise can use the positive feedback mechanism of the cell to amplify the signal and help the cell make decisions . In addition, although biological noise can be generated by differences between cells, biological noise can also be used to maintain and strengthen the individual characteristics of cells and affect individual development and growth. A study published in Nature Methods in 2020 showed that in the process of mouse bone marrow regulation of blood cell development, the gene expression noise of transcription factors participates in the transition of cell states, which in turn affects the fate of these cells.
At present, the research on biological noise and its impact on life activities is becoming a new scientific hotspot, some researchers even call it "noise biology". This kind of research makes us realize that as an open nonlinear complex system, organisms have various inherent random noises on the one hand, and on the other hand they live in an external environment full of uncertainties. It can be said that the evolution of life on earth is driven by chance, and its existence enables life to develop such complex and diverse plants and animals from the simplest form of prokaryotic cells. If the biological world is really governed by certainty, then the life that exists on earth today is probably still just simple single-celled organisms like Escherichia coli.
The modern life science promoted by reductionists is a "hypothesis-driven" research paradigm. For determinists, the occurrence and development of all things follow certain laws, and there must be results if there are causes; the main goal of life science research is usually to verify or falsify assumptions about a certain causal relationship. The famous American tumor biologist R. Weinberg pointed out clearly in an article entitled "Hypothesis First": "In the 20th century, biology changed from a traditional descriptive science to a hypothesis. Driven experimental science. Closely related to this is the dominance of reductionism, that is, the understanding of complex life systems can be studied by disassembling them into components and taking them out one by one" . The denial of the deterministic world of life requires researchers to re-examine this "hypothesis-driven" research paradigm.
Data-driven life sciences
The implementation of the Human Genome Project gave birth to a new research paradigm-data-driven research. A human genome has more than 3 billion base pairs, which is equivalent to more than 3 GB of data. At present, the amount of individual genome sequence data stored internationally has reached the scale of one million people. At the same time, genome sequencing has also become the basic goal of health medical research. For example, in 2006, the National Institutes of Health took the lead in launching the international cancer genome project-"The Cancer Genome Atlas" (The Cancer Genome Atlas, TCGA), By the end of 2017, the project analyzed tumor samples from more than 32,000 patients, covering 38 cancer types and their subtypes in 60 tissues or organs, and detected more than 3.11 million genetic variants, accumulating more than 20 PB (1 PB). = 1015 byte) tumor genome data. In addition, various life omics data such as transcriptome, proteome, and metabolome are also produced in large quantities. Biomedical big data is completely changing the research paradigm of life sciences and medicine. As pointed out in the first strategic report on "precision medicine" in the United States in 2011: "The motivation for this study is that molecular data related to the human body is increasing explosively, especially those related to individual patients. Molecular data has brought a huge, untapped opportunity, that is, how to use these molecular data to improve human health." UNESCO also made such a prediction in a scientific report: By 2030, science will not only conduct research based on data, but the basic output of any scientific discovery will also be data. In other words, the post-genome era is an era of big data. Big data has reshaped life science research. Researchers can not only continue to carry out research under hypothesis-driven research, but also conduct research under a new data-driven paradigm.
The data-driven research paradigm is fundamentally different from the hypothesis-driven research paradigm. The first is that the "initial intention" of research is different. The former does not require assumptions and does not take the responsibility of solving specific scientific problems as its own responsibility. Its main research purpose is to obtain research objects. Related Information. The Human Genome Project is a typical representative of data-driven research; the original intention of the project is to determine all the nucleotide sequences possessed by the human genome. In traditional life science research, researchers often lock the research target on a gene consisting of a nucleotide sequence based on a certain hypothesis, and then in-depth study of its function or regulatory mechanism; while for the Human Genome Project , The researchers determined the entire genome sequence to find more than 20,000 genes hidden in these sequences. Therefore, data-driven research is often referred to as "discovery science".
Although the hypothesis-driven research paradigm has played an important role in the emergence and development of modern life sciences and has become the mainstream of scientific research, this research paradigm also has a congenital defect. The British philosopher of science AF Chalmers (AF Chalmers,) in his famous book "What Is Science? "Pointed out that the science based on the hypothesis-driven research paradigm is "derived from facts" . In his view, the key is how to obtain the "facts". "One of the difficulties is that the perceptual experience is affected to a certain extent by the observer's background and expectations. Therefore, the fact that is observable to one person is not to the other. This is not necessarily the case for a person. The second difficulty stems from the fact that the judgment of the true or false of the observation proposition depends to a certain extent on what is known or assumed, so that the observable facts are like the premises on which they are based. These two difficulties Both imply that the observable basis of science may not be as straightforward and reliable as people have widely and traditionally thought."
Since data-driven research does not rely on assumptions, this kind of subjective selection and judgment of "facts" can be avoided. American biologist T. Golub clearly pointed out in an article entitled "Data First": "If comprehensive tumor genome data is not available, it will be difficult to distinguish between signal and noise. Although hypothesis-driven experimental science It is still at the center of the research field, but tumor genome measurement without preference will provide unprecedented opportunities to generate new ideas" . In other words, a data-driven research paradigm can not only avoid possible subjective biases of researchers, but also help them discover new knowledge beyond the scope of hypotheses or existing theories. We can also extend it further-the classic philosophy of science believes that scientific research needs to be carried out within a framework constructed by a series of hypotheses and theories. This "framework" for guiding research is called "paradigm" by T. Kuhn. "(Paradigm), I. Lakatos (I. Lakatos) called "research program" (research program). One of the “bright spots” of the data-driven research paradigm is that it can be free from the constraints of the existing research “framework”.
There is another important difference between data-driven and hypothesis-driven research paradigms. The research strategy is different. The former often has an obvious feature: that is, the research goal is decomposed into several secondary goals, and then the corresponding research work is carried out. And on the basis of the previous research results, it is repeatedly improved, and the overall goal is gradually approached through multiple studies. The process of each repeated research is called an "iterate". This "iterative" research strategy means that each research work can be incomplete, and partial or non-optimal phased results can be accepted. Hypothesis-driven research pursues the completeness of research results. It is possible to completely answer scientific questions or prove scientific hypotheses through one-time research work as much as possible.
Many life science research projects in the post-genome era clearly have this "iterative" feature, and the most representative example is still the Human Genome Project. Although the ultimate goal of the Human Genome Project is to reveal all the nucleotide sequences of the human genome, the milestone results of human genome sequencing published in February 2001 are only "drafts" covering 90% of the nucleotide sequences of the genome; 2003 In April, the International Human Genome Sequencing Consortium officially announced the success of the complete map of the human genome; in the corresponding paper published in the "Nature" weekly in October 2004, only about 99% of nucleosides in the euchromatin region were given. The measurement result of the acid sequence. Therefore, there are still many highly repetitive regions (such as centrioles) in the human genome sequence that have not been determined. In September 2020, the researchers finally published the first complete human X chromosome sequencing work with no sequencing "gap" in the "Nature" weekly (there are 22 chromosomes left to be tested in the future). Not long ago, researchers proposed a more ambitious "Human Cell Atlas" (HCA) research project than the "Human Genome Project". Its basic goal is to identify and determine the human body through specific molecular expression profiles. The main research strategy adopted for all cell types in the 40 to 60 trillion cells is also "iterative".
The "iterative" model of the data-driven research paradigm does not belong to the "empirical research" of "observation-induction-verification", nor the "falsification research" of "problem-guess-refutation". Its research results are neither If it is confirmed, it cannot be falsified. In the entire process of data "iteration", the results obtained from each study are not decisive or complete. For example, the "full map" of the human genome published in 2004 does not confirm or deny the "draft" published in 2001. More importantly, the data-driven research paradigm, as a "science of discovery" beyond the theoretical framework, does not use traditional inductive methods to pursue the causal relationship between things, but uses algorithms and models to explore the relationship between data Relevance. It can be considered that the data-driven research paradigm overcomes the paradigm of the hypothesis-driven research paradigm on determinism and causality, and then forms a new epistemological system of open research.