Tuesday, 13 June 2017

The Scientific Method


If you find what follows condescending, even insulting - well, in 35 years of university level research and teaching in science I have met only one among all my colleagues and students at Newcastle University able to hold their heads high in this matter. I refer to the Scientific Method which we all think we use in our research. I know there are many examples of beautiful experiments. The careful and inspirational experiments of Richard Lenski and colleagues with Escherichia coli and those of John Endler and colleagues with guppies come to mind (described by Richard Dawkins (2009), pp. 116-139. But, in my experience, these are the exception to the great ground swell of misunderstanding or even scorn for the method.  In his essay, Induction And Intuition in Scientific Thought, Peter Medawar makes the same point with the eloquence we expect of him (Medawar, 1984), p.78, but ends with the conclusion that the method dwells principally in the minds of philosophers. My contention is that this is mistaken. I believe a conscious knowledge and application of the method greatly eases a scientific investigation.

So, how did I escape the general malaise over the scientific method? Many years ago, while browsing in the Newcastle University library, I came across Paul Weisz's textbook for first year biology students (Weisz, 1959). The final chapter concerned the scientific method which had immediate resonance with me.  I supplemented what I learned from Paul Weisz with the excellent book by Beveridge (1950), given to me by my father in law. An appreciation of the method does not come easily but demands thought and practice. So why is the method not part of all undergraduate courses in science? Why had my own realization to wait some 20 years after my PhD? Part of the answer may be that many investigations do actually not benefit from the application of the method. It is true that the method comes into its own only with the design of experiments. Some subjects, for example ecology, a traditionally descriptive science, largely escapes the needs of experimental rigour. For example, preliminary ecological investigations seeking answers such as, in my case, what is living in the mud of newly flooded lakes Kariba and Chilwa in tropical Africa (McLachlan, 1974, 1979)? This investigation could have been framed as a hypothesis such as 'mud samples will reveal something interesting about the colonisation of a newly flooded lake’, but to no great benefit. Later I became interested in mating systems, specifically of chironomid midges, the larvae of which inhabit the mud of lakes in great numbers. It was in the study of the mating systems of these midges that I was to learn  the value of the method.  

Why was I to find colleagues and student so disinterested, even hostile to this indispensable tool?  Since discovering Paul Weisz I have made the four steps of the scientific method; observation, hypothesis, prediction, test; the subject of laboratory classes and field courses from first year on. Examples of the application of the protocol are given below. I cannot say I recall any enthusiasm for this approach from students. Not infrequently, when setting out on a final year research project, I met with the question, "Dr McLachlan - do we really have to do all that scientific method stuff?" But I know from personal experience how difficult it often is to get started on a problem without the conscious effort required to create testable and falsifiable hypotheses. To illustrate the point what follows takes the form of an undergraduate lecture. I consider two famous cases of scientific investigation from history and one from my own research. All fail because of flawed methods. My hope is that these examples will be found instructive. I do not here dwell on the philosophy of the hypothetical/deductive method which underlies the method. The axiom of Karl Popper that any hypothesis must be falsifiable, is taken as given (Popper, 1959 edition). A proper experiment requires one or more treatments, designed to test an hypothesis. A control which omits the test and a treatment control to monitor effects of the treatment are both mandatory. 

First, a typical experiment from somewhere around 1500 AD  to address the apparently inexplicable accuracy achieved by rifled firearms (Trench, 1972), pp.107,108 (Calabi, Helsley, and Sanger, 2013), p46. This example captures the spirit of experiments conducted around that time. Here an hypothesis was created to explain a widely made observation concerning accuracy. The hypothesis was that the devil sits astride the ball fired from a rifled firearm to guide it. To test the hypothesis a set of experimental balls were deeply engraved with the sign of the cross. The assumption here is that the sign of the cross would repel the devil leading to a prediction that the experimental balls would no longer fly true. Control was provided by an equal number of balls, identical to the experimental set but not treated in any way. The devil would therefore be allowed access to these control balls which were predicted to fly true. That is exactly what results showed - control ball, but only those - flew true to target. All well and good.  Here we have an elegant, controlled experiment. It is perhaps not immediately easy to see the flaw. But there is one - a treatment control is lacking. What we need is a treatment control with a third set of balls but with a similar sign other than the cross, deeply engraved. And indeed, musket balls in such a treatment control were subsequently shown to fly erratically, just like experimental balls, hence leading to the rejection of the devil hypothesis. The unbalancing effect of engraving musket balls is quite sufficient alone to cause erratic flight. The devil is not required.

Here is another example from history. Aristotle postulated an idea called Spontaneous Generation which purported to explain that abiding mystery, the origin of life. This hypothesis is readily tested by experiment. It is widely known that flies emerge from rotting meat, evidence it seemed, of spontaneous generation of life in the form of flies. To test the hypothesis of Spontaneous Generation, rotting meat was placed in a bottle, screened to eliminate the possibility of flies arriving from outside. The control was a bottle with decomposing meat, but left unscreened.  The prediction here is that flies would appear in both experiment and control - a prediction fully born out by the result. Flies appeared in all bottles even when screening denied egg laying flies access from outside - thus apparently confirming the hypothesis of spontaneous generation. But to ensure the absence of organisms at the outset, a better experimental design would require the initial sterilisation of meat, eliminating the possibility that flies had laid eggs before the start of the experiment. It took over a hundred years for the compelling spontaneous generation idea to finally be laid to rest by the careful experiments of Louis Pasteur using heat sterilisation of experimental treatments (Fenchel, 2002). Under a heat treatment regime, flies appeared, but only in the unscreened containers. At last the persistent hypothesis of spontaneous generation was demolished.  There are many variants of both these classical experiments. 

To give an example from the work of myself and collaborators; many years were spent studying the extraordinary fly larvae inhabiting rain pools in tropical Africa (McLachlan and Ladle, 2001). There are interesting problems of adaptation associated with extremely ephemeral habitats. One is that each pool harbours essentially one, and only one species - a situation which would appear to persist over geological time. It is a surprising, and rare situation for an ecologist to encounter. I wanted to know if a pool would always harbour the same species. Experiments involved a transplant, that is, the removal of all inhabitants from a pool and either replacing them with a species never before encountered there or refilling pools with tap water and awaiting the outcome of natural invasion by ovipositing females. These experiments are rather vague. While posing some interesting further questions  (McLachlan, 1985; McLachlan and Cantrell, 1980), they would have benefited from a more formal experimental design. I would do it differently now.

A formal experimental design to test one possible hypothesis might look like this:   
Observation: Each pool is inhabited by the larvae of a single species, always the same one. This is an extraordinary situation worthy of attention.
Hypothesis: The species present is determined by chemical characteristics of the pool water conditioned by previous populations of larvae.
Prediction: Conditioned water will determine the species invading a pool.
Test: Experiment: Remove all water from a set of replicate pools. Filter and replace. Leave to allow oviposition by females.
         Control: Filter fresh rain water; add to a replicate set of pools after the removal of original water and occupants. Leave to allow oviposition by females.
        Treatment control: Unfiltered rain water. Leave to allow oviposition by females.
A series of further experiments are required to test other specific hypotheses.     

 


The final point I wish to make is the vexing matter of the reproducibility of experiments carried out by different people. Here we have an unresolved difficulty at the heart of the scientific method which much occupies the minds of the scientific community at present.

The situation has been brought to prominence  by some high profile cases, for example that of the physicist Jan Hendrik Schön (Reich, 2009). Schön was an astonishingly prolific innovator but no one could repeat his experiments. He claimed this was not his fault but that others lacked the skill necessary to succeed. And this is indeed the crux of the matter.  The scientific method is not like a recipe for making a cake that anyone can follow to a successful outcome. It is an art, more like poetry or painting (Medawar, 1984). If no one can reproduce a Mona Lisa, should we conclude that Leonardo da Vinci was a fraud? Furthermore, the Methods section of a scientific paper rarely contains all the minutiae which could bring an independent repetition closer to the original. The cartoon above by Garry Larson encapsulates the difficulty perfectly. The victim's inquisitors are never going to make fire with the method provided. Such difficulties worry us as shown by the frequent appearance of papers concerning replication (Baker, 2016; Editorial, 2015, 2016a, 2016b; N. Editorial, 2016a, 2016b; Nuzzo, 2015; Reardon, 2016; Serewitz, 2015, Kneebore, R., Schlegel, C. and Spivey, A. (2019). The list shows no sign of easing off. Indeed, quite the reverse, it is accelerating.

References

Baker, M. (2016). Statisticians issue warning on P values. Statement aimed to halt missteps in the quest for certainty. Nature, 531, 151.
Beveridge, W. I. B. (1950). The Art of Scientific Investigation. William Heinemann Ltd, London.
Calabi, S., Helsley, S., and Sanger, R. (2013). The Gun Book for Girls.: Shooting Sportsman Books.
Dawkings, R. (2009). The Greatest Show on Earth. Bantam Press, London. 
Editorial. (2015). It's good to talk. Nature, 523, 382.
Editorial. (2016a). Repetitive flaws. Strict guidelines to improve reproducibility of experiments are a welcome move. Nature, 529, 256.
Editorial. (2016b). Reproducibility call. Nature 529, 261.
Editorial, N. (2016a). Repetitive flaws. Nature, 529, 256.
Editorial, N. (2016b). Reproducibility call. Nature, 529, 261.
Fenchel, T. (2002). The Origin and Early Evolution of Life. . Oxford: Oxford University Press.
Kneebone, R., Schlegel, C. and Spivey, A. (2019). Science in hand: how craft informs lab work. Nature, 564, 188-189. 
McLachlan, A. J. (1974). Development of Some Lake Ecosystems in Tropical Africa, with Special Refrence to the Invertebrates. Biological Reviews, 49, 365-397.
McLachlan, A. J. (1979). Decline and Recovery of the Benthic Invertebrate communities. In M. Kalk, McLachlan, A. J. and Howard-Williams, C. (Eds.), Lake Chilwa. Studies of change in a Tropical Ecosystem. London: W. Junk. Publishers.
McLachlan, A. J. (1985). What determines the species present in a Rain - pool? Oikos, 45, 1 - 7.
McLachlan, A. J., and Cantrell, M. A. (1980). Survival Strategies in Tropical Rain Pools. Oecologia, 47, 344 - 351.
McLachlan, A. J., and Ladle, R. (2001). Life in the puddle: behavioural and life-cycle adaptations in the Diptera of tropical rain pools. Biological Reviews 76, 377-388.
Medawar, P. (1984). Pluto's Republic. In. Oxford: Oxford University Press.
Nuzzo, R. (2015). Fooling ourselves. Nature, 526, 182-185.
Popper, K. (1959 edition). The Logic of Scientific Discovery.: Routledge Classics.
Reardon, S. (2016). A mouses home may ruin studies. Environmental factors lie behind many irreproducible rodent experiments. . Nature, 530, 264.
Reich, E. S. (2009). Plastic Fantastic. How the Biggest Fraud in Physics Shook the Scientific World. New York: Palgrave MacMillan.
Serewitz, D. (2015). Reproducibility will not cure what ails science. Nature, 525, 159.
Trench, C. C. (1972). A history of Marksmanship. . Norwich: Longman.
Weisz, P. B. (1959). The Science of Biology. (2nd ed.). New York: McGraw-Hill.