The Crowd and the Cosmos: Adventures in the Zooniverse Read online

Page 5


  lab experiments I had to set up a computer simulation. Using a

  few simple equations (and someone else’s code) it was possible

  to set up an arrangement where light was assumed to have

  emerged from a young star of a particular size, mass, age, and

  brightness, to scatter from a first, dense cocoon of dust, becom-

  ing linearly polarized in the process. Then, we want the light in

  our computer simulation to encounter a second dusty struc-

  ture—perhaps a disc—and to calculate the degree of circular

  polarization resulting from this second scattering. It’s at this

  point we have to make choices. We have to decide on the size of

  the disc, and its shape. The strong and tempestuous winds blow-

  How Science iS Done 25

  ing from the young star have likely cleared all the material away

  from its immediate surroundings, producing a gap at the centre

  of the disc, so we need to decide on the size of this region too.

  The dust grains themselves need a composition—are they car-

  bon, or silicon, or a mix of the two?—and we need to work out

  whether they have ice or whether they are bare. They need a

  shape. Are they round? Or needle shaped? If the latter, how elon-

  gated are they? That matters because nicely needled grains can

  become aligned in the presence of a magnetic field, and such

  alignments may lead to further polarization. How strong is the

  magnetic field? These and many other questions need answering

  if we are to make progress, and we’re already a long way from the

  simple plant experiment that could be reduced to a single test.

  We can, if we choose, run a large number of simulations. Each

  time, we can keep almost all of the different parameters fixed,

  altering only one thing for each run. That might work here, for

  even complicated astronomical questions reduce to reasonably

  small sets of equations and variables, but for more complex sys-

  tems this approach will break down. If you’ve ever been frus-

  trated by a weather forecast, then one of the reasons is that even

  with some of the most powerful supercomputers in the world it’s

  simply not possible to build a model of the Earth’s atmosphere

  accurate enough to account for everything that observations tell

  us must be happening in this very complicated system. In the

  case of our light-scattering dust disc, we also have to deal with

  the opposite problem of creating a model so complicated that it

  can explain pretty much any set of observations.

  This phenomenon, known as over-fitting, is a serious worry in

  cases where our ability to think of variables to fiddle with far out-

  strips our ability to gather observations to test the worlds created

  inside our computers. The starkest example in the astronomical

  world is in the argument, now twenty years old, about how to

  26 How Science iS Done

  build a computer model large enough and detailed enough to

  allow us to study the evolution of large-scale structure in the

  Universe. Building such a cosmological model from first prin-

  ciples, pinpointing and tracking the position of every atom

  within a cosmologically significant volume of space, is for all

  intents and purposes impossible. Yet we don’t in most cases have

  the luxury of treating the galaxies as simple point particles, inter-

  acting only via gravity, because to compare the results of a model

  to the real Universe requires including messy phenomena like

  star formation (what cosmologists like to call, somewhat dismis-

  sively, ‘gastrophysics’) which depend on the behaviour of indi-

  vidual atoms. A computer model, no matter how beautiful, will

  fail to match what we can see if it can’t predict the formation of

  the stars whose light we observe and so, instead of building a

  simulation that would require a computer the size of the Solar

  System, there is a whole industrial complex of scientists spend-

  ing their careers building what are called semi-analytic models.

  The game here is to guess at a set of simple rules that match,

  even while they don’t explain, the behaviour of the system being

  studied. Maybe a galaxy converts 10 per cent of its gas to stars

  every billion years. Maybe it’s 5 per cent, or 2 per cent. Maybe it is 10 per cent after all, but the process only occurs only when the

  galaxy has more than ten billion solar masses of gas on hand. Or

  maybe when it has more than a billion. Maybe a galaxy can con-

  vert a certain percentage of its mass to stars, but after 500 million years activity associated with gas falling into the black hole at the galaxy’s centre heats up the gas and prevents star formation. Or

  maybe that happens after a billion years, not 500 million.

  With each additional complication, both the list of rules and

  the list of things that can be altered to provide a better fit to the observations grow. What starts as a simple set of rules quickly

  becomes a long list of variables, of parameters that can be tweaked

  How Science iS Done 27

  to match the computer model to the real Universe. Need more star

  formation? Turn the knob on the left. Need your galaxies to stop

  growing earlier in the Universe’s history? Push the red button.

  I’m being slightly unfair, and I think most would agree that

  semi-analytic models do a good job of accounting for the obser-

  vations of the Universe we have today (there are a few interesting

  exceptions, as we’ll see in Chapter 2), but deciding when to add

  more complexity to the model is a difficult problem. As you

  make what started out as simple rules more complex, then you

  should, almost by definition, always do a better job of matching

  to any given set of observations without necessarily gaining any

  new insights. This kind of work, where the skills needed involve

  deep statistical insight and a good gut feeling for the status of

  your model, is a long way from the science fair vision of a unify-

  ing scientific method with a single hypothesis being tested by a

  single experiment. The best I can do in writing down a simple

  hypothesis for a semi-analytic model of galaxy formation is

  something like ‘There exists a model I can make from rules which

  explains the observations we have of the large-scale structure of

  the Universe’, which is hardly satisfying. It’s a long way from

  what I’d actually chose in studying whether light is sufficiently

  polarized around a single young star to influence the chemistry.

  I think that the process being followed here is so different from

  science fair procedure that you can think of the computer mod-

  elling that’s become increasingly important in lots of sciences as

  a whole new way of doing science.

  I imagine a band of stereotypical scientists. One sits, dressed

  perhaps in a Greek toga or covered in chalk at a blackboard, scrib-

  bling equations before writing QED in big letters under some

  world shattering conclusion. They’re a theorist, looking for the

  mathematic underpinnings of the Universe. A second, wearing

  a lab coat, is surrounded by bubbling test tubes and complex

  28 How Science iS Done

&nb
sp; glassware. Their life is spent weighing things, in adding this to

  that and occasionally putting the resulting compounds into

  machines that go ‘beep’ and which spit out graphs. They’re an

  experimenter, testing the theories the other comes up with.

  To this motley crew I think we should add a third character.

  They sit in a darkened room in front of a desk with four or five

  computer screens on it. Green numbers scroll upwards on at

  least one of the screens, and they type in a staccato fashion, caus-

  ing a complex three-dimensional visualization of something to

  rotate on yet another screen. They are a computational scientist,

  and modern science needs them as much as it does the other

  two. (It also needs them to talk to the others, which is perhaps a

  much harder problem. But that’s another story.)

  Understanding this change is key to following some of the

  most high-profile scientific debates of the moment. Our inability

  to model each atom of the Earth’s atmosphere means that belief

  in the reality of climate change essentially relies on a prediction

  from a semi-analytic model of the Earth’s atmosphere; every

  time you hear someone claiming that the science of climate

  change is falsified by the cooling of part of the Antarctic Ocean,

  or by an exceptionally cold winter they’re enduring, then you’re

  hearing confusion about how these categories of scientific

  thought are interacting.

  This picture isn’t yet complete. Computer models, though

  they produce worlds which can be explored, observed, and

  experimented upon, are really a way of doing theory that suits

  our digital age. The equivalent observational mode lies in the

  freeform exploration of large data sets. Take the Sloan Digital

  Sky Survey, for example. In some sense it was a traditional

  experiment, with the goal of plotting accurately the positions of

  galaxies and thus measuring the expansion of the Universe. Yet

  if you go to the survey website, for each galaxy caught in its gaze

  How Science iS Done 29

  you can download maybe a hundred pieces of information.

  These include sizes, shapes, colours, and brightnesses, and

  plenty more can be deduced about each system. Is it a member

  of a cluster? Has it recently interacted with a neighbour? Is its

  massive central black hole actively feeding on gas, dust, and

  stars? We can force these questions into ‘traditional’ experi-

  ments, or we can start not with a hypothesis, but by looking in

  the data for correlations, discovering for example that the most

  massive galaxies are reddest or that feeding black holes are bad

  news for star formation. This mode of discovery could be

  uniquely powerful. Done right, it holds out the promise of not

  only providing answers to our questions but of guiding us to the

  right questions in the first place.

  This is the kind of promise that gets magazine articles and

  even books written, and data-driven discovery was labelled the

  ‘fourth paradigm’ of scientific discovery as far back as 2009, in a

  collection of essays under that title published by Microsoft

  Research to commemorate the life of pioneering computer sci-

  entist Jim Gray. The twin ideas of data exploration and ‘big data’

  have attracted plenty of hype, but they are useful in illustrating

  quite how science is changing.

  Imagine, for example, that you’re an astronomer at the turn

  of the nineteenth and twentieth centuries, interested in stars.

  Through careful observation, your colleagues have assembled a

  catalogue of observations of many of the brightest stars in the

  sky. Despite their diligent work, there’s not much to go on. Look

  carefully at the night sky with the naked eye or with a small pair

  of binoculars, and it is easy to see that stars have different col-

  ours. Try, for example, looking at the two brightest stars in the

  easily recognized constellation of Orion, Betelgeux and Rigel.

  While Rigel is blue or white, Betelgeux, an enormous star which

  would engulf Jupiter were it placed in the centre of our Solar

  30 How Science iS Done

  System, appears orange or even red to the naked eye. As well as

  colour, we can easily measure the apparent brightness of the

  stars as well.

  The breakthrough came when astronomers realized they

  could use a variety of methods to measure distances to at least

  the nearest stars. One simple method relies on an apparent

  shift—a parallax—in the position of a star relative to a more dis-

  tant background as the Earth moves around its orbit, just as you

  can make a finger held in front of your face at arm’s length jump

  from side to side by looking at it first through one eye and then

  another. What measurements liked these allowed for the first

  time was the conversion of the apparent brightness of a star—

  how bright it appears to be—into an intrinsic luminosity which

  reflects how powerful the stars actually are. So with colour, and

  luminosity, we have a data set we can explore.

  Perhaps there’s a relationship between the two. In fact, if you plot

  luminosity against colour on what’s now called the Hertzsprung–

  Russell diagram after two of the first scientists to do this system-

  atically, you find that many stars lie on a rough line, known as the

  main sequence. Stars which are bluer tend to be more luminous.

  Those which are red tend to be less luminous, with the Sun sit-

  ting on the main sequence somewhere between the two. Once

  you realize that the colour of a star reflects its temperature this

  makes more sense; a blue star like Bellatrix in the belt of Orion,

  thousands of times more luminous than the Sun, has a surface

  temperature of about 22,000 degrees Celsius—pretty hot, espe-

  cially compared to the Sun’s 6,000 degrees. On the other hand,

  some of the coolest stars known, puny brown dwarfs, can have

  surface temperatures which are mild even compared to room

  temperature (Plate 4).

  That this relationship exists therefore reveals that the source of

  a star’s luminosity must also be responsible for setting the other’s

  How Science iS Done 31

  temperature, but more importantly the fact that the main sequence

  exists at all reveals that the stars that lie upon it must share a

  source of power. In fact, all stars on the main sequence are fusing

  hydrogen together in their cores to form helium, releasing energy

  in the process, and those which do not lie on the sequence are

  either protostars still in the process of getting to the point where

  they can sustain this sort of stable nuclear fusion, or else those

  which have graduated to other sources of energy, such as the

  fusion of helium into other, heavier elements. In this discovery

  from more than a hundred years ago, there is clear evidence of

  the fourth paradigm at work, as the exploration of stellar data

  pointed researchers in the direction of the correct theory for stel-

  lar fusion. Of course, the full story of how astronomers came to

  understand how stars are fuelled
is more interesting and compli-

  cated than the simple version given above, and worthy of a book

  in its own right. What is important for my purposes is that the

  discovery of the main sequence provided powerful support for

  the idea of a single energy source for stars at very different tem-

  peratures and with very different histories.

  These days, astronomers studying stars have much more

  information at their fingertips. Most of the objects captured by

  the Sloan Digital Sky Survey were not galaxies at all, but stars,

  and a data set with hundreds of pieces of information about each

  and every one of them is available to researchers worldwide. This

  rich resource, and those from more targeted surveys, opens up

  the prospect of new insights into the processes of stellar evolu-

  tion, but they also make the challenge of data-driven science

  apparent. We know, because of the work of Hertzsprung, Russell,

  and a century of astrophysics, that the ‘right’ thing to do is to plot temperature (or its proxy, colour) against luminosity. Coming in

  blind, that’s not so obvious; Alex Szalay at Johns Hopkins, a bril-

  liant collaborator and a man responsible for much of the data

  32 How Science iS Done

  processing that sits behind the Sloan Digital Sky Survey’s power,

  ran an entire research programme with the sole aim of redis-

  covering the Hertzsprung–Russell diagram among this data. The

  catch was that Alex’s group wanted to do so with their hands off

  the wheel, trusting in automated searches to identify the signal

  among the noise. Trying to discover the cutting-edge science of

  yesteryear among the modern data deluge sounds like a fool’s

  errand, but it’s surprisingly tough, emphasizing that new tech-

  niques are critical if we’re to make the most of the data that we

  have.

  And what a lot of data it is. Sloan seemed overwhelming to

  astronomers a few years ago, but what’s coming down the pipe is

  truly scary. I had my first glimpse of this future a few years ago

  shortly after walking onto the pitch of the University of Arizona’s

  football stadium. College football is, in Arizona as in much of the

  US, something of a big deal, and the stadium is impressive,

  immaculately tended and seating more than 50,000 fans of the

  Wildcats. Its real beauty, though, lies underneath the stands,