The Crowd and the Cosmos: Adventures in the Zooniverse Page 5
lab experiments I had to set up a computer simulation. Using a
few simple equations (and someone else’s code) it was possible
to set up an arrangement where light was assumed to have
emerged from a young star of a particular size, mass, age, and
brightness, to scatter from a first, dense cocoon of dust, becom-
ing linearly polarized in the process. Then, we want the light in
our computer simulation to encounter a second dusty struc-
ture—perhaps a disc—and to calculate the degree of circular
polarization resulting from this second scattering. It’s at this
point we have to make choices. We have to decide on the size of
the disc, and its shape. The strong and tempestuous winds blow-
How Science iS Done 25
ing from the young star have likely cleared all the material away
from its immediate surroundings, producing a gap at the centre
of the disc, so we need to decide on the size of this region too.
The dust grains themselves need a composition—are they car-
bon, or silicon, or a mix of the two?—and we need to work out
whether they have ice or whether they are bare. They need a
shape. Are they round? Or needle shaped? If the latter, how elon-
gated are they? That matters because nicely needled grains can
become aligned in the presence of a magnetic field, and such
alignments may lead to further polarization. How strong is the
magnetic field? These and many other questions need answering
if we are to make progress, and we’re already a long way from the
simple plant experiment that could be reduced to a single test.
We can, if we choose, run a large number of simulations. Each
time, we can keep almost all of the different parameters fixed,
altering only one thing for each run. That might work here, for
even complicated astronomical questions reduce to reasonably
small sets of equations and variables, but for more complex sys-
tems this approach will break down. If you’ve ever been frus-
trated by a weather forecast, then one of the reasons is that even
with some of the most powerful supercomputers in the world it’s
simply not possible to build a model of the Earth’s atmosphere
accurate enough to account for everything that observations tell
us must be happening in this very complicated system. In the
case of our light-scattering dust disc, we also have to deal with
the opposite problem of creating a model so complicated that it
can explain pretty much any set of observations.
This phenomenon, known as over-fitting, is a serious worry in
cases where our ability to think of variables to fiddle with far out-
strips our ability to gather observations to test the worlds created
inside our computers. The starkest example in the astronomical
world is in the argument, now twenty years old, about how to
26 How Science iS Done
build a computer model large enough and detailed enough to
allow us to study the evolution of large-scale structure in the
Universe. Building such a cosmological model from first prin-
ciples, pinpointing and tracking the position of every atom
within a cosmologically significant volume of space, is for all
intents and purposes impossible. Yet we don’t in most cases have
the luxury of treating the galaxies as simple point particles, inter-
acting only via gravity, because to compare the results of a model
to the real Universe requires including messy phenomena like
star formation (what cosmologists like to call, somewhat dismis-
sively, ‘gastrophysics’) which depend on the behaviour of indi-
vidual atoms. A computer model, no matter how beautiful, will
fail to match what we can see if it can’t predict the formation of
the stars whose light we observe and so, instead of building a
simulation that would require a computer the size of the Solar
System, there is a whole industrial complex of scientists spend-
ing their careers building what are called semi-analytic models.
The game here is to guess at a set of simple rules that match,
even while they don’t explain, the behaviour of the system being
studied. Maybe a galaxy converts 10 per cent of its gas to stars
every billion years. Maybe it’s 5 per cent, or 2 per cent. Maybe it is 10 per cent after all, but the process only occurs only when the
galaxy has more than ten billion solar masses of gas on hand. Or
maybe when it has more than a billion. Maybe a galaxy can con-
vert a certain percentage of its mass to stars, but after 500 million years activity associated with gas falling into the black hole at the galaxy’s centre heats up the gas and prevents star formation. Or
maybe that happens after a billion years, not 500 million.
With each additional complication, both the list of rules and
the list of things that can be altered to provide a better fit to the observations grow. What starts as a simple set of rules quickly
becomes a long list of variables, of parameters that can be tweaked
How Science iS Done 27
to match the computer model to the real Universe. Need more star
formation? Turn the knob on the left. Need your galaxies to stop
growing earlier in the Universe’s history? Push the red button.
I’m being slightly unfair, and I think most would agree that
semi-analytic models do a good job of accounting for the obser-
vations of the Universe we have today (there are a few interesting
exceptions, as we’ll see in Chapter 2), but deciding when to add
more complexity to the model is a difficult problem. As you
make what started out as simple rules more complex, then you
should, almost by definition, always do a better job of matching
to any given set of observations without necessarily gaining any
new insights. This kind of work, where the skills needed involve
deep statistical insight and a good gut feeling for the status of
your model, is a long way from the science fair vision of a unify-
ing scientific method with a single hypothesis being tested by a
single experiment. The best I can do in writing down a simple
hypothesis for a semi-analytic model of galaxy formation is
something like ‘There exists a model I can make from rules which
explains the observations we have of the large-scale structure of
the Universe’, which is hardly satisfying. It’s a long way from
what I’d actually chose in studying whether light is sufficiently
polarized around a single young star to influence the chemistry.
I think that the process being followed here is so different from
science fair procedure that you can think of the computer mod-
elling that’s become increasingly important in lots of sciences as
a whole new way of doing science.
I imagine a band of stereotypical scientists. One sits, dressed
perhaps in a Greek toga or covered in chalk at a blackboard, scrib-
bling equations before writing QED in big letters under some
world shattering conclusion. They’re a theorist, looking for the
mathematic underpinnings of the Universe. A second, wearing
a lab coat, is surrounded by bubbling test tubes and complex
28 How Science iS Done
&nb
sp; glassware. Their life is spent weighing things, in adding this to
that and occasionally putting the resulting compounds into
machines that go ‘beep’ and which spit out graphs. They’re an
experimenter, testing the theories the other comes up with.
To this motley crew I think we should add a third character.
They sit in a darkened room in front of a desk with four or five
computer screens on it. Green numbers scroll upwards on at
least one of the screens, and they type in a staccato fashion, caus-
ing a complex three-dimensional visualization of something to
rotate on yet another screen. They are a computational scientist,
and modern science needs them as much as it does the other
two. (It also needs them to talk to the others, which is perhaps a
much harder problem. But that’s another story.)
Understanding this change is key to following some of the
most high-profile scientific debates of the moment. Our inability
to model each atom of the Earth’s atmosphere means that belief
in the reality of climate change essentially relies on a prediction
from a semi-analytic model of the Earth’s atmosphere; every
time you hear someone claiming that the science of climate
change is falsified by the cooling of part of the Antarctic Ocean,
or by an exceptionally cold winter they’re enduring, then you’re
hearing confusion about how these categories of scientific
thought are interacting.
This picture isn’t yet complete. Computer models, though
they produce worlds which can be explored, observed, and
experimented upon, are really a way of doing theory that suits
our digital age. The equivalent observational mode lies in the
freeform exploration of large data sets. Take the Sloan Digital
Sky Survey, for example. In some sense it was a traditional
experiment, with the goal of plotting accurately the positions of
galaxies and thus measuring the expansion of the Universe. Yet
if you go to the survey website, for each galaxy caught in its gaze
How Science iS Done 29
you can download maybe a hundred pieces of information.
These include sizes, shapes, colours, and brightnesses, and
plenty more can be deduced about each system. Is it a member
of a cluster? Has it recently interacted with a neighbour? Is its
massive central black hole actively feeding on gas, dust, and
stars? We can force these questions into ‘traditional’ experi-
ments, or we can start not with a hypothesis, but by looking in
the data for correlations, discovering for example that the most
massive galaxies are reddest or that feeding black holes are bad
news for star formation. This mode of discovery could be
uniquely powerful. Done right, it holds out the promise of not
only providing answers to our questions but of guiding us to the
right questions in the first place.
This is the kind of promise that gets magazine articles and
even books written, and data-driven discovery was labelled the
‘fourth paradigm’ of scientific discovery as far back as 2009, in a
collection of essays under that title published by Microsoft
Research to commemorate the life of pioneering computer sci-
entist Jim Gray. The twin ideas of data exploration and ‘big data’
have attracted plenty of hype, but they are useful in illustrating
quite how science is changing.
Imagine, for example, that you’re an astronomer at the turn
of the nineteenth and twentieth centuries, interested in stars.
Through careful observation, your colleagues have assembled a
catalogue of observations of many of the brightest stars in the
sky. Despite their diligent work, there’s not much to go on. Look
carefully at the night sky with the naked eye or with a small pair
of binoculars, and it is easy to see that stars have different col-
ours. Try, for example, looking at the two brightest stars in the
easily recognized constellation of Orion, Betelgeux and Rigel.
While Rigel is blue or white, Betelgeux, an enormous star which
would engulf Jupiter were it placed in the centre of our Solar
30 How Science iS Done
System, appears orange or even red to the naked eye. As well as
colour, we can easily measure the apparent brightness of the
stars as well.
The breakthrough came when astronomers realized they
could use a variety of methods to measure distances to at least
the nearest stars. One simple method relies on an apparent
shift—a parallax—in the position of a star relative to a more dis-
tant background as the Earth moves around its orbit, just as you
can make a finger held in front of your face at arm’s length jump
from side to side by looking at it first through one eye and then
another. What measurements liked these allowed for the first
time was the conversion of the apparent brightness of a star—
how bright it appears to be—into an intrinsic luminosity which
reflects how powerful the stars actually are. So with colour, and
luminosity, we have a data set we can explore.
Perhaps there’s a relationship between the two. In fact, if you plot
luminosity against colour on what’s now called the Hertzsprung–
Russell diagram after two of the first scientists to do this system-
atically, you find that many stars lie on a rough line, known as the
main sequence. Stars which are bluer tend to be more luminous.
Those which are red tend to be less luminous, with the Sun sit-
ting on the main sequence somewhere between the two. Once
you realize that the colour of a star reflects its temperature this
makes more sense; a blue star like Bellatrix in the belt of Orion,
thousands of times more luminous than the Sun, has a surface
temperature of about 22,000 degrees Celsius—pretty hot, espe-
cially compared to the Sun’s 6,000 degrees. On the other hand,
some of the coolest stars known, puny brown dwarfs, can have
surface temperatures which are mild even compared to room
temperature (Plate 4).
That this relationship exists therefore reveals that the source of
a star’s luminosity must also be responsible for setting the other’s
How Science iS Done 31
temperature, but more importantly the fact that the main sequence
exists at all reveals that the stars that lie upon it must share a
source of power. In fact, all stars on the main sequence are fusing
hydrogen together in their cores to form helium, releasing energy
in the process, and those which do not lie on the sequence are
either protostars still in the process of getting to the point where
they can sustain this sort of stable nuclear fusion, or else those
which have graduated to other sources of energy, such as the
fusion of helium into other, heavier elements. In this discovery
from more than a hundred years ago, there is clear evidence of
the fourth paradigm at work, as the exploration of stellar data
pointed researchers in the direction of the correct theory for stel-
lar fusion. Of course, the full story of how astronomers came to
understand how stars are fuelled
is more interesting and compli-
cated than the simple version given above, and worthy of a book
in its own right. What is important for my purposes is that the
discovery of the main sequence provided powerful support for
the idea of a single energy source for stars at very different tem-
peratures and with very different histories.
These days, astronomers studying stars have much more
information at their fingertips. Most of the objects captured by
the Sloan Digital Sky Survey were not galaxies at all, but stars,
and a data set with hundreds of pieces of information about each
and every one of them is available to researchers worldwide. This
rich resource, and those from more targeted surveys, opens up
the prospect of new insights into the processes of stellar evolu-
tion, but they also make the challenge of data-driven science
apparent. We know, because of the work of Hertzsprung, Russell,
and a century of astrophysics, that the ‘right’ thing to do is to plot temperature (or its proxy, colour) against luminosity. Coming in
blind, that’s not so obvious; Alex Szalay at Johns Hopkins, a bril-
liant collaborator and a man responsible for much of the data
32 How Science iS Done
processing that sits behind the Sloan Digital Sky Survey’s power,
ran an entire research programme with the sole aim of redis-
covering the Hertzsprung–Russell diagram among this data. The
catch was that Alex’s group wanted to do so with their hands off
the wheel, trusting in automated searches to identify the signal
among the noise. Trying to discover the cutting-edge science of
yesteryear among the modern data deluge sounds like a fool’s
errand, but it’s surprisingly tough, emphasizing that new tech-
niques are critical if we’re to make the most of the data that we
have.
And what a lot of data it is. Sloan seemed overwhelming to
astronomers a few years ago, but what’s coming down the pipe is
truly scary. I had my first glimpse of this future a few years ago
shortly after walking onto the pitch of the University of Arizona’s
football stadium. College football is, in Arizona as in much of the
US, something of a big deal, and the stadium is impressive,
immaculately tended and seating more than 50,000 fans of the
Wildcats. Its real beauty, though, lies underneath the stands,