The Crowd and the Cosmos: Adventures in the Zooniverse Read online

Page 5

lab experiments I had to set up a computer simulation. Using a

few simple equations (and someone else’s code) it was possible

to set up an arrangement where light was assumed to have

emerged from a young star of a particular size, mass, age, and

brightness, to scatter from a first, dense cocoon of dust, becom-

ing linearly polarized in the process. Then, we want the light in

our computer simulation to encounter a second dusty struc-

ture—perhaps a disc—and to calculate the degree of circular

polarization resulting from this second scattering. It’s at this

point we have to make choices. We have to decide on the size of

the disc, and its shape. The strong and tempestuous winds blow-

How Science iS Done 25

ing from the young star have likely cleared all the material away

from its immediate surroundings, producing a gap at the centre

of the disc, so we need to decide on the size of this region too.

The dust grains themselves need a composition—are they car-

bon, or silicon, or a mix of the two?—and we need to work out

whether they have ice or whether they are bare. They need a

shape. Are they round? Or needle shaped? If the latter, how elon-

gated are they? That matters because nicely needled grains can

become aligned in the presence of a magnetic field, and such

alignments may lead to further polarization. How strong is the

magnetic field? These and many other questions need answering

if we are to make progress, and we’re already a long way from the

simple plant experiment that could be reduced to a single test.

We can, if we choose, run a large number of simulations. Each

time, we can keep almost all of the different parameters fixed,

altering only one thing for each run. That might work here, for

even complicated astronomical questions reduce to reasonably

small sets of equations and variables, but for more complex sys-

tems this approach will break down. If you’ve ever been frus-

trated by a weather forecast, then one of the reasons is that even

with some of the most powerful supercomputers in the world it’s

simply not possible to build a model of the Earth’s atmosphere

accurate enough to account for everything that observations tell

us must be happening in this very complicated system. In the

case of our light-scattering dust disc, we also have to deal with

the opposite problem of creating a model so complicated that it

can explain pretty much any set of observations.

This phenomenon, known as over-fitting, is a serious worry in

cases where our ability to think of variables to fiddle with far out-

strips our ability to gather observations to test the worlds created

inside our computers. The starkest example in the astronomical

world is in the argument, now twenty years old, about how to

26 How Science iS Done

build a computer model large enough and detailed enough to

allow us to study the evolution of large-scale structure in the

Universe. Building such a cosmological model from first prin-

ciples, pinpointing and tracking the position of every atom

within a cosmologically significant volume of space, is for all

intents and purposes impossible. Yet we don’t in most cases have

the luxury of treating the galaxies as simple point particles, inter-

acting only via gravity, because to compare the results of a model

to the real Universe requires including messy phenomena like

star formation (what cosmologists like to call, somewhat dismis-

sively, ‘gastrophysics’) which depend on the behaviour of indi-

vidual atoms. A computer model, no matter how beautiful, will

fail to match what we can see if it can’t predict the formation of

the stars whose light we observe and so, instead of building a

simulation that would require a computer the size of the Solar

System, there is a whole industrial complex of scientists spend-

ing their careers building what are called semi-analytic models.

The game here is to guess at a set of simple rules that match,

even while they don’t explain, the behaviour of the system being

studied. Maybe a galaxy converts 10 per cent of its gas to stars

every billion years. Maybe it’s 5 per cent, or 2 per cent. Maybe it is 10 per cent after all, but the process only occurs only when the

galaxy has more than ten billion solar masses of gas on hand. Or

maybe when it has more than a billion. Maybe a galaxy can con-

vert a certain percentage of its mass to stars, but after 500 million years activity associated with gas falling into the black hole at the galaxy’s centre heats up the gas and prevents star formation. Or

maybe that happens after a billion years, not 500 million.

With each additional complication, both the list of rules and

the list of things that can be altered to provide a better fit to the observations grow. What starts as a simple set of rules quickly

becomes a long list of variables, of parameters that can be tweaked

How Science iS Done 27

to match the computer model to the real Universe. Need more star

formation? Turn the knob on the left. Need your galaxies to stop

growing earlier in the Universe’s history? Push the red button.

I’m being slightly unfair, and I think most would agree that

semi-analytic models do a good job of accounting for the obser-

vations of the Universe we have today (there are a few interesting

exceptions, as we’ll see in Chapter 2), but deciding when to add

more complexity to the model is a difficult problem. As you

make what started out as simple rules more complex, then you

should, almost by definition, always do a better job of matching

to any given set of observations without necessarily gaining any

new insights. This kind of work, where the skills needed involve

deep statistical insight and a good gut feeling for the status of

your model, is a long way from the science fair vision of a unify-

ing scientific method with a single hypothesis being tested by a

single experiment. The best I can do in writing down a simple

hypothesis for a semi-analytic model of galaxy formation is

something like ‘There exists a model I can make from rules which

explains the observations we have of the large-scale structure of

the Universe’, which is hardly satisfying. It’s a long way from

what I’d actually chose in studying whether light is sufficiently

polarized around a single young star to influence the chemistry.

I think that the process being followed here is so different from

science fair procedure that you can think of the computer mod-

elling that’s become increasingly important in lots of sciences as

a whole new way of doing science.

I imagine a band of stereotypical scientists. One sits, dressed

perhaps in a Greek toga or covered in chalk at a blackboard, scrib-

bling equations before writing QED in big letters under some

world shattering conclusion. They’re a theorist, looking for the

mathematic underpinnings of the Universe. A second, wearing

a lab coat, is surrounded by bubbling test tubes and complex

28 How Science iS Done

&nb
sp; glassware. Their life is spent weighing things, in adding this to

that and occasionally putting the resulting compounds into

machines that go ‘beep’ and which spit out graphs. They’re an

experimenter, testing the theories the other comes up with.

To this motley crew I think we should add a third character.

They sit in a darkened room in front of a desk with four or five

computer screens on it. Green numbers scroll upwards on at

least one of the screens, and they type in a staccato fashion, caus-

ing a complex three-dimensional visualization of something to

rotate on yet another screen. They are a computational scientist,

and modern science needs them as much as it does the other

two. (It also needs them to talk to the others, which is perhaps a

much harder problem. But that’s another story.)

Understanding this change is key to following some of the

most high-profile scientific debates of the moment. Our inability

to model each atom of the Earth’s atmosphere means that belief

in the reality of climate change essentially relies on a prediction

from a semi-analytic model of the Earth’s atmosphere; every

time you hear someone claiming that the science of climate

change is falsified by the cooling of part of the Antarctic Ocean,

or by an exceptionally cold winter they’re enduring, then you’re

hearing confusion about how these categories of scientific

thought are interacting.

This picture isn’t yet complete. Computer models, though

they produce worlds which can be explored, observed, and

experimented upon, are really a way of doing theory that suits

our digital age. The equivalent observational mode lies in the

freeform exploration of large data sets. Take the Sloan Digital

Sky Survey, for example. In some sense it was a traditional

experiment, with the goal of plotting accurately the positions of

galaxies and thus measuring the expansion of the Universe. Yet

if you go to the survey website, for each galaxy caught in its gaze

How Science iS Done 29

you can download maybe a hundred pieces of information.

These include sizes, shapes, colours, and brightnesses, and

plenty more can be deduced about each system. Is it a member

of a cluster? Has it recently interacted with a neighbour? Is its

massive central black hole actively feeding on gas, dust, and

stars? We can force these questions into ‘traditional’ experi-

ments, or we can start not with a hypothesis, but by looking in

the data for correlations, discovering for example that the most

massive galaxies are reddest or that feeding black holes are bad

news for star formation. This mode of discovery could be

uniquely powerful. Done right, it holds out the promise of not

only providing answers to our questions but of guiding us to the

right questions in the first place.

This is the kind of promise that gets magazine articles and

even books written, and data-driven discovery was labelled the

‘fourth paradigm’ of scientific discovery as far back as 2009, in a

collection of essays under that title published by Microsoft

Research to commemorate the life of pioneering computer sci-

entist Jim Gray. The twin ideas of data exploration and ‘big data’

have attracted plenty of hype, but they are useful in illustrating

quite how science is changing.

Imagine, for example, that you’re an astronomer at the turn

of the nineteenth and twentieth centuries, interested in stars.

Through careful observation, your colleagues have assembled a

catalogue of observations of many of the brightest stars in the

sky. Despite their diligent work, there’s not much to go on. Look

carefully at the night sky with the naked eye or with a small pair

of binoculars, and it is easy to see that stars have different col-

ours. Try, for example, looking at the two brightest stars in the

easily recognized constellation of Orion, Betelgeux and Rigel.

While Rigel is blue or white, Betelgeux, an enormous star which

would engulf Jupiter were it placed in the centre of our Solar

30 How Science iS Done

System, appears orange or even red to the naked eye. As well as

colour, we can easily measure the apparent brightness of the

stars as well.

The breakthrough came when astronomers realized they

could use a variety of methods to measure distances to at least

the nearest stars. One simple method relies on an apparent

shift—a parallax—in the position of a star relative to a more dis-

tant background as the Earth moves around its orbit, just as you

can make a finger held in front of your face at arm’s length jump

from side to side by looking at it first through one eye and then

another. What measurements liked these allowed for the first

time was the conversion of the apparent brightness of a star—

how bright it appears to be—into an intrinsic luminosity which

reflects how powerful the stars actually are. So with colour, and

luminosity, we have a data set we can explore.

Perhaps there’s a relationship between the two. In fact, if you plot

luminosity against colour on what’s now called the Hertzsprung–

Russell diagram after two of the first scientists to do this system-

atically, you find that many stars lie on a rough line, known as the

main sequence. Stars which are bluer tend to be more luminous.

Those which are red tend to be less luminous, with the Sun sit-

ting on the main sequence somewhere between the two. Once

you realize that the colour of a star reflects its temperature this

makes more sense; a blue star like Bellatrix in the belt of Orion,

thousands of times more luminous than the Sun, has a surface

temperature of about 22,000 degrees Celsius—pretty hot, espe-

cially compared to the Sun’s 6,000 degrees. On the other hand,

some of the coolest stars known, puny brown dwarfs, can have

surface temperatures which are mild even compared to room

temperature (Plate 4).

That this relationship exists therefore reveals that the source of

a star’s luminosity must also be responsible for setting the other’s

How Science iS Done 31

temperature, but more importantly the fact that the main sequence

exists at all reveals that the stars that lie upon it must share a

source of power. In fact, all stars on the main sequence are fusing

hydrogen together in their cores to form helium, releasing energy

in the process, and those which do not lie on the sequence are

either protostars still in the process of getting to the point where

they can sustain this sort of stable nuclear fusion, or else those

which have graduated to other sources of energy, such as the

fusion of helium into other, heavier elements. In this discovery

from more than a hundred years ago, there is clear evidence of

the fourth paradigm at work, as the exploration of stellar data

pointed researchers in the direction of the correct theory for stel-

lar fusion. Of course, the full story of how astronomers came to

understand how stars are fuelled
is more interesting and compli-

cated than the simple version given above, and worthy of a book

in its own right. What is important for my purposes is that the

discovery of the main sequence provided powerful support for

the idea of a single energy source for stars at very different tem-

peratures and with very different histories.

These days, astronomers studying stars have much more

information at their fingertips. Most of the objects captured by

the Sloan Digital Sky Survey were not galaxies at all, but stars,

and a data set with hundreds of pieces of information about each

and every one of them is available to researchers worldwide. This

rich resource, and those from more targeted surveys, opens up

the prospect of new insights into the processes of stellar evolu-

tion, but they also make the challenge of data-driven science

apparent. We know, because of the work of Hertzsprung, Russell,

and a century of astrophysics, that the ‘right’ thing to do is to plot temperature (or its proxy, colour) against luminosity. Coming in

blind, that’s not so obvious; Alex Szalay at Johns Hopkins, a bril-

liant collaborator and a man responsible for much of the data

32 How Science iS Done

processing that sits behind the Sloan Digital Sky Survey’s power,

ran an entire research programme with the sole aim of redis-

covering the Hertzsprung–Russell diagram among this data. The

catch was that Alex’s group wanted to do so with their hands off

the wheel, trusting in automated searches to identify the signal

among the noise. Trying to discover the cutting-edge science of

yesteryear among the modern data deluge sounds like a fool’s

errand, but it’s surprisingly tough, emphasizing that new tech-

niques are critical if we’re to make the most of the data that we

have.

And what a lot of data it is. Sloan seemed overwhelming to

astronomers a few years ago, but what’s coming down the pipe is

truly scary. I had my first glimpse of this future a few years ago

shortly after walking onto the pitch of the University of Arizona’s

football stadium. College football is, in Arizona as in much of the

US, something of a big deal, and the stadium is impressive,

immaculately tended and seating more than 50,000 fans of the

Wildcats. Its real beauty, though, lies underneath the stands,

< Prev Next >

The Crowd and the Cosmos: Adventures in the Zooniverse Read online

Page 5

OTHER AUTHOR'S BOOKS