Your assumptions are your windows on the world (Isaac Asimov)
Apparently there is a significant correlation between the number of people who drowned in a pool and the number of films starring Nicolas Cage. Maybe these people really did not enjoy his movies, although most probably this is just a spurious relationship. That is, these two events are indeed associated, but not in a causal way. Watching a movie with Mr. Cage won’t make you drown. And there are many such relationships. In this case, luckily enough, whether we interpret it as causal or not, it will probably have no relevant consequences. Except, perhaps, for Cage’s finances.
But suppose instead that we want to research the effects of exposure to a chemical, like a possible carcinogen or a new drug. Well, now our interpretation of the results might have far more catastrophic consequences. A chemical deemed as safe might actually cause serious health effects, or a drug which did not show any “statistically significant” results might in fact be beneficial. In both cases, there is the concrete possibility that a large proportion of the population will suffer. Establishing causality is not just recommended, it’s imperative. Interestingly, most epidemiologists are aware of the phrase “correlation does not imply causation”, nowadays recited almost as a mantra. Yet, most of us are almost scared of stating the causal objective of our scientific efforts. Indeed, if you randomly pick an epidemiological paper, there’s a good chance that the authors concluded the manuscript by writing something along the lines of “Due to the observational nature of our study, we cannot establish causality”. And unfortunately, in some cases, the authors did not make much effort in an attempt to establish causality. As Miguel A. Hernan writes: “Associational questions are easy to formulate and straightforward to answer when data are available”. A common cause of spurious associations is what we call confounding. A confounder can be described as an event or a variable that is associated with both the exposure (e.g., our chemical) and the outcome (e.g., cancer). Confounders can distort our results, and in some cases they can even change the direction of an effect. The good news is that there exists methods to “control” for these variables, thus reducing their influence on the effect of interest. The bad news is that selecting the right confounders, when available, for our specific question is difficult, to say the least. There is no fancy method to automatically identify them, you need subject-specific knowledge. This requires time and money, something extremely valuable in academia.
Things get even more complicated, but surely more relevant and interesting, when we wish to study the health effects of perhaps hundreds or even thousands of exposures simultaneously. In fact, our health is determined by many aspects of our environment. The sum of all these non-genetic determinants of health is now known as the exposome. And the interest in this field of research has exploded in recent years. Many actors across the world have expressed or are expressing interest in this innovative concept. ISGlobal is currently one of the leading institutions in this field, being part of large European exposome projects like ATHLETE, Equal-Life, EXPANSE and EPHOR. These projects will collect massive amounts of data to link chemical, social and urban exposures to molecular responses and health outcomes. Big data are necessary to answer these complex questions, but they are not sufficient to establish causality. Big data cannot, and never will, replace careful thinking and domain-specific knowledge. And if things were complicated for one exposure and one outcome, we can only imagine the difficulties that will arise when we will try to identify the necessary confounders to establish causality for these complex effects. It is a daunting task, but it is also a necessary one.
Let’s now assume that you came up with an interesting causal question. Let’s further assume that domain-specific knowledge is available, and that all the relevant confounders were identified. How do you answer this question? Unfortunately, the standard statistical models that epidemiologists commonly use, while easy to implement and interpret, are not up to the task. Luckily enough, statisticians, over the past few decades or so, have developed some “smart” solutions. These modern statistical methods, while not always easy to understand and implement, allow us to integrate, analyze, and interpret large amounts of data. And to obtain precise estimates of the target quantities. Applied researchers working with non-experimental data cannot pretend that these questions cannot be answered anymore. And we can do more. It is fairly common nowadays to get data from multiple sources, often independent of one another. We can thus take advantage of this and, perhaps by using different modeling strategies, triangulate the evidence with the hope of reducing the bias and get closer to the truth. For instance, within the OBERON project, of which ISGlobal is also a partner, we are trying to study the health impacts of a certain class of chemicals based on in vitro, in silico, and epidemiological evidence.
To conclude, the output of these large exposome research projects has the potential to provide the understanding necessary to prevent the effects of a multitude of environmental hazards, starting from the earliest stages of life. You can learn more about the exposome and the leading role of ISGlobal in this field here.