Identification and Estimation of Causal Effects

Lorenzo Fabbri

July 5, 2024

1 Resources

G-computation, IPTW, and EIF-based estimators for when:
- There are more than two time points.
- There is more than one exposure.
- The exposures are continuous (IPTW).
Do the terms f(a|l) go away because we are setting the exposure A to a, or are they incorporated with f(l) into f(a,l)? If they are removed, where does the summation over a come from for natural-value based interventions?
What is the difference between a^∗, a, and a^+g?
E[Y ^g|l,a^+g]f(a^+g|a): what if A is continuous? Just multiply?
Document FF; GitHub discussion; Stack-Exchange posts.

2 Time-invariant Exposures

2.1 Deterministic Treatment Regimes

The rule for assigning treatment does so with probability 1.

2.1.1 Deterministic Static Treatment Regimes

The rule for assigning treatment does not depend on past treatment or covariates.

LA|aY^a

Figure 1: A SWIG representing a static treatment regime.

The joint density is:

f (y, l,a ) = f (y|l,a)f (a |l)f(l).

(1)

After intervening on the exposure A, we have:

G f (y,l) = f(y|l,a)f(l).

(2)

Thus, the expected value of the outcome Y is:

𝔼^G	= ∑ _yyf^G(y)	(3)
	= ∑ _yy ∑ _lf^G(y,l)	(4)
	= ∑ _y ∑ _lyf(y\|l,a)f(l)	(5)
	= ∑ _l𝔼 f(L = l).	(6)

Algorithms One estimator of 𝔼^G[Y ] is called parametric g-computation formula, and is based on an outcome model alone. For the simple case of a deterministic static intervention with one exposure and a single time point, the pseudo-algorithm reads as follows:

Fit a regression model with dependent variable Y and independent variables A and L.
Estimate the outcome Y ^a using the model fit in the previous point but changing the exposure according to the intervention rule G.
Take the average of Ŷ^a over the confounders L.

In the case of multiple exposures (e.g., if A is actually a vector of variables), the g-formula would remain the same ¹ , but the pseudo-algorithm should be modified to take into account that the intervention rule now applies to all exposures a ∈ A.

A second estimator of 𝔼^G[Y ] is based on modeling the exposure mechanism, rather than the outcome, and it is referred to as inverse probability of treatment weighting (IPTW) estimator. This can be derived nothing that the g-formula for 𝔼^G[Y ] can be rewritten as follows:

𝔼^G	= ∑ _y ∑ _lyf(y\|l,a)f(l)	(7)
	= ∑ _y ∑ _lyf(y,a\|l)	(8)
	= ∑ _l𝔼f(l)	(9)
	= 𝔼.	(10)

The pseudo-algorithm for a binary exposure A reads as follows (for simplicity, we do not consider the censoring mechanism C here):

Fit a model with dependent variable A and independent variables L.
Denoting the predictions from this model as p_a, estimate the weights w_a = .
Take the average of w_a × Y over the confounders L.

2.1.2 Deterministic Dynamic Treatment Regimes

The rule for assigning treatment depends on past treatment or covariates.

LA|a^gY^g

Figure 2: A SWIG representing a dynamic treatment regime.

The joint density is:

g g g f(y,l,a,a ) = f (y |l,a )f(a |l)f (a|l)f(l).

(11)

After intervening on the exposure A, we have:

G g g g f (y,l,a ) = f(y|l,a )f (a |l)f(l).

(12)

Thus, the expected value of the outcome Y is:

𝔼^G	= ∑ _yyf^G(y)	(13)
	= ∑ _yy ∑ _l ∑ _a^gf^G(y,l,a^g)	(14)
	= ∑ _y ∑ _l ∑ _a^gyf(y\|l,a^g)f(a^g\|l)f(l)	(15)
	= ∑ _l ∑ _a^g𝔼 f(a^g\|L = l)f(L = l).	(16)

2.1.3 Deterministic Natural Treatment Regimes

The rule for assigning treatment depends on its natural value.

LA−−→a^gY^g

Figure 3: A SWIG representing a natural treatment regime.

The joint density is:

g g g f(y,l,a,a ) = f(y |l,a )f(a |l,a)f(a|l)f (l).

(17)

After intervening on the exposure A, we have:

G g g g f (y,l,a,a ) = f(y|l,a )f(a |l,a)f (a |l)f(l).

(18)

Thus, the expected value of the outcome Y is:

𝔼^G	= ∑ _yyf^G(y)	(19)
	= ∑ _yy ∑ _l ∑ _a ∑ _a^gf^G(y,l,a,a^g)	(20)
	= ∑ _y ∑ _l ∑ _a ∑ _a^gyf(y\|l,a^g)f(a^g\|l,a)f(a\|l)f(l)	(21)
	= ∑ _l ∑ _a ∑ _a^g𝔼 f(a^g\|a,L = l)f(a\|l)f(L = l).	(22)

Algorithms One estimator of 𝔼^G[Y ] is the parametric g-computation formula. For the simple case of a deterministic intervention that depends on the natural value of a single exposure and a single time point, it suffices to notice that Equation 22 is equivalent to:

𝔼^G

= ∑ _l ∑ _a ∑ _a^g𝔼 g
[Y |A = a ,L = l]

f(a^g,a,l).

(23)

The pseudo-algorithm then reads as follows:

Fit a regression model with dependent variable Y and independent variables A and L.
Estimate the outcome Y ^a using the model fit in the previous point but changing the exposure according to the intervention rule G.
Take the average of Ŷ^a over the confounders L.

In the case of multiple exposures (e.g., if A is actually a vector of variables), the g-formula would remain the same ² , but the pseudo-algorithm should be modified to take into account that the intervention rule now applies to all exposures a ∈ A.

2.1.4 Modified Treatment Policies

2.2 Random Treatment Regimes

The rule for assigning treatment does so with probability between 0 and 1.

3 Time-varying Exposures

3.1 Deterministic Treatment Regimes

The rule for assigning treatment does so with probability 1.

3.1.1 Deterministic Static Treatment Regimes

The rule for assigning treatment does not depend on past treatment or covariates.

If f^int(a_k|ā_k−1,D_k = 0) is either 0 or 1 for each ā_k and for k = 0,…,K. In particular, given the regime g = (g₀,…,g_K), f^int(a_k|ā_k−1^g,D_k = 0) = 1 if a_k = a_k^g, and 0 otherwise, with a_s^g = g_s(ā_s−1^g).

L₀A₀|a₀L₁^a₀A₁^a₀|a₁Y^a₀,a₁

Figure 4: A SWIG representing a static treatment regime.

The joint density is:

f(y,l₀,l₁,a₀,a₁) =	f(y\|l₀,l₁,a₀,a₁)×	(24)
	f(a₁\|l₀,l₁,a₀) × f(l₁\|l₀,a₀)×	(25)
	f(a₀\|l₀) × f(l₀).	(26)

After intervening on the exposure A at both time points, we have:

f^G(y,l₀,l₁,a₀,a₁) =	f(y\|l₀,l₁,a₀,a₁)×	(27)
	f(a₁\|l₀,l₁,a₀) × f(l₁\|l₀,a₀)×	(28)
	f(a₀\|l₀) × f(l₀).	(29)

Thus, the expected value of the outcome Y is:

𝔼^G	= ∑ _yyf^G(y)	(30)
	= ∑ _yy ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₁f^G(y,l₀,l₁,a₀,a₁)	(31)
	= ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₁ 𝔼 × f(a₁\|l₀,l₁,a₀)× f(l₁\|l₀,a₀)× f(a₀\|l₀)× f(l₀).	(32)

Algorithms One estimator of 𝔼^G[Y ] is the parametric g-computation formula. For the case of a deterministic static intervention with one exposure and two time points, we can rewrite Equation 32 so that it corresponds to a series of conditional expectations:

𝔼^G	= ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₁ 𝔼 × f(a₁\|l₀,l₁,a₀) × f(l₁\|l₀,a₀)× f(a₀\|l₀) × f(l₀)	(33)
	= ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₁ 𝔼 × f(a₁,l₁\|l₀,a₀)× f(a₀,l₀).	(34)

Equation 34 suggests a different form for the parametric g-computation formula, which in the literature is usually called iterated conditional expectation (ICE) g-computation formula. The pseudo-algorithm then reads as follows (Ā_t means the history of A up to time t):

Fit a regression model with dependent variable Y and independent variables Ā₁ and L₁.
Estimate the outcome Y ^ā₁ using the model fit in the previous point but changing the exposure A₁ according to the intervention rule G.
Fit a regression model with dependent variable Ŷ^ā₁ and independent variables A₀ and L₀.
Estimate the outcome Y ^ā₀ using the model fit in the previous point but changing the exposure A₀ according to the intervention rule G.
Take the average of Ŷ^ā₀ over the confounders L₀.

In the case of more than two time points, simply repeat the steps above until reaching t = 0. The ICE g-computation formula is appealing because it does not require the specification of models for the confounders at each time point.

In the case of multiple exposures (e.g., if A_t is actually a vector of variables), the g-formula would remain the same ³ , but the pseudo-algorithm should be modified to take into account that the intervention rule now applies to all exposures a_t ∈ A_t.

3.1.2 Deterministic Dynamic Treatment Regimes

The rule for assigning treatment depends on past treatment or covariates.

If f^int(a_k|l_k,ā_k−1,D_k = 0) is either 0 or 1 for each (ā_k,l_k) and for k = 0,…,K. In particular, given the regime g = (g₀,…,g_K), f^int(a_k|l_k,ā_k−1^g,D_k = 0) = 1 if a_k = a_k^g, and 0 otherwise, with a_s^g = g_s(l_s,ā_s−1^g).

3.1.3 Deterministic Natural Treatment Regimes

The rule for assigning treatment depends on its natural value.

L₀A₀−−→A₀^+gL₁A₁−−→A₁^+gY

Figure 5: A SWIG representing a natural treatment regime.

The joint density is:

f(y,l₀,l₁,a₀,a₀^g,a₁,a₁^g) =	f(y\|l₀,l₁,a₀^g,a₁^g)×	(35)
	f(a₁^g\|l₀,l₁,a₀^g,a₁)×	(36)
	f(a₁\|l₀,l₁,a₀^g)×	(37)
	f(l₁\|l₀,a₀^g)×	(38)
	f(a₀^g\|l₀,a₀)×	(39)
	f(a₀\|l₀)f(l₀).	(40)

After intervening on the exposure A, we have:

f^G(y,l₀,l₁,a₀,a₀^g,a₁,a₁^g) =	f(y\|l₀,l₁,a₀^g,a₁^g)×	(41)
	f(a₁^g\|l₀,l₁,a₀^g,a₁)×	(42)
	f(a₁\|l₀,l₁,a₀^g)×	(43)
	f(l₁\|l₀,a₀^g)×	(44)
	f(a₀^g\|l₀,a₀)×	(45)
	f(a₀\|l₀)f(l₀).	(46)

Thus, the expected value of the outcome Y is:

𝔼^G	= ∑ _yyf^G(y)	(47)
	= ∑ _yy ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₀^g ∑ _a₁ ∑ _a₁^gf^G(y,l₀,l₁,a₀,a₀^g,a₁,a₁^g)	(48)
	= ∑ _l₀ ∑ _l₁ ∑ _a₀ ∑ _a₀^g ∑ _a₁ ∑ _a₁^g 𝔼 × f(a₁^g\|l₀,l₁,a₀^g,a₁)× f(a₁\|l₀,l₁,a₀^g)× f(l₁\|l₀,a₀^g)× f(a₀^g\|l₀,a₀)× f(a₀\|l₀)f(l₀).	(49)