# How do we remove the “healthy vaccinee” bias? Part 2

I did not think there would be part 2 to my previous article on the topic, and so soon, but there is something fascinating about science. You are thinking about a problem for a long time, consider it from many angles, and eventually decide that you have the right viewpoint, that you “got it right”.

Then, at some later moment you understand that you missed something. You see a new angle that you did not think about. It is a creative discovery for you, except that it means that you were wrong (in part). On the positive side, you also discover new questions. So, what do you do? Let someone else make the point?

Nope, you accept and inform. Scientists make mistakes, which sometimes lead to new questions. That’s part of the job.

I think I understand the logic of the two-step method as used in the study from Hungary. I think I understand the “healthy vaccinee” bias in their data. I think I understand the logic of dividing two adjusted rate ratios. My viewpoint about the “correction factor” was valid, but there is another viewpoint which is equally valid.

It is easier to follow if we switch the order and talk about the “sick unvaccinated bias”, instead of the “healthy vaccinee bias”. That is, unvaccinated are sicker than vaccinated, on average.

Suppose that unvaccinated have twice the risk of non-Covid death because they are sicker (risk ratio = 2)

Suppose that unvaccinated have ten times the risk of Covid death (risk ratio = 10), which is biased of course. They have twice the risk of any death, to begin with.

How do we correct the biased risk ratio of Covid death?

Divide: 10/2 = 5.

They truly have only five times the risk of Covid death, beyond their “baseline” elevated risk.

Which means that the effect of the vaccine is the inverse: 1/5 = 0.2

We can get the same result if we compute everything using the inverse:

Corrected vaccine effect: (1/10) / (1/2) = 0.2

So, if you look at my figure with math notation in the previous article, you will see that the ratio on the right has logic of its own. It is not just math derivation from what I (correctly) wrote on the left (correction factor).

That’s actually interesting. Two expressions, each having its own valid rationale, can be derived from each other, mathematically.

Next. There are two ways to define the healthy vaccinee bias, one of which refers to the pseudo effect that remains after adjustment for some set of variables. That’s what they did in the study from Hungary. Notice that it’s a vague definition because “some set of variables” could be “age”, “age and sex”, “age, sex, socioeconomic status, blood pressure”, and an unlimited number of other sets. Any “leftover” bias qualifies.

Another way to define the bias is simply to consider the pseudo protective effect against non-Covid death that we compute from a comparison of vaccinated with unvaccinated. It is called “crude” association. That was my approach. That was the approach in the letter in *The New England Journal of Medicine*.

The authors of the study from Hungary (and the critic of the letter in *The New England Journal of Medicine*) chose the first definition. Accordingly, the healthy vaccinee bias in the Hungarian study was quantified by the adjusted hazard ratio in the non-epidemic period. It was 0.384. That’s the “leftover” bias after adjusting for *their* set of variables. In the absence of the “healthy vaccinee bias”, the ratio should have been 1. In epidemiological jargon, the “leftover” bias is called “residual confounding”. There are other confounders which were not included in the set of variables they adjusted for.

My approach, which was also the approach of the authors of the letter in *The New England Journal of Medicine*, relied on a simple comparison of vaccinated with unvaccinated. The pseudo effect on non-Covid death from that comparison is the “healthy vaccinee” bias. We do not reduce the bias to “remaining” bias after adjusting for some set of variables.

So, we have two approaches for removing the bias. Both seem valid, and they give different results. Which is better?

Is the “leftover” approach necessary superior? Not at all. In fact, the domain of causal diagrams, with which I am familiar, suggests otherwise. For instance, adjustment can sometimes increase the bias (when we mistakenly add colliders to the model). In addition, rate ratios from different regression models are not strictly comparable, even when they rely on the same observations and the same independent variables. The outcomes are different (Covid, non-Covid.) Is there any counter-argument about our model-free approach?

If a single answer can be provided, it should come from a new field of methodological research that involves theoretical work, empirical studies, and simulations. Since we have about three articles on methods to remove the “healthy vaccinee” bias, we do not know.

What I do know is that those who thought that the “leftover” approach is necessarily superior should do one of the following: try to explain why they are right, or revise their viewpoint, as I did here.