How do we remove the “healthy vaccinee” bias?

Eyal Shahar
10 min readAug 6, 2023

For reasons that are not relevant here, people who get vaccinated are healthier, on average, than people who don’t. Therefore, their risk of death from any cause, including Covid, is inherently lower. So, even if injected placebo, they would have had a lower risk of Covid death than their unvaccinated counterparts. We would mistakenly claim that placebo is an effective vaccine… This is called the “healthy vaccinee” bias. It belongs to a class of biases that is called “confounding”.

You can read an elaborated explanation here.


Back in 2021 — the year of Covid vaccines — I sent a letter to Lancet, a prestigious journal, in which I tried to expose the healthy vaccinee bias in a study from Israel of the booster effect on Covid death.

Having been aware of heavy censorship of threats to the “safe and effective” narrative, I refrained from using the term and wrote about the possibility of “residual confounding”. I asked the authors to abide by their own methodological standards and check if the Pfizer vaccine had an effect on non-Covid death, which is a method to detect the healthy vaccinee bias. I pointed out that one of the co-authors published an article on this topic in the context of flu vaccines and recommended that approach.

I thought I had a winning hand. I was certain the results will show that the Pfizer vaccine “protects” against death from other causes, which is evidence of healthy vaccinee bias. If so, part, or all, of the so-called protection against Covid death is due to the fact that vaccinated are healthier than unvaccinated, on average.

Nope. I was not successful. The journal rejected the letter, without subject matter explanation, as commonly done. They guarded the gates. There was no scientifically compelling reason to reject my letter. You can read the exchange here.

Time has passed and another high-profile journal — The New England Journal of Medicine — was not careful enough. They recently accepted a letter to the editor on another booster study from Israel and opened a can of worms. In response to the letter, the authors of that study (Arbel et al.) exposed the “effect” of the Pfizer vaccine on non-Covid death.

It was not null. In fact, it was identical to the effect of the Pfizer vaccine on Covid death, which means that the true booster effectiveness was zero…

That letter-to-the editor has attracted a lot of attention. I know firsthand how much attention that journal gets. You can find my name on its pages during my professional career [here, here, here, and here], just to let you know that I have some academic credentials, even though I now publish on Medium.

But the story gets more interesting.

For many months, I have been looking for data that would allow me to expose the healthy vaccinee bias in studies of Covid vaccines and find out what we get after correction. I had a method of correction in mind.

It was not easy to find such an article. Studies of vaccine effectiveness have been hiding data on non-Covid death.

I found one such study recently, analyzed the data, and published an article, shortly before that letter to the editor was published in The New England Journal of Medicine… Let me assure you that it was a coincidence. I did not know about their upcoming letter, they referred to another study of the booster effect, and they surely submitted the letter long before I wrote my article.

I published my article here and it was re-posted on Brownstone and The Daily Sceptic). If you take the time to read it, you should have a full understanding of the topic and the importance of the bias.

With all due respect to Medium, Brownstone, and The Daily Sceptic, my article largely went unnoticed. Fair enough.

Preamble to “how to remove the bias”

Biases are a key issue in observational studies, and you will find hundreds (thousands?) of methodological articles on the topic. I spent many years studying and writing on various biases, mostly from the perspective of causal diagrams.

You will not find, however, many articles on the healthy vaccinee bias, and you will find close to nothing on methods to remove it.

As far as I know (corrections are welcome), there are three articles that directly proposed a method to remove the bias. One was an article from Hungary. Two are mine. [If you are reading this piece, I assume that you have some trust in my writing, regardless of where it is published.]

The content from now on belongs in a scientific journal, but I have no interest in lining up at the peer-review queue. So, excuse me for switching to more technical text. As I used to tell my students: You don’t have to understand everything in order to understand the essence.

A simple, one-step correction

Suppose, hypothetically, that…

The biased risk ratio for Covid death is 0.01/0.02 = 0.5

Why is it biased?

Because vaccinated are healthier than unvaccinated. Their risk of any (other) death is lower (0.1 versus 0.2), so a priori their risk of Covid death is lower, too.

Notice that the risk ratio of 0.5 for non-Covid (other) deaths is also biased. A Covid vaccine should protect against Covid, not against other causes of death.

To get the correct risk ratio for Covid death, we need to adjust that 0.01 upward. We need to make the vaccinated comparable to the unvaccinated, before computing the risk ratio. To that end, the following question should be answered:

What would have been the risk of Covid death in vaccinated — if they were just as unhealthy as their unvaccinated counterparts?

For sure, it would not have been 0.01, but higher. How much higher?

That’s easy. We know that unvaccinated have a two-fold risk of other deaths (0.2/0.1 = 2). That’s the correction factor (or the bias factor).

So, if vaccinated were just as unhealthy, their risk of Covid death would also have been twice as high: 0.01x2 = 0.02

We now have a valid comparison:

The risk of Covid death in vaccinated, if they were just as unhealthy: 0.02

The risk of Covid death in “unhealthy” unvaccinated: 0.02

Corrected RR: 0.02/0.02 =1. The vaccine had no effect on Covid death.

I have been using the phrase “just as unhealthy”. What does it mean? What kind of unhealthiness is accounted for? Which variables are we adjusting for by multiplying 0.01 by 2?

Theoretically, everything that affects non-Covid death, not only disease-related variables, but also age, sex, socioeconomic status, functional status, and so on. We take care of a long list of variables, some of which are not even known to us.

Does that single multiplier of 2 indeed capture everything? That is a generic question that may be asked about any method of adjustment, including any standard multivariable regression model. We never know the answer. Nonetheless, the method is valid. It is simple, logical, intuitive. As rudimentary as it is, I cannot think of any argument that invalidates the method.

Let’s extend the example, using generic notation.

As you see above, the crucial step is multiplication of the risk of Covid death in vaccinated by the ratio in brackets. That’s the correction, the logic of which was explained above. We adjust the risk of Covid death in vaccinated upward to make them comparable to those unhealthy unvaccinated.

It turns out, however, that a little math gets us from the correction step to the ratio on the right. To derive a corrected risk ratio, we simply need to divide the biased risk ratio for Covid death by the biased risk ratio for non-Covid (other) death. For instance, divide 0.5 by 0.5 in the hypothetical example at the beginning of this section.

Back to that letter in The New England Journal of Medicine.

That’s what they found (their data, my table). A biased risk ratio of 0.05 (equivalent to 95% vaccine effectiveness) has turned into 1 (0% vaccine effectiveness) after correction for the healthy vaccinee bias. The authors did not provide a complete explanation as I offered here, likely due to constraints on the length of letters to the editor.

A two-step method

Shortly after the letter was published, someone argued that the response of the authors of the original article allows us to compute a corrected risk ratio differently. According to that method, the corrected risk ratio is 0.43, which means vaccine effectiveness close to 60%. That’s much lower than 90%, as mistakenly estimated, but still substantial.

The critic took the risk ratio (hazard ratio, here) of Covid death from one multi-variable regression model and divided it by the risk ratio of non-Covid death from another multi-variable regression model.

Corrected risk ratio: 0.10/0.23 = 0.43 (vaccine effectiveness: 57%).

Notice that he replicated the one-step method, but there was an extra step. Instead of dividing “crude” (unadjusted) risk ratios, he divided “adjusted” risk ratios. There was an intermediary step in which regression models were run, and adjusted risk ratios were computed.

You may think about his logic as follows.

First, we try to remove as much confounding as possible, by classical regression adjustment. Then, we take care of any “remaining” healthy vaccinee bias.

It seems to me that the method is not sound — for a simple reason. As I showed earlier, the simplified computation, in which one biased risk ratio is divided by another, has no logic of its own. It was a mathematical derivation from the correction step, the logic of which was clear: We adjusted the risk of Covid death in vaccinated upward to make them comparable to their unvaccinated counterparts.

There is no comparable logic for the “remaining” bias approach. Why does dividing two adjusted risk ratios take care of remaining bias? There is no upward correction of any rate here. There is no explanation of why anything informative is achieved by that division. It is one of those instances where a computation seems to be doing something right, but it does not.

How did the critic come up with the method? Where was it published first?

It was published in a study from Hungary, which he mentioned. I will review that study next.

Two spoilers:

First, the authors of that study do not explain why the two-step method is valid. We still have no rationale for the computation.

Second, their application of the method to the Pfizer vaccine generated a senseless result. The method proved to be faulty in their own data.

The Hungarian Study

The design of the study from Hungary is somewhat different. They did not have data on Covid deaths and non-Covid deaths, but rather data on all-cause deaths at two periods: an epidemic period and a non-epidemic period. Comparing all-cause deaths — vaccinated versus unvaccinated — during the non-epidemic period informs us about the healthy vaccinee bias.

Numerous Covid vaccines have been used in Hungary and the authors analyzed them collectively and individually. I will examine only the Pfizer vaccine, which was most common there. That’s the mRNA vaccine that was used in vaccine effectiveness studies from Israel.

Look at this table, which I created from Table S1 (supplement) of the article. (Key data are often hidden in supplements.) Except for the distribution of sex (women are at lower risk of death), all other key variables indicate that those who received the Pfizer vaccine were sicker than unvaccinated.

Sicker, not healthier. And they were also older.

I have no explanation since the healthy vaccinee phenomenon seems universal. It is seen in the US, the UK, and Israel, for example.

At any rate, there is no “healthy vaccinee bias” in the group that received the Pfizer vaccine. Still, they corrected for a non-existing bias, by the two-step method, without citing a reference for the method and without explaining why it is a valid approach. Even their description of the correction is inaccurate. They write about “difference of hazard ratios”, when they compute ratios.

Here is my summary of their results.

Before any adjustment, the rate ratio of all-cause death was 0.845. Then, they adjusted for a set of variables (like those in my table above) and obtained a stronger “effect”: a risk ratio of 0.197. That was expected, since vaccinated were sicker. We have here a phenomenon called “negative confounding”, the opposite of the healthy vaccinee bias. Adjustment reveals a concealed effect.

So, even though there is no healthy vaccinee bias in the data, they proceeded to the next step. Remember, that step was used by the critic of the letter in The New England Journal of Medicine. Now, they get a “corrected” risk ratio of 0.51, accounting for healthy vaccinee bias that did not exist in the data…

That’s senseless math. Dividing two adjusted rate ratios produced a number that is uninterpretable.

To summarize, not only do we not have any rationale for the two-step method, it proved meaningless in the first empirical application.

To remove the healthy vaccinee bias we need to work with “crude” (unadjusted) rate ratios — not with adjusted rate ratios. The authors of the letter to the editor of The New England Journal of Medicine were right. The booster had no effect on Covid death. It is likely that that was the case for the original 2-dose series as well.

Added on August 7: Please read revised thoughts in Part 2.



Eyal Shahar

Professor Emeritus of Public Health (University of Arizona); MD (Tel-Aviv University, Israel); MPH, Epidemiology (University of Minnesota)