The Sensitivity trap

Sensitivity is the number of people who have a test that is positive that have the thing you are looking for.  In medicine, this is usually a disease.  We like sensitive tests in medicine.

The flip side of the same coin is the specificity of a test.  This is the ability of a test, if it is negative, to allow you to confidently assert that the patient does not have a disease.  We like specific tests in medicine too, we really like them in Emergency Medicine.

Both of these things can be defined numerically.  That can be helpful when thinking about the result, as essentially both the sensitivity and specificity of tests are closely related to the incidence of the thing you are looking for.

The more common a disease (the higher the incidence) the less false positives and negatives matter (as you get less of them).  Generally speaking as you calibrate a test if you push the sensitivity to 100% (fewer false negatives), the specificity falls (more false positives).

So what?

Well, when we study tests we define the population we are looking for really tightly, and exclude people who might not fit, or might generate confusing results (like women – seriously the data gap here is terrifying), or people we might not be able to follow up well.

We then use this published evidence to inform our search for diseases, and often we will not look back at how or why we are using the test again. 

Then the test starts getting used for more and more stuff, more and more tangentially, to the point we really have no idea how it is actually performing.  This is happening with pro-calcitonin right now, and happened with BNP before.

All very well you may say, that’s just academic vs ‘the real world’. There is always going to be indication creep, test biases and testing creep as the people that sell us the assay find more and more ‘useful’ uses for it.

But as we creep, we create a problem for ourselves as doctors in general, and our speciality in particular. This leads to poor care, overcrowding, and exhaustion.

This is the sensitivity trap, and I’m not sure we are talking about it.

We work in a regulatory, governance and legal environment that is pushing for, and demanding, close to 100% sensitivity.  We as a profession have bowed our heads, nodded, as this is absolutely the best thing for patients isn’t it?  We need to find everything.  Smarter doctors, more subtle findings, earlier diagnosis, so much better for patients.

But is it?

100% sensitivity is a wonderful goal, but for many clinical conditions it is impossible to attain.  Yet we are treating many tests as if they are 100% sensitive.  This has dangerous consequences for our systems.

If we aim for higher and higher sensitivity, our specificity drops.  This means we fail to rule out more and more patients, which means they need further, more invasive, tests.  These tests have risks for the patients themselves, but also have risks associated with the next patient, as we delay investigation when it is necessary because the infrastructure to manage 100% sensitivity medicine hasn’t been built (at least in this country –we could look to the US, to see where this thought experiment leads). 

As we fall into this cognitive trap, we assume that all false negatives are preventable, and related to poor clinical care, that the doctor, or nurse was stupid, that they didn’t notice a sign.  So we ratchet up the sensitivity yet further, we make more senior clinicians responsible for an ever high density of test interpretation decisions, or we create pathways which generate an ever decreasing amount of pathology, like an exhausted mining seam.  Just to pick up that 1 diagnosis a year, that we ‘missed’.

NICE performs relatively exhaustive technology assessments of new drugs or interventions to make sure they are cost effective.  Trusts and ICBs do not do the same when it comes to their pathways, is this SDEC working?  How many heart attacks has the rapid access chest pain clinic prevented?  Instead, governance systems, specialty interests, HM coroner and now the HSIB push institutions and clinicians to abandon specificity in the interests of sensitivity, because False negatives are seen as more important than False positives. 

This serves to flood your GP clinic, SDEC, ED, AMU, and 2 week wait clinic with more false positives, as we aren’t allowed to miss a single false negative.  This is overwhelming us, and it leads, I think to a change in clinical reasoning.

How many times have you heard the phrase “I cannot rule it out…”?

We have found ourselves in the position that if a constellation of symptoms or history could potentially fit with an unusual presentation of a clinical emergency (in particular) even if it is felt to be incredibly unlikely an onward referral is made, or further testing is arranged.

Yet at no point have we had a discussion about the infrastructure required by this approach.  Trusts, ICBs and organizations are discussing the utility of of 24 hour MRI lumbar spine testing because of this very problem.  Cauda Equina Syndrome is incredibly difficult to diagnose, but because of some exceptionally ill-informed case law, we must not miss a single case.  This results in many referrals a day for spinal services, which doesn’t result in a demonstrable increase in pick up of CES, or better management for people with lumbar back pain.  A bit like where we are with CTPA now (rates of PE are static, rates of CTPA especially in countries like the US are sky high with yields as low as 1-5%1).  We do a bit better in the UK with the RCR wanting us to manage > 15%2, we hit that (just!) where I am.

This is the tension between providing what we think of as gold standard care for an individual (because remember we don’t care about the harm of false positives) versus the good of the entire population.

We could, for example, trust a negative CT head for SAH in a patient that is GCS 15, and not admit them for confirmatory LP (SHED Study).  We could trust a single 3 hour high-sensitivity troponin at under 53, as specific enough to rule out an MI, or we could accept a less risk averse calibration and go for 40, how many MI’s would we miss?  Doing these things would reduce short stay admissions a little, reduce time in the department a little, and would we really miss that many MIs?  Would we miss any important SAHs?  Probably not, but we would have to turn to His Majesty’s Coroner, and our peers and go ‘this is good enough’. 

There is periodically a push for choosing wisely for tests and treatments that are of no benefit to patients to be stopped.  Do we need to think about choosing more wisely for systems too?

  1. https://www.ahajournals.org/doi/10.1161/CIRCOUTCOMES.119.005753
  2. https://www.rcr.ac.uk/career-development/audit-quality-improvement/auditlive-radiology/appropriateness-of-usage-of-computed-tomography-pulmonary-angiography-ctpa-investigation-of-suspected-pulmonary-embolism/
  3. https://pubmed.ncbi.nlm.nih.gov/24428678/

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.