Problem

The general standard of physician in Canada is pretty good… but we are not perfect. While AFMC proudly assumes that 99.5% of those admitted to medical schools will be good physicians, our experience as senior teachers and assessors of trainees leaves us with a more jaundiced view. For some of us, nearing retirement and looking at the prospect of increasing reliance on good healthcare as we age, this becomes worrying. It is more of a challenge for the Medical Regulatory Authorities (MRAs) who are charged with protecting the public from poorly performing physicians.

Gap

This is not a new issue but the challenge of longitudinal performance assessment in physicians remains largely unsolved. In particular, it is important to recognize that this should not be a reassurance program for the general public that the general physician is ‘a good chap’, a factor that has limited the effectiveness of some previous measures such as CPSA’s PAR program. We need to look at selecting out those marginal physicians who are not good enough.

Partly, this is a measurement problem: collecting the right performance metrics (which is not always that easy, with unintended consequences arising from such measures, as demonstrated by some of the NHS initiatives in the UK); and deciding where to set the bar – what is ‘good enough’.

Some of the new initiatives by MRAs are taking a more holistic look at this problem, using practice-based data analytics to get some sense of which physicians perform well, and which less so. There is a concern that these initiatives are once again taking the stance of demonstrating that the majority of physicians are good. This has a function but does not address the marginal performer.

As with all systems that depend on customer surveys and secondary performance metrics, there is a tendency to ‘game the system’. Physicians are smart people and when subjected to such measures, they will adapt. This adaptation lessens the effectiveness of such measures, again as demonstrated by various NHS quality initiatives.

The commercial world long ago recognized that customer surveys tell you what the customer thinks you want to know. There is a reason why customer tracking programs such as Air Miles, branded as customer loyalty, but in truth customer-monitoring programs, have become so popular with vendors. Their impact on loyalty is trivial. But the value in tracking consumer behaviours is huge: it has been estimated that this consumer data is worth approximately $1200 per person to Google, and yet costs only $3 per year to collect – a massive markup in value.

Part of what has made this data so effective to large corporations is the network effect: looking at how various behaviours and linkages are connected in observational data. No matter how much the research positivists trot out that tired dictum ‘correlation is not causation’, these corporations have recognized the corollary that ‘without correlation, there is no causation’, and instead focused on analyzing such correlations with a much lower degree of certainty, but whose sheer volume remains persuasive.

A toolset that has become enormously popular in analyzing such network effects is founded on the graph database. We have taken for granted the old maxim of six degrees of separation, based on Milgram’s postcard study from the 60s, but Facebook has demonstrated through its vast graph database of account connections, Friends, Likes etc, that this number has now dropped below 4.

Who we connect with, how we interact, where we go, what we do, has been a powerful source of information to these large corporations. And yet we in the academic and regulatory world have largely ignored this.

Hook

Such data has been shown to be predictive of a wide range of behaviors. The degree of certainty is low and this is important to recognize, lest we follow the unfortunate pattern associations generated by some USA security agencies in the zeal to pursue malfeasance. But these are sensitive markers.

We already know that certain markers of maladaptive behaviours, as expressed in various studies around the Conscientiousness Index , are both easy to gather and are sensitive markers. The MRAs are also well aware that marginally performing physicians have a tendency to be migratory. This is magnified by the difficulties in tracking marginal behaviours across multiple jurisdictions. Privacy concerns are oft cited but, surely, the public good is of greater importance in detecting such behaviors.

For those who are senior teachers, and who may be wondering about this migratory tendency, we encourage you to think back on some of the troubled learners you have encountered over the years and Google them. For our part, it has been illuminating to find that such migratory tendencies are quite striking.

How then could we use such information? Simply being migratory is not in itself necessarily a bad thing. Those in very senior positions have often gained their experience through just such migrations between different medical schools, provinces and practice environments. We need to look beyond just ‘moving around lots’.

What factors might be considered in such a Vagrancy Index? Solo practice is a known indicator and already watched quietly by MRAs. Are there other homophilies that can be detected? Birds of a feather and all that. Isolated practice, disconnection from teaching networks, narcotic prescribing patterns have all been explored. But there are many other possible markers: perhaps referral patterns, investigation patterns, record keeping habits. In such an endeavor, it is notable that exploring such connections in a graph often brings to light a greater number of unanticipated factors .

Graph databases and graph neural networks are powerful tools for exploring such associations. As noted above, while this may be a sensitive approach (in more ways than one), there will also be a lot of noise and false positives. But it is a powerful starting point. The large corporations will certainly have such data but it is unlikely that they would be willing to share, without strong assurances, and even then will likely regard their data as proprietary.

But what are the alternatives? We have largely demonstrated over the past three decades that our usual measures (customer surveys, board exams, QI performance metrics) have little effect on detecting such marginal performance.

It is also important to consider the effort and invasiveness required by such QI programs. We are all suffering from survey fatigue. But as noted by the customer loyalty programs, we seem blithely willing to hand over our purchasing data and privacy for a 1% discount (the effective value of most programs) and convenience, despite the warning of GDPR etc. The public have spoken: convenience is everything. So let’s look at convenient ways of collecting performance data.