Activity data vs clinical data extraction from the EMR

Premise

What is the difference between exposing clinical data and the underlying activity data in an Electronic Medical Record (EMR)? Why does Change Tracking data help?

Background

Medical Education Research and clinical outcomes

Use of Workplace Tools

Exposing Patient Data

Cross-Platform Integration

Activity tracking in the EMR

Change Tracking

History of Change Tracking

Accessibility of CT data

What does CT data say?

Why use a Learning Record Store?

Activity metrics for education

What can an LRS do?

Why not just store the activity data in-house?

What do you mean by unstructured data?

Process mining and data complexity

Data cleaning and CT data

Synonyms and semantic indexing

Background

Medical Education Research and clinical outcomes

There are many challenges in medical education research but chief among these is the lack of connection to clinical outcomes or changes in health workforce behaviours. (Magraw, Fox, & Weston, 1978; Whitcomb, 2002) Educational interventions, despite considerable variety and innovation, and despite being based on useful educational and learning theories, rarely are measured in terms of behaviour change in either the patient or the healthcare provider being studied. Studies that do extend to health system changes or patient outcomes, even if secondary measures, rarely are based on interventions that have sound educational approaches.

This gulf has been commented upon but has been difficult to address. Outcome measures, when available, often appear to be based on patient or learner satisfaction. While we like to have happier patients and learners, it would be better if they were healthier.

Use of Workplace Tools

In addition to this, we are seeing survey fatigue. None of us is particularly keen to complete yet more questionnaires or feedback forms. We need to make better use of data that is inherent in our workplace tools. Most of what we do is captured by a computer system somewhere. For the clinical workplace, the obvious workplace tool is the Electronic Medical Record (EMR). (See Glossary for similar terms)

Using the EMR to examine patient progress and outcomes is not a new endeavour. CPCSSN is a notable pan-Canadian primary care endeavour to explore some of these issues. However, this and most other efforts to extract data from the EMR have focused on the clinical content. This approach faces the challenges of data coding vs natural language text entry and interpretation, but significant progress has been made in these areas.

Exposing Patient Data

Examining such clinical data often raises concerns about security and patient confidentiality. EMR and health system database vendors are appropriately sensitive about exposing the contents of their databases. REBs are tentative about the release of such data, even in aggregate form.

In particular, there is also a gulf between educational and clinical systems. There are many barriers to integrating data across these systems. Indeed, even at their roots, the prevailing attitudes of network security teams at educational institutions (sharing, access, openness, collaboration) are at odds with the attitudes at healthcare institutions (privacy, confidentiality, controlled access, audit trails). Both groups are right but often struggle to view the problems from the perspective of “the other side”.

Cross-Platform Integration

In educational research, the challenge of integrating data across multiple platforms is not new. In the past, there have been notable attempts to address this, such as SCORM and IMS-LTI. (Dodds, 2004; “IMS Global: Learning Tools Interoperability (LTI),” 2012; Topps, Ellaway, & Greer, 2019) With recent advances in AI, machine learning and big data analytics, we now have huge data storage capacity and powerful cognitive computing platforms to help us analyze our educational data. But what is becoming clear is that past data capture in Learning Management Systems (LMSs) has mostly focused on administrative needs: did the student show up for the course, and did they complete it (whatever that means). There has been little record of what the student, or teacher, did within those systems. Activity metrics and the Learning Records Store are a promising new approach, capturing much richer data about what people actually do, rather than what they say they do, or their teachers say they do.

There has been a similar challenge in EMR and health system data. Much of what is captured relates to administrative needs or coded diagnostic assumptions (which are mostly entered for billing purposes, and poorly correlated with actual patient presentations). Even the prevalent coding system, ICD9, was originally designed by pathologists for their purposes, and has notable gaps and redundancies. Projects like the Alberta Physician Learning Program have often struggled because the data items that they would like to examine are not captured in any of the multiple health system databases. Projects, like CPCSSN, have tried to explore questions like how long does the resident spend with the patient, in various clinical encounters? But these factors have not been accessible to their data analysts.

Activity tracking in the EMR

There is a potential solution to the above challenges. Everything you do in an EMR is tracked. Each task, each prescription, each visit is documented, with “who did what when”. Such audit trails and monitoring are intrinsic to all healthcare data systems. Anyone who doubts this should simply go and access their own records on the EMR. This is a proscribed activity in healthcare systems. But this is not just a guideline or admonition from administrative services. If you do this, you will receive an awkward phone call within a few days asking about such access. Such professionalism breaches are quite common, and a source of concern to data security monitoring teams. So, how do they know?

Change Tracking

Enterprise-level databases, such as EMRs, are required to track “who did what when”. This is sometimes known as an audit trail, or in-stream non-invasive audit trail. The term “in-stream” refers to the fact that the database inherently tracks this; it is not something that the database administrator or architect has to consider in the design of their data schema. The term “non-invasive” partly refers to the fact that this runs in the background, not getting the way of regular database operations; but it also refers to the fact that it cannot be edited by a user or administrator. It is a true record that cannot be modified after the fact.

In Microsoft SQL Server, there are two mechanisms that support this: SQL Server Change Tracking and SQL Server Change Data Capture. The latter is more complete and includes the actual changes in content. For the purposes of activity metrics, the simpler approach of Change Tracking provides the record of “who did what when” that we need.

History of Change Tracking

For those who are interested in why this is widely implemented, we have to thank Enron! As a result of the corporate malfeasance demonstrated in vast quantities by the company and their auditors, the Sarbanes-Oxley (SOX) Act was implemented in the USA. (DeFond & Francis, 2005) This requires that all enterprise-level databases support change tracking and inviolable audit trails. SOX itself is not Canadian but our databases were built in the USA — we benefit from this regulation. We have had Change Tracking since 2003 but have made little use of it.

Accessibility of CT data

Although Change Tracking (CT) data is automatically captured by the database engine, it is not generally visible to database analysts. For example, when exploring with CPCSSN and PLP some of the questions around work processes and “who did what when”, our researchers were usually told that such data is not available.

As noted earlier, health system administrators are appropriately averse to providing access to their data stores. CT data requires a deeper level of access than is usually needed, and certainly forbidden to educational researchers. We need to find a mechanism that exposes such CT data, without compromising patient confidentiality.

What does CT data say?

In essence, CT data is quite simple. It states that this actor made a change to this data record at this time on this date: “who did what when”. It does not even generally record what the change in data was. To find that out, you need to access the more complex Change Data Capture service, mentioned above. In Activity Metrics for educational purposes, this can work to our advantage: we can examine workflows and perform process mining operations, without getting into the sensitive clinical content of the EMR.

Why use a Learning Record Store?

Activity metrics for education

Activity streams are central to big data analytics. Companies like Google and Amazon have learned that customer surveys and questionnaires are much less informative than simply watching what consumers do. This is a multi-billion dollar industry.

In education, the current most promising approach to capturing activity metrics is via the Learning Records Store (LRS). How and why you should use an LRS is well documented elsewhere. (Kitto, Cross, Waters, & Lupton, 2015; Lindert & Su, 2016; Mueller, Dikke, & Dahrendorf, 2014) Previous approaches to integrating learner data across systems, using SCORM and similar mechanisms, have been much heralded but have not seen much actual use across systems. Pretty much any educational information system says that it supports SCORM. Very few institutions have actually implemented this because it is really quite difficult and demanding.

Statements regarding an activity performed by a learner are transmitted to the LRS using the Experience (a.k.a Tin Can) Application Programming Interface (xAPI). (Advanced Distributed Learning (ADL), 2014) The structure of an xAPI statement can be summarized as Actor-Verb-Object or “Bob Did This”.

It will not have escaped you as to how similar this is to the “who did what when” that is captured in the Change Tracking (CT) described above.

What can an LRS do?

This is also better described elsewhere (Downes, 2015) but there are a number of things that an LRS is good at, which are relevant to our needs:

High capacity (volume)
Absorb data at high rates (velocity)
Accept data from multiple systems at once (variety)
Accept data in a relatively unstructured format (variability)
Secure data transmission end to end
Secure federated search
Secure action triggers

Some will notice that these properties are similar to those advocated in V’s of big data analytics. (Kobielus, 2013)

Why not just store the activity data in-house?

For each database system, the simplest approach to ensuring security and access control is to store the data “in-house”, using the same controls as for the regular data. Indeed, many systems do just that, and the CT data described above is doing exactly that.

But if you want to share such data with other systems, how do you make it accessible? The common approach of exporting a data dump to an Excel or CSV file might be fine for small projects or one-time analysis. However, can you be sure how the data will be stored and protected? And if you want to do this again, the redundancies are often prohibitive.

The LRS can act as an intermediary, neutral yet secure data store. Using a common and easily modifiable set of protocols, as provided by the xAPI specification, (Advanced Distributed Learning (ADL), 2014) it is relatively easy to adapt current systems to provide their CT data in a format that can be understood by the LRS. This also opens up the variety of analytics and visualization tools that can be applied to the activity stream data.

And yet, by storing the activity data outside the EMR system, you can make it more accessible, but keep the more confidential content and patient data securely behind its firewalls and access control layers.

What do you mean by unstructured data?

As mentioned above, the LRS has an architecture which is designed to handle relatively unstructured or semi-structured data. While it is not a requirement for the underlying database to be NoSQL, the JSON statement syntax is oriented towards that premise.

The differences between SQL vs noSQL databases are well described elsewhere. (Adkins, 2018; Ronk, 2014) For our purposes, relational databases need you to anticipate the relations and the questions you want to ask of your data. In contrast, NoSQL appears inefficient at first glance but offers faster data throughput, with more flexible queries that do not pre-suppose existing relationships.

Process mining and data complexity

The simple data structures that are inherent to activity statements and CT data have many advantages in data storage and retrieval. This belies the complexity of the activities and context that surrounds the generation of this data. In a clinic, healthcare providers are often juggling multiple priorities and workflows in trying to address the problems of their patients. Processes are rarely linear or repeatable or consistent.

This makes process mining particularly challenging. Attempts to streamline or make healthcare delivery more efficient have often relied on simple measures or interventions. Trying to assess whether such interventions make a difference is usually confounded by external factors. Sounds just like educational research.

Pouring more data into the mix just sounds like a recipe for disaster. And such complexity often overwhelms human capacity for sense-making. (Cukier, 2014) This is where the approaches made available by big data analytics, machine learning and AI come to the fore. There are many examples of where these new approaches have made connections, where none were suspected, and have afforded some insights that have led to practical solutions or a deeper dive into the underlying factors at play. (Cukier, 2014)

Data cleaning and CT data

One of the biggest challenges that arise from data integration across systems, and in big data analytics generally is the classic problem of cleaning the data. The approaches used are more robust and tolerant of noise in the data streams than traditional database analytics but it remains a problem.

In the approaches discussed here, we have a huge advantage: humans do not input the data. They might generate the data by what they do, but the systems themselves create the data statements, both with CT data and activity metrics. Because these data streams are “non-invasive”, this also helps to ensure that the data is more valid (another big data V).

Synonyms and semantic indexing

In pulling data from multiple systems into an LRS, using a very flexible protocol such as xAPI, is the risk of subtle differences in what is meant by similar statement structures from different systems that use a different glossary of terms.

Having predefined xAPI Profiles, which describe what certain terms mean in a particular context or system, does help to mitigate these challenges. However, there is a further refinement that can be applied. The triplet structure of xAPI statements, Actor-Verb-Object, is similar (by design) to the Subject-Verb-Predicate structure of the RDF triplets that are typical of semantic indexing. The two constructs merge well and there are efforts afoot to apply RDF semantic indexing directly to xAPI Profiles. Some LRSs already have semantic indexing incorporated into their architecture. This provides easier disambiguation of conflicting terminology between linked systems.

Summary

To summarize the key points afforded by these tools and approaches to extracting activity data from the EMR:

The data is already there
Change Tracking (CT) data is automatically generated
CT data is more valid than human data entry
CT data and activity metrics have the same simple structures
The LRS is well designed to accommodate external data
Security mechanisms are robust throughout
Confidential content remains at home in the EMR
Complex processes submit to AI and big data approaches

References

Adkins, C. (2018). MySQL vs MongoDB. Retrieved October 19, 2019, from https://www.upguard.com/articles/mysql-vs-mongodb

Advanced Distributed Learning (ADL). (2014). Experience API v1.0.1. Retrieved June 13, 2019, from https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md

Cukier, K. (2014). Big data is better data | TED Talk. TED.com. Retrieved from https://www.ted.com/talks/kenneth_cukier_big_data_is_better_data?language=en

DeFond, M. L., & Francis, J. R. (2005). Audit Research after Sarbanes‐Oxley. AUDITING: A Journal of Practice & Theory, 24(s-1), 5–30. https://doi.org/10.2308/aud.2005.24.s-1.5

Dodds, P. (2004). SCORM Overview. Retrieved from https://www.adlnet.gov/adl-research/scorm/

Downes, A. (2015). Learning Record Store – Tin Can API. Retrieved May 29, 2017, from http://tincanapi.com/learning-record-store/

IMS Global: Learning Tools Interoperability (LTI). (2012). Retrieved July 16, 2013, from http://www.webcitation.org/6arGC9Thy

Kitto, K., Cross, S., Waters, Z., & Lupton, M. (2015). Learning analytics beyond the LMS. Proceedings of the Fifth International Conference on Learning Analytics And Knowledge – LAK ’15, 11–15. https://doi.org/10.1145/2723576.2723627

Kobielus, J. (2013). The Four V’s of Big Data. Retrieved June 18, 2016, from http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Lindert, L., & Su, B. (2016). The Evolution of SCORM to Tin Can API: Implications for Instructional Design. Educational Technology. Educational Technology Publications, Inc. https://doi.org/10.2307/44430478

Magraw, R. M., Fox, D. M., & Weston, J. L. (1978). Health professions education and public policy: a research agenda. Journal of Medical Education, 53(7), 539–546. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/671495

Mueller, N., Dikke, D., & Dahrendorf, D. (2014). Experience API vs SCORM – How xAPI Benefits Technology-Enhanced Learning. EDULEARN14 Proceedings, 1276–1284.

Ronk, J. (2014). Structured, semi structured and unstructured data | Jeremy Ronk. Retrieved October 19, 2019, from https://jeremyronk.wordpress.com/2014/09/01/structured-semi-structured-and-unstructured-data/

Topps, D., Ellaway, R., & Greer, G. (2019). PiHPES Project: broadened perspectives on learning analytics. Calgary. https://doi.org/https://doi.org/10.5683/SP2/VDGGG3

Whitcomb, M. E. (2002). Research in medical education: what do we know about the link between what doctors are taught and what they do? Academic Medicine, 77(11), 1067–1068. Retrieved from http://www2.tulane.edu/som/ome/upload/What-do-we-know-the-link-between-What-Doctors-are-Taught-and-What-they-do.pdf

Examples

These are important principles but it is sometimes easier to relate to these when you apply them to specific examples.

QuRE and Process Mining: https://olab.ca/qure-and-process-mining/

PiHPES Project xAPI EMR Profile:

See https://olab.ca/xapi-emr-profile/ for more info about xAPI in this context.

Use Cases

Patient Complexity: academic teaching clinics often have lower patient throughput and higher staff ratios than private clinics. Is this solely based on the need to teach while providing patient care? Teaching clinics often claim to carry more complex patient loads. Complicated patients offer more opportunity and challenge for the learner. But how do we measure such complexity? This has financial implications. AHS pays more for patients with a ‘complex care code’. But you have to document that the patient is complex. This is meta-documentation: documenting the documented. The EMR already intrinsically “knows” that the patient is more complex and generates more activity. Rather than relying on further human input, why not just ask the system which patients create more workplace activity? And if you can use the rich data streams from CT data, you can start to explore more finely whether the additional activities are useful, without burdening the staff with more forms (which becomes a circuitous process).
We know that underperforming healthcare providers tend to fall behind on routine tasks such as chart sign-off, consult generation, hospital discharge summaries. This is variable in pattern and, as with other studies of Conscientiousness Index related behaviours, is highly variable over time. Most of us have left tasks uncompleted if we have to rush off to a family commitment. But trends over time are predictive. They can be indicative of overwork, burnout, stress, substance abuse, boredom. The most important factor is early detection, especially in learners who rotate through clinical situations frequently. Underperformers are usually picked up late: remedial rotations are mostly in the last quarter of the curriculum; poor practice is more often detected by patient safety incidents, rather than through early warning systems. Imagine if a variety of activity trends can be easily dashboarded. The concept is not new but the monitors tend to focus on activities that are easy to measure with a single marker, rather than looking for processes that are better indicators of significant performance. Being able to combine activity streams from several workplace processes across the EMR gives better power and sensitivity to such monitoring systems.
Healthcare provision is a team effort. A team can be very busy and this is usually regarded as a good thing. But does this just mean they are inefficient, or plagued by inefficient processes? Asking them has been the usual approach to data collection but this adds more work to an already busy team, and is fraught with biases. Remember also that many teams are actually dynamic or virtual, defined by their roles and the context of the task, rather than by who is playing the role. And, of course, just as with a sports team, the team performance can be radically altered by changing the player in a single position. Being able to extract CT or activity stream data easily, based on actual workflow activities rather than self-reporting, provides many opportunities for quality and process improvement, or avoidance of patient safety issues.

Glossary

In this document, we have mostly referred to the Electronic Medical Record or EMR. There are several other terms, which some groups regard as synonymous and others strongly argue their differences. For general topic at hand, using Change Tracking and activity metrics data from these workplace tools, they can be regarded as synonymous because the underlying principles and techniques apply to the vast majority of these tools.

They all use similar enterprise-level SQL database engines and all have in-stream non-invasive audit trails, as noted in the Sarbanes-Oxley reference above. The term Change Tracking is somewhat Microsoft-centric but all of these clinical workplace tools have their equivalents.

We recognize that there are differences between EMRs, EHRs, hospital information systems etc and we provide this short glossary for reference:

CPOE: Computerized Physician Order Entry, usually hospital and doctor specific
EHR: Electronic Health Record – a broader term, not discipline specific, often of broad geographic and system scope
EMR: Electronic Medical Record – an information system found generally in outpatient clinics that are physician focused
PHR: Personal Health Record – an information system whose custodian is the patient, providing personal control over what information is contained and who has access to it

There are other glossaries out there, which provide similar descriptions. Canada Health Infoway offers this one: https://www.infoway-inforoute.ca/en/what-we-do/blog/digital-health-records/7017-emr-ehr-and-phr-and-now-aemr-and-h-his-what-s-with-these-systems

Premise

Contents

Background

Medical Education Research and clinical outcomes

Use of Workplace Tools

Exposing Patient Data

Cross-Platform Integration

Activity tracking in the EMR

Change Tracking

History of Change Tracking

Accessibility of CT data

What does CT data say?

Why use a Learning Record Store?

Activity metrics for education

What can an LRS do?

Why not just store the activity data in-house?

What do you mean by unstructured data?

Process mining and data complexity

Data cleaning and CT data

Synonyms and semantic indexing

Summary

References

Examples

Use Cases

Glossary