We DO NOT Have Duplicate Records

Duplicate records have existed since the first use of electronic health record (EHR) systems. The healthcare information management (HIM) industry has struggled with this issue ever since and has yet to make real progress to prevent these data errors from recurring. Part of the reason is how the healthcare industry describes the problem of “duplicate” records.


The term duplicate medical records is a true misnomer for at least two reasons, as the term is both imprecise and, more importantly, factually incorrect. The use of “duplicates” when describing the circumstance where one patient has more than one medical record in one master patient index (MPI) is misleading as it obfuscates the real problem and thus makes it harder for the industry to effectively address the resultant patient safety risks and revenue cycle leaks.

The term “duplicate records” is a misnomer as they are neither duplicates and often there are more than two records

– Jim Hoover

More than Two Records

Patients often have more than two records in one MPI; these data errors would be more accurately called “triplets,” “quadruplets,” “quintuplets,” and so forth. A more precise term for these data errors might be “multiples” instead of duplicates or any other term indicating an exact number of records.

Unfortunately, the techniques, protocols, and algorithms used for “multiple” medical record remediation have not advanced much since the early days of EHRs. There is a palatable resistance to change as “we have always done it this way,” which is a common refrain in the HIM industry.

Many consultants, patient-matching vendors, and EHR systems only consider pairs of records when analyzing MPI data errors. Interestingly, most in the industry know patients can have three or more records, yet they continue to review only pairs of records. Old habits die hard, as most patient matching software and algorithms were designed only to handle duplicate pairs. Further, most consulting companies quote prices for MPI error remediation at a “per pair” rate. Thus, for all these reasons and others, the extent of multiples in MPI is not well-known across the industry.

However, The HIM industry should understand this aspect of MPIs more deeply and track the number of patients with more than two records. A deeper analysis may illuminate systemic issues regarding the cause of the duplicates, such as language translation issues or EHR tool limitations. Specific patient populations such as the homeless, behavioral health, or even aging patients may have trouble communicating their unique data to recognize them as previous patients correctly.

Data clustering is a well-known mathematical technique that can be used to group similar data and is very useful for MPI data error analysis, such as grouping patients with more than two records together. Clustering will help illuminate the reason for these multiple records per patient by allowing HIM professionals to explore the reasons for the clustering. Sometimes, the answer might be as simple as what clinic, department, or individual registered the patient. Clustering multiple records and examining the common factors will help reduce the creation of future multiple-record data errors.

Incomplete Records

Second, even if there are only two medical records per patient, the term “duplicate” inaccurately describes the data error. According to Oxford Languages, the publisher of the Oxford English Dictionary (Oxford, n.d.), a duplicate is “exactly like something else, especially through having been copied.” This is not the state of “duplicate” medical records. Instead, we have “partial records.” Again, we can look to Oxford for the definition, “partial: existing only in part; incomplete.”

English is very precise, and using a more accurate word can help better understand the problem. In the case of “duplicate” records, we do not have two exact copies of one record. Instead, “duplicate” medical records are better described as incomplete or partial records. The fact that a clinician only reviews a partial record creates a danger for patients as their entire medical or episode history is unknown, with only partial records available. Partial records also impact revenue cycle, whereas providers may order redundant tests because clinicians might be unaware of earlier test results in another medical record that the clinician has not reviewed during the encounter.

Accurately Describing the Problem Helps in Finding Solutions

The term “multiple medical records” is more accurate than “duplicate medical records” to describe the MPI error condition where one patient has more than one medical record. The term “duplicates” obfuscates the MPI error state, whereas the terms “partial” or “incomplete” medical records better connote the state of the record being reviewed. That is, the record is missing information. The missing patient information is in another medical record within the same MPI, and that data could be critical to a patient’s safety or contain previous test results. Further, ensuring that patients do not have multiple records will reduce the need for unnecessary tests and other expenditures for which payers will not reimburse providers.

The HIM industry needs to move beyond only doing pair-wise comparisons. HIM professionals doing remediation for multiple-patient records should update their software and analysis techniques to solve the problem more efficiently. Various clustering and other techniques will help HIM professionals advance the science of patient matching.

Scroll to Top