The Referent Tracking PARADIGM
The data integration problem
Current efforts to achieve data integration across the bioinformatics/clinical divide rest on approaches operating at single levels of abstraction on data stored in heterogeneous and distributed data sources. A global cross-level unifying approach is still lacking.
The traditional federated database approach seeks application-level integration on the basis of the assumption that databases can easily be wrapped using some form of middleware technology. This, however, does not guarantee semantic integration unless additional measures are taken. Other approaches seek to integrate information at the raw data level using pointer mechanisms from one database to another, or simply by creating - via mere accumulation - one big data warehouse out of several smaller ones.
What, however, is characteristic for all these approaches is:
-
the failure to realize integration between data at different levels of abstraction and granularity (e.g. linking databases containing raw instance-based bio-science data to a general class-based resource for example linking DNA data in a specific patient's electronic health record (EHR) to the human genome database), and
-
the failure to provide access to the enormous amount of bio-information available in natural language documents, i.e. in a non-structured format, in relation to which traditional database mapping techniques do not work.
The EHR reference problem
EHRs consist primarily of descriptions of a patient's medical condition, the treatments administered, and the outcomes obtained. These descriptions are about concrete entities in reality such as the particular pain that the patient experienced in his chest on this specific day; or about the particular pacemaker that was implanted. The descriptions contained in current EHRs contain very few explicit references to such entities but primarily expressions in generic terms that are either in natural language or taken from terminologies or ontologies.
This has some obvious consequences. When a patient suffers from the same type of disease and exhibits the same kinds of symptoms on two successive occasions, then the descriptions of these conditions using codes from a terminology will be identical. When another patient suffers from the same type of disease and exhibits similar symptoms in his turn, then the resulting descriptions will also be identical to those relating to the first patient. But note that one cannot assume that if the same code is used in two such records then they refer to two distinct entities. When a fracture code is used in relation to two distinct patients, then numerically different fractures are involved. But when a code is used to provide the additional information that the fractures are due to an accident in a swimming pool, the very same, probably dangerous, swimming pool might be involved.
One also cannot assume that if two different codes are used in an EHR then they refer to different entities. It may be that the most specific or detailed code is not always used when the same entity is referred to on successive occasions. A colon polyp might simply be referred to as intestinal polyp, or just polyp, and thus associated on successive occasions with different codes. It might also be that the polyp has become malignant, and then it will be assigned the code for malignant neoplasm of colon. Clearly, the relevant entity, i.e. the polyp, underwent changes. But it is still the same entity: its identity did not change. A third reason why different general codes may not automatically be taken to refer to different particular instances turns on the fact that a code may not suffice to describe a given instance appropriately. If, for example, one wants to use SNOMED-CT (v0301) to code a closed pedicular fracture of the fifth cervical vertebra then a single code is not available; to give a faithful description one must combine several codes. If, however, these codes are not entered in the EHR in such a way that it is clear that they refer to the same particular entity, then their presence might be taken incorrectly to refer to two different fractures.
Clarity through Referent Tracking
Referent Tracking is a paradigm under which it is possible to refer explicitly to all of the concrete individual entities relevant to the accurate description of each patient's condition, therapies, and outcomes through the assignment of unique identifiers. Such an identifier is called a IUI, for Instance Unique Identifier. This means that not only does the patient receive a IUI, but so also does the particular fracture he is suffering from, the particular bone that is fractured, and even, if the clinician finds this important, the particular pain the patient is experiencing in a certain time period or the particular document in which the pain is first recorded.
IUIs refer to the real entities themselves out there in reality, and not to data about these entities. IUIs are the means whereby those constellations of particular entities in reality that are relevant to clinical care can be represented in an EHR in the same direct way in which the corresponding classes are already represented by means of clinical coding systems. Thus IUIs are also not the entities themselves. This might seem obvious, but use-mention confusions ('Swimming is healthy and contains eight letters') - in which an entity in reality and its digital representation are confounded together - are abundantly present in the literature on knowledge representation in general and on concept-based terminology systems in particular.
The referent tracking paradigm distinguishes between IUI assignment, which is possible only in relation to entities that exist or have existed in the past, and IUI reservation, which is a provision made for entities, such as an X-ray ordered for tomorrow, that are expected to come into existence in the future. The order itself can have a IUI assigned already today, but for the resultant image one can only reserve a IUI at the time of ordering.
IUI assignment or reservation does not by itself entail any assertion as to the class (or, since we take a position grounded in realism as a philosophical theory, the universal) of which the particular in question is an instance. Thus we might assign a IUI to some syndrome of a given patient before we have any clear idea what sort of syndrome it is with which we are dealing. This facility, too, has no analogue in code-based EHR systems as currently constituted and that resort to invent new classes such as 'unknown syndrome'.
Referent Tracking Systems
Part of the RTU's activities consist in implementing the Referent Tracking paradigm such that it can be put in practice as a software system.
The following requirements are addressed to give the paradigm of referent tracking concrete form:
- a mechanism for generating IUIs that are guaranteed to be unique strings;
- a procedure for deciding what particular entities should receive IUIs;
- protocols for determining whether or not a particular has already been assigned a IUI ;
- practices governing the use of IUIs in the EHR and in clinical documentation and research in general (issues concerning the syntax and semantics of statements containing IUIs);
- methods for determining the truth values of propositions that are expressed through descriptions in which IUIs are used;
- methods for correcting errors in the assignment of IUIs, and for investigating the results of assigning alternative IUIs to problematic cases;
- methods for taking account of changes in the reality to which IUIs get assigned, for example when particulars merge or split.
Expected benefits
Our hypothesis is that, once the right infrastructure is in place, the burden on clinicians and nurses (or on whomever is assigned the task of registering patient data) will be not significantly greater than under existing strategies for data entry - but that the benefits, in terms of semantic interoperability of computer systems and also in terms of patient management, cost containment, epidemiology and disease control, as well as for the advance of science in the domain of biomedicine, can be enormous.
Referencing instead of coding: a scenario
John, a professional dancer, suffers since a few days from a very disturbing pain in his left foot. It started with some discomfort while dancing, but evolved gradually in constant pain, even at rest. So he decides to visit that modern hospital that installed one of the new fancy EHR systems (EHRS) that is permanently connected to a nationwide referent tracking system (RTS). That system allows careful and explicit reference to all his various problems and to the different kinds of entities associated therewith.
Because of his story about what happened to his foot, and because of the pain the two attending physicians are able to induce by palpating his forefoot, both agree that there is something wrong. That "something wrong" is given the IUI #234, a meaningless consecutive number assigned automatically by the RTS (and guaranteed to be unique according to some algorithm). The system at the same time also generates two statements, recording the assignment of #234 to that particular by each of the two physicians. These statements enjoy a high degree of positive evidence, since the referent-tracking database allows automatic checking to verify the absence of prior existing disorders of which John's current problem might have been a continuation. It did find referent #15 for the left first metatarsal base fracture that he suffered from two years ago, but this - as witnessed by the X-ray image #98 taken half a year after the initial diagnosis - had since ceased to exist. The physicians also have good evidence that the referent-tracking database is complete in all relevant respects, since they knew that John never sought treatment elsewhere. The physicians' statements concerning the assignment are each time-stamped both for occasion of utterance and for point of appearance in the EHRS. Note that these time-stamps do not necessarily imply assertions about when #234 itself began to exist. Also, at this stage, no statement is made about which universal disorder #234 is an instance of.
The physicians order, and receive a few hours later, three X-ray photographs taken of John's foot from different angles. They both look at the first (identified by the EHRS as #235 and stated to be an instance of the universal referred to by SNOMED-CT as "257444003: photograph"), but they see nothing abnormal. Of course, they see an image of John's left first metatarsal bone, this image being identified as #286 (they do not bother to look for a SNOMED-CT code for such an image, knowing by experience that they would find nothing that comes close). They are at the same time aware that entity #286 is clearly different from entity #221, which is John's left first metatarsal bone itself, and which they declare to be (i) an instance of the universal referred to by the SNOMED-CT concept "182121005: entire first metatarsal", further annotated with the side-modifier "left", and (ii) a part of #2 which is John.
On the second photograph (#236), both see a thin hypodense line appearing towards the top of John's left first metatarsal bone. They assign that line in the image the label #287, and both state it to be the image of some corresponding particular #288, thereby agreeing on the existence of #288 but disagreeing as to what universal it is an instance of - the one seeing it as a fracture line, the other as just a normal part of the bone somewhat less dense than the surrounding bony material. They agree, however, that #287 is not a radiographic artifact, i.e. that it does indeed correspond to something in John's body. On the third photograph (#237), both see a clear fracture line, indisputably an image of a real fracture and identical with particular #288. They thereupon assert that #234, i.e. the "something wrong" previously identified, is in fact an instance of the universal: left first metatarsal base fracture.