The majority of electronic clinical documentation is stored as “free text” rather than as structured, coded data. An advantage of free text is that it gives clinical authors autonomy in expressing their thoughts. The variety of ways used to express information in text means that although this data is rich and descriptive, it is locked away, unable to be used in computerized research and decision support. To leverage text data, we employ a variety of natural language processing (NLP) methods to extract concepts, context, and relationships found in narrative text.
While NLP is not a “solved” science, there are many tasks that NLP can do very reliably. Extracting concepts (symptoms, diseases, medications) and values (ejection fracture value, lab values, vital signs) that are stored in the text. More complex tasks, such as determining what caused an event of interest or why a patient discontinued a medication can be conducted to answer specific study questions.
Anolinx uses advanced NLP technology. In our projects, specific NLP tools are developed, tested and optimized for accuracy and reliability, according to the definitions for the cohort criteria and/or outcomes of interest.
Use of NLP in our projects often includes:
- (a) defining each symptom/outcome in detail by a qualified clinician specialist and NLP expert
- (b) developing the NLP tools based on the detailed definitions
- (c) training the NLP tools using information from the clinical notes
- (d) testing the NLP tools for accuracy & reliability against a manually annotated sample of the clinical notes
The gold standard for validation of the NLP tools is manual chart review. This process results in a validated tool to accurately and reliably identify patients and outcomes of interest.