Pestian Lab
Software Research and Development

Software Research and Development

Research shows that all neurological and psychiatric diseases are interconnected. For example, it is possible that someone might sustain traumatic brain injury and develops epilepsy which can cause depression or even suicidal ideation. We plan to deconstruct neuropsychiatric disorder into many data points and have machines figure out which data points are most relevant.

This would be a hopeless endeavor if each data point would have to be hand-typed by a clinician or a patient. Instead, we’ve created special software that takes unstructured text (written or spoken) and extracts from it discrete values. Examples of this include a machine that extracts emotions from suicide notes or epilepsy diagnostic test results derived from epilepsy progress notes.

Once this data is forwarded into a disease specific data mart, it is analyzed by clinicians to improve patient neuropsychiatric care.


Web based software that visualizes the relationships between medical concepts as an undirected graph. 

Medicine is founded on millions of specialized concepts. Even for an expert, relationships between those terms are not obvious. While doing a literature review it’s helpful to see relations between terms that we search. We developed the Unified Medical Language System Visualizer to display a graph representation of how concepts are related to each other in order to help better expand or focus a search.


Prospective suicide risk research web application 

We want to study psychiatric disorders by looking at what patients say. In order to gather the correct data we needed a universal set of questions that would categorize patients in different psychiatric disorders or even give us insight into terminally ill pediatric patients. We explored if such a set of questions existed that could be asked of a person regardless of age and disease. We did not find satisfactory questions sets so we developed the five ubiquitous questions.

Epilepsy is a confounding disease; at the surface you expect to capture electrophysiological reaction of a brain, but in depth it’s not always clear what is causing the seizure. Clinicians capture large volumes of data to accurately describe the patients. We built a web application to make the data easily accessible.

The application captures data in three different ways:

  • Manually entering data.
  • Importing the data from heterogonous sources.
  • Populating data through interpreting epilepsy progress note language.

Epilepsy data collection application.

It is rare that a human reader interprets every single word of a text presented to them. Usually it’s a small portion of text that makes a note worthwhile. Given that a large portion of our research relies on machines that understand natural language, we wanted to apply that ability to the correct interpretation of text such as suicide notes or epilepsy progress notes.

To achieve this, an annotation process was developed. We first asked clinical experts which text is valuable and what they would look for in the text. For example, specific emotions in suicide notes or diagnosis information in epilepsy progress notes. We then asked them to bind those two together.

Using the same method, the capture training data can be used for teaching a machine the same task. We have developed four separate web applications that use this annotation method. Our goal is to collect data for training an information extraction machine that will correctly identify markers for suicide or epilepsy.

The task of interpreting emotion from the writer of the suicide note is extremely difficult.  We asked people and machines to put themselves in the writer's shoes and tell us what emotion they see in that person. We knew that this would be a challenging task, so we proposed it to various international institutions. They were asked to create a machine that extracts emotions from text. A new data capture system was created which tracked downloaded data and machine performance.

This web site monitors challenge participants and tracks who downloads data and how often. It also allows participants to upload their results and score them.


Project animation.


Suicide Notes Data Set and Shared Task Definition

Emotions are subjective, as is their interpretation, leading to a great amount of variations in how people interpret emotions. We investigated whether machines could reach human being competency in spotting emotions in text. We also wanted to upgrade the mainstream sentiment analysis from simple binary positive-negative classification to multilevel classification with 6 positive, 7 negative and 2 neutral emotions. We learned that ensemble classifiers can indeed reach human competency which is roughly 60% accuracy.

Radiology Reports Data Set and Shared Task Definition

The installation of electronic medical records transfers all free text to structured, drop-down boxes. When Cincinnati Children’s started using EPIC software, paper forms were eliminated along with the free-text form fields. There is a great deal of knowledge in free text and, therefore, a need to figure out how to extract it. We are attempting to overcome these obstacles by using natural language processing (NLP). Specifically, we are focused on developing and implementing neuro-cognitive algorithms that enable computers to understand the concepts and semantic relationships within clinical text. We have developed a tool that anonymizes free text and have used this tool to create a radiology corpus to support NLP research. Our next steps include further annotating the existing corpus, developing a second corpus and using these corpora to train new, memory-based text processing algorithms.