A breakthrough in machine language learning provides early identification of suicidal behavior. But moving such a product beyond the lab is no simple task.

by Tom O’Neill

John Pestian, PhD, MBA.

John Pestian, PhD, MBA, led a team that converted the world’s largest known collection of suicide notes into a data set to begin teaching a machine learning program how to recognize signs of suicidal thought in a person’s speech, voice and expression. 

John Pestian, PhD, MBA, Tracy Glauser, MD, and Michael Sorter, MD, have thousands of reasons to feel a profound sense of urgency in getting their latest innovation into the hands of as many caregivers as possible.

In 2016 alone, nearly 45,000 Americans took their own lives, according to the U.S. Centers for Disease Control and Prevention. Of those, more than 6,000 were young people aged 10 to 25—and the rate of adolescent suicide has been rising steadily since 2007.

The suicide rate for girls ages 15 to 19 has doubled, reaching its highest point in 40 years in 2015. The suicide rate for boys ages 15 to 19 increased 30 percent across the same period. Now, suicide is the second-leading cause of death for those aged 15 to 24 and the third-leading cause of death among children aged 10 to 14.

As nationally known experts in artificial intelligence, computational medicine, psychiatry and brain science, the coinventors know these statistics better than most. But one number is especially powerful: 1,319.

That is the number of suicide notes used in this project to teach computers how to understand the language of suicide. It’s the world’s largest known collection of its kind. Pestian and 30 colleagues from around the world devoted years to collecting and analyzing these notes.

From this exhaustive research, Pestian and colleagues reported a breakthrough in 2012 that to this day seems mind-boggling: it is possible to teach a computer to do a better job than a trained human mental-health professional at determining how serious a person actually is about taking their own life.

The potential benefit of such a tool is hard to overstate. Imagine having the chance to step in before things go too far. Imagine avoiding the anguish, the overwhelming guilt, the crushed hopes that flow from so many premature funerals. Imagine instead the many contributions to our world that could come from budding young lives getting their chance to blossom.

Real Progress, Real Potential

SAM app.

Screenshot of Spreading Activation Mobile (SAM)

In 2010, Pestian and colleagues recruited 163 volunteers who had personal experiences with suicidal thoughts to help annotate the collection of suicide notes. They mapped words and sentences to emotions and related categories, such as abuse, anger, blame, fear, guilt, hopelessness, sorrow, forgiveness, happiness, peacefulness, hopefulness, love, pride, thankfulness, instructions, and information.

The result was a database of suicide information that the researchers agreed could become the basis of a computerized scoring system.

In 2012, Pestian was the lead author on a paper describing the potential of the machine learning approach, published in Biomedical Informatics Insights as “Sentiment Analysis of Suicide Notes: A Shared Task.” That paper described the work of 106 scientists from 24 teams from North America, Europe and Asia who volunteered to explore the best ways to digitally capture the emotions conveyed through suicide notes.

They concluded that “human-like performance on this task is within the reach of currently available technologies.”

Pestian then set out to redefine “currently available.” His lab developed algorithms—complex sequences of instructions for a computer—to identify individuals at risk of suicide, depression, and bipolar and anxiety disorders.

The wording of suicide notes was just the beginning. Since 2012, the research team has added data from visual cues exhibited during counseling sessions that also capture differences in how people at high risk of self-harm smile and speak. From this combined pool of data, the team developed a highly advanced “app” that uses natural language processing to determine, in real time, how closely a person’s communication reflects a language of suicide that human ears may not always hear.

“When some people talk about suicide, they might express sadness, or being upset. They might say, ‘My girlfriend left me, so I’m going to stand in front of a train.’ But these are predominantly impulsive factors that may not reflect underlying chronic issues that also can lead to suicide,” Pestian explains.

“When we talk about machine learning, we’re talking about measurable features of the language used, the characteristics in the data that can predict outcomes. We learned, for example, that one of the most important factors to the machine was the ratio of nouns to pronouns. Also, through video and acoustic recordings, we found differences in cadence in the suicidal person.”

While a person might be able to mask their emotional state from a human counselor for a few minutes, it is harder to hide from the machine across the entirety of an interview session. Patients seriously considering suicide not only tend to use specific words and phrases, they tend to talk slower. When they smile, they don’t show as much of their teeth.

“We can measure that with machines,” Pestian says.

The latest version of the product is called Spreading Activation Mobile (SAM). With consent from therapists and patients, SAM records sessions and interprets words and inflections in real-time. These “thought markers” include voice cadence, language, hand motion, body language and facial expression.

Pestian’s algorithms received its first patent in 2015. Then the results of a pilot study were published in 2015 and 2016 in Suicide and Life-Threatening Behavior.

More than 370 patients, including some known to be suicidal, others who were diagnosed as mentally ill but not suicidal, and some who were neither, completed standardized behavioral rating scales and participated in semi-structured interviews. After extracting and analyzing verbal and non-verbal language, the machine learning algorithms discerned the differences between the groups with up to 93 percent accuracy.

Nearly two years later, Pestian asks a simple question: With mounting evidence that the algorithm works, why is it taking so long to convert this breakthrough into a product that can start saving lives?

Re-doubling the Effort

Scientists like Pestian have three choices for disseminating their discoveries: release to the public domain, publish in academic journals, or use the market to disseminate their findings.

Pestian is no stranger to using the market. He, along with three co-inventors, developed a genetic testing tool, now called GeneSight, that has helped caregivers determine ideal medication doses for more than 750,000 patients with psychiatric conditions. Assurex Health, the company formed to market the tests, was acquired in 2016 by Myriad Genetics.

Pestian and colleagues see similar potential in the suicide risk identification app. The machine learning approach that led to the first app may also serve as a platform for apps addressing other conditions. So, the team opted to pursue a commercial approach.

Early collaboration included work with CincyTech, a regional business incubator, and commercialization staff at Cincinnati Children’s. Now Pestian is working with the re-organized and renamed Innovation Ventures group at Cincinnati Children’s to continue advancing the product.

The science behind the innovation has been described in 10 peer-reviewed publications. The work also has attracted news coverage from The Wall Street Journal, the Washington Post, NPR, USA Today, and others.

So far, Pestian and colleagues have secured two patents, and have formed a spinoff company. The product also has received two Innovation Fund grants from Cincinnati Children’s.

Innovation Ventures and CincyTech have selected Pestian’s machine language algorithm as one of a handful of discoveries slated to received accelerated development support. That means devoting more expertise and resources to developing business plans, supporting the start-up company, pursuing investors and more.

The market is much less forgiving than the grant review process, so even brilliant ideas with powerful life-transforming potential have no guarantee of commercial success.

Andrew Wooten, MST, MTM, Vice President of Innovation Ventures, says the research world and the commercial world spin on different axes. Lining up those worlds is part art, part science, and takes time.

“This is like trying to build a car while driving it,” Wooten says. “The old approach was to just put a discovery out there and find someone to do the product development. That can be a good model if you get people to take it. But this project is an early-stage innovation. Many investors will say ‘I’m not taking the risk.’ ”

Unlike the difficult but well-understood pathway for developing a new medication, digital healthcare products remain a tough sell. Especially when the subject matter carries the societal taboos of suicide and the technology involved is emerging.

“With software and algorithms, you can patent some of the software involved, but there are other protections too, like a copyright, or the kind of patents that protect a business process,” says Matthew Wortman, a portfolio manager who oversees the digital health and care delivery asset class for Innovation Ventures. “Then it becomes a question of, who is going to buy it? And who is going to pay? The insurance company? The hospital? Parents?”

Miles To Go

It remains too early to predict how, or even if, SAM will emerge as a fully commercialized product. One hopeful sign: The start-up company’s CEO is Don Wright, who helped launch the Assurex start-up. Wright has a particular passion for this project because his own son, 29, died from suicide last year.

Regardless of commercial success, supporting the effort is core to Cincinnati Children’s non-profit mission, Wooten says.

“Our loyalty is to children,” Wooten says. “Getting things to market that help children.”


A Growing Pipeline of Digital Assets