4 Ways John Snow Labs is Raising the Bar for NLP in Healthcare

David Talby, CTO, John Snow Labs

Natural Language Processing (NLP) has been a technology to watch over the past few years, but in 2020 it emerged as a mission-critical tool for businesses. With the advent of dispersed workforces, closed brick-and-mortar stores, and overwhelmed healthcare systems, NLP has helped enable solutions from virtual assistants and customer service chatbots, to automating mundane tasks and aiding in vaccine development—and the enterprise is taking notice. In fact, 53% of respondents from a recent global survey reported their NLP budget was at least 10% higher this year compared to 2019, with 31% stating their budget was at least 30% higher than the previous year.

The investment in NLP in the wake of the global pandemic—a time when IT budgets have been on a steady downturn—speaks volumes about just how important this technology is. To help healthcare and life science organizations take full advantage of all NLP has to offer, John Snow Labs has been hard at work making enhancements to its technology, improving accuracy, and knowledge-sharing amongst the NLP community to usher in the next wave of NLP adoption. Here are four ways the company has raised the bar for NLP this year:

1.) Putting Accuracy at the Forefront

At only four years old, John Snow Labs is still in its infancy, but its Spark NLP offering might lead you to believe otherwise. Half of all respondents from the aforementioned survey cited they used at least one of the top two libraries: Spark NLP and spaCy. More specifically, a third of all respondents stated they use Spark NLP, making it the most popular NLP library in the survey. This number only increases when we look at healthcare organizations, 54% of which stated they use Spark NLP. That won’t last long, though—the company experienced 9x growth in downloads since January, and the numbers keep climbing.

The hype around Spark NLP stems from a deep commitment to offering users the most accurate solution on the market. Accuracy is a huge pain point for NLP users, with more than 40% of users reporting accuracy as the most important criteria they use to evaluate NLP libraries. Accuracy refers to pre-trained models that get used in multi-stage NLP pipelines. These models let users input text to complete common tasks like named entity recognition, sentiment analysis, spell checking, document classification, and toxic content detection. Spark NLP models beat state-of-the-art academic accuracy on multiple public benchmarks, and more importantly, the software keeps improving as new research results get published and productized by the team.

2.) Constant Enhancements to the Technology

John Snow Labs released 26 new versions of Spark NLP in 2019 and another 26 this year, with the most recent being Spark NLP for Healthcare 2.7. The most significant feature in this release is Text to SQL, and other upgrades include more accurate entity resolution and clinical named entity recognizers, new PICO classifier for evidence-based medicine, new biomedical named entity recognizers, and new clinical and traffic accident NER models in German. These models are pre-trained with clinical BioBERT based embeddings, the most powerful contextual language model in the clinical domain today - making it an easy-to-use, best-in-class solution for clinical and biomedical NLP projects.

In addition to enhancements in the technology, John Snow Labs has also improved upon its multilingual offerings, making NLP available to more users worldwide. Historically, highly accurate NLP software was built largely for English and Chinese. Now, Spark NLP offers models in 46 languages, democratizing the technology for data scientists around the globe. Language support has been a big barrier to adoption, but recent advances such as language-agnostic sentence embeddings, zero-shot learning, and the public availability of multilingual embeddings, is making the technology more accessible.

3.) Addressing Mission-Critical Healthcare Applications

Adverse Drug Events (ADE) account for nearly 700,000 emergency department visits and 100,000 hospitalizations in the US alone. They are one of the most common types of inpatient errors, and put a big burden on the healthcare system. To address this, John Snow Labs recently announced updates in areas such as Named Entity Recognition (NER) and Classification, making it easier to achieve more timely and accurate results. The ADE NER model enables data scientists to extract ADE and drug entities from a given text, and its ADE classifiers are trained to automatically decide if a given sentence is, in fact, a description of an ADE.

The combination of NER and classifier and the availability of pre-trained clinical pipeline for ADE tasks in the Spark NLP library can save users from building such models and pipelines from scratch, and put them into production immediately. The pre-trained models that come with Spark NLP for Healthcare are the most accurate available today, improving on the previous SOTA results by Giorgi et. al. (2019) - while going beyond an academic research result to deliver a production-grade codebase.

4.) Knowledge Sharing with the Community

This year, John Snow Labs hosted the first-ever NLP Summit, a gathering of more than 6,500 practitioners to hear more than 50 sessions from industry thought leaders in the AI and NLP space. The four-day virtual event dedicated an entire day to healthcare use cases, and due to its popularity, has sparked the creation of a new Healthcare NLP Summit, which will kick off in April 2021. With speakers from Google, Kaiser Permanente, Roche, Intel, Merck, DocuSign, and more, we look forward to seeing what next year’s events will bring.

In addition to community-building and knowledge-sharing, John Snow Labs is committed to putting its AI and healthcare expertise into practice for good. To help in the fight against the Coronavirus, the company partnered with data. world, the cloud-native data catalog, to make 2,224 current, linked, and expert-curated healthcare datasets available to researchers and data scientists studying the disease for free. John Snow Labs is also providing free software licenses to researchers fighting COVID-19.

It’s been a year of tremendous innovation and growth for NLP, and industries beyond healthcare and life science are feeling its impact. With exciting advancements and capabilities that will make this technology more accessible to users, 2021 is poised to be another significant year for NLP.