Conference Program - Technical Track

The Technical Track focuses on the latest text analytics techniques and methods – a how-to develop text analytics foundations, incorporate the latest advances, and explore new approaches. This track will appeal to anyone just starting to develop text analytics and to those who are looking to enhance their current efforts.

View the Text Analytics Forum 2019 Advance Program PDF


Wednesday, Nov 6

Track 1, Wednesday: Technical


Knowledge Graphs—Deep Dive

11:45 AM2019-11-062019-11-06

Wednesday, November 6: 11:45 a.m. - 12:30 p.m.

Make Beautiful Metadata: Using Knowledge Graphs to Improve Relevance in Automated Tagging
11:45 a.m. - 12:30 p.m.

This talk describes the use of an enterprise knowledge graph as the semantic backbone of a text analytics application. Going beyond traditional hierarchical taxonomies, we demonstrate how we use a knowledge graph to enhance entity resolution, boost signal detection, and improve relevance scoring. We examine a use case where graph-informed tagging adds business value by surfacing connections between different facets of content and by driving personalization and user experience through precise metadata.


, Corporate Taxonomist, IBM


Knowledge Graphs—Deep Dive

01:30 PM2019-11-062019-11-06

Wednesday, November 6: 1:30 p.m. - 2:15 p.m.

Knowledge Graph Enrichment via Text Analytics
1:30 p.m. - 2:15 p.m.

This presentation covers text analytics and text processing techniques used in creating several interesting text based knowledge graphs. One example is the Noam Chomsky Knowledge Graph which incorporates hundreds of articles and numerous books that Chomsky has authored about linguistics, mass media, politics, and war. Another example covers health effects for ingredients in foods and beauty products. We show how a combination of AI techniques and knowledge graphs can be used to transform text-heavy applications into an interactive response system that can be used by scientists, technologists, politicians, and scholars along with smart applications, intelligent chatbots, and question/answering machines, as well as other AI and data systems.


, CEO, Franz Inc. - AllegroGraph


AI & Analytics

02:30 PM2019-11-062019-11-06

Wednesday, November 6: 2:30 p.m. - 3:15 p.m.

This session looks at how AI and data science may be able to shape the world of project delivery, particularly projects which have a high degree of complexity but only if mediated by human sense-making and decision support. Our experienced and popular speaker takes work from counter-terrorism (his DARPA and other work) and applies them to project management using a multi-methods approach. Get in on the ground floor and grab new ways of thinking about AI and analytics.


, Chief Scientific Officer, Cognitive Edge


Thursday, Nov 7

Track 1, Thursday Morning: Technical


Text Analytics in Government

10:15 AM2019-11-072019-11-07

Thursday, November 7: 10:15 a.m. - 11:00 a.m.

Measuring the Effects of Information Operations Through Specialized Sentiment Analysis
10:15 a.m. - 11:00 a.m.

Government influence operations (IO) have been conducted throughout recorded history. In recent times, they have commonly been referred to as propaganda, active measures, or psychological operations (PSYOPS). More than a century of Russian “Chekist” tradition has culminated in a force that can mobilize thousands of humans augmented by unlimited numbers of bots. As documented in congressional testimony, this force has repeatedly seized control of foreign news cycles, inserting sentiments or wholly fictional stories. A simple positive/neutral/negative axis is not as applicable to the IO mission as one specific to the operation in question, such as entity stability/ instability, trustworthiness, or advocacy of violence. Given an IO action, such as promotion of an embarrassing story, the operator wants to measure the effect as change in sentiment, such as distrust of the now-discredited entity.


, SVP, Global Public Sector, Basis Technology

, Federal Solutions Engineer, Basis Technology

Text Analytics for Federal Regulations Public Commentary
10:15 a.m. - 11:00 a.m. was launched in 2003 to provide the public with access to federal regulatory content and the ability to submit comments on federal regulations. Manually reading thousands of comments is time-consuming and labor-intensive. It is also difficult for multiple reviewers to accurately and consistently assess content, themes, stakeholder identity, and sentiment. In response to this issue, text analytics can be used to develop transparent and accurate text models, and visual analytics can quantify, summarize, and present the results of that analysis. This talk addresses public commentary submitted in response to new product regulations by the U.S. Food and Drug Administration.


, Systems Engineer, SAS


Building Training Sets

11:15 AM2019-11-072019-11-07

Thursday, November 7: 11:15 a.m. - 12:00 p.m.

Accelerate Custom Text Analytics With Semi-Supervised Learning
11:15 a.m. - 12:00 p.m.

Text Analytics cloud services are maturing toward enabling customers to tailor entity recognition to the specifics of their business domain. A major challenge that customers face is the creation of training datasets tailored for each application. This talk describes a semi-supervised approach that accelerates the creation thereof, and gives customers a handle to control the trade-off between the effort associated with preparing training data and the accuracy of the resulting model.

NLP & Rule-Based Approach for Fact Extraction: Launchpad for Machine Learning Techniques
11:15 a.m. - 12:00 p.m.

One fundamental obstacle for using machine learning (ML) to accurately extract facts from free text documents is that it requires huge amounts of pre-categorized data for training. Manual annotation is not a viable option as it would entail enormous amounts of human analyst time. In this presentation we outline an innovative rule-based approach for automated generation of pre-categorized data that can be further used for training ML models. This approach relies on writing queries expressed in the powerful pattern definition language that fully exploits the results of the underlying natural language processing (NLP): deep linguistic, semantic, and statistical analysis of documents. The sequential application of rule-based and ML techniques facilitates the high accuracy of results. An example project illustrating this technology focuses on the automated extraction of clinical information from patient medical records.


, CEO, Megaputer

, Senior Computational Linguist, Megaputer


Thursday, Nov 7

Track 1, Thursday Afternoon: Technical


Auto-Categorization: People & Structure

01:00 PM2019-11-072019-11-07

Thursday, November 7: 1:00 p.m. - 1:45 p.m.

Supervised vs. Unsupervised Automated Categorization
1:00 p.m. - 1:45 p.m.

AI promises to categorize all types of content with reliable results, but the reality is much more complex. Most applications won’t work with a meat grinder approach, where you pour a huge amount of content in one end and a perfectly organized collection comes out the other end. Effective automated categorization depends on defining a process workflow and assembling a stack of methods to process different types of content in different ways. Designing and validating a content processing workflow requires human judgments. So good quality categorization applications often rely on how to make the best use of people. This presentation provides a reality check on unsupervised automated categorization, and discusses a case study in which the performance was suitable for editorial review and approval, but not for unsupervised processing of a large collection.


, Founder and Principal, Taxonomy Strategies

Content Structure Models—A Technique to Improve Both Machine Learning & Rules-Based Categorization
1:00 p.m. - 1:45 p.m.

There is no such thing as unstructured text—even tweets have some structure—words, clauses, phrases, even the occasional paragraph. Techniques that treat documents as undifferentiated bags of words have never achieved a high enough accuracy to build good auto-categorization whether using machine learning (ML) or rules. However, going beyond bags of words and utilizing the structures found in “unstructured” text, it is possible to achieve dramatically improved accuracy. This talk, using multiple examples from recent projects, presents how to build content structure models and content structure rules that can be used for both rules-based and ML categorization. We conclude with a method for combining rules and ML in a variety of ways for the best of both worlds.


, Chief Knowledge Architect, KAPS Group, USA


Machine Learning & Rules—Either/Or/And

02:00 PM2019-11-072019-11-07

Thursday, November 7: 2:00 p.m. - 2:45 p.m.

Indexing Psychological Content: Striving for Consistency & Specificity
2:00 p.m. - 2:45 p.m.

The American Psychological Association’s PsycINFO databases release around 3,000 records per month. In June 2017, a plan was created to bring machine-aided indexing (MAI) back to the APA’s PsycINFO databases. Since then, MAI has been implemented across three of the databases, including PsycARTICLES. Pearson discusses the strategy used to build the rule base, and integrate the software into the production system. He also takes a look at some of the challenges faced along the way and explores future goals and further deployment plans.


, Machine-Aided Indexing Specialist, American Psychological Association

Use Semantic Models to Provide Context for Machine Learning
2:00 p.m. - 2:45 p.m.

The world is looking for a fast solution to analyzing text, and certainly we've made progress with AI and machine learning (ML) tools. But those options still tend to yield broader, or more general, results than enterprises need. Layering ML techniques with semantic models can give users highly precise classifications for search, recommendations, and data management by providing context for data. In this session, see how semantic models drive better text analysis—either by creating a training dataset for machine learning operations or by using them on their own.


, Principal, Bacon Tree Consulting


Knowledge Graphs: A Foundation for Search & Beyond

03:00 PM2019-11-072019-11-07

Thursday, November 7: 3:00 p.m. - 3:45 p.m.

Lessons Learned Building a Knowledge Graph for IDB
3:00 p.m. - 3:45 p.m.

Similar to many enterprises, the Inter-American Development Bank (IDB) has multiple information sources which are isolated in different systems. There is no link between all these information resources that can make them accessible outside of their native systems. It is not possible to relate distinct kinds of resources that share some characteristics, e.g., to find a course that is about the same topic as a publication. To achieve this objective, IDB implemented a system that can automatically extract entities and concepts from its systems, including structured and unstructured data. Further, it semantically enhanced the data and made it accessible in a Knowledge Graph. Hernandez and Marino share lessons learned from this project that can help interested attendees start with a baseline of best practices for their own projects, saving valuable time and money.


, Senior Project Manager, Inter-American Development Bank

, Senior Consultant, Enterprise Knowledge

Using Text Analytics & Knowledge Graphs for Natural Language Search
3:00 p.m. - 3:45 p.m.

Products like Amazon Alexa and Google Home are changing the expectations as to how search should work. Searchers now expect voice-driven search solutions that provide answers and not just a list of links. Part of this talk shares how knowledge graphs enable a natural language search and how text analytics along with machine learning can be used to populate these powerful constructs. We explain how to architect these solutions and provide real world examples as to how many of our clients have taken advantage of these powerful tools.


, Principal, Enterprise Knowledge LLC

Co-Located With