Text Analytics Forum 2018

Keynotes: Thursday

Keynote - Intra-Preneurship & Learning: Tips & Tools

Thursday, November 8: 8:45 a.m. - 9:30 a.m.

Organizations can use game design techniques to fully engage customers, partners, and employees. When well implemented, gamification can transform a work culture by cultivating deep emotional connections, high levels of active participation, and long-term relationships that drive knowledge sharing, learning and business value. Enterprises can utilize strategy games, simulation games, and role-playing games as means to teach, drive operational efficiencies, and innovate. Find out how organizations have embraced social collaboration using playful design to reap tremendous value, grab tips and tools to build a learning culture, and learn how to engage your community!

Speaker:

Phaedra Boinodiris, Global Leader for Responsible AI, IBM Consulting, & 2026 Women in AI Honoree

Keynote - Semantic AI

Thursday, November 8: 9:30 a.m. - 9:45 a.m.

Semantic enhanced artificial intelligence is based on the fusion of semantic technologies and machine learning. Our leader in the field discusses six core aspects of semantic-enhanced AI and why semantics should be a fundamental element of any AI strategy. He looks into concrete examples and shares how to increase precision of machine learning tasks by semantic enrichment. Semantic AI is the next-generation artificial intelligence. Understand how machine learning (ML) can help to extend knowledge graphs, and in return, how knowledge graphs can help to improve ML algorithms. This integrated approach ultimately leads to systems that work like self-optimizing machines after an initial setup phase, while being transparent to the underlying knowledge models.

Speaker:

Andreas Blumauer, SVP Growth, Graphwise

Keynote - Intelligent Answers: NLP & Machine Learning for Customer & Employee Satisfaction

Thursday, November 8: 9:45 a.m. - 10:00 a.m.

Answers are the key exchange between customer and provider in support, service, and sales, yet that intersection is wrought with friction when information isn’t readily available, context is unknown, and time is of the essence. AI-driven technologies such as natural language processing, machine learning, and text analytics can help reduce the friction and create more satisfying experiences for both customer and vendor, across any touchpoint, ensuring the most precise answer is delivered every time. Johnson explores how and shares real-world outcomes from Fortune 1000 companies.

Speaker:

Gerard Dwan, Director of Customer Engagement, Attivio

Track 1, Thursday AM: Technical

New Techniques in Text Analytics

Thursday, November 8: 10:15 a.m. - 11:00 a.m.

Don’t Stop at Stopwords: Function Words in Text Analytics

Most text analysis methods include the removal of stopwords, which generally overlap with the linguistic category of function words, as part of pre-processing. While this makes sense in the majority of use cases, function words can be extremely powerful. Research within the field of language psychology, largely centered around linguistic inquiry and word count (LIWC), has shown that function words are indicative of a range of cognitive, social, and psychological states. This makes an understanding of function words vital to making appropriate decisions in text analytics. In model design, differences in expected distributions of function words compared with content words have an impact on feature engineering. For instance, methods which use as their input the presence or absence of a word within a text segment will produce no useable signal when applied to function words, while those that are sensitive to deviations from expected frequency within a given language context will be highly successful. When interpreting results, differences in the way that function and content words are processed neurologically must be accounted for. As awareness of the utility of function words rises within the text analytics community, it is increasingly important to cultivate a nuanced understanding of the nature of function words.

Speakers:

Kiki Adams, Head of Science, Receptiviti

Shayna Gardiner, Computational Linguist & Data Scientist, Receptiviti

Using Structure in Taxonomy-Based Auto-Categorization

The basic premise of taxonomy and text analytics work is to impose structure on—or reveal structure in—unstructured content. Despite being called “unstructured,” much workplace information can be described as semi-structured, as there is always some level of organization in even the most basic content formats. For example, in a workplace document you will likely find titles, headers, sentences, and paragraphs, or at least a clear indicator of the beginning and end of a large block of text. Similarly, taxonomies and ontologies are artificial constructs which may reflect the information they describe or be imposed as a form of ordering on semi-structured content. In this session, attendees hear case studies about using the contextual structure of taxonomies and ontologies and the various structural indicators in text to perform taxonomy-based content auto-categorization and information extraction.

Speaker:

Ahren Lehnert, Senior Taxonomist, Genentech

Entity Extraction

Thursday, November 8: 11:15 a.m. - 12:00 p.m.

The Frontier of Named Entity Recognition

This presentation serves as an overview of current issues with named entity recognition in text analytics, focusing on work done beyond the categories of people, place, organization, and other elements that are (relatively) easily extracted through current processes. It covers areas of ongoing research, issues, and ideas about their potential benefits to taxonomy and ontology development.

Speaker:

Brian Goss, Taxonomist, EBSCO Information Services

Beyond NER: Concept Extraction Using Semantic Structure

Traditional approaches to concept and relationship extraction focus either on pure statistical techniques or on detecting and extending noun phrases. This talk outlines an alternative approach that identifies multiword concepts and the relationships between them, without requiring any predefined knowledge about the text’s subject. We demonstrate a number of capabilities built using this approach, including ontology learning, intelligent browsing, semantic search, and text categorization.

Speakers:

Jeff Fried, Director, Platform Strategy & Innovation, InterSystems

Dirk Van Hyfte, Senior Advisor, Biomedical Informatics, InterSystems

Track 2, Thursday AM: Applications

Practical AI

Thursday, November 8: 10:15 a.m. - 11:00 a.m.

Assisted Intelligence: Bridging the Gap Between Technology & Humans to Improve the Bottom Line

The AI hype is rapidly exploding into C-suites of organizations around the world and with good reason—the promise is compelling. The convergence of AI, robotic process automation (RPA), machine learning, and cognitive platforms allows employees to focus their attention on exception processing and higher-value work while digital labor manages low-value, repetitive tasks. While the debate whether digital labor will add or eliminate jobs is ongoing, what’s important in today’s enterprise is how digital and human labor can be integrated to improve efficiency and drive innovation. Using real-world examples, this session covers how machine processing, when guided by human knowledge, curation, and control, provides assisted intelligence (AI) to organizations that want to streamline processes, reduce operating costs, and improve ROI.

Speaker:

Jeremy Bentley, Head, Strategy, MarkLogic

Leveraging AI, NLP, & Text Analysis to Deliver a Better Customer Experience

Traditional knowledge platforms are not capable of effectively understanding unstructured information due to the complexity of language and the lack of structure. Therefore, they cannot effectively organize disparate sources of knowledge (marketing material, customer service content, emails, chat logs, social media chatter, customer response surveys, internal documentation, etc.) in any meaningful way. Addressing the complexity of language requires more than keywords, spending weeks and months manually tagging data, or locating topic-specific content in an effort to train machine learning algorithms. This presentation explains the concepts behind an AI/cognitive computing platform, what makes it work, and how it can be deployed as a smart infrastructure to support a variety of business objectives. It will include a demonstration of an English-language knowledge graph (ontology), a customer support self-service mobile solution, and a smart content navigation portal.

Speaker:

Bryan Bell, Regional Vice President of Sales, Lucidworks

Structuring Documents

Thursday, November 8: 11:15 a.m. - 12:00 p.m.

Using NLP, ML & Graph Databases to Automate a Documents Review Process

FINRA receives hundreds of thousands of various documents each year from stockbrokers and investors to be reviewed and analyzed by our investigators. The investigators are looking for information about the who, what, where, when, and how contained in these documents, which is labor-intensive. Our solution was to develop a system that leverages NLP, machine learning and graph databases. The enhanced NER model combined with our custom entity resolution algorithms allowed us to extract individuals, organizations, and FINRA-specific entities and to map these entities into FINRA’s business systems. Entities were loaded into the Titan graph DB that supported navigation between documents, individuals, and organizations, visually highlighting hard-to-see patterns and insights. In addition, our NLP process allowed us to generate document summaries. This system significantly improved effectiveness and comprehensiveness of investigators’ documents review.

Speakers:

Dmytro Dolgopolov, Senior Director, Financial Industry Regulatory Authority (FINRA)

Greg Wolff, Enterprise Software Architect, Financial Industry Regulatory Authority (FINRA)

Content Analytics for Duplicative Research Detection

In order to reduce waste in DoD research, the DTIC is developing a document similarity application to identify forms of fraud, waste, and abuse such as equivalent work being done by different services. The document similarity tool will provide DTIC with the capability to apply content analytics against a large collection of documents, including Request for Proposals, proposals, technical reports, and project descriptions is key. The challenge goes beyond the identification of simple copy-and-paste style duplication. In this presentation, we discuss our hybrid approach to evaluating document similarity that combines multiple approaches, including vector space models, semantic similarity, and a novel approach to text analytics called trains-of-thought analysis. In addition, we provide a demonstration of the web-based application to include a real-time, document similarity analysis, including data visualizations, to speed the finding and assessment of similar content.

Speakers:

Hany Mohammed, Senior Information Architect, Defense Technical Information Center (DTIC)

Lowell Vizenor, CTO, Defense Technical Information Center (DTIC)

Keynotes

Luncheon & Keynote - AI & the Future of Knowledge

Thursday, November 8: 12:00 p.m. - 1:00 p.m.

AI is on the highest rung of the IT agenda. But how does it support professionals’ needs for insights in decision-making? Mayer looks at text analytics, the particular strand of AI that deals with language, the essential vehicle for professional knowledge. Through examples of its impact in insurance, media and the sciences, he illustrates “the art of the possible” and how you can make AI part of your knowledge practice’s roadmap.

Speaker:

Daniel Mayer, CMO, Expert System Enterprise

Track 1, Thursday PM: Technical

Modeling Tacit Knowledge & People

Thursday, November 8: 1:00 p.m. - 1:45 p.m.

Text Analytics & KM—From Expertise Location to Community Support

The Inter-American Development Bank is a multilateral public sector institution committed to improving lives in Latin America and the Caribbean. Human capital may be the institution’s most important resource for realizing its vision: The knowledge of its employees, roughly 5,000, is spread across offices in 29 countries throughout the Americas, Europe and Asia. The IDB’s knowledge management division led an 8-week proof of concept that used natural language processing techniques to create explicit representations of the tacit knowledge within its employees and make those representations searchable. Attempting to identify and represent people’s knowledge is a complex task. Part of this complexity lies in the fact that variables used to determine knowledge have ambiguous definitions. These and other considerations are what make this POC so different from a simple skills database or profile search. This presentation details our experience with this project and how the use of NLP allowed us to successfully create approximations of IDB personnel knowledge and turn them into machine searchable knowledge entities.

Speakers:

Kyle Strand, Lead Knowledge Management Specialist and Head of Library, Inter-American Development Bank (IDB)

Daniela Collaguazo, Text Analytics Consultant, Knowledge Innovation Communication Department, Inter-American Development Bank

Text vs. Anomalous States of Knowledge

Information retrieval can be seen as matching the intellectual content represented in documents to a knowledge gap in the mental map of a searcher. For decades, most of the focus of information retrieval research, whether in academia or in commercial systems, has been on improving the representation of documents, or collections of documents. Less attention has been paid to representing the searcher’s information need, or knowledge gap. This knowledge gap was characterized by Belkin, Brooks, and Oddy as an Anomalous State of Knowledge. This talk will describe the theory and practice of this concept and how it can be utilized to enhance information retrieval.

Speaker:

Paul Thompson, Instructor, Geisel Medical School, Dartmouth College

Auto-Tagging Methods

Thursday, November 8: 2:00 p.m. - 2:45 p.m.

The New Generation of Text Analytics

Advances in machine learning have led to an evolution in the field of text analytics. As these and other AI technologies are incorporated into business processes at organizations around the world, there’s an expectation that intelligent automation will lead to improvements like increased operational efficiency, enriched customer engagements and faster detection of emerging issues. How will technology meet that demand? How can we combine the expertise of humans with the speed and power of machines to analyze unstructured text that’s being generated at an unprecedented rate? Find out in this talk from Mary Beth Moore, who will share stories about text analytics being used to augment regulatory analysis, improve product quality and fight financial crimes.

Speaker:

Mary Beth Moore, Global Product Marketing Manager for AI, Text Analytics, SAS

Should We Consign All Taxonomies to the Dustbin?

The advent of unsupervised machine-learning algorithms make it possible for content owners to index their content without a taxonomy. This means that publishers are faced with this challenge: Do you maintain your existing taxonomies or replace them by a full ML approach? Or is there any way of combining the two? This talk looks at some case studies that have implemented different solutions, including publishers with private taxonomies used by organizations, and the use of large-scale, public-controlled vocabularies such as MeSH.

Speaker:

Michael Upshall, Head of Business Development, UNSILO, Denmark

The Human Factor in Machine Learning

Thursday, November 8: 3:00 p.m. - 3:45 p.m.

Optimizing Hand-Annotated Training Data for Machine Learning

Machine learning models often depend on large amounts of training data for supervised learning tasks. This data may be expensive to collect, especially if it requires human labeling. This raises some particular quality issues, for example, how to ensure that human agreement is high and what to do in the event that it is not? Also, when your data is expensive to tag, how do you ensure that you have the smallest set possible that is representative of all your features? This talk addresses these and other issues associated with gathering hand-coded datasets for supervised machine-learning models, especially models run on textual data.

Speaker:

Leslie Barrett, Senior Software Engineer, Bloomberg, LP

Triage First, Analysis Second

A U.S. intelligence community researcher recently declared, “Analytics is my second priority.” We have long passed the point where even “medium data” projects exceed the capacity of human analysts to actually read the corpus. Yet “human in the loop” is essential to ensuring quality in machine analytics. Thus his, and our, first priority becomes effective triage: determining which text warrants human attention, which should be condensed by automated means, and which may actually best be disregarded as valueless or actively malign. We model the text analytic process as a success of tiered steps, each with accuracy rates. While we classically think of text analytic accuracy in favorable terms as “precision and recall,” their inverses are “false negative and false positive.” We explore how initial steps with high-volume, automated processing can best tune their accuracy trade-offs to optimize the latter, human-moderated steps.

Speaker:

Christopher Biow, SVP, Global Public Sector, Basis Technology

Track 2, Thursday PM: Applications

Question & Answering Systems

Thursday, November 8: 1:00 p.m. - 1:45 p.m.

Automated Question-Answering System on Clinical Drug Trials Using NLP

Using NLP and linguistics, Saini presents Sapient’s work to develop unsupervised learning-based question-answering system. This talk showcases a demo on real-life data and also explains the process of building such automated QnA systems. Further, it also talks about shortcomings of chatbot systems and how these systems can be integrated with QnA systems to make them scalable.

Speaker:

Anuj Saini, Architect NLP, Sapient Corp.

Sherlock Holmes: A Question & Answering Machine

Sanjani introduces a semantic search and browse tool that aims to help researchers at IMF with their studies by finding relevant papers, authors, and concepts.

Speaker:

Marzie Taheri Sanjani, US Head of Quantitative Macro Research, Global Macro Advisers and SPX

Applications From Space to Human Trafficking

Thursday, November 8: 2:00 p.m. - 2:45 p.m.

Understanding International Crew Perspective Following Long Duration Missions

Meza presents a discussion on how the use of an analytical framework in conjunction with the current human interface improved the understanding of the International Space Station crew perspective data and shortened the analysis time, allowing for more informed decisions and rapid development improvements.

Speaker:

David Meza, Chief Knowledge Architect, NASA Johnson Space Center

Developing the Emerging Chemical Hazard Intelligence Platform for CFSAN’s Signal Detection Program

The Center for Food Safety and Applied Nutrition at the U.S. Food & Drug Administration (CFSAN/FDA) has been piloting a new process to identify, prioritize, and address potential emerging chemical hazards of concern that may be associated with CFSAN-regulated products. The objective has been to develop a business solution that enables analysts to identify predictors that are indicative of emerging chemical hazards associated with CFSAN-regulated products. This presentation reviews how CFSAN leverages SAS capabilities such as text analytics, entity extraction, predictive modeling, and business intelligence, combined with access to a variety of data sources, in our approach to build the Emerging Chemical Hazard Intelligence Platform (ECHIP), CFSAN’s solution for identifying emerging chemical hazards. We discuss how we developed an integrated solution that enables our analysts to quickly filter, visualize, and identify trends in reports that are indicative of potential chemical hazards.

Speaker:

Emily McRae, Systems Engineer, SAS

Using Text Analytics to Assess International Human Trafficking Patterns

This presentation showcases a strategy of applying text analytics to explore the Trafficking in Persons (TIP) reports and apply new layers of structured information. Specifically, it identifies common themes across the reports, use topic analysis to identify a structural similarity across reports, identifying source and destination countries involved in trafficking, and use a rule-building approach to extract these relationships from free-form text. We subsequently depict these trafficking relationships across multiple countries using a geographic network diagram that covers the types of trafficking as well as whether the countries involved are invested in addressing the problem. This ultimately provides decision makers with big-picture information about how to best combat human trafficking internationally.

Speaker:

Tom Sabo, Advisory Solutions Architect, SAS Institute Inc.

All the Use Cases

Thursday, November 8: 3:00 p.m. - 3:45 p.m.

Current Methodologies for Supervised Machine Learning in E-Discovery

Since supervised machine learning gained court acceptance for use in e-discovery 6 years ago, best practices have evolved. This talk describes the special circumstances of e-discovery and the best approaches that are currently in use. How robust is the Continuous Active Learning (CAL) approach? How much impact does the choice of seed documents have? What are SCAL and TAR 3.0?

Speaker:

Bill Dimm, Founder & CEO, Hot Neuron LLC

What a Mess: From OCR to Structured Text (Case Study)

This short case study describes a recent project with a special collection from a major university library which posed a fascinating challenge: Provided with scanned images (and OCR) of 14,000 typewritten and handwritten Cuban catalog cards, how can we extract structured text, index the content, and build XML records from this source data? Using a variety of text analytics techniques—including both Boolean and Bayesian approaches—we were able to identify and extract and structure the targeted elements accurately enough to create a dataset that required minimal manual cleanup.

Speaker:

Bob Kasenchak, Information Architect, Factor, USA

Closing Keynotes

Keynote - Engage Employees With CoPs in Office 365

Thursday, November 8: 4:00 p.m. - 4:15 p.m.

The intersection of knowledge sharing and new ways of learning and training is having an impact on how connected your employees feel to your organization at large. Moneypenny demonstrates how using video, social networks, and content collaboration together empowers knowledge practitioners and experts and people across the organization to engage with each other. Foster a culture of curiosity and share learning and best practices, while improving employee experience.

Speaker:

Naomi Moneypenny, Director, Product Development, Microsoft Viva, Microsoft

Closing Keynote - KMWorld Conversations With Leading Thinkers

Thursday, November 8: 4:15 p.m. - 5:00 p.m.

What are the chances of three thought leaders meeting in the same room, at the same terminal, in the same airport, in the same city by coincidence? Hear their story and many more as they discuss the impact of social media, organizational culture, machine learning, demographics and more!

Speakers:

Dave Snowden, Founder & Chief Scientist, The Cynefin Company

Tom Stewart, Executive Director, National Center for the Middle Market, Fisher College of Business, The Ohio State University

Leif Edvinsson, World's First Professor Emeritus on Intellectual Capital, Lund University and Hong Kong Polytechnic University and formerly with Skandia & Author, Intellectual Capital: Realizing Your Company’s True Value by Finding Its Hidden Brainpower

Conference Program - Day 2:Thursday, November 8, 2018

Keynotes: Thursday

Keynote - Intra-Preneurship & Learning: Tips & Tools

Keynote - Semantic AI

Keynote - Intelligent Answers: NLP & Machine Learning for Customer & Employee Satisfaction

Track 1, Thursday AM: Technical

New Techniques in Text Analytics

Entity Extraction

Track 2, Thursday AM: Applications

Practical AI

Structuring Documents

Keynotes

Luncheon & Keynote - AI & the Future of Knowledge

Track 1, Thursday PM: Technical

Modeling Tacit Knowledge & People

Auto-Tagging Methods

The Human Factor in Machine Learning

Track 2, Thursday PM: Applications

Question & Answering Systems

Applications From Space to Human Trafficking

All the Use Cases

Closing Keynotes

Keynote - Engage Employees With CoPs in Office 365

Closing Keynote - KMWorld Conversations With Leading Thinkers

Co-Located With

Diamond Sponsors

Platinum Sponsors

Gold Sponsor

Wi-Fi Sponsor

Networking Happy Hour

Media Sponsors