Increasing the accessibility of NLP techniques for Defence and Security using a web-based tool

Paxton-Fear, Katie

doi:10.17862/cranfield.rd.10066229.v1

Paxton-Fear_Poster 7_DSDS19.pdf (3.24 MB)

Increasing the accessibility of NLP techniques for Defence and Security using a web-based tool

poster

posted on 2019-11-19, 15:39 authored by Katie Paxton-Fear

As machine learning becomes more common in defence and security, there is a real risk that the low accessibility of techniques to non-specialists will hinder the process of operationalising the technologies. This poster will present a tool to support a variety of Natural Language Processing (NLP) techniques including the management of corpora – data sets of documents used for NLP tasks, creating and training models, in addition to visualising the output of the models. The aim of this tool is to allow non-specialists to exploit complex NLP techniques to understand the content of large volumes of reports.

NLP techniques are the mechanisms by which a machine can process and analyse text written by humans. These methods can used for a range of tasks including categorising documents, translation and summarising text. For many of these tasks the ability to process and analyse large corpora of text is key. With current methods, the ability to manage corpora is rarely considered, instead relying on researchers and practitioners to do this manually in their file system. To train models, researchers use ad-hoc code directly, writing scripts or code and compiling or running them through an interpreter. These approaches can be a challenge when working in multidisciplinary fields, such as defence and security and cyber security. This is even more salient when delivering research where outputs may be operationalised and the accessibility can be a limiting factor in their deployment and use.

We present a web interface that uses an asynchronous service-based architecture to enable non-specialists to easily manage multiple large corpora and create and operationalise a variety of different models – at this early stage we have focussed on one NLP technique, that of topic models.

This tool-support has been created as part of a project considering the use of NLP to better understand reports of insider threat attacks. These are security incidents where the attacker is a member of staff or another trusted individual. Insider threat attacks are particularly difficult to defend against due to the level of access these individuals gain during the regular course of their employment. The wider use of these techniques would generate greater impact both tactically in defending against these attacks and strategically in developing policy and procedures. There are tools available, however they are often complex and perform a single-task, limiting their use. To generate maximum impact from our research we have developed this web-based software to make the tools more accessible, especially to non-specialist researchers, customers and potential users.