General Architecture for Text Engineering

Submitted by kbontcheva on Mon, 07/22/2019 - 11:59

GATE has models and algorithms for social media analysis, Information Extraction (IE), machine learning for IE, knowledge graphs and semantic annotation, and Natural Language Processing-as-a-service. In total, the infrastructure and all its components and models comprise over 350,00 lines of code.

The GATE infrastructure is unique in its offering to both researchers and companies, a comprehensive NLP platform-as-a-service (GATE Cloud to

In the past year it attracted over 290 registered users, who used the services over 37,000 times. Large-scale Information Extraction (IE) is a particular strength, where GATE is used as the platform to extract information from the web, news wires, scientific papers, and legal and medical documents.

Selected example users include: development of robust, scalable IE from patents by Matrixware, Austria (1 million funding); used by BBC in 2012 for automating the coverage of the Olympic games through semantic annotation; WHO epidemiologists who found the first gene-disease association linking lung cancer and smoking. Other examples in the UK: Garlik (from the founders of Egg Plc) to fight identity theft; Innovantage for intelligent recruiting; Fizzback for analysing customer feedback; the UK National Archives; the Press Association; Financial Times, the Stationery Office, Nesta, TechCity UK, Synaptica, Text Mining Solutions, Buzzfeed UK, and Public Health England.

The most recent innovation objective has been working in collaboration with Kings College London and the South London and Maudsley NHS Trust on an NHS Text Analytics Platform. Current active users are the following NHS Foundation Trusts: South London and Maudsley, King’s College Hospital, University College London Hospitals, Oxford Health, Camden and Islington, together with Swansea Medical School and the Connected Health Cities project. Various MRC and other fellowships also use GATE at other sites. Past users have included Public Health England, and discussions are ongoing with University Hospitals Birmingham, NHS Digital, and Sheffield Teaching Hospitals. Another key goal is to continue offering some of the best performing social media analysis tools. Currently these include part-of-speech tagging and named entity recognition, as well as ground breaking machine learning methods for analysing rumours on social media.


Biological Sciences, Health & Food
E-Infrastructure & Data
Physical Sciences & Engineering
Social Sciences, Arts & Humanities
Natural language processing
Artificial intelligence
misinformation analysis
social media analysis



Contact Us

📧 Public Contact email
Telephone Number
0114 222 1930
Host Organisation
University of Sheffield
🏢 Address

Regent Court, Department of Computer Science
University of Sheffield
East Midlands
S1 4DP
United Kingdom

Last Updated: