Introduction

The previous post was about using the software to get extract entities from unstructured text. What is more those entities were classified as persons, locations, organizations and others. However, as this software is still far from being perfect there are some miss-recognitions and miss-classifications that should be fixed manually. Therefore, the following steps are going to explain how to do it.

Click on Analyzer > Review and fix entities.

Option used to fix entities
Click on Load available corpora.

Looking for the corpus we are working on
Click on the corpus we want to fix.

Selecting the corpus whose entities will be fixed

After clicking on the previous corpus you are going to see all the entities that belong to this.

Entities that belong to the selected corpus
Fixing miss-recognitions

This is about deleting words that have been recognized as entities by mistake. To delete these wrong entities, just click on the check-box (located on the right hand) to select them and then click on Delete selected entities.

Selecting entities to be deleted

If you are not sure whether this is an entity or not just double click on the entity name to read the sentence where this appears.

Getting the sentence where this instance appear
Fixing miss-classifications

Here we are going to fix entities that have been miss-classified. For example, in the following picture a Peruvian party has been classified as a location so fixing this is just about clicking the cell under “Entity type” to choose the proper type, which in this case is “ORG” then you should click on Update modified entities.

Fixing a miss-classified entity


Option used to fix entities


Looking for the corpus we are working on


Selecting the corpus whose entities will be fixed


Entities that belong to the selected corpus


Selecting entities to be deleted


Getting the sentence where this instance appear


Fixing a miss-classified entity

Conclusion

This post has shown how to fix miss-recognitions and miss-classifications and even though this is a cumbersome process, it is still better than doing a pure manual process. The next post is coming soon and is going to be about clustering entities.