Build awareness and adoption for your software startup with Circuit.

SpaCy KeyError: [E022]

Debugging is a form of art, and troubleshooting errors provides a deeper understanding of technical nuances than any tutorial ever could. Yet, at times, the error resolution process can be both exhaustive and seemingly endless.

During the training of a spaCy blank model (spacy.blank("en")), I encountered an error message as follows:

KeyError Traceback (most recent call last)
Cell In[135], line 34, in train*model_with_epochs(num_epochs, output_dir)
 30 #optimizer = nlp.begin_training()
 31 optimizer = nlp.create_optimizer()
 32 examples.append(example)
 33 print(examples)
---> 34 nlp.update(examples, sgd=optimizer, losses=losses)
 35 timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
 36 file.write("{} - Iteration : {} - Losses: {}n".format(timestamp,*, losses))

KeyError: "[E022] Could not find a transition with the name 'O' in the NER model."

In the process of comprehending and resolving this error, I encountered diverse explanations for what could lead to such issues. However, none of these solutions were consolidated, making the troubleshooting process time-consuming. This blog aims to consolidate all potential reasons and their respective resolutions for the benefit of others facing similar challenge.

This is an illustration of an example and the corresponding labels list used for training:

example = {
'doc_annotation': {
'cats': {},
'entities': ['O', 'O', 'O', 'B-org', 'L-org', 'O', 'O', 'O'],
'spans': {},
'links': {}
'token_annotation': {
'ORTH': ['A', 'N', 'NUM', 'Kashmir', 'University', 'counter', 'near', 'NUM'],
'SPACY': [True, True, True, True, True, True, True, False],
'TAG': ['', '', '', '', '', '', '', ''],
'LEMMA': ['', '', '', '', '', '', '', ''],
'POS': ['', '', '', '', '', '', '', ''],
'MORPH': ['', '', '', '', '', '', '', ''],
'HEAD': [0, 1, 2, 3, 4, 5, 6, 7],
'DEP': ['', '', '', '', '', '', '', ''],
'SENT_START': [1, 0, 0, 0, 0, 0, 0, 0]
labels = ['O', 'org']
  1. Disparity in labels: Inconsistencies in labels can arise due to issues in the labels list, such as missing commas or discrepancies between label names and NER annotations. For instance: If NER annotation has 'Aircraft_type' but labels list contains 'Aircraft type'.
labels=['Aircraft type', 'Temperature','Wind',
'Sky Cover', 'Overcast Sky Cover' #missing comma
'Haze Flight visibility'
  1. Version incompatibility: The spaCy version you are utilizing may introduce a specific tagging framework that your code anticipates. Currently, we have BIO, BIOES, and BILOU schemas.

BILOU Method/Schema

| ------|--------------------|
| BEGIN | The first token |
| ------|--------------------|
| IN | An inner token |
| ------|--------------------|
| LAST | The final token |
| ------|--------------------|
| Unit | A single-token |
| ------|--------------------|
| Out | A non-entity token |
| ------|--------------------|

BIOES - Begin, Inside, Outside, End, Single

BIO - Begin, Inside and Outside. Words tagged with O are outside of named entities

In certain scenarios, errors may arise when your code expects 'None' in place of non-entities during annotation, but encounters an 'O' instead, leading to confusion and processing difficulties.

  1. Label Definition: If the error message indicates, "Transition with a name 'B-AnyLabelName' could not be found in the NER model," it is advisable to inspect your labels list. To confirm the presence of 'AnyLabelName,' consider printing the labels list. If it is absent, utilize the following code to add it to your list.
ner = nlp.get_pipe("ner")
  1. Training Data Headers: When converting a pandas data frame into spaCy-specific training data, it's possible that the data frame headers persist in the training data as an example. As a consequence, when searching for annotations, errors may occur because no annotations are present in the initial entry (headers). It is recommended to inspect and address this issue in such cases.

  2. Optimizer initialisation: Appropriate optimizer initialization is crucial and contingent on the type of model employed --- whether it is pre-trained or blank.

# Inititalizing optimizer

if model is None:
optimizer = nlp.begin_training()
optimizer = nlp.entity.create_optimizer()

What do these mean?

In the event that the model is None, the code employs nlp.begin_training() to initiate the optimizer. This implies that you are commencing with a blank model and intend to train it from the ground up. The begin_training() function is commonly utilized in spaCy for model training purposes. Conversely, if the model is not None, indicating it has already undergone pre-training or initialization, the code utilizes nlp.entity.create_optimizer() to set up the optimizer. This suggests that the model is equipped with named entity recognition (NER) components, and you aim to establish an optimizer specifically tailored for fine-tuning or training the entity recognition segment.

These were some typical error scenarios we've encountered along with their respective solutions. Apart from these instances, errors may also emerge from other issues within the code. Please feel free to share your personal experiences in this regard. We hope this information proves valuable.

Happy learning:)

Continue Learning