ORCA2 NER/RE Experiment

Completes task: https://openmbee.atlassian.net/browse/AI4-20

Hypothesis

The Orca2 LLM can be an effective Named Entity Recognition and Relation Extraction tool for aircraft system descriptions.

Setup

For this experiment, I am using ORCA2 7B GGUF model Q5_K_M. It tends to work quickly using the accelerate package and is recommended by the author for better fidelity and performance.

I decided to use Langchain to simplify use of the model so I can focus on the actual experiment.

Location

Github Location: ai4safety/MikeWorkspace/Goal1/TestNotebook.ipynb

Model Config

'context_length': 8000, 'max_new_tokens': 4096, 'gpu_layers': 32, 'temperature': 0.3, 'top_p': 0.45, 'top_k': 18, 'repetition_penalty': 1.1,
  • Low temperature is preferred as to not create any creative data.

  • top_p, top_k and repitition_penalty were set based on randomized trials over many runs.

  • context_length is set to an arbitrarily large number, this can affect performance, but so far so good.

  • max_new_tokens is set to the llama2 recommended value.

  • gpu_layers is with respect to what my RTX 3070Ti Mobile can handle without locking up.

System Prompt

The system prompt is created using langchain prompt templates, using two input variables, {input} and {json}, which will be the system description to be analyzed, and the example JSON output.

The system prompt tested for NER is:

You are an aircraft systems expert. Perform the task of Named Entity Recognition (NER) on the input text. You must format your output as a JSON vlaue that adheres to the example below. Your output will be parsed and type-checked! Example text: The engine connects to the fuel pump. The Avionics depends on the Electrical System. Results in the json: {json} ### Input: {input} ### Response: Valid JSON that adheres the requirements above and format of the example json output. """

The system prompt used for RE is:

You are an aircraft systems expert. Perform the task of Relation Extraction (RE) on the input text. You must format your output as a JSON vlaue that adheres to the example below. Your output will be parsed and type-checked! Example text: The engine connects to the fuel pump. The Avionics depends on the Electrical System. Results in the json: {json} ### Input: {input} ### Response: Valid JSON that adheres the requirements above and format of the example json output.

 

JSON Structure

 

Runs Using Data From Real System Descriptions

Source Data: Confluence

#

Orca2 Output

#

Orca2 Output

P1

NER:

RE:

P2

NER:

RE:

P3

NER:

RE:

P4

NER:

RE:

P5

NER:

RE:

P6

NER:

RE:

P7

NER:

RE:

P8

NER:

RE:

P9

NER:

RE:

P10

NER:

RE: