ORCA2 NER/RE Experiment
Completes task: https://openmbee.atlassian.net/browse/AI4-20
Hypothesis
The Orca2 LLM can be an effective Named Entity Recognition and Relation Extraction tool for aircraft system descriptions.
Setup
For this experiment, I am using ORCA2 7B GGUF model Q5_K_M. It tends to work quickly using the accelerate package and is recommended by the author for better fidelity and performance.
I decided to use Langchain to simplify use of the model so I can focus on the actual experiment.
Location
Github Location: ai4safety/MikeWorkspace/Goal1/TestNotebook.ipynb
Model Config
'context_length': 8000,
'max_new_tokens': 4096,
'gpu_layers': 32,
'temperature': 0.3,
'top_p': 0.45,
'top_k': 18,
'repetition_penalty': 1.1,
Low temperature is preferred as to not create any creative data.
top_p, top_k and repitition_penalty were set based on randomized trials over many runs.
context_length is set to an arbitrarily large number, this can affect performance, but so far so good.
max_new_tokens is set to the llama2 recommended value.
gpu_layers is with respect to what my RTX 3070Ti Mobile can handle without locking up.
System Prompt
The system prompt is created using langchain prompt templates, using two input variables, {input} and {json}, which will be the system description to be analyzed, and the example JSON output.
The system prompt tested for NER is:
You are an aircraft systems expert.
Perform the task of Named Entity Recognition (NER) on the input text.
You must format your output as a JSON vlaue that adheres to the example below.
Your output will be parsed and type-checked!
Example text:
The engine connects to the fuel pump.
The Avionics depends on the Electrical System.
Results in the json:
{json}
### Input:
{input}
### Response:
Valid JSON that adheres the requirements above and format of the example json output.
"""
The system prompt used for RE is:
You are an aircraft systems expert.
Perform the task of Relation Extraction (RE) on the input text.
You must format your output as a JSON vlaue that adheres to the example below.
Your output will be parsed and type-checked!
Example text:
The engine connects to the fuel pump.
The Avionics depends on the Electrical System.
Results in the json:
{json}
### Input:
{input}
### Response:
Valid JSON that adheres the requirements above and format of the example json output.
Â
JSON Structure
Â
Runs Using Data From Real System Descriptions
Source Data: Confluence
# | Orca2 Output |
---|---|
P1 | NER: RE: |
P2 | NER: RE: |
P3 | NER: RE: |
P4 | NER: RE: |
P5 | NER: RE: |
P6 | NER: RE: |
P7 | NER: RE: |
P8 | NER: RE: |
P9 | NER: RE: |
P10 | NER: RE: |