Hypothesis
The Orca2 LLM can be an effective Named Entity Recognition and Relation Extraction tool for aircraft system descriptions.
Setup
For this experiment, I am using ORCA2 7B GGUF model Q5_K_M. It tends to work quickly using the accelerate package and is recommended by the author for better fidelity and performance.
I decided to use Langchain to simplify use of the model so I can focus on the actual experiment.
Location
Github Location: ai4safety/MikeWorkspace/Goal1/TestNotebook.ipynb
Model Config
'context_length': 8000, 'max_new_tokens': 4096, 'gpu_layers': 32, 'temperature': 0.3, 'top_p': 0.45, 'top_k': 18, 'repetition_penalty': 1.1,
Low temperature is preferred as to not create any creative data.
top_p, top_k and repitition_penalty were set based on randomized trials over many runs.
context_length is set to an arbitrarily large number, this can affect performance, but so far so good.
max_new_tokens is set to the llama2 recommended value.
gpu_layers is with respect to what my RTX 3070Ti Mobile can handle without locking up.
System Prompt
The system prompt is created using langchain prompt templates, using two input variables, {input} and {json}, which will be the system description to be analyzed, and the example JSON output.
The system prompt tested for NER is:
You are an aircraft systems expert. Perform the task of Named Entity Recognition (NER) on the input text. You must format your output as a JSON vlaue that adheres to the example below. Your output will be parsed and type-checked! Example text: The engine connects to the fuel pump. The Avionics depends on the Electrical System. Results in the json: {json} ### Input: {input} ### Response: Valid JSON that adheres the requirements above and format of the example json output. """
The system prompt used for RE is:
You are an aircraft systems expert. Perform the task of Relation Extraction (RE) on the input text. You must format your output as a JSON vlaue that adheres to the example below. Your output will be parsed and type-checked! Example text: The engine connects to the fuel pump. The Avionics depends on the Electrical System. Results in the json: {json} ### Input: {input} ### Response: Valid JSON that adheres the requirements above and format of the example json output.
JSON Structure
# output json structures example_NER_json = { "entities": [ { "name": "Engine", "type": "SystemComponent", }, { "name": "Fuel Pump", "type": "SystemComponent", }, { "name": "Avionics", "type": "System", }, { "name": "Electrical System", "type": "System", } ] } example_RE_json = [ { "entity1": "Engine", "entity2": "Fuel Pump", "relation_type": "Connects To" }, { "entity1": "Avionics", "entity2": "Electrical System", "relation_type": "Depends On" } ]
Runs Using Data From Real System Descriptions
Source Data: https://openmbee.atlassian.net/l/cp/c2Qmroq2
# | Orca2 Output |
---|---|
P1 | |
P2 | |
P3 | |
P4 | |
P5 | |
P6 | |
P7 | |
P8 | |
P9 | |
P10 |