Add answer relevance model cards

Browse files

Files changed (3) hide show

.gitignore +1 -0
answer_relevance_classifier/README.md +363 -0
answer_relevance_rewriter/README.md +380 -0

.gitignore CHANGED Viewed

@@ -2,3 +2,4 @@
 # No MacOS image cache files, please.
 **/.DS_Store

 # No MacOS image cache files, please.
 **/.DS_Store
+.venv/

answer_relevance_classifier/README.md ADDED Viewed

	@@ -0,0 +1,363 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: peft
+library_name: transformers
+---
+# Intrinsics for Answer Relevance Classifier
+## Model Summary
+This is a RAG-specific intrinsic for answer relevance classification task.
+The model takes as input a multi-turn conversation ending with assistant response,
+and provides a classification of whether the assistant's response is relevant to the
+user's final inquiry, as well as categorization of the relevance and reasoning for the conclusions.
+We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
+Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
+- **Developer:** IBM Research
+- **Model type:** LoRA and aLoRA adapter for
+  [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
+  [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Intended use
+This rag specific intrinsics is intended to be used to post-process the generated assistant response.
+- The binary classification of relevance can be used to determine if the assistance response is suitable
+to be given to the user, or a rewrite to a more relevant response is necessary.
+- The category and the analysis providing reasoning for the conclusion can be used to
+be incorporated into prompt for the answer relevance rewriter, indicating specific directions
+the rewrite must take to overcome the perceived deficiency in relevance.
+**Model input**: The input to the answer relevance classifier intrinsic is an
+OpenAI-compatible chat completion request, containing a list of conversation
+turns that can alternate between the `user` and `assistant` role and ending with
+a `assistant` turn.
+**Model output**: The output of the answer relevance classifier intrinsic is the result of the
+original chat completion request formatted as a JSON object of the following schema
+    {
+        answer_relevance_analysis: <Free text analysis of whether and in which ways the assistant response is relevant or not>
+        answer_relevance_category: <One of a set of labels>
+        answer_relevance_likelihood: <float between 0.0 and 1.0>
+    }
+The set of labels for `answer_relevance_category` are:
+    "Pertinent",
+    "Pertinent with relevant extra",
+    "Excessive unnecessary information",
+    "Unduly restrictive",
+    "Too vague or generic",
+    "Contextual misalignment",
+    "Misinterpreted inquiry",
+    "No attempt"
+Please see the code snippets in the Quickstart Example section below for
+examples that illustrate the intrinsic's input/output.
+## Quickstart Example
+To run the answer relevance classifier intrinsics through granite-common, you can either (a)
+use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
+Face transformers library. We provide below instructions for each of the two
+approaches. Note that running inference using vLLM or another scalable
+OpenAI-compatible inference backend should be significantly faster than using
+the Hugging Face transformers library directly.
+### Using an OpenAI-Compatible Inference Backend
+To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install vLLM:
+        pip install vllm
+4.  Download the intrinsics library:
+        hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
+5.  Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
+    using your favorite editor:
+    Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
+    base model on which the desired LoRA adapter has been trained. Optionally,
+    edit the constant `PORT` to change the port on which vLLM will run. Save the
+    modified file and exit the editor.
+6.  Start vLLM through the startup script. The first time you run the script,
+    you may have to change the permissions to allow execution:
+        cd rag-intrinsics-lib
+        chmod u+x ./run_vllm.sh
+        ./run_vllm.sh &
+7.  Run the following code snippet:
+        import json
+        import openai
+        import granite_common
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        # Change the following constants as needed to reflect the location of the vLLM server
+        # The selected port should be identical to the one you specified in the vLLM startup script
+        openai_base_url = "http://localhost:55555/v1"
+        openai_api_key = "rag_intrinsics_1234"
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add other parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Run inference
+        client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
+        chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
+        # Apply output processor
+        processed_chat_completion = result_processor.transform(
+            chat_completion, rewritten_request
+        )
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face transformers library directly,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install PEFT:
+        pip install peft
+4.  Install xgrammar:
+        pip install xgrammar
+5.  Run the following code snippet:
+        import json
+        import granite_common.util
+        import peft
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        use_cuda = True  # Set to False to use default PyTorch device for this machine + model
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Fetch LoRA directory from Hugging Face Hub
+        lora_dir = granite_common.intrinsics.util.obtain_lora(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add additional parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Load the base model and merge LoRA weights
+        model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
+        if use_cuda:
+            model = model.cuda()
+        # Convert the chat completion request into a the Transformers library's proprietary
+        # format.
+        generate_input, other_input = (
+            granite_common.util.chat_completion_request_to_transformers_inputs(
+                rewritten_request,
+                tokenizer,
+                model,
+            )
+        )
+        # Use the Transformers library's APIs to generate one or more completions,
+        # then convert those completions into OpenAI-compatible chat completion
+        responses = granite_common.util.generate_with_transformers(
+            tokenizer, model, generate_input, other_input
+        )
+        # Apply output processor
+        transformed_responses = result_processor.transform(responses, rewritten_request)
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(transformed_responses.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+## Training Details
+### Training Data
+The training data is created in the following process
+1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
+2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
+3. Produce answer_relevance_classifier target output using mixtral-large with prompts with in-context examples.
+The conversation created in steps 1 and 2 are taken as training input.  The json string from step 3
+is taken as train target output.
+#### Training Hyperparameters
+The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
+32, learning rate = 3.0e-06, number of epochs = 50.
+## Evaluation
+### Answer Relevance Classifier
+We evaluated the model on test data set generated by the same procedure as the training process,
+using GPT-4o as judge.
+The following table presents results comparing baselines and frontier models
+on the answer relevance classification task. The LoRAs perform on par with frontier models
+of much larger size and outperforms frontier models of comparable size.
+|                         | Not relevant        |        |       | Relevant       |        |       |
+|:------------------------|:----------|:-------|:------|:----------|:-------|:------|
+|                         | precision | recall | f1    | precision | recall | f1    |
+| mixtral-8x22b-v0.1      | 0.934     | 0.592  | 0.725 | 0.886     | 0.880  | 0.883 |
+| llama-3.3-70b           | 0.895     | 0.829  | 0.861 | 0.898     | 0.939  | 0.918 |
+| gpt-oss-20b             | 0.747     | 0.745  | 0.746 | 0.969     | 0.782  | 0.865 |
+| gpt-4o                  | 0.775     | 0.945  | 0.852 | 0.974     | 0.690  | 0.808 |
+| gpt-4o-mini             | 0.818     | 0.921  | 0.866 | 0.948     | 0.872  | 0.908 |
+|                         |           |        |       |           |        |       |
+| granite-3.3-2b/lora     | 0.743     | 0.861  | 0.798 | 0.909     | 0.806  | 0.855 |
+| granite-3.3-2b/alora    | 0.761     | 0.821  | 0.790 | 0.894     | 0.833  | 0.862 |
+| granite-3.3-8b/lora     | 0.783     | 0.900  | 0.837 | 0.931     | 0.842  | 0.884 |
+| granite-3.3-8b/alora    | 0.793     | 0.879  | 0.834 | 0.919     | 0.856  | 0.886 |
+### Comparing the Answer Relevance Classifier Intrinsics vs. Vanilla Granite Models
+We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
+vs. answer relevance classifier intrinsics implemented as LoRA adapters.
+It is seen that the LoRAs significantly out perform the base models.
+|                         | Not relevant        |        |       | Relevant       |        |       |
+|:------------------------|:----------|:-------|:------|:----------|:-------|:------|
+|                         | precision | recall | f1    | precision | recall | f1    |
+| granite-3.3-2b          |           |        |       |           |        |       |
+| granite-3.3-2b/lora     | 0.743     | 0.861  | 0.798 | 0.909     | 0.806  | 0.855 |
+| granite-3.3-2b/alora    | 0.761     | 0.821  | 0.790 | 0.894     | 0.833  | 0.862 |
+|                         |           |        |       |           |        |       |
+| granite-3.3-8b          | 0.798     | 0.542  | 0.646 | 0.813     | 0.770  | 0.791 |
+| granite-3.3-8b/lora     | 0.783     | 0.900  | 0.837 | 0.931     | 0.842  | 0.884 |
+| granite-3.3-8b/alora    | 0.793     | 0.879  | 0.834 | 0.919     | 0.856  | 0.886 |
+## Model Card Authors
+[Huaiyu Zhu](mailto:[email protected])
+### Framework versions
+- PEFT 0.14.0

answer_relevance_rewriter/README.md ADDED Viewed

	@@ -0,0 +1,380 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: peft
+library_name: transformers
+---
+# Intrinsics for Answer Relevance Rewriter
+## Model Summary
+This is a RAG-specific intrinsic for answer relevance rewrite task.
+The model takes as input the chat completion from answer relevance classifier output
+consisting of conversation as well as answer_relevance_classification, together with grounding documents,
+and provides a rewritten assistant response that is more relevant to the user's final inquiry.
+We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
+Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
+- **Developer:** IBM Research
+- **Model type:** LoRA and aLoRA adapter for
+  [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
+  [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Intended use
+This rag specific intrinsics is intended to be used to post-process the generated assistant response.
+It should be used following the answer relevance classifier intrinsic, and should be applied to
+the cases where the `answer_relevance_likelihood` is below a certain threshold according to application criteria.
+For cases where the assistant answer is deemed not relevant (where `answer_relevance_likelihood` is below a
+given threshold), the answer relevance rewriter intrinsic can be used to rewrite the assistant response
+into a more relevant response.   It takes as input the chat completion
+from answer relevance classifier output and the grounding documents.  Its output is of the form
+    {
+        answer_relevance_rewrite: <Rewritten response>
+    }
+The rewriter is instructed to only correct deficiencies in relevance as identified by the classifier,
+and ensure the rewritten response is grounded in the conversation and given documents.
+**Model input**: The input to the answer relevance rewriter intrinsic is an
+OpenAI-compatible chat completion request, containing a list of conversation
+turns that can alternate between the `user` and `assistant` role and ending with
+a `assistant` turn, plus two additional turns:
+- A conversation between user and assistant ending with assistant response
+- An additional user turn with content "answer_relevance"
+**Model output**: The output of the answer relevance rewriter intrinsic is the result of the
+original chat completion request formatted as a JSON object of the following schema
+    {
+        answer_relevance_rewrite: <Rewritten response>
+    }
+Please see the code snippets in the Quickstart Example section below for
+examples that illustrate the intrinsic's input/output.
+## Quickstart Example
+To run the answer relevance rewriter intrinsics through granite-common, you can either (a)
+use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
+Face transformers library. We provide below instructions for each of the two
+approaches. Note that running inference using vLLM or another scalable
+OpenAI-compatible inference backend should be significantly faster than using
+the Hugging Face transformers library directly.
+### Using an OpenAI-Compatible Inference Backend
+To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install vLLM:
+        pip install vllm
+4.  Download the intrinsics library:
+        hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
+5.  Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
+    using your favorite editor:
+    Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
+    base model on which the desired LoRA adapter has been trained. Optionally,
+    edit the constant `PORT` to change the port on which vLLM will run. Save the
+    modified file and exit the editor.
+6.  Start vLLM through the startup script. The first time you run the script,
+    you may have to change the permissions to allow execution:
+        cd rag-intrinsics-lib
+        chmod u+x ./run_vllm.sh
+        ./run_vllm.sh &
+7.  Run the following code snippet:
+        import json
+        import openai
+        import granite_common
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        # Change the following constants as needed to reflect the location of the vLLM server
+        # The selected port should be identical to the one you specified in the vLLM startup script
+        openai_base_url = "http://localhost:55555/v1"
+        openai_api_key = "rag_intrinsics_1234"
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                },
+                {
+                "role": "user",
+                "content": "answer_relevance"
+                },
+                {
+                "role": "assistant",
+                "content": "{\"answer_relevance_analysis\": \"The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.\", \"answer_relevance_category\": \"No attempt\", \"answer_relevance_likelihood\": 0.0}"
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add other parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Run inference
+        client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
+        chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
+        # Apply output processor
+        processed_chat_completion = result_processor.transform(
+            chat_completion, rewritten_request
+        )
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face transformers library directly,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install PEFT:
+        pip install peft
+4.  Install xgrammar:
+        pip install xgrammar
+5.  Run the following code snippet:
+        import json
+        import granite_common.util
+        import peft
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        use_cuda = True  # Set to False to use default PyTorch device for this machine + model
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Fetch LoRA directory from Hugging Face Hub
+        lora_dir = granite_common.intrinsics.util.obtain_lora(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                },
+                {
+                "role": "user",
+                "content": "answer_relevance"
+                },
+                {
+                "role": "assistant",
+                "content": "{\"answer_relevance_analysis\": \"The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.\", \"answer_relevance_category\": \"No attempt\", \"answer_relevance_likelihood\": 0.0}"
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add additional parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Load the base model and merge LoRA weights
+        model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
+        if use_cuda:
+            model = model.cuda()
+        # Convert the chat completion request into a the Transformers library's proprietary
+        # format.
+        generate_input, other_input = (
+            granite_common.util.chat_completion_request_to_transformers_inputs(
+                rewritten_request,
+                tokenizer,
+                model,
+            )
+        )
+        # Use the Transformers library's APIs to generate one or more completions,
+        # then convert those completions into OpenAI-compatible chat completion
+        responses = granite_common.util.generate_with_transformers(
+            tokenizer, model, generate_input, other_input
+        )
+        # Apply output processor
+        transformed_responses = result_processor.transform(responses, rewritten_request)
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(transformed_responses.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+## Training Details
+### Training Data
+The training data is created in the following process
+1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
+2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
+3. Produce answer_relevance_rewriter target output using mixtral-large with prompts with in-context examples.
+The conversation created in steps 1 and 2 are taken as training input.  The json string from step 3
+is taken as train target output.
+#### Training Hyperparameters
+The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
+32, learning rate = 1.0e-04, number of epochs = 5.
+## Evaluation
+### Answer Relevance Rewriter
+We evaluated the model on test data set generated by the same procedure as the training process,
+using GPT-4o as judge.
+The following table presents results comparing baselines and frontier models
+on the answer relevance rewrite task. The data sets consists of those classified as irrelevant by
+mixtral-large.  The evaluations are first divided into two parts, those that are truly irrelevant,
+for which we measure the rate of rewrite becoming relevant, and those that are false irrelevant,
+for which we measure the rate of rewrite becoming irrelevant.  Then the overall rate flipping
+irrelevant to relevant and flipping relevant to irrelevant are calculated, and well as the net gain
+of relevance and resulting final relevance.
+The LoRAs out perform the best of frontier models
+|                      | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result <br>relevance |
+|:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
+| mixtral-8x22b-v0.1   | 0.416         | 0.101    | 0.286                         | 0.032    | 0.254    | 0.566         |
+| llama-3.3-70b        | 0.804         | 0.041    | 0.554                         | 0.013    | 0.541    | 0.853         |
+| gpt-oss-20b          | 0.902         | 0.034    | 0.621                         | 0.011    | 0.610    | 0.922         |
+| gpt-4o               | 0.960         | 0.014    | 0.661                         | 0.004    | 0.657    | 0.968         |
+| gpt-4o-mini          | 0.758         | 0.027    | 0.522                         | 0.008    | 0.514    | 0.825         |
+|                      |               |          |                               |          |          |               |
+| granite-3.3-2b/lora  | 0.972         | 0.027    | 0.669                         | 0.008    | 0.661    | 0.973         |
+| granite-3.3-2b/alora | 0.972         | 0.007    | 0.669                         | 0.002    | 0.667    | 0.979         |
+| granite-3.3-8b/lora  | 0.969         | 0.014    | 0.667                         | 0.004    | 0.663    | 0.975         |
+| granite-3.3-8b/alora | 0.966         | 0.027    | 0.665                         | 0.008    | 0.657    | 0.968         |
+|                      |               |          |                               |          |          |               |
+### Comparing the Answer Relevance Rewriter Intrinsics vs. Vanilla Granite Models
+We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
+vs. answer relevance rewriter intrinsics implemented as LoRA adapters.
+It is seen that the LoRAs significantly out perform the base models.
+|                      | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result relevance |
+|:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
+| granite-3.3-2b       | 0.346         | 0.169    | 0.238                         | 0.053    | 0.185    | 0.497         |
+| granite-3.3-2b/lora  | 0.972         | 0.027    | 0.669                         | 0.008    | 0.661    | 0.973         |
+| granite-3.3-2b/alora | 0.972         | 0.007    | 0.669                         | 0.002    | 0.667    | 0.979         |
+|                      |               |          |                               |          |          |               |
+| granite-3.3-8b       | 0.266         | 0.277    | 0.183                         | 0.086    | 0.097    | 0.408         |
+| granite-3.3-8b/lora  | 0.969         | 0.014    | 0.667                         | 0.004    | 0.663    | 0.975         |
+| granite-3.3-8b/alora | 0.966         | 0.027    | 0.665                         | 0.008    | 0.657    | 0.968         |
+|                      |               |          |                               |          |          |               |
+## Model Card Authors
+[Huaiyu Zhu](mailto:[email protected])
+### Framework versions
+- PEFT 0.14.0