# Code generation with RAG and self-correction

AlphaCodium presented an approach for code generation that uses control flow.

Main idea: [construct an answer to a coding question iteratively.](https://x.com/karpathy/status/1748043513156272416?s=20). 

[AlphaCodium](https://github.com/Codium-ai/AlphaCodium) iteravely tests and improves an answer on public and AI-generated tests for a particular question. 

We will implement some of these ideas from scratch using [LangGraph](https://langchain-ai.github.io/langgraph/):

1. We start with a set of documentation specified by a user
2. We use a long context LLM to ingest it and perform RAG to answer a question based upon it
3. We will invoke a tool to produce a structured output
4. We will perform two unit tests (check imports and code execution) prior returning the solution to the user 

![Screenshot 2024-05-23 at 2.17.42 PM.png](attachment:67b615fe-0c25-4410-9d58-835982547001.png)

In [6]:
! pip install -U langchain_community langchain-openai langchain-anthropic langchain langgraph bs4

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


## Docs

Load [LangChain Expression Language](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel) (LCEL) docs as an example.

In [7]:
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader

# LCEL docs
url = "https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
 url=url, max_depth=20, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

# Sort the list based on the URLs and get the text
d_sorted = sorted(docs, key=lambda x: x.metadata["source"])
d_reversed = list(reversed(d_sorted))
concatenated_content = "\n\n\n --- \n\n\n".join(
 [doc.page_content for doc in d_reversed]
)

## LLMs

### Code solution

Try OpenAI and [Claude3](https://docs.anthropic.com/en/docs/about-claude/models) with function calling.

Create `code_gen_chain` w/ either OpenAI or Claude and test here.

In [8]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

### OpenAI

# Grader prompt
code_gen_prompt = ChatPromptTemplate.from_messages(
 [
 (
 "system",
 """You are a coding assistant with expertise in LCEL, LangChain expression language. \n 
 Here is a full set of LCEL documentation: \n ------- \n {context} \n ------- \n Answer the user 
 question based on the above provided documentation. Ensure any code you provide can be executed \n 
 with all required imports and variables defined. Structure your answer with a description of the code solution. \n
 Then list the imports. And finally list the functioning code block. Here is the user question:""",
 ),
 ("placeholder", "{messages}"),
 ]
)


# Data model
class code(BaseModel):
 """Code output"""

 prefix: str = Field(description="Description of the problem and approach")
 imports: str = Field(description="Code block import statements")
 code: str = Field(description="Code block not including import statements")
 description = "Schema for code solutions to questions about LCEL."


expt_llm = "gpt-4-0125-preview"
llm = ChatOpenAI(temperature=0, model=expt_llm)
code_gen_chain = code_gen_prompt | llm.with_structured_output(code)
question = "How do I build a RAG chain in LCEL?"
#solution = code_gen_chain_oai.invoke({"context":concatenated_content,"messages":[("user",question)]})

In [9]:
# Test
question = "How do I build a RAG chain in LCEL?"
solution = code_gen_chain.invoke(
 {"context": concatenated_content, "messages": [("user", question)]}
)
solution

code(prefix='Build a RAG Chain in LCEL', imports='from langchain import LCEL\nfrom langchain.retrievers import YourRetriever\nfrom langchain.llms import YourLLM\nfrom langchain.output_parsers import YourOutputParser', code='# Define your retriever\nretriever = YourRetriever(...)\n\n# Define your LLM\nllm = YourLLM(...)\n\n# Define your output parser (optional)\noutput_parser = YourOutputParser(...)\n\n# Build the RAG chain\nrag_chain = LCEL.chain(retriever | llm | output_parser)\n\n# Example usage\nresult = rag_chain.invoke("Your query here")\nprint(result)', description="This code snippet demonstrates how to build a Retrieval Augmented Generation (RAG) chain using the LangChain Expression Language (LCEL). The process involves defining a retriever, a language model (LLM), and optionally an output parser. These components are then chained together using the `LCEL.chain` method to create the RAG chain. The `invoke` method is used to execute the chain with a query, and the result is print

## State 

Our state is a dict that will contain keys (errors, question, code generation) relevant to code generation.

In [10]:
from typing import List, TypedDict


class GraphState(TypedDict):
 """
 Represents the state of our graph.

 Attributes:
 error : Binary flag for control flow to indicate whether test error was tripped
 messages : With user question, error messages, reasoning
 generation : Code solution
 iterations : Number of tries
 """

 error: str
 messages: List
 generation: str
 iterations: int

## Graph 

Our graph lays out the logical flow shown in the figure above.

In [11]:
from langchain_core.pydantic_v1 import BaseModel, Field

### Parameter

# Max tries
max_iterations = 3
# Reflect
# flag = 'reflect'
flag = "do not reflect"

### Nodes


def generate(state: GraphState):
 """
 Generate a code solution

 Args:
 state (dict): The current graph state

 Returns:
 state (dict): New key added to state, generation
 """

 print("---GENERATING CODE SOLUTION---")

 # State
 messages = state["messages"]
 iterations = state["iterations"]
 error = state["error"]

 # We have been routed back to generation with an error
 if error == "yes":
 messages += [
 (
 "user",
 "Now, try again. Invoke the code tool to structure the output with a prefix, imports, and code block:",
 )
 ]

 # Solution
 code_solution = code_gen_chain.invoke(
 {"context": concatenated_content, "messages": messages}
 )
 messages += [
 (
 "assistant",
 f"{code_solution.prefix} \n Imports: {code_solution.imports} \n Code: {code_solution.code}",
 )
 ]

 # Increment
 iterations = iterations + 1
 return {"generation": code_solution, "messages": messages, "iterations": iterations}


def code_check(state: GraphState):
 """
 Check code

 Args:
 state (dict): The current graph state

 Returns:
 state (dict): New key added to state, error
 """

 print("---CHECKING CODE---")

 # State
 messages = state["messages"]
 code_solution = state["generation"]
 iterations = state["iterations"]

 # Get solution components
 imports = code_solution.imports
 code = code_solution.code

 # Check imports
 try:
 exec(imports)
 except Exception as e:
 print("---CODE IMPORT CHECK: FAILED---")
 error_message = [("user", f"Your solution failed the import test: {e}")]
 messages += error_message
 return {
 "generation": code_solution,
 "messages": messages,
 "iterations": iterations,
 "error": "yes",
 }

 # Check execution
 try:
 exec(imports + "\n" + code)
 except Exception as e:
 print("---CODE BLOCK CHECK: FAILED---")
 error_message = [("user", f"Your solution failed the code execution test: {e}")]
 messages += error_message
 return {
 "generation": code_solution,
 "messages": messages,
 "iterations": iterations,
 "error": "yes",
 }

 # No errors
 print("---NO CODE TEST FAILURES---")
 return {
 "generation": code_solution,
 "messages": messages,
 "iterations": iterations,
 "error": "no",
 }


def reflect(state: GraphState):
 """
 Reflect on errors

 Args:
 state (dict): The current graph state

 Returns:
 state (dict): New key added to state, generation
 """

 print("---GENERATING CODE SOLUTION---")

 # State
 messages = state["messages"]
 iterations = state["iterations"]
 code_solution = state["generation"]

 # Prompt reflection

 # Add reflection
 reflections = code_gen_chain.invoke(
 {"context": concatenated_content, "messages": messages}
 )
 messages += [("assistant", f"Here are reflections on the error: {reflections}")]
 return {"generation": code_solution, "messages": messages, "iterations": iterations}


### Edges


def decide_to_finish(state: GraphState):
 """
 Determines whether to finish.

 Args:
 state (dict): The current graph state

 Returns:
 str: Next node to call
 """
 error = state["error"]
 iterations = state["iterations"]

 if error == "no" or iterations == max_iterations:
 print("---DECISION: FINISH---")
 return "end"
 else:
 print("---DECISION: RE-TRY SOLUTION---")
 if flag == "reflect":
 return "reflect"
 else:
 return "generate"

In [12]:
from langgraph.graph import END, StateGraph, START

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("generate", generate) # generation solution
workflow.add_node("check_code", code_check) # check code
workflow.add_node("reflect", reflect) # reflect

# Build graph
workflow.add_edge(START, "generate")
workflow.add_edge("generate", "check_code")
workflow.add_conditional_edges(
 "check_code",
 decide_to_finish,
 {
 "end": END,
 "reflect": "reflect",
 "generate": "generate",
 },
)
workflow.add_edge("reflect", "generate")
app = workflow.compile()

In [13]:
question = "How can I directly pass a string to a runnable and use it to construct the input needed for my prompt?"
app.invoke({"messages": [("user", question)], "iterations": 0})

---GENERATING CODE SOLUTION---
---CHECKING CODE---
---CODE IMPORT CHECK: FAILED---
---DECISION: RE-TRY SOLUTION---
---GENERATING CODE SOLUTION---
---CHECKING CODE---
---CODE IMPORT CHECK: FAILED---
---DECISION: RE-TRY SOLUTION---
---GENERATING CODE SOLUTION---
---CHECKING CODE---
---CODE BLOCK CHECK: FAILED---
---DECISION: FINISH---


{'error': 'yes',
 'messages': [('user',
 'How can I directly pass a string to a runnable and use it to construct the input needed for my prompt?'),
 ('assistant',
 'Passing a string directly to a runnable in LCEL \n Imports: from langchain_core.prompts import PromptTemplate\nfrom langchain_core import Runnable \n Code: # Define a custom runnable class that accepts a string and constructs the input for a prompt\nclass StringInputRunnable(Runnable):\n def __init__(self, prompt_template):\n self.prompt_template = prompt_template\n\n async def invoke(self, input_data):\n # Assuming input_data is a string, use it to construct the prompt input\n prompt_input = {\'text\': input_data}\n # Generate the prompt using the provided template\n generated_prompt = self.prompt_template.invoke(prompt_input)\n return generated_prompt\n\n# Example usage\nprompt_template = PromptTemplate.from_template("Your prompt here with {text}")\nstring_input_runnable = StringInputRunnable(prompt_template)\n\n# Example

## Eval

[Here](https://smith.langchain.com/public/326674a6-62bd-462d-88ae-eea49d503f9d/d) is a public dataset of LCEL questions. 

I saved this as `test-LCEL-code-gen`.

You can also find the csv [here](https://github.com/langchain-ai/lcel-teacher/blob/main/eval/eval.csv).

In [14]:
import langsmith

client = langsmith.Client()

LangSmithUserError: API key must be provided when using hosted LangSmith API

In [None]:
# Clone the dataset to your tenant to use it
public_dataset = (
 "https://smith.langchain.com/public/326674a6-62bd-462d-88ae-eea49d503f9d/d"
)
client.clone_public_dataset(public_dataset)

Custom evals.

In [None]:
from langsmith.schemas import Example, Run


def check_import(run: Run, example: Example) -> dict:
 imports = run.outputs.get("imports")
 try:
 exec(imports)
 return {"key": "import_check", "score": 1}
 except Exception:
 return {"key": "import_check", "score": 0}


def check_execution(run: Run, example: Example) -> dict:
 imports = run.outputs.get("imports")
 code = run.outputs.get("code")
 try:
 exec(imports + "\n" + code)
 return {"key": "code_execution_check", "score": 1}
 except Exception:
 return {"key": "code_execution_check", "score": 0}

Compare LangGraph to Context Stuffing.

In [None]:
def predict_base_case(example: dict):
 """Context stuffing"""
 solution = code_gen_chain.invoke(
 {"context": concatenated_content, "messages": [("user", example["question"])]}
 )
 solution_structured = code_gen_chain.invoke([("code", solution)])
 return {"imports": solution_structured.imports, "code": solution_structured.code}


def predict_langgraph(example: dict):
 """LangGraph"""
 graph = app.invoke({"messages": [("user", example["question"])], "iterations": 0})
 solution = graph["generation"]
 return {"imports": solution.imports, "code": solution.code}

In [None]:
from langsmith.evaluation import evaluate

# Evaluator
code_evalulator = [check_import, check_execution]

# Dataset
dataset_name = "test-LCEL-code-gen"

In [None]:
# Run base case
experiment_results_ = evaluate(
 predict_base_case,
 data=dataset_name,
 evaluators=code_evalulator,
 experiment_prefix=f"test-without-langgraph-{expt_llm}",
 max_concurrency=2,
 metadata={
 "llm": expt_llm,
 },
)

In [None]:
# Run with langgraph
experiment_results = evaluate(
 predict_langgraph,
 data=dataset_name,
 evaluators=code_evalulator,
 experiment_prefix=f"test-with-langgraph-{expt_llm}-{flag}",
 max_concurrency=2,
 metadata={
 "llm": expt_llm,
 "feedback": flag,
 },
)

`Results:`

* `LangGraph outperforms base case`: adding re-try loop improve performance
* `Reflection did not help`: reflection prior to re-try regression vs just passing errors directly back to the LLM
* `GPT-4 outperforms Claude3`: Claude3 had 3 and 1 run fail due to tool-use error for Opus and Haiku, respectively

https://smith.langchain.com/public/78a3d858-c811-4e46-91cb-0f10ef56260b/d