{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Welcome to Lab 3 for Week 1 Day 4\n",
    "\n",
    "Today we're going to build something with immediate value!\n",
    "\n",
    "In the folder `me` I've put a single file `linkedin.pdf` - it's a PDF download of my LinkedIn profile.\n",
    "\n",
    "Please replace it with yours!\n",
    "\n",
    "I've also made a file called `summary.txt`\n",
    "\n",
    "We're not going to use Tools just yet - we're going to add the tool tomorrow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table style=\"margin: 0; text-align: left; width:100%\">\n",
    "    <tr>\n",
    "        <td style=\"width: 150px; height: 150px; vertical-align: middle;\">\n",
    "            <img src=\"../assets/tools.png\" width=\"150\" height=\"150\" style=\"display: block;\" />\n",
    "        </td>\n",
    "        <td>\n",
    "            <h2 style=\"color:#00bfff;\">Looking up packages</h2>\n",
    "            <span style=\"color:#00bfff;\">In this lab, we're going to use the wonderful Gradio package for building quick UIs, \n",
    "            and we're also going to use the popular PyPDF2 PDF reader. You can get guides to these packages by asking \n",
    "            ChatGPT or Claude, and you find all open-source packages on the repository <a href=\"https://pypi.org\">https://pypi.org</a>.\n",
    "            </span>\n",
    "        </td>\n",
    "    </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you don't know what any of these packages do - you can always ask ChatGPT for a guide!\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "from openai import OpenAI\n",
    "from pypdf import PdfReader\n",
    "import gradio as gr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv(override=True)\n",
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Ignoring wrong pointing object 6 0 (offset 0)\n",
      "Ignoring wrong pointing object 8 0 (offset 0)\n",
      "Ignoring wrong pointing object 10 0 (offset 0)\n",
      "Ignoring wrong pointing object 13 0 (offset 0)\n",
      "Ignoring wrong pointing object 22 0 (offset 0)\n",
      "Ignoring wrong pointing object 23 0 (offset 0)\n"
     ]
    }
   ],
   "source": [
    "reader = PdfReader(\"me/Arnav_Agrawal_Resume_2025.pdf\")\n",
    "linkedin = \"\"\n",
    "for page in reader.pages:\n",
    "    text = page.extract_text()\n",
    "    if text:\n",
    "        linkedin += text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Arnav\tAgrawal\t\n",
      "Jersey\tCity,\tNJ\t|\tEmail\t|\tLinkedIn\t|\t+1\t908-525-6248\t\n",
      "OBJECTIVE\t\n",
      "ProBicient\tanalytics\tprofessional\tfocused\ton\tdelivering\tbusiness\timpact\tthrough\tend-to-end\tdata\tinitiatives.\tSkilled\tin\t\n",
      "building\tinnovative,\tinsight-driven\tsolutions\tthat\tachieve\tmeasurable\tresults.\t\n",
      "EDUCATION\t \t\n",
      "Stevens\tInstitute\tof\tTechnology,\tHoboken,\tNJ\t\t \t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tDecember\t2022\t\n",
      "Master\tof\tScience,\tData\tScience\t \t\n",
      "GPA:\t3.87\tRelated\tCourses:\tOptimization\tMethods,\tWeb\tMining,\tDeep\tLearning,\tApplied\tMachine\tLearning,\tStatistical\t\n",
      "Methods,\tTime\tSeries\tAnalysis,\tPattern\tRecognition\t&\tClassiBication,\tGraph\tTheory\t\n",
      "Manipal\tInstitute\tof\tTechnology,\tManipal,\tIndia\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tJ u l y \t2021\t\n",
      "Bachelor\tof\tTechnology,\tElectronics\tand\tCommunication\tEngineering,\tMinor\tin\tComputational\tMathematics\t\n",
      "CGPA:\t8.83\tRelated\tCourses:\tComputer\tVision,\tLinux\tand\tShell\tScripting,\tSQL,\tand\tDatabase\tManagement\t\n",
      "SKILLS\t \t\n",
      "• Language\t&\tTools:\tPython,\tSQL,\tC++,\tC#,\tR,\tSpark,\tTableau,\tMS\tOf@ice,\tHTML/CSS\t\n",
      "• Frameworks\t&\tPlatforms:\tGit,\tHuggingFace\tTransformers,\tLangChain,\tCI/CD\tPipeline\t\n",
      "• Techniques:\tGenerative\tAI\t(LLMs,\tPrompt\tEngineering,\tAgentic\tAI),\tNLP ,\tDeep\tLearning,\tMachine\tLearning,\tData\t\n",
      "Analytics,\tFeature\tEngineering,\tModel\tEvaluation\t\n",
      "• CertiQications:\tNVIDIA\tCerti@ied\tAssociate:\tGenerative\tAI\tLLMs,\tIBM\tData\tScience\tSpecialization\t\n",
      "EXPERIENCE\t \t\n",
      "EXL\tService\tAnalytics,\tNew\tYork\tCity,\tUSA\t(Banking\t&\tFinancial\tAnalytics)\t\t \t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\tMay\t2022\t–\tPresent\t\n",
      "Data\tScientist\t(Consultant\tto\tLeading\tU.S.\tBank)\t\n",
      "Suspicious\tActivity\tReport\t(SAR)\tReview\tProcess\tAutomation\t\n",
      "• Streamlined\tcase\treviews\tby\tautomating\tworkBlows,\treducing\tmanual\teffort\tand\taccelerating\tturnaround\ttime\t\n",
      "• Implemented\ta\t.NET\tCore\tUI\tand\tbackend\tsystem\tto\tautomate\tdata\taggregation\tvia\tSQL\tand\tAPIs,\tdelivered\t\n",
      "using\tAgile\tpractices\tand\tCI/CD\tpipelines\t\n",
      "• Elevated\tworkBlow\tefBiciency\tand\tlaid\tthe\tfoundation\tfor\tGenAI\tintegration\tto\tauto-generate\tCase\tNarratives\t\n",
      "and\tDigests,\tenabling\tfaster,\tintelligence-driven\tdecision-making\t\n",
      "• Spearheaded\ta\t7-member\tteam,\toverseeing\tdelivery,\tstakeholder\talignment,\tand\tongoing\tfeature\tplanning\t\n",
      "Conversational\tBI\t\n",
      "• Conceptualized\ta\tnatural\tlanguage\tinterface\tto\tdemocratize\tcredit\tcard\tdata\tinsights\tfor\tbusiness\tusers\t\n",
      "• Designed\tan\tNLP-powered\tchatbot\twith\ttext-to-SQL\tcapabilities\tfor\treal-time\tdata\tquerying\tand\texploration\t\n",
      "• Delivered\tdynamic\tinsights,\tcharts,\ttables,\tand\tSQL\toutputs\tvia\tconversational\tinputs,\tboosting\tself-serve\t\n",
      "analytics\tadoption\t\n",
      "• Drove\tkey\tsolution\tphases—from\tarchitecture\tand\tprototyping\tto\tdeployment\tand\tstakeholder\tdemonstrations\t\n",
      "Risk\tDecisioning-as-a-Service\t(RDaaS)\t\n",
      "• Architected\ta\tdigital\tlending\tsolution\tto\thelp\tbanks\toptimize\tapproval\trates\twhile\tmaintaining\trisk\tthresholds\t\n",
      "• Orchestrated\ta\tscalable\tML\tpipeline\tfor\tdata\tingestion,\tmodeling,\tand\texplainability\ton\tcloud\tinfrastructure\t\n",
      "• Accelerated\ttime-to-decision\tby\t4–6x\tand\tenabled\tup\tto\t40%\tincrease\tin\tvolume\twith\tno\tadded\tcredit\texposure\t\n",
      "• Presented\toutcomes\tand\tplatform\tcapabilities\tto\tsenior\tleadership\tto\tdrive\tadoption\tand\tstrategic\talignment\t\n",
      "Virtuous\tTransactional\tAnalytics,\tNoida,\tIndia\t(Healthcare\tAnalytics)\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tJanuary\t2021\t–\tJuly\t2021\t\n",
      "Data\tScience\tIntern\t-\tAutomation\tof\tcase-intake\tfor\tclassifying\tDrug\tSafety\tInformation\t(Pharmacovigilance)\t\n",
      "• Administered\tNLP\ttechniques\t(NER,\tcoreference,\trelation\textraction)\tto\tbiomedical\tliterature,\tachieving\t90%\t\n",
      "accuracy\tin\tdrug\tsafety\tcase\tintake\t\n",
      "• Deployed\ta\t99%-accurate\tspam\tBilter\tand\tUI\tto\tautomate\tgeneration\tof\tIndividual\tCase\tSafety\tReports\t\n",
      "Solytics\tPartners,\tPune,\tIndia\t(Banking\t&\tFinancial\tAnalytics)\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tMay\t2020\t–\tAugust\t2020\t\n",
      "Data\tScience\tIntern\t-\tAnti-Money\tLaundering\t(AML)\t\n",
      "• Developed\tML\tmodels\tfor\tfraud\tdetection\tand\ta\tcredit\trisk\tscorecard\tusing\tWOE/IV\tfor\taccurate\tcredit\tscoring\t\n",
      "• Created\ta\tweb-based\tAutoML\tinterface\tthat\tenabled\tmodel\tconBiguration\tand\toutput\tvisualization\t\n",
      "PROJECTS\t\n",
      "Post-OCR\tCorrection\t\t \t \t \t\t\t\t\t\t\t \t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tJanuary\t2022\t–\tMay\t2022\t\n",
      "• Implemented\ta\tBERT-based\tcontext-aware\terror\tcorrection\tsystem\tusing\tHuggingFace\tNeuspell;\timproved\tOCR\t\n",
      "output\taccuracy\tto\t97%\t\n",
      "Ted\tTalk\tRecommender\tSystem\t\t \t \t \t \t\t\t\t\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tSeptember\t2021\t–\tDecember\t2021\t\n",
      "• Built\ta\tcontent-based\trecommender\tusing\ttranscript\tsentiment\tanalysis\tand\ttopic\tmodeling;\tachieved\t83%\t\n",
      "recommendation\taccuracy\t\n",
      "Smart\tFarming\tusing\tConvolutional\tNeural\tNetworks\t\t\t \t \t \t\t\t\t\t\t\t\t\t\t\t\t\t\tMarch\t2019\t–\tOctober\t2019\t\n",
      "• Engineered\tan\tArduino-powered\tdevice\twith\tCNNs\tfor\treal-time\tcrop\thealth\tmonitoring;\treached\t96%\taccuracy\t\n",
      "using\tlow-cost\tsensors\n"
     ]
    }
   ],
   "source": [
    "print(linkedin)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"me/summary.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    summary = f.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "name = \"Arnav Agrawal\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = f\"You are acting as {name}. You are answering questions on {name}'s website, \\\n",
    "particularly questions related to {name}'s career, background, skills and experience. \\\n",
    "Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \\\n",
    "You are given a summary of {name}'s background and LinkedIn profile which you can use to answer questions. \\\n",
    "Be professional and engaging, as if talking to a potential client or future employer who came across the website. \\\n",
    "If you don't know the answer, say so.\"\n",
    "\n",
    "system_prompt += f\"\\n\\n## Summary:\\n{summary}\\n\\n## LinkedIn Profile:\\n{linkedin}\\n\\n\"\n",
    "system_prompt += f\"With this context, please chat with the user, always staying in character as {name}.\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"You are acting as Arnav Agrawal. You are answering questions on Arnav Agrawal's website, particularly questions related to Arnav Agrawal's career, background, skills and experience. Your responsibility is to represent Arnav Agrawal for interactions on the website as faithfully as possible. You are given a summary of Arnav Agrawal's background and LinkedIn profile which you can use to answer questions. Be professional and engaging, as if talking to a potential client or future employer who came across the website. If you don't know the answer, say so.\\n\\n## Summary:\\nMy name is Arnav Agrawal. I'm an data scientist, business analyst. I'm originally from India, but I moved to NYC in 2021.\\nA highly motivated, diligent, focused and adaptable individual. A Data Science Enthusiast with experience in Data Analytics, Machine Learning, Natural Language Processing and Web Development. I have worked in Financial and Healthcare sector. I have graduated from Stevens Institute of Technology by completing Master’s in Data Science. I would like to connect with different people to know moreabout the industry world and how to make a career in it.\\n\\n## LinkedIn Profile:\\nArnav\\tAgrawal\\t\\nJersey\\tCity,\\tNJ\\t|\\tEmail\\t|\\tLinkedIn\\t|\\t+1\\t908-525-6248\\t\\nOBJECTIVE\\t\\nProBicient\\tanalytics\\tprofessional\\tfocused\\ton\\tdelivering\\tbusiness\\timpact\\tthrough\\tend-to-end\\tdata\\tinitiatives.\\tSkilled\\tin\\t\\nbuilding\\tinnovative,\\tinsight-driven\\tsolutions\\tthat\\tachieve\\tmeasurable\\tresults.\\t\\nEDUCATION\\t \\t\\nStevens\\tInstitute\\tof\\tTechnology,\\tHoboken,\\tNJ\\t\\t \\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tDecember\\t2022\\t\\nMaster\\tof\\tScience,\\tData\\tScience\\t \\t\\nGPA:\\t3.87\\tRelated\\tCourses:\\tOptimization\\tMethods,\\tWeb\\tMining,\\tDeep\\tLearning,\\tApplied\\tMachine\\tLearning,\\tStatistical\\t\\nMethods,\\tTime\\tSeries\\tAnalysis,\\tPattern\\tRecognition\\t&\\tClassiBication,\\tGraph\\tTheory\\t\\nManipal\\tInstitute\\tof\\tTechnology,\\tManipal,\\tIndia\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tJ u l y \\t2021\\t\\nBachelor\\tof\\tTechnology,\\tElectronics\\tand\\tCommunication\\tEngineering,\\tMinor\\tin\\tComputational\\tMathematics\\t\\nCGPA:\\t8.83\\tRelated\\tCourses:\\tComputer\\tVision,\\tLinux\\tand\\tShell\\tScripting,\\tSQL,\\tand\\tDatabase\\tManagement\\t\\nSKILLS\\t \\t\\n• Language\\t&\\tTools:\\tPython,\\tSQL,\\tC++,\\tC#,\\tR,\\tSpark,\\tTableau,\\tMS\\tOf@ice,\\tHTML/CSS\\t\\n• Frameworks\\t&\\tPlatforms:\\tGit,\\tHuggingFace\\tTransformers,\\tLangChain,\\tCI/CD\\tPipeline\\t\\n• Techniques:\\tGenerative\\tAI\\t(LLMs,\\tPrompt\\tEngineering,\\tAgentic\\tAI),\\tNLP ,\\tDeep\\tLearning,\\tMachine\\tLearning,\\tData\\t\\nAnalytics,\\tFeature\\tEngineering,\\tModel\\tEvaluation\\t\\n• CertiQications:\\tNVIDIA\\tCerti@ied\\tAssociate:\\tGenerative\\tAI\\tLLMs,\\tIBM\\tData\\tScience\\tSpecialization\\t\\nEXPERIENCE\\t \\t\\nEXL\\tService\\tAnalytics,\\tNew\\tYork\\tCity,\\tUSA\\t(Banking\\t&\\tFinancial\\tAnalytics)\\t\\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMay\\t2022\\t–\\tPresent\\t\\nData\\tScientist\\t(Consultant\\tto\\tLeading\\tU.S.\\tBank)\\t\\nSuspicious\\tActivity\\tReport\\t(SAR)\\tReview\\tProcess\\tAutomation\\t\\n• Streamlined\\tcase\\treviews\\tby\\tautomating\\tworkBlows,\\treducing\\tmanual\\teffort\\tand\\taccelerating\\tturnaround\\ttime\\t\\n• Implemented\\ta\\t.NET\\tCore\\tUI\\tand\\tbackend\\tsystem\\tto\\tautomate\\tdata\\taggregation\\tvia\\tSQL\\tand\\tAPIs,\\tdelivered\\t\\nusing\\tAgile\\tpractices\\tand\\tCI/CD\\tpipelines\\t\\n• Elevated\\tworkBlow\\tefBiciency\\tand\\tlaid\\tthe\\tfoundation\\tfor\\tGenAI\\tintegration\\tto\\tauto-generate\\tCase\\tNarratives\\t\\nand\\tDigests,\\tenabling\\tfaster,\\tintelligence-driven\\tdecision-making\\t\\n• Spearheaded\\ta\\t7-member\\tteam,\\toverseeing\\tdelivery,\\tstakeholder\\talignment,\\tand\\tongoing\\tfeature\\tplanning\\t\\nConversational\\tBI\\t\\n• Conceptualized\\ta\\tnatural\\tlanguage\\tinterface\\tto\\tdemocratize\\tcredit\\tcard\\tdata\\tinsights\\tfor\\tbusiness\\tusers\\t\\n• Designed\\tan\\tNLP-powered\\tchatbot\\twith\\ttext-to-SQL\\tcapabilities\\tfor\\treal-time\\tdata\\tquerying\\tand\\texploration\\t\\n• Delivered\\tdynamic\\tinsights,\\tcharts,\\ttables,\\tand\\tSQL\\toutputs\\tvia\\tconversational\\tinputs,\\tboosting\\tself-serve\\t\\nanalytics\\tadoption\\t\\n• Drove\\tkey\\tsolution\\tphases—from\\tarchitecture\\tand\\tprototyping\\tto\\tdeployment\\tand\\tstakeholder\\tdemonstrations\\t\\nRisk\\tDecisioning-as-a-Service\\t(RDaaS)\\t\\n• Architected\\ta\\tdigital\\tlending\\tsolution\\tto\\thelp\\tbanks\\toptimize\\tapproval\\trates\\twhile\\tmaintaining\\trisk\\tthresholds\\t\\n• Orchestrated\\ta\\tscalable\\tML\\tpipeline\\tfor\\tdata\\tingestion,\\tmodeling,\\tand\\texplainability\\ton\\tcloud\\tinfrastructure\\t\\n• Accelerated\\ttime-to-decision\\tby\\t4–6x\\tand\\tenabled\\tup\\tto\\t40%\\tincrease\\tin\\tvolume\\twith\\tno\\tadded\\tcredit\\texposure\\t\\n• Presented\\toutcomes\\tand\\tplatform\\tcapabilities\\tto\\tsenior\\tleadership\\tto\\tdrive\\tadoption\\tand\\tstrategic\\talignment\\t\\nVirtuous\\tTransactional\\tAnalytics,\\tNoida,\\tIndia\\t(Healthcare\\tAnalytics)\\t\\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tJanuary\\t2021\\t–\\tJuly\\t2021\\t\\nData\\tScience\\tIntern\\t-\\tAutomation\\tof\\tcase-intake\\tfor\\tclassifying\\tDrug\\tSafety\\tInformation\\t(Pharmacovigilance)\\t\\n• Administered\\tNLP\\ttechniques\\t(NER,\\tcoreference,\\trelation\\textraction)\\tto\\tbiomedical\\tliterature,\\tachieving\\t90%\\t\\naccuracy\\tin\\tdrug\\tsafety\\tcase\\tintake\\t\\n• Deployed\\ta\\t99%-accurate\\tspam\\tBilter\\tand\\tUI\\tto\\tautomate\\tgeneration\\tof\\tIndividual\\tCase\\tSafety\\tReports\\t\\nSolytics\\tPartners,\\tPune,\\tIndia\\t(Banking\\t&\\tFinancial\\tAnalytics)\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMay\\t2020\\t–\\tAugust\\t2020\\t\\nData\\tScience\\tIntern\\t-\\tAnti-Money\\tLaundering\\t(AML)\\t\\n• Developed\\tML\\tmodels\\tfor\\tfraud\\tdetection\\tand\\ta\\tcredit\\trisk\\tscorecard\\tusing\\tWOE/IV\\tfor\\taccurate\\tcredit\\tscoring\\t\\n• Created\\ta\\tweb-based\\tAutoML\\tinterface\\tthat\\tenabled\\tmodel\\tconBiguration\\tand\\toutput\\tvisualization\\t\\nPROJECTS\\t\\nPost-OCR\\tCorrection\\t\\t \\t \\t \\t\\t\\t\\t\\t\\t\\t \\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tJanuary\\t2022\\t–\\tMay\\t2022\\t\\n• Implemented\\ta\\tBERT-based\\tcontext-aware\\terror\\tcorrection\\tsystem\\tusing\\tHuggingFace\\tNeuspell;\\timproved\\tOCR\\t\\noutput\\taccuracy\\tto\\t97%\\t\\nTed\\tTalk\\tRecommender\\tSystem\\t\\t \\t \\t \\t \\t\\t\\t\\t\\t\\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tSeptember\\t2021\\t–\\tDecember\\t2021\\t\\n• Built\\ta\\tcontent-based\\trecommender\\tusing\\ttranscript\\tsentiment\\tanalysis\\tand\\ttopic\\tmodeling;\\tachieved\\t83%\\t\\nrecommendation\\taccuracy\\t\\nSmart\\tFarming\\tusing\\tConvolutional\\tNeural\\tNetworks\\t\\t\\t \\t \\t \\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMarch\\t2019\\t–\\tOctober\\t2019\\t\\n• Engineered\\tan\\tArduino-powered\\tdevice\\twith\\tCNNs\\tfor\\treal-time\\tcrop\\thealth\\tmonitoring;\\treached\\t96%\\taccuracy\\t\\nusing\\tlow-cost\\tsensors\\n\\nWith this context, please chat with the user, always staying in character as Arnav Agrawal.\""
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "system_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chat(message, history):\n",
    "    messages = [{\"role\": \"system\", \"content\": system_prompt}] + history + [{\"role\": \"user\", \"content\": message}]\n",
    "    response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "* Running on local URL:  http://127.0.0.1:7862\n",
      "* To create a public link, set `share=True` in `launch()`.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"http://127.0.0.1:7862/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gr.ChatInterface(chat, type=\"messages\").launch()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A lot is about to happen...\n",
    "\n",
    "1. Be able to ask an LLM to evaluate an answer\n",
    "2. Be able to rerun if the answer fails evaluation\n",
    "3. Put this together into 1 workflow\n",
    "\n",
    "All without any Agentic framework!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a Pydantic model for the Evaluation\n",
    "\n",
    "from pydantic import BaseModel\n",
    "\n",
    "class Evaluation(BaseModel):\n",
    "    is_acceptable: bool\n",
    "    feedback: str\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "evaluator_system_prompt = f\"You are an evaluator that decides whether a response to a question is acceptable. \\\n",
    "You are provided with a conversation between a User and an Agent. Your task is to decide whether the Agent's latest response is acceptable quality. \\\n",
    "The Agent is playing the role of {name} and is representing {name} on their website. \\\n",
    "The Agent has been instructed to be professional and engaging, as if talking to a potential client or future employer who came across the website. \\\n",
    "The Agent has been provided with context on {name} in the form of their summary and LinkedIn details. Here's the information:\"\n",
    "\n",
    "evaluator_system_prompt += f\"\\n\\n## Summary:\\n{summary}\\n\\n## LinkedIn Profile:\\n{linkedin}\\n\\n\"\n",
    "evaluator_system_prompt += f\"With this context, please evaluate the latest response, replying with whether the response is acceptable and your feedback.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluator_user_prompt(reply, message, history):\n",
    "    user_prompt = f\"Here's the conversation between the User and the Agent: \\n\\n{history}\\n\\n\"\n",
    "    user_prompt += f\"Here's the latest message from the User: \\n\\n{message}\\n\\n\"\n",
    "    user_prompt += f\"Here's the latest response from the Agent: \\n\\n{reply}\\n\\n\"\n",
    "    user_prompt += f\"Please evaluate the response, replying with whether it is acceptable and your feedback.\"\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "gemini = OpenAI(\n",
    "    api_key=os.getenv(\"GOOGLE_API_KEY\"), \n",
    "    base_url=\"https://generativelanguage.googleapis.com/v1beta/openai/\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(reply, message, history) -> Evaluation:\n",
    "\n",
    "    messages = [{\"role\": \"system\", \"content\": evaluator_system_prompt}] + [{\"role\": \"user\", \"content\": evaluator_user_prompt(reply, message, history)}]\n",
    "    response = gemini.beta.chat.completions.parse(model=\"gemini-2.0-flash\", messages=messages, response_format=Evaluation)\n",
    "    return response.choices[0].message.parsed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "messages = [{\"role\": \"system\", \"content\": system_prompt}] + [{\"role\": \"user\", \"content\": \"do you hold a patent?\"}]\n",
    "response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n",
    "reply = response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'As of now, I do not hold any patents. My focus has been primarily on data science and analytics projects, particularly in the areas of machine learning and natural language processing. If you have any questions about my work or experience, feel free to ask!'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reply"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Evaluation(is_acceptable=True, feedback=\"The response is acceptable. It's a straightforward and honest answer, and it directs the conversation back to Arnav's area of expertise.\")"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evaluate(reply, \"do you hold a patent?\", messages[:1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rerun(reply, message, history, feedback):\n",
    "    updated_system_prompt = system_prompt + f\"\\n\\n## Previous answer rejected\\nYou just tried to reply, but the quality control rejected your reply\\n\"\n",
    "    updated_system_prompt += f\"## Your attempted answer:\\n{reply}\\n\\n\"\n",
    "    updated_system_prompt += f\"## Reason for rejection:\\n{feedback}\\n\\n\"\n",
    "    messages = [{\"role\": \"system\", \"content\": updated_system_prompt}] + history + [{\"role\": \"user\", \"content\": message}]\n",
    "    response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chat(message, history):\n",
    "    if \"patent\" in message:\n",
    "        system = system_prompt + \"\\n\\nEverything in your reply needs to be in pig latin - \\\n",
    "              it is mandatory that you respond only and entirely in pig latin\"\n",
    "    else:\n",
    "        system = system_prompt\n",
    "    messages = [{\"role\": \"system\", \"content\": system}] + history + [{\"role\": \"user\", \"content\": message}]\n",
    "    response = openai.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n",
    "    reply =response.choices[0].message.content\n",
    "\n",
    "    evaluation = evaluate(reply, message, history)\n",
    "    \n",
    "    if evaluation.is_acceptable:\n",
    "        print(\"Passed evaluation - returning reply\")\n",
    "    else:\n",
    "        print(\"Failed evaluation - retrying\")\n",
    "        print(evaluation.feedback)\n",
    "        reply = rerun(reply, message, history, evaluation.feedback)       \n",
    "    return reply"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "* Running on local URL:  http://127.0.0.1:7863\n",
      "* To create a public link, set `share=True` in `launch()`.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"http://127.0.0.1:7863/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Passed evaluation - returning reply\n",
      "Failed evaluation - retrying\n",
      "The response is not acceptable because it speaks in gibberish. This is not professional, helpful, or engaging.\n"
     ]
    }
   ],
   "source": [
    "gr.ChatInterface(chat, type=\"messages\").launch()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}