Spaces:
Sleeping
Sleeping
| Daniel Halwell — Full Life Story (First‑Person) — Digital CV Narrative (LLM‑Ready) | |
| Last updated: 22 September 2025 | |
| [Metadata] | |
| Exec summary: Opening context introducing you as a scientist transitioning into AI engineering, highlighting passion for coding and ethical, results-driven principles. | |
| Keywords: introduction, AI engineer journey, ethics, automation, passion for coding, momentum | |
| Questions: | |
| 1. Who are you and what is your current career focus? | |
| 2. How long have you been coding and why did you start? | |
| 3. What types of work make you happiest day-to-day? | |
| [/Metadata] | |
| Hi, I’m Daniel. I’m a scientist‑turned‑AI engineer who likes building apps, automating processes and AI systems. I've been coding for the last | |
| 6 years now, having been suggested to try it by a colleague. Now its my favourite hobby and become my main career objective to become an AI engineer. | |
| I’m happiest when I’m writing Python, wiring up data, and shipping small, useful tools that unblock people. I care about | |
| ethics, clarity, and momentum — make the right thing the easy thing, and prove it with results. | |
| [Metadata] | |
| Exec summary: Expands on your broader interests in AI, data science, and mathematics, anchoring your transition from analytical chemistry to tech. | |
| Keywords: data science, automation, problem solving, kaggle, mathematics, analytical chemistry transition | |
| Questions: | |
| 1. What other technical domains beyond AI interest you? | |
| 2. How do you engage with problem solving outside of work? | |
| 3. In what ways does your background in analytical chemistry support your move into data science? | |
| [/Metadata] | |
| Its not just AI systems I like, I like data science and general automation too. I really like to solve a problem so having a look at coding competitions | |
| on kaggle or just trying to come up with solutions is really enjoyable. I like (trying) to learn about all the underpinning mathematics as it facinates me. | |
| As an analytical chemist, working with lots of data has been part of my day job for some time so making the leap to data science then AI felt pretty natural. | |
| — | |
| Core identity & working values | |
| — | |
| [Metadata] | |
| Exec summary: Bullet-point overview of your principles, emphasising human-centred tech, ethics, iterative building, communication style, and evidence-based mindset. | |
| Keywords: working values, ethics, human-first, builder mindset, mentoring, communication style, evidence-driven | |
| Questions: | |
| 1. What foundational values steer your approach to technology and product delivery? | |
| 2. How do you prefer to translate ideas into execution? | |
| [/Metadata] | |
| • Human‑first technologist. Tools are only useful if they help people make better decisions faster. | |
| • Ethics matter. I avoid work tied to fossil fuels, weapons, surveillance, or anything that harms people. | |
| • Builder’s mindset. Prefer to get an idea down in a flow diagram, like visual representations. Once I'm clear on the vision I like to start small and iterate quickly | |
| • Teaching & clarity. Notebooks, diagrams, docstrings, and handovers. Mentoring is part of the job. | |
| • Plain English, not the most over the top person, pretty laid back, just want to get stuff done and enjoy life, pretty sarcastic, love some dark humout | |
| • Evidence over adjectives. Numbers, before-and-after, and real bottlenecks solved. | |
| -- | |
| [Metadata] | |
| Exec summary: Personal origin note grounding your background in Devon and acknowledging relocation for career opportunities. | |
| Keywords: origin story, Devon, relocation, personal background, career move motivation | |
| Questions: | |
| 1. Where are you originally from | |
| [/Metadata] | |
| I'm originally from the south west of England in the lovely county of Devon. Great place to grow up even if it is a bit out of the way. | |
| Shame to move away but you gotta go where the jobs are. I lived here up until the age of 18, when I moved to Guyana South America for a year and | |
| then on to University. My first work experiences came here and it was nice to grow up in a quiet place where there was a lot of | |
| people who knew each other, close to the beach in summer and has great pubs and bars. | |
| --- | |
| — | |
| Early graft (pre‑uni): where my work ethic came from | |
| — | |
| [Metadata] | |
| Exec summary: Details your early jobs across hospitality, retail, and manufacturing, highlighting the foundation of your work ethic, QA mindset, and preference for night shifts. | |
| Keywords: early career, bar cleaning, retail experience, factory work, quality assurance, work ethic, night owl | |
| Questions: | |
| 1. What types of jobs did you hold before university? | |
| 2. How did your roles shape your understanding of quality assurance? | |
| 3. How did these early experiences influence later method development skills? | |
| [/Metadata] | |
| I started working around 13, cleaning a bar on Saturday mornings — bottles, floors, stocking the bar etc. By 16 I was at | |
| Summerfields, mainly stacking shelves but I did do some checkout work. Tried | |
| a few different shift patterns there and learnt that I'm a bit of a night owl (nights, evenings, days) | |
| across produce and dairy: receiving deliveries, stacking shelves, | |
| I also did a stint in a small component factory, making parts by hand from SOPs — | |
| counting coil turns, trimming, testing. It wasnt the most exciting job but was a good earner. | |
| This was my first foray into QA really, where checking work was a priortiy so things worked, | |
| this process and repeatability fed straight into my later method development work. | |
| I also worked in an Indian restaurant where I worked behind the bar most of the time taking orders over the phone, | |
| making drinks and occasionaly serving drinks to the table and clearing up tables. I always loved indian cuisine and | |
| working meant I ate quite a lot of curry, it was amazing. | |
| I also worked in a nightclub on weekends, which was a pretty late one to be honest, used to start around 10pm and then work through until | |
| about 3am. I did quite a few jobs here, I worked in coat check, kitchen (making burgers and lattice fries mainly) and also | |
| worked on the bar, pretty hectic. | |
| — | |
| Gap year in Guyana: teaching and learning to adapt | |
| — | |
| [Metadata] | |
| Exec summary: Chronicles your gap-year teaching experience in Guyana, emphasizing adaptability, instructional skills, and exposure to challenging environments. | |
| Keywords: Guyana, Project Trust, teaching, adaptability, resilience, user-centered design inspiration | |
| Questions: | |
| 1. What program enabled you to teach in Guyana and what subjects did you cover? | |
| 2. How did you handle unexpected challenges during your placement? | |
| 3. Which teaching lessons do you carry into your user-facing design work? | |
| [/Metadata] | |
| Before university, I spent a year teaching in Guyana through Project Trust. I trained on the Isle of Coll, then flew out | |
| with a cohort and split off into schools. When my volunteer room mate had to return home due to illness, I moved schools and | |
| started again with some new room mates. | |
| I taught maths, science, and PE to students roughly 11–16. The big lessons: | |
| • Teaching is hard, you have to be prepared, things suprise you and you have to be quick on your feet | |
| • Learning isn't the same for everybody, you have to adapt to the idividual | |
| • Be clear and concise in your delivery, you gotta be fine-tuned | |
| Those ideas still shape how I design anything user‑facing — dashboards, APIs, or agentic assistants. | |
| This time wasn't without its challenges, my room mate getting ill was a big deal, when you're the only person around | |
| to help in a medical emergency, its quite a challenge. Thankfully things turned out well on that occasion but there were | |
| many challenges living in a country so different from your own. You get some key perspective on your own situation when | |
| viewing the type of povery that some people will never see in a lifetime. | |
| — | |
| Loughborough & Mars Petcare: chemistry + sensors + software | |
| — | |
| [Metadata] | |
| Exec summary: Narrates your MChem journey, industrial placement at Mars Petcare, development of analytical methods, and growing interest in statistics and food science. | |
| Keywords: Loughborough MChem, Mars Petcare, LC-MS, GC-MS, method development, maillard reaction, statistics, sensory science | |
| Questions: | |
| 1. Why did you choose Loughborough and pursue an industrial placement at Mars? | |
| 2. Which analytical instruments and methods did you master during the placement? | |
| 3. How did your work with the Maillard reaction influence your interests? | |
| 4. What statistical techniques did you apply in flavour development projects? | |
| 5. What publication resulted from your work in this period? | |
| [/Metadata] | |
| I’d already accepted a place for MChem at Loughborough. I wanted industry experience in the degree, so I took an | |
| industrial placement at Mars Petcare in Verden, Germany. I trained on LC‑MS, GC‑MS, GC‑FID; moved from sample prep | |
| to method development; migrated a tricky amino‑acid analysis from LC to GC with derivatisation; added additional amino | |
| acids; and demonstrated linearity and accuracy. First taste of method development and optimisation — and I loved it. | |
| Living in Germany was a great experience and definitely one of the best places I've ever lived. | |
| I worked on flavour development of cat food, running feeding trials with recipes that I put together. This is where I started | |
| to get more invloved and interested in statistics. I set up design of experiments trials to determine the optimum concentration | |
| of food additives to increase food uptake by the cats, they are quite picky after all. This involved making pilot scale batches on plant, | |
| running analysis and interpreting the data. All in all it was an amazing experience. | |
| The main focus of my project there was the maillard reaction. The Maillard reaction is a non-enzymatic browning reaction between | |
| amino acids and reducing sugars that generates the complex flavours and brown colours associated with roasted, baked, and fried foods. | |
| It proceeds through a cascade of steps (Schiff base → Amadori/Heyns → fragmentation and Strecker degradation → melanoidins) and is | |
| accelerated by heat, dryness, and alkaline conditions. It made me really interested in food and how small changes to cooking can make | |
| big differences in flavour profiles. | |
| Back in the UK, I returned to Mars for a summer project near Loughborough on umami perception. I set up macros and | |
| a software workflow so sensory panelists could record peak perception while we swabbed to quantify concentrations, and | |
| we correlated the curves. That work was presented at a flavour symposium. It was instrumentation + sensory science + | |
| just‑enough software — a pattern I’ve repeated since in other domains. This turned into my first publication | |
| Relationship between Human Taste Perception and the Persistence of Umami Compounds in the Mouth - Flavour Science | |
| Proceedings from XIII Weurman Flavour Research Symposium | |
| 2014, Pages 487-491 | |
| Side note: the animal care standards and the environment were excellent. It mattered to me that the work respected | |
| the animals — that balance between scientific rigour and humanity set a tone for my career. | |
| — | |
| A practical reset: labouring on a building site | |
| — | |
| [Metadata] | |
| Exec summary: Highlights your post-graduation labouring work, underscoring appreciation for tangible progress and parallels to iterative software development. | |
| Keywords: labouring, construction, tangible progress, iteration, motivation, work ethic | |
| Questions: | |
| 1. What work did you do immediately after graduating? | |
| 2. How did labouring influence your appreciation for visible progress? | |
| 3. In what way do you connect physical labour to your software development mindset? | |
| 4. Why does iterative build-test cycles resonate with you? | |
| [/Metadata] | |
| After graduating, I worked as a labourer in Devon while job‑hunting — hauling materials through houses to back gardens, | |
| mixing cement for brickwork and infill, clearing waste. Tangible progress at the end of each day is addictive. I still chase | |
| that in my day to day, which really pointed me towards a career in programming, I just didnt know it yet: Its great to see progress when you finish at the end of the day, those small iterative cycles of build, debug | |
| , test repeat keeps me addicted to writing code. | |
| — | |
| Sanofi → Recipharm (2012–2021): analytical specialist in a regulated world | |
| — | |
| [Metadata] | |
| Exec summary: Summarises nearly a decade of analytical chemistry work at Sanofi and Recipharm, covering E&L leadership, method transfers, investigations, and cross-functional support in regulated environments. | |
| Keywords: Sanofi, Recipharm, analytical specialist, extractables and leachables, method validation, cGxP, investigations, manufacturing support, statistics | |
| Questions: | |
| 1. What were your primary responsibilities at Sanofi and Recipharm? | |
| 2. How did you lead extractables and leachables studies? | |
| 3. What statistical methods did you apply during method transfers and validations? | |
| 4. How did you contribute to troubleshooting and manufacturing support? | |
| 5. How did this period strengthen your commitment to data integrity and Python/ML? | |
| [/Metadata] | |
| I spent nearly a decade across Sanofi and Recipharm, moving from routine QC to Analytical Specialist. My centre of gravity: | |
| non‑routine analysis, method transfers, validations, and inspection support (MHRA, FDA, etc.). | |
| [Metadata] | |
| Exec summary: Enumerates your day-to-day responsibilities at Sanofi and Recipharm, covering E&L leadership, method transfers, investigations, and manufacturing support. | |
| Keywords: responsibilities, extractables and leachables, method transfer, validation, investigations, manufacturing support. This is also when I started coding in Python. | |
| Questions: | |
| 1. What specific analytical tasks did you handle in this role? | |
| 2. How did you contribute to extractables and leachables programmes? | |
| 3. In what ways did you support method transfers and validations? | |
| 4. How did you engage in investigations and CAPA activities? | |
| 5. What types of cross-functional manufacturing collaboration did you perform? | |
| [/Metadata] | |
| What I did: | |
| • Extractables & leachables (E&L). Subject‑matter lead for E&L studies, scoping and interpreting chromatographic & | |
| spectroscopic data for materials such as plastics and elastomers. I worked with suppliers to perform testing | |
| on out behalf, draw up protocols and reports, kept up to date on the latest advancements | |
| • Method transfers & validation. Equivalence testing, t‑tests, TOST, precision/accuracy studies, technical reports, | |
| and document control in a cGxP environment. This is another stage in my career where statistics is pushing me | |
| towards a career in data science and AI. I didnt quite know it yet but I loved maths more than I thought I did. | |
| I was one of the technical experts when we transferred around 60 methods to Germany following potential rule | |
| changes after Brexit. This made me a key contact for troubleshooting, acceptance criteria setting, result interpretation, | |
| I travelled to Germany to train staff, a bit of everything. | |
| • Investigations & CAPA. Practical Problem Solving (PPS), root‑cause analysis across engineering, manufacturing, | |
| and quality. | |
| • Manufacturing support. Collaborated with scientists, engineers, and microbiologists on urgent issues — from chemical | |
| impurities to microbial contamination — often building or adapting analytical methods on the fly. I'd be testing effluent | |
| one day and have my head in a metered dose inhaler formulation vessel the next. | |
| • I worked in routine QC environment for quite a few years, doing analysis of nasal product, metered dose inhalers and also | |
| the packaging and raw materials that went into them. | |
| • During this time I gained expertise in HPLC, GC, Karl Fisher and became a super user in GC specifically. | |
| [Metadata] | |
| Exec summary: Highlights the business impact and personal growth outcomes from your Sanofi/Recipharm tenure, including cost savings, data integrity ethos, and development of statistical expertise. | |
| Keywords: impact, cost savings, data integrity, statistics, practical problem solving, career inflection | |
| Questions: | |
| 1. What quantified business result did you deliver during this period? | |
| 2. How did the work reinforce your commitment to data integrity? | |
| 3. In what way did statistics influence your transition toward Python and ML? | |
| 4. What experience did you gain with the PPS tool and complex investigations? | |
| [/Metadata] | |
| Why it mattered: | |
| • We resolved a critical impurity issue that delivered real cost savings (and a lot of learning). | |
| • I developed deep respect for data integrity and traceability: if it isn’t documented, it didn’t happen. | |
| • Statistics became second‑nature and nudged me towards Python and, later, machine learning and AI. | |
| • I gained invaluble experience in practical problem solving (PPS). I worked on an extensive investigation using the | |
| PPS tool to solve extremely complex issues, including multivariate root causes, making it difficult to find the true root cause. | |
| — | |
| AstraZeneca (May 2021 – Present, Macclesfield): chemistry meets code | |
| — | |
| [Metadata] | |
| Exec summary: Captures your hybrid analytical science and AI engineering role at AstraZeneca, focusing on nitrosamine investigations, automation, and major achievements across RAG assistants, Bayesian optimisation, agentic workflows, and platform reliability. | |
| Keywords: AstraZeneca, analytical chemistry, nitrosamine, automation, RAG assistant, Bayesian optimisation, agentic workflows, data pipelines, mentorship | |
| Questions: | |
| 1. What core responsibilities define your work at AstraZeneca? | |
| 2. How do you apply automation and AI to analytical challenges such as nitrosamine detection? | |
| 3. What impact did the RAG laboratory assistant deliver, and how was it developed? | |
| 4. How did Bayesian optimisation change method development practices? | |
| 5. What agentic workflow innovations have you introduced? | |
| 6. Which tooling and platforms do you regularly use in this role? | |
| 7. How do you support community and mentoring within AstraZeneca? | |
| [/Metadata] | |
| This is where everything clicked. I stayed rooted in analytical science — including trace/nitrosamine risk investigations | |
| where timelines are tight — but I worked increasingly like a data scientist / engineer. | |
| One of my key tasks is method devlopment of extremely low concentrations. Nitrosamine have to be monitored at such low concentrations | |
| it requires specific methods and equipment, we're talking levels of around a billionth of a gram. The one thing we can always do | |
| is automate better, which is what I love to do. Whether its processing ticket requests or extracting instrument usage from logs, | |
| this is where my programming knowledge really started to make an impact. | |
| [Metadata] | |
| Exec summary: Bullet list of standout initiatives at AstraZeneca showing your impact across GenAI, optimisation, automation, and data tooling. | |
| Keywords: key achievements, RAG assistant, Bayesian optimisation, agentic workflows, data pipelines, dashboards, platform correctness, chromatographic prediction, mentoring | |
| Questions: | |
| 1. What major projects illustrate your contributions at AstraZeneca? | |
| 2. How did you leverage GenAI and optimisation to improve lab processes? | |
| 3. Which data engineering and dashboard efforts reduced friction for colleagues? | |
| 4. How did you ensure platform correctness and predictive modelling capability? | |
| 5. In what ways do you support mentoring and community building at work? | |
| [/Metadata] | |
| Key achievements and strands of work: | |
| • RAG‑based laboratory assistant (GenAI). I led the build of a retrieval‑augmented assistant with a multi‑disciplinary | |
| team (SMEs, AI engineers, front/back‑end). We took it from PoC through risk assessments, evaluation vs expected | |
| outputs, and UAT. It reduced troubleshooting lead times by ~20% and made internal knowledge more discoverable. | |
| • Bayesian optimisation for method development. We matched a historical method‑development context and reached | |
| the same optimum with ~50% fewer experiments by applying Bayesian optimisation. That moved from a promising study | |
| to an adopted practice in real projects. This was a great team of individuals with expert knowledge of automation | |
| , Python, Bayesian Optimisation (using BayBE) and Gas Chromatography and HRMS. I also devloped a RAG chatbot for | |
| writing PAL script code for managing CTC rails. | |
| • Agentic workflows. I’m actively developing agentic patterns (tool‑use, MCP) to cut manual | |
| coordination and reduce method‑development effort. In targeted scopes, we’ve seen up to ~80% reductions in the | |
| human loops required to get to “good enough to ship” (the point is fewer trips round the houses, not magic). | |
| • Data pipelines & APIs. I engineered pipelines in SQL (Snowflake) and Python; launched FastAPI services so downstream | |
| tools could call data cleanly; and used those services as foundations for GenAI tools via tool‑use/MCP. | |
| • Dashboards that people actually use. I built Power BI and Streamlit tooling that gives a clean view of support tickets, | |
| instrument utilisation, and a self‑serve data portals. | |
| • Worked with large scale databases to retrieve and clean data. Worked with external partners to improve the data pipeline | |
| • Developed and deployed streamlit web apps for various purposes | |
| • Chromatographic prediction. From fingerprints + XGBoost baselines to neural approaches and, later, attention‑based | |
| graph models. I pre‑trained on a large open dataset (~70k injections) | |
| • Mentoring & community. I contribute to the internal Coding Network, support colleagues learning Python, and sit on the | |
| programming expert panel. I like turning tacit know‑how into repeatable templates. | |
| [Metadata] | |
| Exec summary: Enumerates the primary tools, languages, and platforms you relies on within AstraZeneca projects. | |
| Keywords: tooling stack, Python, FastAPI, SQL, Power BI, Streamlit, ML frameworks, cloud platforms | |
| Questions: | |
| 1. Which languages and frameworks underpin your daily work at AstraZeneca? | |
| 2. What visualisation and dashboard tools do you deploy? | |
| 3. Which machine learning libraries support your modelling efforts? | |
| 4. What GenAI providers and cloud platforms do you integrate with? | |
| [/Metadata] | |
| Tools I use a lot here: | |
| Python, FastAPI, SQL/Snowflake, Power BI, Streamlit/Plotly/Matplotlib, scikit‑learn, XGBoost, PyTorch, PyTorch Geometric, | |
| OpenAI/Anthropic/OpenRouter/Vertex APIs, Docker, GitHub Copilot / Claude Code / Gemini Code Assist, and cloud basics | |
| across Azure/AWS/GCP. | |
| — | |
| CoDHe Labs (Jul 2025 – Present, part‑time): ethical AI that ships | |
| — | |
| [Metadata] | |
| Exec summary: Outlines your part-time independent practice focusing on ethical AI, RAG copilots, agentic automation, and pro bono work for charities, including current projects. | |
| Keywords: CoDHe Labs, independent work, generative AI copilots, dashboards, agentic workflows, pro bono, charity support, automation tools | |
| Questions: | |
| 1. What services does CoDHe Labs provide and what principles guide it? | |
| 2. Which current initiatives demonstrate your applied skills outside AstraZeneca? | |
| 3. How do you balance commercial and pro bono engagements? | |
| 4. What technologies and collaborations are involved in the charity project mentioned? | |
| 5. What future project do you hint at in this section? | |
| [/Metadata] | |
| Alongside my full‑time role, I formalised my independent work as CoDHe Labs — a small practice focused on: | |
| • Generative AI “copilots” (RAG) that make internal knowledge instantly useful. | |
| • ML‑powered insights dashboards wired to a warehouse. | |
| • Agentic workflow automation that coordinates multi‑step processes via tool‑use. | |
| • Digital uplift for small teams and non‑profits (including light M365/Azure support). | |
| This includes setting up invoicing automation, data entry and storage, user management | |
| I also run an “AI for Charities” pro bono strand because capability should compound beyond big budgets. | |
| I'm currently working on a project to help a charity with their IT infrastructure and automations using Excel VBA, | |
| Power Automate and Python. I'm also liasing with external partners to implement additional tools. | |
| I'm also working on an agentic VS Code extension but more of that at a later date as thats still in development. | |
| [Metadata] | |
| Exec summary: Outlines your scoping methodology for client engagements, focusing on bottleneck identification, rapid prototyping, transparency, and documentation. | |
| Keywords: scoping process, bottleneck analysis, prototyping, documentation, transparency, vendor lock avoidance | |
| Questions: | |
| 1. How do you prioritise bottlenecks when starting new work? | |
| 2. What approach do you take to rapid prototyping and iteration? | |
| 3. How do you handle intellectual property, licensing, and vendor lock concerns? | |
| 4. What project management practices (SoWs, documentation) do you emphasise? | |
| [/Metadata] | |
| How I scope work: | |
| • Start with the bottleneck: retrieval? experiment count? brittle handoffs? We pick one. | |
| • Ship a small, working prototype fast; measure; iterate; document; hand over. | |
| • Keep IP, license usage; be transparent; avoid vendor lock where we can. | |
| • Strong SoWs; clean docs; and honest conversations about risk, safety, and fit. | |
| — | |
| Achievements I’m proud of (because they changed behaviour) | |
| — | |
| [Metadata] | |
| Exec summary: Highlights selected achievements demonstrating your competitive performance, hackathon recognition, adoption-driving innovations, and user-focused RAG impact. | |
| Keywords: achievements, Kaggle competition, Modal Labs award, Bayesian optimisation adoption, RAG assistant impact, behaviour change | |
| Questions: | |
| 1. Which competition results do you cite as evidence of capability? | |
| 2. What recognition did you receive for agentic MCP work and why? | |
| 3. How did the Bayesian optimisation project influence team practices? | |
| 4. Why do you value the RAG assistant’s impact on colleagues? | |
| 5. How do these achievements reflect your focus on behaviour change? | |
| [/Metadata] | |
| • 4th place in a Kaggle binary‑classification competition (Mar 2025) — a nice reminder that fundamentals matter. In this | |
| challenge, we were tasked with predicting rainfall. I enjoyed working on this one and I learnt a few things as I progressed with submitting | |
| my predictions. I started with the usual EDA and feature engineering before testing a few models. I tend to default to models like | |
| XGBoost because it works really well out of the box. I usually like to do a random forest as a baseline for most tasks though, then I | |
| can iterate and start to build a picture of what works best. My go to toolbox is sklearn for models, pandas for data manipulation and seaborn | |
| for visualisation. Sklearn is great for pre-processing data as well as hyperparameter tuning, running a quick random search cv and grid search cv | |
| is usually my sequence. When performing a task like this, I alays like to keep in mind what data I have for predictions, especially whether the dataset | |
| is well balanced, on this occasion, it was quite unbalanced. When this happens, I like to employ SMOTE, generating synthetic data to help balance the scales. | |
| This dramatically improved performance. The final step was something new to me, I used a VotingClassifier to train multiple methods and | |
| train a surrogate model picker to evaluate the classifier. Initially, I was really inpressed with my cross validation, but the test set | |
| on Kaggle didnt look like anything special. However, I decided to stick with the good cross val score and it really payed off. I jumped up | |
| hundreds of places in the final leaderboard. I got my free t-shirt, one of my prized possesions, a really proud moment when I started to realise | |
| I'm more than ok at this, I can do this as a job. With no classical training, no full time data job, I was surpassing trained data scientists | |
| and Kaggle grandmasters, just imagine what I could do if I did it full time. | |
| • Modal Labs Choice Award ($5,000) for an agentic Model Context Protocol (MCP) server during a Gradio + Hugging Face | |
| hackathon (Jul 2025). The joy wasn’t the prize — it was proving a lean, useful pattern quickly with real constraints. | |
| This was one of the best achievements of my adult career. Prior to this, I hadnt written a single MCP, hadnt even really used MCP that much but I felt | |
| I wanted the challenge. I worked incredibly hard on this, working very late into the night (I couldnt sleep anyway lol) and planned out my | |
| application. At first, I wanted to make agentic deep research but, the more I thought about it, I want something quick with low latency. Shallow Research was born. | |
| Shallow Research was about code, generating tested and validated code. The MCP in essence, took a user input about code, like "how do I perform predictions on a binary classification task?" | |
| The MCP would begin a linear "agentic" workflow, where there would be various agents tasked with very specific tasks. There was a research agent | |
| to look up best practice on the internet, then there was a code generation agent, then there was the most critical part, the code execution. | |
| The real power of the MCP was the ability to run code in a remote sandbox on the Modal platform, an amazing platform for spinning up CPU or GPU instances. | |
| The Code Runner agent would run the code and make sure the code works, the sandbox doesnt have the right library, no wories, the library would be installed | |
| dynamically. A simple image was set up on Modal to decrease latency, this way it was set up quick with the core libraries and then other libraries were installed | |
| only when needed. Finally, all of this would be returned to the user, the user can see that the code was executed and what the result was. They | |
| could copy and paste the code knowing that the code would work. The aim was this to be a great MCP for those learning to code, it'd give you | |
| working code with good explanations and citations to read more. | |
| • The Bayesian optimisation result at AZ (same optimum, ~50% fewer experiments) because it moved from “cool idea” to | |
| “how we actually work”. It was the first of its type in the department and I learned a lot from Bayesian experts | |
| • The RAG assistant because it reduced real, everyday friction for colleagues hunting for knowledge in complex systems. The | |
| tools I develop are thought of with the end user in mind, how can I try and make someones day better by helping them solve a problem. | |
| — | |
| Why AI — and why now | |
| — | |
| [Metadata] | |
| Exec summary: Explains your motivation for pursuing AI, tying together analytical chemistry, statistics, Python, ML, GenAI, and the joy of iterative problem-solving. | |
| Keywords: motivation, AI transition, analytical chemistry influence, statistics, Python, machine learning, generative AI, iteration | |
| Questions: | |
| 1. How did analytical chemistry shape your approach to data and decision-making? | |
| 2. What role did statistics and Python play in your transition to AI? | |
| 3. How do you describe the evolution from ML to GenAI and agentic patterns? | |
| 4. Why do you find programming so engaging and time-dissolving? | |
| [/Metadata] | |
| Analytical chemistry immersed me in noisy data and decisions under constraint. Statistics gave me language for uncertainty. | |
| Python gave me leverage — automation, analysis, and APIs. ML stitched it together into predictive systems. GenAI widened | |
| the aperture to text and reasoning, and agentic patterns turn tools into coordinated doers. To be honest, I enjoy the loop: | |
| frame the question, ship a tiny thing, see if it helps, and keep going. Programming is one of those activities I can start and | |
| all of a sudden its 8 hours later and thats what I love about it, it tunes my brain, my brain was made to code, its just a shame | |
| I found out so late. | |
| — | |
| How I work (and sound) | |
| — | |
| [Metadata] | |
| Exec summary: Describes your working style, including goal orientation, iterative focus, accountability, collaboration, documentation habits, and conversational cues. | |
| Keywords: working style, goal setting, iteration, accountability, collaboration, documentation, communication tone | |
| Questions: | |
| 1. How do you define success and plan your skill development? | |
| 2. What is your approach to starting and finishing projects? | |
| 3. How do you handle mistakes and team accountability? | |
| 4. What role does documentation play in your delivery process? | |
| 5. Which phrases signal your agreement or emphasis during conversations? | |
| [/Metadata] | |
| • “What does success look like in a year, what skill will I need in 6 months?” — then work backwards. I'm always thinking about how things can be done better | |
| • Start small and see where it goes. If its good, I'll fixate on it until its done | |
| • Nothing is ever good enough, we can always improve on processes | |
| • I'm honest, if somethings my fault, I'll hold my hand up and expect the same of others. Pushing blame onto others or nitpicking people doesnt impress me | |
| • Opinionated defaults, but collaborative. I’ll propose a pattern and then adapt with the team. | |
| • Documentation is part of delivery. If someone can’t pick it up without me, I haven’t finished and thats hard to follow through on. It's not always easy but I try my best. In the world of Pharma, if you didnt write it down, it never happened. | |
| If it was never written down but some action was performed, then an auditor is not going to like it and that can get you in serious trouble. | |
| • I’ll say “Yeah, definitely,” when something resonates. I’ll say “to be honest,” when I need to cut through nicely. | |
| — | |
| Technical highlights (deeper cuts) | |
| — | |
| [Metadata] | |
| Exec summary: Introduces deep-dive technical case studies covering RAG assistant, agentic workflows, Bayesian optimisation, chromatographic prediction, and API/ETL improvements. | |
| Keywords: technical highlights, case studies, RAG, agentic workflows, Bayesian optimisation, chromatographic prediction, APIs, ETL correctness | |
| Questions: | |
| 1. What advanced technical initiatives do you showcase here? | |
| 2. How do these highlights expand on earlier achievements? | |
| 3. Which domains (retrieval, optimisation, prediction, API design) do you emphasise? | |
| 4. How do these examples demonstrate your end-to-end problem solving? | |
| [/Metadata] | |
| RAG laboratory assistant | |
| [Metadata] | |
| Exec summary: Details the rationale, implementation steps, outcomes, and technical methods behind the RAG laboratory assistant PoC and rollout. | |
| Keywords: RAG assistant, retrieval, embeddings, ChromaDB, prompt augmentation, multimodal troubleshooting, AI governance, evaluation | |
| Questions: | |
| 1. Why was the RAG laboratory assistant needed and what problem did it solve? | |
| 2. How did you architect the retrieval pipeline, including embeddings and databases? | |
| 3. What prompt augmentation strategy did you use to improve retrieval? | |
| 4. How did you incorporate multimodal troubleshooting and image context into the solution? | |
| 5. What governance, evaluation, and deployment steps were taken to roll out the assistant responsibly? | |
| 6. How did the project balance user experience with accuracy and guardrails? | |
| [/Metadata] | |
| Why: People were wasting time on “Who knows X?” and “Where’s that doc?”. Retrieval needed to be first‑class. | |
| How: Light doc loaders; chunking; embeddings; vector DB; retrieval‑augmented prompting; guardrails around sources; | |
| simple UI; risk assessments; evaluation vs expected outputs; UAT with actual users. | |
| Outcome: ~20% reduction in troubleshooting lead times and noticeably faster answers to routine questions. | |
| How I did it: I built a PoC using Streamlit. I used Open AI embeddings to vectorise manuals and troubleshooting guides that I | |
| selected from the internet, knowing that these ground truth documents were great sources. What do you do with embeddings, put | |
| them in a vector database, personally I used ChromaDB because it was easy to set up locally but I have also used QDrant and Pinecone | |
| which are great cloud alternatives. Then I had to layer in the LLM calls. To improve accuracy, I employed a prompt augmentation step, | |
| I make an extra call to an LLM, to come up with 3 or 4 questions related to the user query but slightly different. This helps to widen the potential | |
| retrieval of documents, especially if it asks questions the user hadnt thought of, its all about context. From this you can inject this into the prompt | |
| and get a grounded answer (although you gotta cal set() on those retrieved chunks, dont want to waste tokens on duplicated lol). | |
| I also included image based troubleshooting, early on in the multimodal landscape. Image embeddings wernt common then, so I used models to explain the issue | |
| in the image and then used this context to perform retrieval, this meant it could be quite dynamic and still give ground truth results with references (key). | |
| The other main input into this type of tool, prompt engineering, users dont want to type war and piece, by having a specialist RAG tool you can fine fune the system | |
| prompt to abstract away some of the more complex prompting skills like chain of thought, its been done form them already. | |
| Thats just the PoC, AI governance is key, copyright concerns is key. Deployment becomes a collaborative effort with teams all over the world, sprints in Jira | |
| UAT with SME's, AI evaluation rounds with SME's to make sure responses meet requirements. FInally you get an app out in the wild and people start using it | |
| feels great!! | |
| Agentic workflows + MCP/tool‑use | |
| [Metadata] | |
| Exec summary: Summarises your approach to agentic workflows using Model Context Protocol, focusing on reducing manual coordination through secure tool orchestration. | |
| Keywords: agentic workflows, MCP, tool-use, automation, coordination reduction, secure interfaces | |
| Questions: | |
| 1. What problem do agentic workflows solve for your teams? | |
| 2. How do you apply MCP and tool-use patterns in these workflows? | |
| 3. What outcomes have these automations delivered? | |
| 4. How does scope control factor into your design choices? | |
| [/Metadata] | |
| Why: Multi‑step, cross‑system tasks were brittle and person‑dependent. | |
| How: Orchestrated tools behind clear interfaces; used MCP/tool‑use patterns so models can call functions securely; kept | |
| scope tight. | |
| Outcome: In the right slices, up to ~80% reduction in human loops to reach usable results. | |
| Bayesian optimisation for method development | |
| [Metadata] | |
| Exec summary: Describes your application of Bayesian optimisation to laboratory method development, reducing experiments while matching historical optimums. | |
| Keywords: Bayesian optimisation, method development, experiment reduction, iterative loop, objective function | |
| Questions: | |
| 1. Why was Bayesian optimisation selected for the method development problem? | |
| 2. How did you structure the optimisation loop and objective? | |
| 3. What comparison baseline validated the approach? | |
| 4. What efficiency gains were achieved in experiment count? | |
| [/Metadata] | |
| Why: Parameter spaces are expensive to explore; we needed a principled way to reach “good enough to ship” faster. | |
| How: Replayed a historical development on the same instrument with bounded variables and a clear objective; ran an | |
| iterative loop; compared against the known optimum. | |
| Outcome: Same optimum with ~50% fewer experiments. Clear signal to scale into practice. | |
| Chromatographic retention‑time prediction | |
| [Metadata] | |
| Exec summary: Covers your progression from baseline models to attention-based graph approaches for predicting chromatographic retention times, leveraging large datasets for pre-training and fine-tuning. | |
| Keywords: chromatographic prediction, retention time, XGBoost, neural networks, graph models, pre-training, fine-tuning, method development | |
| Questions: | |
| 1. Why was chromatographic retention-time prediction valuable for your work? | |
| 2. Which modelling techniques did you iterate through from baseline to advanced? | |
| 3. How did you combine open datasets with internal data for training? | |
| 4. What benefits did attention-based graph models provide over earlier approaches? | |
| [/Metadata] | |
| Why: Better priors mean fewer dead‑ends in method development. | |
| How: Start with fingerprints + XGBoost baselines; extend to neural models; then pre‑train a graph model with attention on | |
| ~70k open injections; fine‑tune on internal ~30k; evaluate on held‑out chemistries. | |
| Outcome: Stronger generalisation and a reusable domain foundation to build on. | |
| APIs & ETL correctness | |
| [Metadata] | |
| Exec summary: Highlights your focus on API design and ETL integrity, ensuring analysts access clean, typed data via FastAPI services and schema fixes. | |
| Keywords: APIs, ETL correctness, FastAPI, Pydantic, schema flattening, data reliability, analytics enablement | |
| Questions: | |
| 1. Why do you emphasise clean APIs and ETL pipelines for analysts? | |
| 2. How did you use FastAPI and Pydantic models to improve data access? | |
| 3. What schema issues did you identify and resolve, and why did they matter? | |
| 4. How did these efforts reduce friction and bespoke scripting across teams? | |
| [/Metadata] | |
| Why: Analysts shouldn’t screen‑scrape or wrestle nested XML‑ish blobs. Clean tables + typed APIs unlock everything. | |
| How: FastAPI with Pydantic models; raised/resolved flattening issues so SQL was sane; wrote small services people | |
| could actually call. | |
| Outcome: Less friction; fewer bespoke scripts; more reliable dashboards and models. | |
| — | |
| What’s next | |
| — | |
| [Metadata] | |
| Exec summary: Signals your future focus on expanding agentic workflows, strengthening data contracts, and sharing lightweight automation patterns for charities and SMEs. | |
| Keywords: future plans, agentic workflows, data contracts, knowledge sharing, SMEs, charities | |
| Questions: | |
| 1. What future technical areas do you plan to invest in? | |
| 2. How do you intend to help charities and SMEs with automation? | |
| 3. Why are explicit data contracts a priority for your upcoming work? | |
| 4. How does knowledge sharing feature in your outlook? | |
| [/Metadata] | |
| More agentic workflows wired to real systems; more explicit data contracts; and more public sharing of light‑weight tools | |
| and patterns. I want charities and SMEs to have leverage without needing a 50‑person platform team. | |
| — | |
| Contact & links | |
| — | |
| [Metadata] | |
| Exec summary: Provides primary contact information and online presence links for reaching you or exploring your work. | |
| Keywords: contact, email, GitHub, portfolio, LinkedIn, location | |
| Questions: | |
| 1. What email addresses can be used to contact you personally or for business? | |
| 2. Where can someone review your code and projects? | |
| 3. Which portfolio site showcases your broader work? | |
| 4. What is your LinkedIn profile and current location? | |
| [/Metadata] | |
| Email: [email protected] (personal) | [email protected] (business) | |
| GitHub: github.com/CodeHalwell | |
| Portfolio: codehalwell.io | |
| LinkedIn: linkedin.com/in/danielhalwell | |
| Location: Northwich, UK | |
| Thanks for reading. If there’s something you want to build — or a process that needs unblocking — I’m happy to chat. | |
| Let’s make the right thing the easy thing. | |
| [Metadata] | |
| Exec summary: Closing invitation encouraging collaboration and reinforcing your philosophy of making the right approach straightforward. | |
| Keywords: closing note, collaboration invite, philosophy, call to action, accessibility | |
| Questions: | |
| 1. What offer do you extend to potential collaborators? | |
| 2. How do you summarise your approach to solving problems? | |
| 3. What tone do you set for prospective conversations? | |
| 4. Why do you emphasise making the right thing easy? | |
| [/Metadata] | |
| — | |
| Selected GitHub Repositories (LLM‑Ready Index) | |
| — | |
| [Metadata] | |
| Exec summary: Introduces the curated list of your GitHub repositories with metadata for LLM-ready indexing, highlighting focus areas and inferred summaries. | |
| Keywords: GitHub index, repositories, LLM-ready, project catalogue, focus areas, tags | |
| Questions: | |
| 1. What is the purpose of this GitHub repositories section? | |
| 2. How are the repositories categorised and described? | |
| 3. Which metadata fields accompany each repository listing? | |
| 4. How does this section support LLM-friendly retrieval? | |
| [/Metadata] | |
| Daniel Halwell — Repositories Index (LLM-Ready) | |
| [Metadata] | |
| Exec summary: Explains the table-like format used to present repository metadata for quick scanning and indexing. | |
| Keywords: repository format, metadata fields, presentation structure, LLM-ready, quick reference | |
| Questions: | |
| 1. How are the repository entries structured for readability? | |
| 2. Which metadata columns are included for each repository? | |
| 3. Why is a consistent format important for LLM-ready indexing? | |
| 4. How does this format help with retrieval tasks? | |
| [/Metadata] | |
| — | |
| Selected GitHub Repositories (Organized by Category) | |
| — | |
| **LLM Utilities & Language Models** | |
| • yamllm - YAML ↔ LLM interaction utilities | |
| • simple_rag - Minimal RAG baseline implementation | |
| • openai-logp-viewer - Log probability inspection and visualization | |
| **Agentic Systems & Automation** | |
| • gradio-mcp-agent-hack - Model Context Protocol experimentation with Gradio | |
| • agents-for-art - Creative agent orchestration tools | |
| • n8n-mcp - n8n integration with Model Context Protocol | |
| • synthetic-data-agent - Automated synthetic data generation | |
| • research-agent - Deep research workflow automation | |
| • coding-agent-cli - Command-line coding assistant | |
| • agentic-ai-engineering - Agent engineering frameworks and patterns | |
| **Web Development & Portfolio** | |
| • CodeHalwell-Portfolio - Personal portfolio site | |
| • portfolio-codehalwell - Alternative portfolio implementation | |
| • WeatherApp - Weather API integration with UI | |
| • web-page-test - Web development experiments | |
| **Data Science & Analytics** | |
| • washing-line-predictor - Weather-informed predictive modeling | |
| • openai-logp-viewer - Data visualization for LLM analysis | |
| • arxiv-scraper - Academic paper collection and processing | |
| **Healthcare & Specialized Domains** | |
| • BabelFHIR - FHIR/HL7 healthcare data processing | |
| **Learning & Coursework** | |
| • ibm-build-genai-apps - IBM watsonx platform exploration | |
| • ibm-python-data-analysis - IBM data analysis certification work | |
| • llm_engineering-course - LLM engineering fundamentals | |
| • LLM101n - Large language model foundations | |
| • DataCamp_DS_Cert - Data science certification projects | |
| • oaqjp-final-project-emb-ai - Embedded AI final project | |
| **Personal Projects & Apps** | |
| • MyPoppet / poppet - Personal assistant experiments | |
| • translator-with-voice-and-watsonx - Voice translation with IBM watsonx | |
| **Utilities & Experiments** | |
| • MyGPT - Quick GPT experimentation | |
| • Grand-Gardens-AI - AI garden management concepts | |
| • Useful_Scripts - General automation scripts | |
| • deep-research - Research workflow tools | |
| • food_review - Food review analysis | |
| • podcast-censoring - Podcast content filtering | |
| • playground_series_september2025 - September 2025 coding experiments | |
| • pallscripting - Scripting utilities | |
| • deep-learning-illustrated - Deep learning visualization | |
| • build_own_chatbot_without_open_ai - Non-OpenAI chatbot implementation | |
| • code_chat_bot - Code-focused chatbot | |
| • neurIPS-open-polymer - Polymer research collaboration | |
| **Repository List for Automation:** | |
| repos = [ | |
| "CodeHalwell/yamllm","CodeHalwell/gradio-mcp-agent-hack","CodeHalwell/CodeHalwell-Portfolio", | |
| "CodeHalwell/MyGPT","CodeHalwell/agents-for-art","CodeHalwell/Grand-Gardens-AI", | |
| "CodeHalwell/Useful_Scripts","CodeHalwell/MyPoppet","CodeHalwell/deep-research", | |
| "CodeHalwell/ibm-build-genai-apps","CodeHalwell/n8n-mcp","CodeHalwell/washing-line-predictor", | |
| "CodeHalwell/portfolio-codehalwell","CodeHalwell/openai-logp-viewer","CodeHalwell/food_review", | |
| "CodeHalwell/synthetic-data-agent","CodeHalwell/simple_rag","CodeHalwell/ibm-python-data-analysis", | |
| "CodeHalwell/podcast-censoring","CodeHalwell/playground_series_september2025","CodeHalwell/poppet", | |
| "CodeHalwell/arxiv-scraper","RanL703/neurIPS-open-polymer","CodeHalwell/WeatherApp", | |
| "CodeHalwell/research-agent","CodeHalwell/pallscripting","CodeHalwell/deep-learning-illustrated", | |
| "quotentiroler/BabelFHIR","CodeHalwell/coding-agent-cli","CodeHalwell/llm_engineering-course", | |
| "CodeHalwell/agentic-ai-engineering","CodeHalwell/translator-with-voice-and-watsonx", | |
| "CodeHalwell/build_own_chatbot_without_open_ai","CodeHalwell/oaqjp-final-project-emb-ai", | |
| "CodeHalwell/LLM101n","CodeHalwell/code_chat_bot","CodeHalwell/DataCamp_DS_Cert", | |
| "CodeHalwell/web-page-test" | |
| ] | |
| [Metadata] | |
| Exec summary: Supplies a Python list of repository identifiers to support scripted ingestion or indexing workflows. | |
| Keywords: repository list, Python array, identifiers, automation, ingestion helper | |
| Questions: | |
| 1. What data structure is used to enumerate your repositories for automation? | |
| 2. How many repositories are captured in this list and what patterns do they follow? | |
| 3. How might this list be used in vector database or indexing pipelines? | |
| 4. Why is maintaining a consolidated repository list useful for your digital CV? | |
| [/Metadata] | |
| — | |
| How I Work & Tools (Consolidated) | |
| — | |
| [Metadata] | |
| Exec summary: Consolidated overview of your primary languages, data/ML stack, GenAI tooling, service design experience, orchestration platforms, data platforms, and workplace productivity tools. | |
| Keywords: skills overview, toolchain, programming languages, ML stack, GenAI platforms, orchestration, DevOps, productivity tools | |
| Questions: | |
| 1. Which programming languages and core tools do you rely on daily? | |
| 2. What data and machine learning libraries form your toolkit? | |
| 3. Which GenAI and agent orchestration platforms do you use? | |
| 4. What services, APIs, and orchestration methods do you employ? | |
| 5. Which data platforms and DevOps tools are integral to your workflow? | |
| 6. How do you manage documentation, project tracking, and operations? | |
| [/Metadata] | |
| • Languages & Core: Python (heavy daily use), SQL, TypeScript (portfolio/UI), Bash. | |
| • Data & ML: NumPy, Pandas, scikit-learn, XGBoost, PyTorch, PyTorch Geometric; Power BI, Plotly, Matplotlib. | |
| • GenAI & Agents: OpenAI API, Anthropic, Watsonx; Retrieval (FAISS/Chroma/Qdrant), RAG patterns; tool-use/MCP; CrewAI/AutoGen/SmolAgents; prompt evaluation and structured output with Pydantic/JSON-schema. | |
| • Services & APIs: FastAPI (typed models via Pydantic), Flask (legacy), REST design; LangGraph-style orchestration patterns. | |
| • Orchestration: n8n (daily), lightweight cron, Modal, small Dockerized jobs. | |
| • Data Platforms: Snowflake/SQL; ETL correctness and schema hygiene are non-negotiable. | |
| • DevOps/Infra: Docker, GitHub Actions, Azure/AWS/GCP basics. | |
| • Workplace OS: Notion (docs/CRM/case studies), Linear (projects), Google Workspace, Canva, Miro. Accounting via QuickBooks; banking via Starling (sole trader). | |
| — | |
| Engagement Policy (Ethics & Fit) | |
| — | |
| [Metadata] | |
| Exec summary: Defines your ethical guidelines for client engagements, categorising red, amber, and green domains with default operating principles. | |
| Keywords: ethics, engagement policy, red lines, amber considerations, green projects, governance, transparency | |
| Questions: | |
| 1. What types of work do you refuse on ethical grounds? | |
| 2. Which project domains require additional governance before engagement? | |
| 3. What sectors align well with your ethical stance? | |
| 4. What default practices do you implement to maintain ethical standards? | |
| [/Metadata] | |
| Red lines: no fossil fuels, weapons/arms, or harmful surveillance/abusive tech; avoid organisations and conflicts that contradict a people-first stance. | |
| Ambers: ad tech, scraping of private data without consent, high-risk medical claims — require strict scoping, governance and auditability. | |
| Greens: health & life sciences; education & upskilling; charities & non-profits; SMEs doing practical automation; research tooling. | |
| Defaults: minimal lock-in; clear IP/licensing; privacy-by-design; evals and guardrails for GenAI; documented handovers with maintainers named. | |
| — | |
| Teaching & Mentoring | |
| — | |
| [Metadata] | |
| Exec summary: Summarises your mentoring philosophy rooted in practical explanations, co-designed exercises, and fast feedback loops inspired by teaching experiences in Guyana. | |
| Keywords: teaching, mentoring, diagrams, notebooks, handovers, feedback loops, user education | |
| Questions: | |
| 1. How do you approach teaching and documentation while building? | |
| 2. What mentoring activities do you participate in at work? | |
| 3. How did your time in Guyana influence your teaching style? | |
| 4. Why do you emphasise tight feedback loops and simple interfaces when mentoring? | |
| [/Metadata] | |
| I explain as I build: diagrams, notebooks, READMEs, and handover sessions. I mentor through internal coding groups, | |
| co-design small exercises, and prefer “show, don’t tell.” My Guyana year made me comfortable teaching under constraints; | |
| at work I apply the same approach — tight feedback loops, simple interfaces, and momentum. | |
| — | |
| Personal Tech Lab (Home Server & Experiments) | |
| — | |
| [Metadata] | |
| Exec summary: Describes your home lab environment, including server setups and experimentation with local LLM fine-tuning under the ChemGemma project. | |
| Keywords: home lab, Mac Mini server, Raspberry Pi, automations, LLM lab, ChemGemma, fine-tuning, GRPO | |
| Questions: | |
| 1. What infrastructure do you maintain for personal experiments? | |
| 2. Which services run on your home server and why? | |
| 3. How do you use local hardware to prototype agents and RAG patterns? | |
| 4. What is ChemGemma, and how did you fine-tune it? | |
| [/Metadata] | |
| I tinker. I run a Mac Mini home server (Jellyfin, n8n, Nextcloud, Pi‑hole, Home Assistant, web servers) and keep a Raspberry Pi 4B (SSD-boot) for small | |
| automations. It’s where I test agents, RAG patterns, and light-weight services before hardening them for work. | |
| I have an LLM lab for local models (RTX 3090). I've successfully run fine tuning and reinforcement learning using an open source Gemma model. I called this | |
| model ChemGemma having curated a dataset from HuggingFace to perform supervised fine tuning and RL using GRPO to add reasoning. | |
| — | |
| n8n Systems: Daily arXiv Scraper & RAG Pipeline | |
| — | |
| [Metadata] | |
| Exec summary: Details your n8n automations for research monitoring and RAG querying, outlining workflow steps, design choices, and resulting impact on knowledge retrieval. | |
| Keywords: n8n workflows, arXiv scraper, RAG pipeline, automation, Qdrant, structured prompts, evaluation, impact | |
| Questions: | |
| 1. What goals do your n8n workflows achieve for research ingestion and querying? | |
| 2. How is the daily arXiv scraper structured from trigger to vector storage? | |
| 3. What design choices underpin the RAG query pipeline’s accuracy and guardrails? | |
| 4. How do these systems improve your productivity and knowledge sharing? | |
| 5. What impact metrics demonstrate the value of these workflows? | |
| [/Metadata] | |
| I’ve built a set of n8n flows that keep me and my tools up to date with AI/CS/DS research and make that corpus queryable. | |
| 1) Daily arXiv Scraper (image: /mnt/data/db5d6387-16a9-4004-8a7b-6663d15217a2.png) | |
| Goal: pull fresh research (AI/CS/DS), normalise it with an LLM, store summaries/metadata in Notion, and index the text in a vector store for search and RAG. | |
| High-level steps in the flow: | |
| • Schedule Trigger → RSS Read: runs daily, fetching new arXiv entries from feeds I care about. | |
| • Loop Over Items: iterates papers. | |
| • Message a model (Message Model): composes a clean prompt per item with extracted metadata. | |
| • AI Agent (Chat Model + Tools): calls an OpenAI chat model; reaches out via an HTTP Request node when extra info is needed (e.g., to fetch the abstract or PDF link); produces structured JSON (title, authors, abstract, URL, categories, license hints). | |
| • Structured Output Parser: enforces the schema and catches malformed outputs. | |
| • If (branch): routes by licence/permissiveness or other policy flags. | |
| • Create a database page (Notion): two variants of the Notion writer — one for permissive/common licences, another for restricted — so that only permissive-license papers are fully “stored” and enriched (restricted ones get a link-only/metadata card). | |
| • Merge: folds both branches back into a single stream. | |
| • Qdrant Vector Store: chunk + embed the permitted text (abstract/fulltext when allowed) using OpenAI embeddings; write vectors and metadata for retrieval later. | |
| Result: a clean, daily-updated Notion knowledge base + vector index of papers I’m allowed to store, with policy-respecting handling of licences. It’s simple, fast to audit, and easy to extend. | |
| 2) RAG Query Pipeline (image: /mnt/data/b503029c-a157-4c48-9a40-5271840d4327.png) | |
| Goal: ask natural-language questions over the paper corpus with transparent retrieval and guardrails. | |
| High-level steps in the flow: | |
| • Webhook: entry-point for a query (from my portal or CLI). | |
| • PromptAugmentation: uses a chat model to clean/expand the user prompt (e.g., add synonyms, normalise acronyms) and emits a structured plan via a Structured Output Parser. | |
| • Code: tiny glue to format search queries and pass control values (k, filters). | |
| • Loop Over Items: if the plan has multiple sub-queries, iterate them. | |
| • AI Agent: coordinates two tools — (a) Qdrant Vector Store search with OpenAI embeddings; (b) a Cohere re-ranker for higher precision. | |
| • Aggregate → Message a model → Respond to Webhook: aggregates top contexts, prompts the model to answer with citations and explicit “what I don’t know,” then returns the response JSON to the caller. | |
| Design choices: | |
| • Retrieval is explicit: top-k, distances/scores, and doc IDs logged. | |
| • Re-ranking improves answer quality without overloading the LLM. | |
| • Style/guardrails: British spelling, direct tone; citations mandatory; no hallucinated claims beyond the retrieved contexts. | |
| - I hooked the RAG pipeline up to Telegram, that way, I can put a message in Telegram and it'll start the RAG pipeline, retrive relevant papers | |
| and then drop the response back in a message a few minutes later. | |
| Impact: | |
| • I don’t waste time manually scanning feeds; new work lands in Notion and the vector store each morning. | |
| • I can query “What’s new on tool-use/MCP for agents?” and get a grounded answer with links. | |
| • The same index powers demos and internal RAG utilities — a single source of truth. | |
| Email Triage - I get lost in a sea of emails, its so easy to miss things, the answer, have an agent do it for you. | |
| My triage agent reads through emails and uses a tier system to highlight items for escalations. If an important email comes through I get a message and can take a look. | |
| [Metadata] | |
| Exec summary: Summarises the benefits of your automated research pipelines, highlighting time savings, improved retrieval, and reusable indexes. | |
| Keywords: impact summary, time savings, retrieval, demos, single source of truth | |
| Questions: | |
| 1. How do the automations reduce your manual research effort? | |
| 2. What querying capabilities do the pipelines unlock? | |
| 3. How does the indexed corpus serve multiple applications? | |
| 4. Why is a single source of truth valuable for your workflows? | |
| [/Metadata] | |
| — | |
| What I’m Open To | |
| — | |
| [Metadata] | |
| Exec summary: Lists the types of roles, pro bono work, and collaborations you is interested in pursuing, emphasizing AI engineering and mission-aligned projects. | |
| Keywords: opportunities, AI engineer roles, pro bono, collaborations, automation, charities | |
| Questions: | |
| 1. What kinds of full-time or contract roles are you seeking? | |
| 2. What pro bono engagements do you offer to charities? | |
| 3. Which collaborative areas appeal to you for future work? | |
| 4. How does this section help partners understand fit? | |
| [/Metadata] | |
| • Roles: AI Engineer / ML Engineer / Data Scientist (UK-remote or North West hybrid); full time positions and short build engagements for copilots, RAG, and agentic automations. | |
| • Pro bono: time-boxed PoCs for UK charities and mission-led organisations. | |
| • Collaborations: research tooling, open-source scaffolding, educational content. | |
| — | |
| Why I Love to Code (and the Thing I Obsess About) | |
| — | |
| [Metadata] | |
| Exec summary: Expresses your passion for coding as the fastest route from problem to improvement, highlighting your focus on bottlenecks and iterative solutions. | |
| Keywords: passion for coding, problem solving, bottlenecks, iteration, automation, optimisation | |
| Questions: | |
| 1. Why do you find coding compelling and energising? | |
| 2. How do you describe your approach to identifying and solving bottlenecks? | |
| 3. Which problem types do you align with specific solution strategies (RAG, Bayesian optimisation, APIs)? | |
| 4. How does iteration and feedback drive your projects? | |
| [/Metadata] | |
| I love writing Python and shipping small tools that unblock people — it’s the quickest route from “problem” to “better.” | |
| I tend to obsess about the bottleneck: if it’s retrieval, I build RAG; if it’s too many experiments, I reach for Bayesian optimisation; | |
| if it’s brittle handoffs, I ship a typed API. Solving problems is the through-line for me — discreet questions, clear interfaces, | |
| quick feedback, and steady iteration. | |
| I love to help people in general, if someone is struggling with some Python, I like to solve it, if they are having issue with Microsoft COpilot Studio flows, | |
| I'm more than happy to take a quick look and see what I can do. So far, people really appreciate this approach and I get some really good feedback. | |
| — | |
| Sport & Judo | |
| — | |
| [Metadata] | |
| Exec summary: Shares your sporting interests and judo achievements, highlighting discipline, composure, and practical learning skills gained from coaching and competition. | |
| Keywords: sport, judo, black belt, competition, coaching, discipline, composure, learning by doing | |
| Questions: | |
| 1. Which sports do you follow or participate in? | |
| 2. What level of achievement did you reach in judo and at what age? | |
| 3. What competitions and coaching experiences shape your discipline and composure? | |
| 4. How do lessons from judo translate into your daily work habits? | |
| [/Metadata] | |
| I’m into sport — football, rugby, Formula 1, and most things with a scoreboard. As a teenager I was a judoka: | |
| I earned my black belt at 16 (the youngest age you can), won medals across the country, including a bronze at an | |
| international competition in London, and trained as a coach for my local club. It taught me discipline, composure under pressure, a | |
| nd how to learn by doing — lessons I still apply daily. | |