HallD commited on
Commit
ad7b82e
·
verified ·
1 Parent(s): 4aafe22

Upload 31 files

Browse files

Initial upload of project

.gitattributes ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.png filter=lfs diff=lfs merge=lfs -text
2
+ *.pdf filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.sqlite3 filter=lfs diff=lfs merge=lfs -text
5
+ *.7z filter=lfs diff=lfs merge=lfs -text
6
+ *.arrow filter=lfs diff=lfs merge=lfs -text
7
+ *.bin filter=lfs diff=lfs merge=lfs -text
8
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
9
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
10
+ *.ftz filter=lfs diff=lfs merge=lfs -text
11
+ *.gz filter=lfs diff=lfs merge=lfs -text
12
+ *.h5 filter=lfs diff=lfs merge=lfs -text
13
+ *.joblib filter=lfs diff=lfs merge=lfs -text
14
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
15
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
16
+ *.model filter=lfs diff=lfs merge=lfs -text
17
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
18
+ *.npy filter=lfs diff=lfs merge=lfs -text
19
+ *.npz filter=lfs diff=lfs merge=lfs -text
20
+ *.onnx filter=lfs diff=lfs merge=lfs -text
21
+ *.ot filter=lfs diff=lfs merge=lfs -text
22
+ *.parquet filter=lfs diff=lfs merge=lfs -text
23
+ *.pb filter=lfs diff=lfs merge=lfs -text
24
+ *.pickle filter=lfs diff=lfs merge=lfs -text
25
+ *.pkl filter=lfs diff=lfs merge=lfs -text
26
+ *.pt filter=lfs diff=lfs merge=lfs -text
27
+ *.pth filter=lfs diff=lfs merge=lfs -text
28
+ *.rar filter=lfs diff=lfs merge=lfs -text
29
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
30
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
31
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
32
+ *.tar filter=lfs diff=lfs merge=lfs -text
33
+ *.tflite filter=lfs diff=lfs merge=lfs -text
34
+ *.tgz filter=lfs diff=lfs merge=lfs -text
35
+ *.wasm filter=lfs diff=lfs merge=lfs -text
36
+ *.xz filter=lfs diff=lfs merge=lfs -text
37
+ *.zip filter=lfs diff=lfs merge=lfs -text
38
+ *.zst filter=lfs diff=lfs merge=lfs -text
39
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
40
+
.gitignore ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ __pycache__/
3
+ .pytest_cache/
4
+ .DS_Store
5
+ chat_log.txt
6
+ utils/digital-cv.log
7
+ data/chroma/
8
+ *.sqlite3
9
+ *.bin
10
+ *.log
11
+ *.lock
12
+ # Python-generated files
13
+ __pycache__/
14
+ *.py[oc]
15
+ build/
16
+ dist/
17
+ wheels/
18
+ *.egg-info
19
+
20
+ # Virtual environments
21
+ .venv/
22
+ .env/
23
+ venv/
24
+ env/
25
+
26
+ # Environment variables and secrets
27
+ .env
28
+ .env.local
29
+ .env.*.local
30
+
31
+ # IDE and editor files
32
+ .vscode/
33
+ .idea/
34
+ *.swp
35
+ *.swo
36
+ *~
37
+
38
+ # Operating system files
39
+ .DS_Store
40
+ Thumbs.db
41
+
42
+ # Logs
43
+ *.log
44
+ logs/
45
+
46
+ # Temporary files
47
+ tmp/
48
+ temp/
49
+ .tmp/
50
+
51
+ # Coverage and testing
52
+ .coverage
53
+ .pytest_cache/
54
+ htmlcov/
55
+
56
+ # Gradio temporary files
57
+ gradio_cached_examples/
58
+ flagged/
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.11
README.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ title: DigitalDan
2
+ emoji: 📊
3
+ colorFrom: purple
4
+ colorTo: blue
5
+ sdk: gradio
6
+ sdk_version: 5.49.1
7
+ app_file: app.py
8
+ pinned: false
9
+ license: mit
10
+ short_description: Digital twin of me Daniel Halwell
11
+
12
+ # Digital CV - Interactive Personal Assistant
13
+
14
+ An AI-powered digital CV that allows visitors to chat with Daniel Halwell through an intelligent conversational interface. Built with Gradio and powered by OpenAI's GPT models, this application provides an interactive way to learn about Daniel's professional background, experience, and capabilities.
15
+
16
+ ## 🌟 Features
17
+
18
+ - **Interactive Chat Interface**: Natural language conversations about Daniel's experience, skills, and projects
19
+ - **Intelligent Context Awareness**: Draws from comprehensive professional summary and LinkedIn profile data
20
+ - **Contact Recording**: Ability to capture visitor contact information with proper consent
21
+ - **Professional Presentation**: Clean, responsive UI with custom branding
22
+ - **Question Tracking**: Logs unknown questions to continuously improve the knowledge base
23
+
24
+ ## 🔧 Technology Stack
25
+
26
+ - **Frontend**: Gradio (Python-based web UI framework)
27
+ - **AI/LLM**: OpenAI GPT models with function calling
28
+ - **Document Processing**: PyPDF for resume parsing
29
+ - **Notifications**: Pushover integration for contact alerts
30
+ - **Deployment**: Supports containerized deployment
31
+ - **Python Version**: 3.11+
32
+
33
+ ## 📁 Project Structure
34
+
35
+ ```
36
+ digital-cv/
37
+ ├── app.py # Main Gradio application
38
+ ├── me/
39
+ │ ├── Profile.pdf # LinkedIn profile export
40
+ │ └── summary.txt # Comprehensive professional summary
41
+ ├── utils/
42
+ │ ├── chat.py # Core chat functionality and AI integration
43
+ │ ├── tool_calls.py # Function calling tools (contact recording, etc.)
44
+ │ └── logging.py # Application logging setup
45
+ ├── assets/
46
+ │ ├── logo.png # Application logo
47
+ │ ├── dan.png # Avatar image
48
+ │ └── Logo WO Background.png
49
+ ├── pyproject.toml # Project dependencies and metadata
50
+ ├── .env # Environment variables (create this file locally; not included in repo)
51
+ └── README.md # This file
52
+ ## 🚀 Quick Start
53
+
54
+ ### Prerequisites
55
+
56
+ - Python 3.11 or higher
57
+ - OpenAI API key
58
+ - (Optional) Pushover account for notifications
59
+
60
+ ### Installation
61
+
62
+ 1. **Clone the repository**
63
+ ```bash
64
+ git clone https://github.com/CodeHalwell/digital-cv.git
65
+ cd digital-cv
66
+ ```
67
+
68
+ 2. **Install dependencies**
69
+ ```bash
70
+ pip install -e .
71
+ ```
72
+
73
+ Or install key dependencies directly:
74
+ ```bash
75
+ pip install gradio openai python-dotenv pypdf requests
76
+ ```
77
+
78
+ 3. **Set up environment variables**
79
+ Create a `.env` file in the root directory:
80
+ ```env
81
+ OPENAI_API_KEY=your_openai_api_key_here
82
+ PUSHOVER_TOKEN=your_pushover_token (optional)
83
+ PUSHOVER_USER=your_pushover_user_key (optional)
84
+ PORT=7860
85
+ ```
86
+
87
+ 4. **Run the application**
88
+ ```bash
89
+ python app.py
90
+ ```
91
+
92
+ 5. **Access the interface**
93
+ Open your browser and navigate to `http://localhost:7860`
94
+
95
+ ## 🎯 Usage
96
+
97
+ ### For Visitors
98
+
99
+ - Start a conversation by typing questions about Daniel's experience, skills, or projects
100
+ - Example prompts:
101
+ - "Tell me about your last role"
102
+ - "How do you design a RAG pipeline?"
103
+ - "What projects have you worked on?"
104
+ - "Can you scope a small automation?"
105
+ - Share your contact information if you'd like to connect directly
106
+ - Use the "Stop" button to interrupt streaming responses
107
+
108
+ ### For Developers
109
+
110
+ - The chat interface automatically draws context from `me/summary.txt` and `me/Profile.pdf`
111
+ - Function calling enables contact recording and question tracking
112
+ - All conversations are logged for analytics and improvement
113
+
114
+ ## 🔧 Configuration
115
+
116
+ ### Environment Variables
117
+
118
+ | Variable | Description | Required |
119
+ |----------|-------------|----------|
120
+ | `OPENAI_API_KEY` | OpenAI API key for GPT models | Yes |
121
+ | `PUSHOVER_TOKEN` | Pushover application token | No |
122
+ | `PUSHOVER_USER` | Pushover user key | No |
123
+ | `PORT` | Server port (default: 7860) | No |
124
+
125
+ ### Customization
126
+
127
+ - **Personal Content**: Update `me/summary.txt` with your professional background
128
+ - **Profile**: Replace `me/Profile.pdf` with your LinkedIn export
129
+ - **Branding**: Update images in the `assets/` directory
130
+ - **Styling**: Modify the `custom_css` variable in `app.py`
131
+
132
+ ## 🛡️ Features Deep Dive
133
+
134
+ ### AI Chat System
135
+
136
+ The chat system uses OpenAI's GPT models with:
137
+ - **System prompts** that establish Daniel's professional persona
138
+ - **Function calling** for structured interactions (contact recording, question logging)
139
+ - **Content guardrails** to ensure appropriate conversations
140
+ - **Context injection** from professional documents
141
+
142
+ ### Contact Management
143
+
144
+ When visitors share contact information:
145
+ - Details are validated and recorded via Pushover notifications
146
+ - Privacy-conscious approach - only records when explicitly shared
147
+ - Structured data capture (email, name, context notes)
148
+
149
+ ### Question Tracking
150
+
151
+ Unknown or unanswerable questions are:
152
+ - Automatically detected and logged
153
+ - Sent via Pushover for manual review
154
+ - Used to continuously improve the knowledge base
155
+
156
+ ## 📊 Monitoring & Analytics
157
+
158
+ - Application logs provide detailed interaction tracking
159
+ - Pushover notifications alert to new contacts and unknown questions
160
+ - Chat logs can be analyzed for common themes and improvements
161
+
162
+ ## 🚀 Deployment
163
+
164
+ ### Local Development
165
+ ```bash
166
+ python app.py
167
+ ```
168
+
169
+ ### Production Deployment
170
+ The application is designed for containerized deployment:
171
+
172
+ ```dockerfile
173
+ # Example Dockerfile approach
174
+ FROM python:3.11-slim
175
+
176
+ WORKDIR /app
177
+ COPY . .
178
+ RUN pip install -e .
179
+
180
+ CMD ["python", "app.py"]
181
+ ```
182
+
183
+ ### Deployment Considerations
184
+
185
+ - Set `debug=False` in production
186
+ - Use environment variables for all secrets
187
+ - Configure appropriate server limits for Gradio
188
+ - Consider using a reverse proxy (nginx) for production traffic
189
+
190
+ ## 🤝 Contributing
191
+
192
+ 1. Fork the repository
193
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
194
+ 3. Make your changes and test thoroughly
195
+ 4. Update documentation as needed
196
+ 5. Commit your changes (`git commit -m 'Add amazing feature'`)
197
+ 6. Push to the branch (`git push origin feature/amazing-feature`)
198
+ 7. Open a Pull Request
199
+
200
+ ### Development Guidelines
201
+
202
+ - Follow existing code style and patterns
203
+ - Test changes with different conversation flows
204
+ - Update `me/summary.txt` if adding new professional information
205
+ - Ensure all dependencies are properly documented
206
+
207
+ ## 📄 License
208
+
209
+ This project is personal intellectual property of Daniel Halwell. Contact for usage permissions.
210
+
211
+ ## 📞 Contact
212
+
213
+ - **Email**: [email protected] (personal) | [email protected] (business)
214
+ - **GitHub**: [@CodeHalwell](https://github.com/CodeHalwell)
215
+ - **Portfolio**: [codehalwell.io](https://codehalwell.io)
216
+ - **LinkedIn**: [linkedin.com/in/danielhalwell](https://linkedin.com/in/danielhalwell)
217
+ - **Location**: Northwich, UK
218
+
219
+ ## 🎨 About the Project
220
+
221
+ This digital CV represents a modern approach to professional networking and self-presentation. Rather than static resumes, it offers an interactive experience that showcases both technical capabilities and communication skills. The project demonstrates expertise in:
222
+
223
+ - AI/LLM integration and prompt engineering
224
+ - Modern Python web development with Gradio
225
+ - User experience design for professional applications
226
+ - Privacy-conscious data handling
227
+ - Scalable application architecture
228
+
229
+ ---
230
+
231
+ *Made with ❤️ — CoDHe Labs*
app.py ADDED
@@ -0,0 +1,788 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ def run_with_watch():
2
+ from watchfiles import run_process
3
+
4
+ logger.info("Starting watch mode on 'app' directory")
5
+
6
+ def _run():
7
+ logger.info("Reloading app.py")
8
+ main()
9
+
10
+ run_process(
11
+ path=".",
12
+ target=_run,
13
+ watch_filter=lambda change, path: path.endswith(".py"),
14
+ )
15
+ from dotenv import load_dotenv
16
+ import os
17
+ import gradio as gr
18
+ from utils.app_logging import setup_logging
19
+ from utils.chat import Me
20
+
21
+ load_dotenv(override=True)
22
+ logger = setup_logging()
23
+
24
+ logger.info("Starting digital-cv")
25
+
26
+ me = Me()
27
+ logger.info("Me initialized")
28
+ # Theming and chat styling for embedding
29
+ theme = gr.themes.Soft(primary_hue="indigo", neutral_hue="slate")
30
+ initial_assistant_message = (
31
+ "Hello, nice to meet you! At any time, feel free to give me your name and email; "
32
+ "I'll make a note and I can get back to you later."
33
+ )
34
+
35
+ chatbot = gr.Chatbot(
36
+ label=None,
37
+ avatar_images=("assets/logo.png", "assets/dan.png"),
38
+ render_markdown=True,
39
+ type="messages",
40
+ value=[{"role": "assistant", "content": initial_assistant_message}],
41
+ elem_id="chatbot",
42
+ )
43
+ logger.info("Chatbot initialized")
44
+ custom_css = """
45
+ html, body, .gradio-container { height: 100%; }
46
+ body {
47
+ margin: 0;
48
+ font-family: "Inter", "SF Pro Display", "Segoe UI", system-ui, -apple-system, sans-serif;
49
+ background: linear-gradient(135deg, #0f172a 0%, #1e293b 25%, #334155 50%, #1e293b 75%, #0f172a 100%);
50
+ background-attachment: fixed;
51
+ color: #f8fafc;
52
+ font-feature-settings: "kern" 1, "liga" 1, "ss01" 1;
53
+ text-rendering: optimizeLegibility;
54
+ -webkit-font-smoothing: antialiased;
55
+ -moz-osx-font-smoothing: grayscale;
56
+ font-weight: 400;
57
+ line-height: 1.6;
58
+ }
59
+ .gradio-container {
60
+ display: flex;
61
+ background: transparent;
62
+ padding: 28px 0 36px;
63
+ }
64
+ #container {
65
+ max-width: 1280px;
66
+ margin: 0 auto;
67
+ padding: 32px 40px 40px;
68
+ display: flex;
69
+ flex-direction: column;
70
+ flex: 1 1 auto;
71
+ min-height: 0;
72
+ border-radius: 32px;
73
+ background: rgba(15, 23, 42, 0.95);
74
+ box-shadow:
75
+ 0 64px 128px rgba(0, 0, 0, 0.4),
76
+ 0 32px 64px rgba(0, 0, 0, 0.2),
77
+ 0 0 0 1px rgba(148, 163, 184, 0.1),
78
+ inset 0 1px 0 rgba(255, 255, 255, 0.05);
79
+ backdrop-filter: blur(24px);
80
+ border: 1px solid rgba(148, 163, 184, 0.15);
81
+ position: relative;
82
+ overflow: hidden;
83
+ }
84
+ #container::before {
85
+ content: '';
86
+ position: absolute;
87
+ top: 0;
88
+ left: 0;
89
+ right: 0;
90
+ height: 2px;
91
+ background: linear-gradient(90deg,
92
+ transparent,
93
+ rgba(59, 130, 246, 0.6),
94
+ rgba(147, 51, 234, 0.6),
95
+ transparent
96
+ );
97
+ z-index: 1;
98
+ }
99
+ #container::after {
100
+ content: '';
101
+ position: absolute;
102
+ top: 0;
103
+ left: 0;
104
+ right: 0;
105
+ bottom: 0;
106
+ background: radial-gradient(circle at 50% 0%, rgba(59, 130, 246, 0.05) 0%, transparent 50%);
107
+ pointer-events: none;
108
+ z-index: 0;
109
+ }
110
+ #header {
111
+ align-items: center;
112
+ gap: 32px;
113
+ justify-content: flex-start;
114
+ margin-bottom: 16px;
115
+ position: relative;
116
+ z-index: 2;
117
+ }
118
+ #logo img {
119
+ max-height: 240px;
120
+ width: auto;
121
+ border-radius: 24px;
122
+ object-fit: contain;
123
+ box-shadow:
124
+ 0 32px 64px rgba(0, 0, 0, 0.3),
125
+ 0 16px 32px rgba(0, 0, 0, 0.2),
126
+ 0 0 0 1px rgba(148, 163, 184, 0.1),
127
+ inset 0 1px 0 rgba(255, 255, 255, 0.1);
128
+ transition: all 0.4s cubic-bezier(0.4, 0, 0.2, 1);
129
+ filter: brightness(1.05) contrast(1.1);
130
+ }
131
+ #logo img:hover {
132
+ transform: translateY(-4px) scale(1.02);
133
+ box-shadow:
134
+ 0 48px 96px rgba(0, 0, 0, 0.4),
135
+ 0 24px 48px rgba(0, 0, 0, 0.3),
136
+ 0 0 0 1px rgba(59, 130, 246, 0.3),
137
+ inset 0 1px 0 rgba(255, 255, 255, 0.15);
138
+ filter: brightness(1.1) contrast(1.15);
139
+ }
140
+ #intro-card {
141
+ border-radius: 24px;
142
+ padding: 32px 36px;
143
+ background: linear-gradient(135deg,
144
+ rgba(30, 41, 59, 0.8) 0%,
145
+ rgba(51, 65, 85, 0.6) 50%,
146
+ rgba(30, 41, 59, 0.8) 100%
147
+ );
148
+ border: 1px solid rgba(148, 163, 184, 0.2);
149
+ box-shadow:
150
+ inset 0 1px 0 rgba(255, 255, 255, 0.1),
151
+ 0 16px 32px rgba(0, 0, 0, 0.2),
152
+ 0 8px 16px rgba(0, 0, 0, 0.1);
153
+ position: relative;
154
+ overflow: hidden;
155
+ backdrop-filter: blur(16px);
156
+ }
157
+ #intro-card::before {
158
+ content: '';
159
+ position: absolute;
160
+ top: 0;
161
+ left: 0;
162
+ right: 0;
163
+ height: 2px;
164
+ background: linear-gradient(90deg,
165
+ transparent,
166
+ rgba(59, 130, 246, 0.5),
167
+ rgba(147, 51, 234, 0.5),
168
+ transparent
169
+ );
170
+ }
171
+ #intro-card::after {
172
+ content: '';
173
+ position: absolute;
174
+ top: 0;
175
+ left: 0;
176
+ right: 0;
177
+ bottom: 0;
178
+ background: radial-gradient(circle at 50% 0%, rgba(59, 130, 246, 0.03) 0%, transparent 70%);
179
+ pointer-events: none;
180
+ }
181
+ #intro-card ul {
182
+ margin: 0.35rem 0 0.7rem;
183
+ padding-left: 1.1rem;
184
+ }
185
+ #intro-card li { margin-bottom: 0.3rem; }
186
+ #title {
187
+ text-align: center;
188
+ margin: 24px 0 32px;
189
+ letter-spacing: 0.08em;
190
+ text-transform: uppercase;
191
+ font-weight: 800;
192
+ color: #f1f5f9;
193
+ font-size: 1.75rem;
194
+ text-shadow:
195
+ 0 4px 12px rgba(0, 0, 0, 0.4),
196
+ 0 2px 6px rgba(0, 0, 0, 0.2);
197
+ position: relative;
198
+ z-index: 2;
199
+ background: linear-gradient(135deg, #f1f5f9 0%, #cbd5e1 100%);
200
+ -webkit-background-clip: text;
201
+ -webkit-text-fill-color: transparent;
202
+ background-clip: text;
203
+ }
204
+ #title::after {
205
+ content: '';
206
+ position: absolute;
207
+ bottom: -12px;
208
+ left: 50%;
209
+ transform: translateX(-50%);
210
+ width: 80px;
211
+ height: 3px;
212
+ background: linear-gradient(90deg,
213
+ transparent,
214
+ rgba(59, 130, 246, 0.8),
215
+ rgba(147, 51, 234, 0.8),
216
+ transparent
217
+ );
218
+ border-radius: 2px;
219
+ box-shadow: 0 2px 8px rgba(59, 130, 246, 0.3);
220
+ }
221
+ #chat-wrapper {
222
+ display: flex;
223
+ flex-direction: column;
224
+ gap: 16px;
225
+ flex: 1 1 auto;
226
+ min-height: 0;
227
+ }
228
+ #chatbot {
229
+ display: flex;
230
+ flex-direction: column;
231
+ min-height: 680px;
232
+ height: clamp(680px, calc(100dvh - 200px), 1200px);
233
+ border-radius: 28px;
234
+ border: 1px solid rgba(148, 163, 184, 0.2);
235
+ background: linear-gradient(135deg,
236
+ rgba(15, 23, 42, 0.95) 0%,
237
+ rgba(30, 41, 59, 0.9) 50%,
238
+ rgba(15, 23, 42, 0.95) 100%
239
+ );
240
+ box-shadow:
241
+ inset 0 1px 0 rgba(255, 255, 255, 0.1),
242
+ 0 48px 96px rgba(0, 0, 0, 0.3),
243
+ 0 24px 48px rgba(0, 0, 0, 0.2),
244
+ 0 0 0 1px rgba(148, 163, 184, 0.1);
245
+ position: relative;
246
+ overflow: hidden;
247
+ backdrop-filter: blur(20px);
248
+ z-index: 2;
249
+ }
250
+ #chatbot::before {
251
+ content: '';
252
+ position: absolute;
253
+ top: 0;
254
+ left: 0;
255
+ right: 0;
256
+ height: 2px;
257
+ background: linear-gradient(90deg,
258
+ transparent,
259
+ rgba(59, 130, 246, 0.6),
260
+ rgba(147, 51, 234, 0.6),
261
+ transparent
262
+ );
263
+ z-index: 1;
264
+ }
265
+ #chatbot::after {
266
+ content: '';
267
+ position: absolute;
268
+ top: 0;
269
+ left: 0;
270
+ right: 0;
271
+ bottom: 0;
272
+ background: radial-gradient(circle at 50% 0%, rgba(59, 130, 246, 0.02) 0%, transparent 70%);
273
+ pointer-events: none;
274
+ z-index: 0;
275
+ }
276
+ #chatbot .wrapper,
277
+ #chatbot .bubble-wrap,
278
+ #chatbot .message-wrap {
279
+ flex: 1 1 auto;
280
+ display: flex;
281
+ min-height: 0;
282
+ }
283
+ #chatbot .bubble-wrap {
284
+ flex-direction: column;
285
+ overflow-y: auto;
286
+ padding: 12px 16px 20px;
287
+ gap: 16px;
288
+ }
289
+ #chatbot label span {
290
+ color: rgba(221, 230, 255, 0.85);
291
+ font-weight: 600;
292
+ letter-spacing: 0.03em;
293
+ }
294
+ #chatbot .message-wrap .message {
295
+ background: rgba(30, 41, 59, 0.9);
296
+ border-radius: 24px;
297
+ border: 1px solid rgba(148, 163, 184, 0.2);
298
+ box-shadow:
299
+ 0 24px 48px rgba(0, 0, 0, 0.2),
300
+ 0 12px 24px rgba(0, 0, 0, 0.1),
301
+ inset 0 1px 0 rgba(255, 255, 255, 0.05);
302
+ backdrop-filter: blur(12px);
303
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
304
+ position: relative;
305
+ overflow: hidden;
306
+ padding: 20px 24px;
307
+ margin: 8px 0;
308
+ line-height: 1.6;
309
+ word-wrap: break-word;
310
+ overflow-wrap: break-word;
311
+ hyphens: auto;
312
+ }
313
+ #chatbot .message-wrap .message::before {
314
+ content: '';
315
+ position: absolute;
316
+ top: 0;
317
+ left: 0;
318
+ right: 0;
319
+ height: 1px;
320
+ background: linear-gradient(90deg, transparent, rgba(255, 255, 255, 0.1), transparent);
321
+ }
322
+ #chatbot .message-wrap .message:hover {
323
+ transform: translateY(-2px) scale(1.01);
324
+ box-shadow:
325
+ 0 32px 64px rgba(0, 0, 0, 0.3),
326
+ 0 16px 32px rgba(0, 0, 0, 0.2),
327
+ inset 0 1px 0 rgba(255, 255, 255, 0.1);
328
+ }
329
+ #chatbot .message-wrap .bot .message {
330
+ background: linear-gradient(135deg,
331
+ rgba(30, 41, 59, 0.95) 0%,
332
+ rgba(51, 65, 85, 0.8) 100%
333
+ );
334
+ border-color: rgba(59, 130, 246, 0.3);
335
+ margin-right: 60px;
336
+ margin-left: 8px;
337
+ }
338
+ #chatbot .message-wrap .user .message {
339
+ background: linear-gradient(135deg,
340
+ rgba(30, 41, 59, 0.95) 0%,
341
+ rgba(51, 65, 85, 0.8) 100%
342
+ );
343
+ border-color: rgba(147, 51, 234, 0.3);
344
+ margin-left: 60px;
345
+ margin-right: 8px;
346
+ }
347
+ .suggestion-banner {
348
+ font-weight: 700;
349
+ letter-spacing: 0.06em;
350
+ text-transform: uppercase;
351
+ font-size: 0.95rem;
352
+ color: #cbd5e1;
353
+ margin-bottom: 16px;
354
+ text-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
355
+ position: relative;
356
+ z-index: 2;
357
+ }
358
+ .suggestion-buttons {
359
+ display: flex;
360
+ gap: 16px;
361
+ flex-wrap: wrap;
362
+ justify-content: space-between;
363
+ margin-bottom: 12px;
364
+ position: relative;
365
+ z-index: 2;
366
+ }
367
+ .suggestion-buttons button {
368
+ flex: 1 1 0;
369
+ min-width: 0;
370
+ padding: 16px 20px;
371
+ border-radius: 16px;
372
+ border: 1px solid rgba(148, 163, 184, 0.3);
373
+ background: linear-gradient(135deg,
374
+ rgba(30, 41, 59, 0.9) 0%,
375
+ rgba(51, 65, 85, 0.7) 100%
376
+ );
377
+ color: #f1f5f9;
378
+ font-weight: 600;
379
+ font-size: 0.95rem;
380
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
381
+ cursor: pointer;
382
+ position: relative;
383
+ overflow: hidden;
384
+ backdrop-filter: blur(12px);
385
+ box-shadow:
386
+ 0 8px 16px rgba(0, 0, 0, 0.1),
387
+ inset 0 1px 0 rgba(255, 255, 255, 0.1);
388
+ }
389
+ .suggestion-buttons button::before {
390
+ content: '';
391
+ position: absolute;
392
+ top: 0;
393
+ left: -100%;
394
+ width: 100%;
395
+ height: 100%;
396
+ background: linear-gradient(90deg,
397
+ transparent,
398
+ rgba(59, 130, 246, 0.1),
399
+ transparent
400
+ );
401
+ transition: left 0.6s ease;
402
+ }
403
+ .suggestion-buttons button::after {
404
+ content: '';
405
+ position: absolute;
406
+ top: 0;
407
+ left: 0;
408
+ right: 0;
409
+ height: 1px;
410
+ background: linear-gradient(90deg,
411
+ transparent,
412
+ rgba(255, 255, 255, 0.2),
413
+ transparent
414
+ );
415
+ }
416
+ .suggestion-buttons button:hover {
417
+ transform: translateY(-4px) scale(1.02);
418
+ box-shadow:
419
+ 0 32px 64px rgba(0, 0, 0, 0.2),
420
+ 0 16px 32px rgba(0, 0, 0, 0.1),
421
+ inset 0 1px 0 rgba(255, 255, 255, 0.15);
422
+ border-color: rgba(59, 130, 246, 0.5);
423
+ background: linear-gradient(135deg,
424
+ rgba(30, 41, 59, 0.95) 0%,
425
+ rgba(51, 65, 85, 0.8) 100%
426
+ );
427
+ }
428
+ .suggestion-buttons button:hover::before {
429
+ left: 100%;
430
+ }
431
+ .suggestion-buttons button:active {
432
+ transform: translateY(-2px) scale(1.01);
433
+ }
434
+ .gradio-container textarea {
435
+ border-radius: 20px !important;
436
+ min-height: 120px !important;
437
+ background: rgba(30, 41, 59, 0.95) !important;
438
+ border: 1px solid rgba(148, 163, 184, 0.3) !important;
439
+ color: #f1f5f9 !important;
440
+ box-shadow:
441
+ inset 0 1px 0 rgba(255, 255, 255, 0.1),
442
+ 0 16px 32px rgba(0, 0, 0, 0.2),
443
+ 0 8px 16px rgba(0, 0, 0, 0.1) !important;
444
+ font-size: 1rem !important;
445
+ font-weight: 500 !important;
446
+ line-height: 1.6 !important;
447
+ padding: 20px 24px !important;
448
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
449
+ backdrop-filter: blur(16px) !important;
450
+ position: relative !important;
451
+ overflow: hidden !important;
452
+ }
453
+ .gradio-container textarea::before {
454
+ content: '';
455
+ position: absolute;
456
+ top: 0;
457
+ left: 0;
458
+ right: 0;
459
+ height: 1px;
460
+ background: linear-gradient(90deg, transparent, rgba(255, 255, 255, 0.1), transparent);
461
+ }
462
+ .gradio-container textarea:focus {
463
+ outline: none !important;
464
+ border-color: rgba(59, 130, 246, 0.6) !important;
465
+ box-shadow:
466
+ 0 0 0 4px rgba(59, 130, 246, 0.2),
467
+ 0 24px 48px rgba(0, 0, 0, 0.3),
468
+ 0 12px 24px rgba(0, 0, 0, 0.2),
469
+ inset 0 1px 0 rgba(255, 255, 255, 0.15) !important;
470
+ background: rgba(30, 41, 59, 0.98) !important;
471
+ transform: translateY(-1px) !important;
472
+ }
473
+ .gradio-container textarea::placeholder {
474
+ color: rgba(203, 213, 225, 0.7) !important;
475
+ font-weight: 500 !important;
476
+ font-style: italic !important;
477
+ }
478
+ #footer {
479
+ text-align: center;
480
+ opacity: 0.8;
481
+ font-size: 0.9rem;
482
+ margin-top: 32px;
483
+ letter-spacing: 0.06em;
484
+ color: rgba(203, 213, 225, 0.8);
485
+ font-weight: 500;
486
+ text-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
487
+ position: relative;
488
+ z-index: 2;
489
+ }
490
+
491
+ /* Professional loading states and animations */
492
+ @keyframes pulse {
493
+ 0%, 100% { opacity: 1; }
494
+ 50% { opacity: 0.6; }
495
+ }
496
+ @keyframes shimmer {
497
+ 0% { transform: translateX(-100%); }
498
+ 100% { transform: translateX(100%); }
499
+ }
500
+ @keyframes fadeInUp {
501
+ from {
502
+ opacity: 0;
503
+ transform: translateY(20px);
504
+ }
505
+ to {
506
+ opacity: 1;
507
+ transform: translateY(0);
508
+ }
509
+ }
510
+ @keyframes scaleIn {
511
+ from {
512
+ opacity: 0;
513
+ transform: scale(0.95);
514
+ }
515
+ to {
516
+ opacity: 1;
517
+ transform: scale(1);
518
+ }
519
+ }
520
+ .loading-message {
521
+ animation: pulse 2s ease-in-out infinite;
522
+ }
523
+ .loading-shimmer {
524
+ position: relative;
525
+ overflow: hidden;
526
+ }
527
+ .loading-shimmer::after {
528
+ content: '';
529
+ position: absolute;
530
+ top: 0;
531
+ left: 0;
532
+ width: 100%;
533
+ height: 100%;
534
+ background: linear-gradient(90deg,
535
+ transparent,
536
+ rgba(59, 130, 246, 0.1),
537
+ transparent
538
+ );
539
+ animation: shimmer 2.5s infinite;
540
+ }
541
+
542
+ /* Professional button styles */
543
+ .gradio-container button {
544
+ border-radius: 16px !important;
545
+ font-weight: 600 !important;
546
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
547
+ position: relative !important;
548
+ overflow: hidden !important;
549
+ backdrop-filter: blur(12px) !important;
550
+ box-shadow:
551
+ 0 8px 16px rgba(0, 0, 0, 0.1),
552
+ inset 0 1px 0 rgba(255, 255, 255, 0.1) !important;
553
+ }
554
+ .gradio-container button:hover {
555
+ transform: translateY(-2px) scale(1.02) !important;
556
+ box-shadow:
557
+ 0 16px 32px rgba(0, 0, 0, 0.2),
558
+ 0 8px 16px rgba(0, 0, 0, 0.1),
559
+ inset 0 1px 0 rgba(255, 255, 255, 0.15) !important;
560
+ }
561
+ .gradio-container button:active {
562
+ transform: translateY(-1px) scale(1.01) !important;
563
+ }
564
+
565
+ /* Professional scrollbar styling */
566
+ ::-webkit-scrollbar {
567
+ width: 10px;
568
+ }
569
+ ::-webkit-scrollbar-track {
570
+ background: rgba(30, 41, 59, 0.3);
571
+ border-radius: 6px;
572
+ border: 1px solid rgba(148, 163, 184, 0.1);
573
+ }
574
+ ::-webkit-scrollbar-thumb {
575
+ background: linear-gradient(135deg,
576
+ rgba(59, 130, 246, 0.6) 0%,
577
+ rgba(147, 51, 234, 0.6) 100%
578
+ );
579
+ border-radius: 6px;
580
+ border: 1px solid rgba(148, 163, 184, 0.2);
581
+ box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.1);
582
+ }
583
+ ::-webkit-scrollbar-thumb:hover {
584
+ background: linear-gradient(135deg,
585
+ rgba(59, 130, 246, 0.8) 0%,
586
+ rgba(147, 51, 234, 0.8) 100%
587
+ );
588
+ box-shadow:
589
+ 0 4px 8px rgba(0, 0, 0, 0.2),
590
+ inset 0 1px 0 rgba(255, 255, 255, 0.15);
591
+ }
592
+
593
+ /* Professional responsive design */
594
+ @media (max-width: 1024px) {
595
+ #container {
596
+ padding: 24px 32px;
597
+ border-radius: 28px;
598
+ max-width: 100%;
599
+ }
600
+ #header {
601
+ gap: 24px;
602
+ }
603
+ #logo img {
604
+ max-height: 200px;
605
+ }
606
+ #chatbot {
607
+ height: clamp(620px, calc(100dvh - 180px), 1000px);
608
+ border-radius: 24px;
609
+ }
610
+ #chatbot .message-wrap .bot .message {
611
+ margin-right: 40px;
612
+ }
613
+ #chatbot .message-wrap .user .message {
614
+ margin-left: 40px;
615
+ }
616
+ }
617
+
618
+ @media (max-width: 900px) {
619
+ #container {
620
+ padding: 20px 24px;
621
+ border-radius: 24px;
622
+ }
623
+ #header {
624
+ flex-direction: column;
625
+ text-align: center;
626
+ gap: 24px;
627
+ }
628
+ #logo img {
629
+ max-height: 180px;
630
+ }
631
+ #chatbot {
632
+ height: clamp(580px, calc(100dvh - 160px), 900px);
633
+ border-radius: 20px;
634
+ }
635
+ #chatbot .message-wrap .bot .message {
636
+ margin-right: 20px;
637
+ padding: 16px 20px;
638
+ }
639
+ #chatbot .message-wrap .user .message {
640
+ margin-left: 20px;
641
+ padding: 16px 20px;
642
+ }
643
+ .suggestion-buttons {
644
+ flex-direction: column;
645
+ gap: 12px;
646
+ }
647
+ .suggestion-buttons button {
648
+ min-width: 100%;
649
+ padding: 16px 20px;
650
+ }
651
+ #title {
652
+ font-size: 1.4rem;
653
+ margin: 20px 0 24px;
654
+ }
655
+ #intro-card {
656
+ padding: 24px 28px;
657
+ border-radius: 20px;
658
+ }
659
+ }
660
+
661
+ @media (max-width: 640px) {
662
+ .gradio-container {
663
+ padding: 16px 0 24px;
664
+ }
665
+ #container {
666
+ border-radius: 20px;
667
+ padding: 16px 20px;
668
+ }
669
+ #chatbot {
670
+ height: clamp(520px, calc(100dvh - 140px), 800px);
671
+ border-radius: 18px;
672
+ }
673
+ #chatbot .message-wrap .bot .message {
674
+ margin-right: 12px;
675
+ margin-left: 4px;
676
+ padding: 14px 18px;
677
+ border-radius: 20px;
678
+ }
679
+ #chatbot .message-wrap .user .message {
680
+ margin-left: 12px;
681
+ margin-right: 4px;
682
+ padding: 14px 18px;
683
+ border-radius: 20px;
684
+ }
685
+ #logo img {
686
+ max-height: 160px;
687
+ }
688
+ #intro-card {
689
+ padding: 20px 24px;
690
+ border-radius: 16px;
691
+ }
692
+ .gradio-container textarea {
693
+ min-height: 100px !important;
694
+ padding: 16px 20px !important;
695
+ font-size: 0.95rem !important;
696
+ border-radius: 16px !important;
697
+ }
698
+ #title {
699
+ font-size: 1.2rem;
700
+ margin: 16px 0 20px;
701
+ }
702
+ .suggestion-buttons button {
703
+ padding: 14px 18px;
704
+ border-radius: 14px;
705
+ }
706
+ }
707
+ """
708
+ logger.info("Custom CSS initialized")
709
+ with gr.Blocks(theme=theme, css=custom_css) as demo:
710
+ with gr.Column(elem_id="container"):
711
+ with gr.Row(elem_id="header"):
712
+ with gr.Column(scale=2, min_width=140):
713
+ gr.Image(
714
+ value="assets/Logo WO Background.png",
715
+ height=190,
716
+ show_label=False,
717
+ elem_id="logo",
718
+ )
719
+ with gr.Column(scale=10):
720
+ with gr.Group(elem_id="intro-card"):
721
+ gr.Markdown(
722
+ """
723
+ **Welcome — Chat with Daniel**
724
+
725
+ - **What to ask**: projects, AI/RAG/agents, data pipelines, or career.
726
+ - **Privacy**: if you share an email, I’ll only save it when you ask.
727
+ - **Tip**: streaming is live; use Stop to interrupt and send a follow‑up.
728
+
729
+ Example prompts: “Tell me about your last role”, “How do you design a RAG pipeline?”, “Can you scope a small automation?”
730
+ """,
731
+ )
732
+ gr.Markdown("## Chat with Daniel", elem_id="title")
733
+ with gr.Column(elem_id="chat-wrapper"):
734
+ chat_input = gr.Textbox(
735
+ placeholder="Type your message here…",
736
+ autofocus=True,
737
+ max_lines=5,
738
+ show_copy_button=True,
739
+ container=False,
740
+ scale=1
741
+ )
742
+ chat_iface = gr.ChatInterface(
743
+ me.chat,
744
+ type="messages",
745
+ chatbot=chatbot,
746
+ title="",
747
+ description="Ask about projects, AI workflows, or get in touch.",
748
+ submit_btn="Send",
749
+ stop_btn="Stop",
750
+ textbox=chat_input,
751
+ )
752
+ gr.Markdown("**Need inspiration?** Try asking:", elem_classes="suggestion-banner")
753
+ with gr.Row(elem_classes="suggestion-buttons"):
754
+ examples = [
755
+ "Tell me about your last role and what you do day to day",
756
+ "How would you design a small RAG pipeline for docs?",
757
+ "What Python libraries are you familiar with?",
758
+ ]
759
+ for example in examples:
760
+ gr.Button(
761
+ example,
762
+ variant="secondary",
763
+ size="sm"
764
+ ).click(
765
+ lambda text=example: gr.update(value=text),
766
+ outputs=chat_input,
767
+ )
768
+ gr.Markdown("Made with ❤️ — CoDHe Labs", elem_id="footer")
769
+ logger.info("Blocks app initialized")
770
+
771
+
772
+ def main():
773
+ logger.info("Launching demo")
774
+ demo.launch(
775
+ server_name="0.0.0.0",
776
+ server_port=int(os.getenv("PORT", 7860)),
777
+ favicon_path="assets/logo.png",
778
+ debug=True,
779
+ show_error=True,
780
+ )
781
+
782
+
783
+ if __name__ == "__main__":
784
+ if os.getenv("WATCH_MODE") == "1":
785
+ run_with_watch()
786
+ else:
787
+ main()
788
+ logger.info("Demo launched")
assets/Logo WO Background.png ADDED

Git LFS Details

  • SHA256: d6f3d6a54ea8210e6d669c1c7acd9236025749540b2387a76eab447b1867cac0
  • Pointer size: 132 Bytes
  • Size of remote file: 1.72 MB
assets/dan.png ADDED

Git LFS Details

  • SHA256: 367b6d63fba504c787293e4acbac7edd056b864b6a0e774ddb3cbaaece49a565
  • Pointer size: 131 Bytes
  • Size of remote file: 628 kB
assets/logo.png ADDED

Git LFS Details

  • SHA256: fd4cf842201d6182f070d4d73d6c39945079dd010469eccab0fdd0a22492f758
  • Pointer size: 132 Bytes
  • Size of remote file: 1.3 MB
data/chroma/71809a45-be76-40b0-999a-c4ac152f6a9b/data_level0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19913cb747d4f20ecdb323b45c8e9cc1f007a5d1783888656851a8b1e949c67c
3
+ size 1242800
data/chroma/71809a45-be76-40b0-999a-c4ac152f6a9b/header.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:871503f03f4549153dc2cf0f77e863e57b9a594b4224b02dda23a9018da3f346
3
+ size 100
data/chroma/71809a45-be76-40b0-999a-c4ac152f6a9b/length.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a12e561363385e9dfeeab326368731c030ed4b374e7f5897ac819159d2884c5
3
+ size 400
data/chroma/71809a45-be76-40b0-999a-c4ac152f6a9b/link_lists.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
3
+ size 0
data/chroma/chroma.sqlite3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:789aa36f96343e8bffdc967b155da66a1d59df09b55e2a0658dc75f0e6018f42
3
+ size 1576960
me/Daniel Halwell Full CV.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b977e3d5f7dcabe33a54e3114917b18156020ec70891686660d64043003350d
3
+ size 175573
me/Profile.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c66d2e1c9fd4763f5e50bd5df0f7dc0b81e0045a48f2909c0a35ea12163ccce5
3
+ size 60938
me/summary.txt ADDED
@@ -0,0 +1,911 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Daniel Halwell — Full Life Story (First‑Person) — Digital CV Narrative (LLM‑Ready)
3
+ Last updated: 22 September 2025
4
+
5
+ [Metadata]
6
+ Exec summary: Opening context introducing you as a scientist transitioning into AI engineering, highlighting passion for coding and ethical, results-driven principles.
7
+ Keywords: introduction, AI engineer journey, ethics, automation, passion for coding, momentum
8
+ Questions:
9
+ 1. Who are you and what is your current career focus?
10
+ 2. How long have you been coding and why did you start?
11
+ 3. What types of work make you happiest day-to-day?
12
+ [/Metadata]
13
+
14
+ Hi, I’m Daniel. I’m a scientist‑turned‑AI engineer who likes building apps, automating processes and AI systems. I've been coding for the last
15
+ 6 years now, having been suggested to try it by a colleague. Now its my favourite hobby and become my main career objective to become an AI engineer.
16
+ I’m happiest when I’m writing Python, wiring up data, and shipping small, useful tools that unblock people. I care about
17
+ ethics, clarity, and momentum — make the right thing the easy thing, and prove it with results.
18
+
19
+ [Metadata]
20
+ Exec summary: Expands on your broader interests in AI, data science, and mathematics, anchoring your transition from analytical chemistry to tech.
21
+ Keywords: data science, automation, problem solving, kaggle, mathematics, analytical chemistry transition
22
+ Questions:
23
+ 1. What other technical domains beyond AI interest you?
24
+ 2. How do you engage with problem solving outside of work?
25
+ 3. In what ways does your background in analytical chemistry support your move into data science?
26
+ [/Metadata]
27
+
28
+ Its not just AI systems I like, I like data science and general automation too. I really like to solve a problem so having a look at coding competitions
29
+ on kaggle or just trying to come up with solutions is really enjoyable. I like (trying) to learn about all the underpinning mathematics as it facinates me.
30
+ As an analytical chemist, working with lots of data has been part of my day job for some time so making the leap to data science then AI felt pretty natural.
31
+
32
+
33
+ Core identity & working values
34
+
35
+ [Metadata]
36
+ Exec summary: Bullet-point overview of your principles, emphasising human-centred tech, ethics, iterative building, communication style, and evidence-based mindset.
37
+ Keywords: working values, ethics, human-first, builder mindset, mentoring, communication style, evidence-driven
38
+ Questions:
39
+ 1. What foundational values steer your approach to technology and product delivery?
40
+ 2. How do you prefer to translate ideas into execution?
41
+ [/Metadata]
42
+
43
+ • Human‑first technologist. Tools are only useful if they help people make better decisions faster.
44
+ • Ethics matter. I avoid work tied to fossil fuels, weapons, surveillance, or anything that harms people.
45
+ • Builder’s mindset. Prefer to get an idea down in a flow diagram, like visual representations. Once I'm clear on the vision I like to start small and iterate quickly
46
+ • Teaching & clarity. Notebooks, diagrams, docstrings, and handovers. Mentoring is part of the job.
47
+ • Plain English, not the most over the top person, pretty laid back, just want to get stuff done and enjoy life, pretty sarcastic, love some dark humout
48
+ • Evidence over adjectives. Numbers, before-and-after, and real bottlenecks solved.
49
+
50
+
51
+ --
52
+ [Metadata]
53
+ Exec summary: Personal origin note grounding your background in Devon and acknowledging relocation for career opportunities.
54
+ Keywords: origin story, Devon, relocation, personal background, career move motivation
55
+ Questions:
56
+ 1. Where are you originally from
57
+ [/Metadata]
58
+
59
+ I'm originally from the south west of England in the lovely county of Devon. Great place to grow up even if it is a bit out of the way.
60
+ Shame to move away but you gotta go where the jobs are. I lived here up until the age of 18, when I moved to Guyana South America for a year and
61
+ then on to University. My first work experiences came here and it was nice to grow up in a quiet place where there was a lot of
62
+ people who knew each other, close to the beach in summer and has great pubs and bars.
63
+ ---
64
+
65
+
66
+ Early graft (pre‑uni): where my work ethic came from
67
+
68
+ [Metadata]
69
+ Exec summary: Details your early jobs across hospitality, retail, and manufacturing, highlighting the foundation of your work ethic, QA mindset, and preference for night shifts.
70
+ Keywords: early career, bar cleaning, retail experience, factory work, quality assurance, work ethic, night owl
71
+ Questions:
72
+ 1. What types of jobs did you hold before university?
73
+ 2. How did your roles shape your understanding of quality assurance?
74
+ 3. How did these early experiences influence later method development skills?
75
+ [/Metadata]
76
+
77
+ I started working around 13, cleaning a bar on Saturday mornings — bottles, floors, stocking the bar etc. By 16 I was at
78
+ Summerfields, mainly stacking shelves but I did do some checkout work. Tried
79
+ a few different shift patterns there and learnt that I'm a bit of a night owl (nights, evenings, days)
80
+ across produce and dairy: receiving deliveries, stacking shelves,
81
+ I also did a stint in a small component factory, making parts by hand from SOPs —
82
+ counting coil turns, trimming, testing. It wasnt the most exciting job but was a good earner.
83
+ This was my first foray into QA really, where checking work was a priortiy so things worked,
84
+ this process and repeatability fed straight into my later method development work.
85
+ I also worked in an Indian restaurant where I worked behind the bar most of the time taking orders over the phone,
86
+ making drinks and occasionaly serving drinks to the table and clearing up tables. I always loved indian cuisine and
87
+ working meant I ate quite a lot of curry, it was amazing.
88
+ I also worked in a nightclub on weekends, which was a pretty late one to be honest, used to start around 10pm and then work through until
89
+ about 3am. I did quite a few jobs here, I worked in coat check, kitchen (making burgers and lattice fries mainly) and also
90
+ worked on the bar, pretty hectic.
91
+
92
+
93
+ Gap year in Guyana: teaching and learning to adapt
94
+
95
+ [Metadata]
96
+ Exec summary: Chronicles your gap-year teaching experience in Guyana, emphasizing adaptability, instructional skills, and exposure to challenging environments.
97
+ Keywords: Guyana, Project Trust, teaching, adaptability, resilience, user-centered design inspiration
98
+ Questions:
99
+ 1. What program enabled you to teach in Guyana and what subjects did you cover?
100
+ 2. How did you handle unexpected challenges during your placement?
101
+ 3. Which teaching lessons do you carry into your user-facing design work?
102
+ [/Metadata]
103
+
104
+ Before university, I spent a year teaching in Guyana through Project Trust. I trained on the Isle of Coll, then flew out
105
+ with a cohort and split off into schools. When my volunteer room mate had to return home due to illness, I moved schools and
106
+ started again with some new room mates.
107
+ I taught maths, science, and PE to students roughly 11–16. The big lessons:
108
+ • Teaching is hard, you have to be prepared, things suprise you and you have to be quick on your feet
109
+ • Learning isn't the same for everybody, you have to adapt to the idividual
110
+ • Be clear and concise in your delivery, you gotta be fine-tuned
111
+ Those ideas still shape how I design anything user‑facing — dashboards, APIs, or agentic assistants.
112
+ This time wasn't without its challenges, my room mate getting ill was a big deal, when you're the only person around
113
+ to help in a medical emergency, its quite a challenge. Thankfully things turned out well on that occasion but there were
114
+ many challenges living in a country so different from your own. You get some key perspective on your own situation when
115
+ viewing the type of povery that some people will never see in a lifetime.
116
+
117
+
118
+ Loughborough & Mars Petcare: chemistry + sensors + software
119
+
120
+ [Metadata]
121
+ Exec summary: Narrates your MChem journey, industrial placement at Mars Petcare, development of analytical methods, and growing interest in statistics and food science.
122
+ Keywords: Loughborough MChem, Mars Petcare, LC-MS, GC-MS, method development, maillard reaction, statistics, sensory science
123
+ Questions:
124
+ 1. Why did you choose Loughborough and pursue an industrial placement at Mars?
125
+ 2. Which analytical instruments and methods did you master during the placement?
126
+ 3. How did your work with the Maillard reaction influence your interests?
127
+ 4. What statistical techniques did you apply in flavour development projects?
128
+ 5. What publication resulted from your work in this period?
129
+ [/Metadata]
130
+
131
+ I’d already accepted a place for MChem at Loughborough. I wanted industry experience in the degree, so I took an
132
+ industrial placement at Mars Petcare in Verden, Germany. I trained on LC‑MS, GC‑MS, GC‑FID; moved from sample prep
133
+ to method development; migrated a tricky amino‑acid analysis from LC to GC with derivatisation; added additional amino
134
+ acids; and demonstrated linearity and accuracy. First taste of method development and optimisation — and I loved it.
135
+ Living in Germany was a great experience and definitely one of the best places I've ever lived.
136
+
137
+ I worked on flavour development of cat food, running feeding trials with recipes that I put together. This is where I started
138
+ to get more invloved and interested in statistics. I set up design of experiments trials to determine the optimum concentration
139
+ of food additives to increase food uptake by the cats, they are quite picky after all. This involved making pilot scale batches on plant,
140
+ running analysis and interpreting the data. All in all it was an amazing experience.
141
+
142
+ The main focus of my project there was the maillard reaction. The Maillard reaction is a non-enzymatic browning reaction between
143
+ amino acids and reducing sugars that generates the complex flavours and brown colours associated with roasted, baked, and fried foods.
144
+ It proceeds through a cascade of steps (Schiff base → Amadori/Heyns → fragmentation and Strecker degradation → melanoidins) and is
145
+ accelerated by heat, dryness, and alkaline conditions. It made me really interested in food and how small changes to cooking can make
146
+ big differences in flavour profiles.
147
+
148
+ Back in the UK, I returned to Mars for a summer project near Loughborough on umami perception. I set up macros and
149
+ a software workflow so sensory panelists could record peak perception while we swabbed to quantify concentrations, and
150
+ we correlated the curves. That work was presented at a flavour symposium. It was instrumentation + sensory science +
151
+ just‑enough software — a pattern I’ve repeated since in other domains. This turned into my first publication
152
+ Relationship between Human Taste Perception and the Persistence of Umami Compounds in the Mouth - Flavour Science
153
+ Proceedings from XIII Weurman Flavour Research Symposium
154
+ 2014, Pages 487-491
155
+
156
+ Side note: the animal care standards and the environment were excellent. It mattered to me that the work respected
157
+ the animals — that balance between scientific rigour and humanity set a tone for my career.
158
+
159
+
160
+ A practical reset: labouring on a building site
161
+
162
+ [Metadata]
163
+ Exec summary: Highlights your post-graduation labouring work, underscoring appreciation for tangible progress and parallels to iterative software development.
164
+ Keywords: labouring, construction, tangible progress, iteration, motivation, work ethic
165
+ Questions:
166
+ 1. What work did you do immediately after graduating?
167
+ 2. How did labouring influence your appreciation for visible progress?
168
+ 3. In what way do you connect physical labour to your software development mindset?
169
+ 4. Why does iterative build-test cycles resonate with you?
170
+ [/Metadata]
171
+
172
+ After graduating, I worked as a labourer in Devon while job‑hunting — hauling materials through houses to back gardens,
173
+ mixing cement for brickwork and infill, clearing waste. Tangible progress at the end of each day is addictive. I still chase
174
+ that in my day to day, which really pointed me towards a career in programming, I just didnt know it yet: Its great to see progress when you finish at the end of the day, those small iterative cycles of build, debug
175
+ , test repeat keeps me addicted to writing code.
176
+
177
+
178
+ Sanofi → Recipharm (2012–2021): analytical specialist in a regulated world
179
+
180
+ [Metadata]
181
+ Exec summary: Summarises nearly a decade of analytical chemistry work at Sanofi and Recipharm, covering E&L leadership, method transfers, investigations, and cross-functional support in regulated environments.
182
+ Keywords: Sanofi, Recipharm, analytical specialist, extractables and leachables, method validation, cGxP, investigations, manufacturing support, statistics
183
+ Questions:
184
+ 1. What were your primary responsibilities at Sanofi and Recipharm?
185
+ 2. How did you lead extractables and leachables studies?
186
+ 3. What statistical methods did you apply during method transfers and validations?
187
+ 4. How did you contribute to troubleshooting and manufacturing support?
188
+ 5. How did this period strengthen your commitment to data integrity and Python/ML?
189
+ [/Metadata]
190
+
191
+ I spent nearly a decade across Sanofi and Recipharm, moving from routine QC to Analytical Specialist. My centre of gravity:
192
+ non‑routine analysis, method transfers, validations, and inspection support (MHRA, FDA, etc.).
193
+
194
+ [Metadata]
195
+ Exec summary: Enumerates your day-to-day responsibilities at Sanofi and Recipharm, covering E&L leadership, method transfers, investigations, and manufacturing support.
196
+ Keywords: responsibilities, extractables and leachables, method transfer, validation, investigations, manufacturing support. This is also when I started coding in Python.
197
+ Questions:
198
+ 1. What specific analytical tasks did you handle in this role?
199
+ 2. How did you contribute to extractables and leachables programmes?
200
+ 3. In what ways did you support method transfers and validations?
201
+ 4. How did you engage in investigations and CAPA activities?
202
+ 5. What types of cross-functional manufacturing collaboration did you perform?
203
+ [/Metadata]
204
+
205
+ What I did:
206
+ • Extractables & leachables (E&L). Subject‑matter lead for E&L studies, scoping and interpreting chromatographic &
207
+ spectroscopic data for materials such as plastics and elastomers. I worked with suppliers to perform testing
208
+ on out behalf, draw up protocols and reports, kept up to date on the latest advancements
209
+ • Method transfers & validation. Equivalence testing, t‑tests, TOST, precision/accuracy studies, technical reports,
210
+ and document control in a cGxP environment. This is another stage in my career where statistics is pushing me
211
+ towards a career in data science and AI. I didnt quite know it yet but I loved maths more than I thought I did.
212
+ I was one of the technical experts when we transferred around 60 methods to Germany following potential rule
213
+ changes after Brexit. This made me a key contact for troubleshooting, acceptance criteria setting, result interpretation,
214
+ I travelled to Germany to train staff, a bit of everything.
215
+ • Investigations & CAPA. Practical Problem Solving (PPS), root‑cause analysis across engineering, manufacturing,
216
+ and quality.
217
+ • Manufacturing support. Collaborated with scientists, engineers, and microbiologists on urgent issues — from chemical
218
+ impurities to microbial contamination — often building or adapting analytical methods on the fly. I'd be testing effluent
219
+ one day and have my head in a metered dose inhaler formulation vessel the next.
220
+ • I worked in routine QC environment for quite a few years, doing analysis of nasal product, metered dose inhalers and also
221
+ the packaging and raw materials that went into them.
222
+ • During this time I gained expertise in HPLC, GC, Karl Fisher and became a super user in GC specifically.
223
+
224
+ [Metadata]
225
+ Exec summary: Highlights the business impact and personal growth outcomes from your Sanofi/Recipharm tenure, including cost savings, data integrity ethos, and development of statistical expertise.
226
+ Keywords: impact, cost savings, data integrity, statistics, practical problem solving, career inflection
227
+ Questions:
228
+ 1. What quantified business result did you deliver during this period?
229
+ 2. How did the work reinforce your commitment to data integrity?
230
+ 3. In what way did statistics influence your transition toward Python and ML?
231
+ 4. What experience did you gain with the PPS tool and complex investigations?
232
+ [/Metadata]
233
+
234
+ Why it mattered:
235
+ • We resolved a critical impurity issue that delivered real cost savings (and a lot of learning).
236
+ • I developed deep respect for data integrity and traceability: if it isn’t documented, it didn’t happen.
237
+ • Statistics became second‑nature and nudged me towards Python and, later, machine learning and AI.
238
+ • I gained invaluble experience in practical problem solving (PPS). I worked on an extensive investigation using the
239
+ PPS tool to solve extremely complex issues, including multivariate root causes, making it difficult to find the true root cause.
240
+
241
+
242
+ AstraZeneca (May 2021 – Present, Macclesfield): chemistry meets code
243
+
244
+ [Metadata]
245
+ Exec summary: Captures your hybrid analytical science and AI engineering role at AstraZeneca, focusing on nitrosamine investigations, automation, and major achievements across RAG assistants, Bayesian optimisation, agentic workflows, and platform reliability.
246
+ Keywords: AstraZeneca, analytical chemistry, nitrosamine, automation, RAG assistant, Bayesian optimisation, agentic workflows, data pipelines, mentorship
247
+ Questions:
248
+ 1. What core responsibilities define your work at AstraZeneca?
249
+ 2. How do you apply automation and AI to analytical challenges such as nitrosamine detection?
250
+ 3. What impact did the RAG laboratory assistant deliver, and how was it developed?
251
+ 4. How did Bayesian optimisation change method development practices?
252
+ 5. What agentic workflow innovations have you introduced?
253
+ 6. Which tooling and platforms do you regularly use in this role?
254
+ 7. How do you support community and mentoring within AstraZeneca?
255
+ [/Metadata]
256
+
257
+ This is where everything clicked. I stayed rooted in analytical science — including trace/nitrosamine risk investigations
258
+ where timelines are tight — but I worked increasingly like a data scientist / engineer.
259
+ One of my key tasks is method devlopment of extremely low concentrations. Nitrosamine have to be monitored at such low concentrations
260
+ it requires specific methods and equipment, we're talking levels of around a billionth of a gram. The one thing we can always do
261
+ is automate better, which is what I love to do. Whether its processing ticket requests or extracting instrument usage from logs,
262
+ this is where my programming knowledge really started to make an impact.
263
+
264
+ [Metadata]
265
+ Exec summary: Bullet list of standout initiatives at AstraZeneca showing your impact across GenAI, optimisation, automation, and data tooling.
266
+ Keywords: key achievements, RAG assistant, Bayesian optimisation, agentic workflows, data pipelines, dashboards, platform correctness, chromatographic prediction, mentoring
267
+ Questions:
268
+ 1. What major projects illustrate your contributions at AstraZeneca?
269
+ 2. How did you leverage GenAI and optimisation to improve lab processes?
270
+ 3. Which data engineering and dashboard efforts reduced friction for colleagues?
271
+ 4. How did you ensure platform correctness and predictive modelling capability?
272
+ 5. In what ways do you support mentoring and community building at work?
273
+ [/Metadata]
274
+
275
+ Key achievements and strands of work:
276
+ • RAG‑based laboratory assistant (GenAI). I led the build of a retrieval‑augmented assistant with a multi‑disciplinary
277
+ team (SMEs, AI engineers, front/back‑end). We took it from PoC through risk assessments, evaluation vs expected
278
+ outputs, and UAT. It reduced troubleshooting lead times by ~20% and made internal knowledge more discoverable.
279
+ • Bayesian optimisation for method development. We matched a historical method‑development context and reached
280
+ the same optimum with ~50% fewer experiments by applying Bayesian optimisation. That moved from a promising study
281
+ to an adopted practice in real projects. This was a great team of individuals with expert knowledge of automation
282
+ , Python, Bayesian Optimisation (using BayBE) and Gas Chromatography and HRMS. I also devloped a RAG chatbot for
283
+ writing PAL script code for managing CTC rails.
284
+ • Agentic workflows. I’m actively developing agentic patterns (tool‑use, MCP) to cut manual
285
+ coordination and reduce method‑development effort. In targeted scopes, we’ve seen up to ~80% reductions in the
286
+ human loops required to get to “good enough to ship” (the point is fewer trips round the houses, not magic).
287
+ • Data pipelines & APIs. I engineered pipelines in SQL (Snowflake) and Python; launched FastAPI services so downstream
288
+ tools could call data cleanly; and used those services as foundations for GenAI tools via tool‑use/MCP.
289
+ • Dashboards that people actually use. I built Power BI and Streamlit tooling that gives a clean view of support tickets,
290
+ instrument utilisation, and a self‑serve data portals.
291
+ • Worked with large scale databases to retrieve and clean data. Worked with external partners to improve the data pipeline
292
+ • Developed and deployed streamlit web apps for various purposes
293
+ • Chromatographic prediction. From fingerprints + XGBoost baselines to neural approaches and, later, attention‑based
294
+ graph models. I pre‑trained on a large open dataset (~70k injections)
295
+ • Mentoring & community. I contribute to the internal Coding Network, support colleagues learning Python, and sit on the
296
+ programming expert panel. I like turning tacit know‑how into repeatable templates.
297
+
298
+ [Metadata]
299
+ Exec summary: Enumerates the primary tools, languages, and platforms you relies on within AstraZeneca projects.
300
+ Keywords: tooling stack, Python, FastAPI, SQL, Power BI, Streamlit, ML frameworks, cloud platforms
301
+ Questions:
302
+ 1. Which languages and frameworks underpin your daily work at AstraZeneca?
303
+ 2. What visualisation and dashboard tools do you deploy?
304
+ 3. Which machine learning libraries support your modelling efforts?
305
+ 4. What GenAI providers and cloud platforms do you integrate with?
306
+ [/Metadata]
307
+
308
+ Tools I use a lot here:
309
+ Python, FastAPI, SQL/Snowflake, Power BI, Streamlit/Plotly/Matplotlib, scikit‑learn, XGBoost, PyTorch, PyTorch Geometric,
310
+ OpenAI/Anthropic/OpenRouter/Vertex APIs, Docker, GitHub Copilot / Claude Code / Gemini Code Assist, and cloud basics
311
+ across Azure/AWS/GCP.
312
+
313
+
314
+ CoDHe Labs (Jul 2025 – Present, part‑time): ethical AI that ships
315
+
316
+ [Metadata]
317
+ Exec summary: Outlines your part-time independent practice focusing on ethical AI, RAG copilots, agentic automation, and pro bono work for charities, including current projects.
318
+ Keywords: CoDHe Labs, independent work, generative AI copilots, dashboards, agentic workflows, pro bono, charity support, automation tools
319
+ Questions:
320
+ 1. What services does CoDHe Labs provide and what principles guide it?
321
+ 2. Which current initiatives demonstrate your applied skills outside AstraZeneca?
322
+ 3. How do you balance commercial and pro bono engagements?
323
+ 4. What technologies and collaborations are involved in the charity project mentioned?
324
+ 5. What future project do you hint at in this section?
325
+ [/Metadata]
326
+
327
+ Alongside my full‑time role, I formalised my independent work as CoDHe Labs — a small practice focused on:
328
+ • Generative AI “copilots” (RAG) that make internal knowledge instantly useful.
329
+ • ML‑powered insights dashboards wired to a warehouse.
330
+ • Agentic workflow automation that coordinates multi‑step processes via tool‑use.
331
+ • Digital uplift for small teams and non‑profits (including light M365/Azure support).
332
+ This includes setting up invoicing automation, data entry and storage, user management
333
+ I also run an “AI for Charities” pro bono strand because capability should compound beyond big budgets.
334
+
335
+ I'm currently working on a project to help a charity with their IT infrastructure and automations using Excel VBA,
336
+ Power Automate and Python. I'm also liasing with external partners to implement additional tools.
337
+ I'm also working on an agentic VS Code extension but more of that at a later date as thats still in development.
338
+
339
+ [Metadata]
340
+ Exec summary: Outlines your scoping methodology for client engagements, focusing on bottleneck identification, rapid prototyping, transparency, and documentation.
341
+ Keywords: scoping process, bottleneck analysis, prototyping, documentation, transparency, vendor lock avoidance
342
+ Questions:
343
+ 1. How do you prioritise bottlenecks when starting new work?
344
+ 2. What approach do you take to rapid prototyping and iteration?
345
+ 3. How do you handle intellectual property, licensing, and vendor lock concerns?
346
+ 4. What project management practices (SoWs, documentation) do you emphasise?
347
+ [/Metadata]
348
+
349
+ How I scope work:
350
+ • Start with the bottleneck: retrieval? experiment count? brittle handoffs? We pick one.
351
+ • Ship a small, working prototype fast; measure; iterate; document; hand over.
352
+ • Keep IP, license usage; be transparent; avoid vendor lock where we can.
353
+ • Strong SoWs; clean docs; and honest conversations about risk, safety, and fit.
354
+
355
+
356
+ Achievements I’m proud of (because they changed behaviour)
357
+
358
+ [Metadata]
359
+ Exec summary: Highlights selected achievements demonstrating your competitive performance, hackathon recognition, adoption-driving innovations, and user-focused RAG impact.
360
+ Keywords: achievements, Kaggle competition, Modal Labs award, Bayesian optimisation adoption, RAG assistant impact, behaviour change
361
+ Questions:
362
+ 1. Which competition results do you cite as evidence of capability?
363
+ 2. What recognition did you receive for agentic MCP work and why?
364
+ 3. How did the Bayesian optimisation project influence team practices?
365
+ 4. Why do you value the RAG assistant’s impact on colleagues?
366
+ 5. How do these achievements reflect your focus on behaviour change?
367
+ [/Metadata]
368
+
369
+ • 4th place in a Kaggle binary‑classification competition (Mar 2025) — a nice reminder that fundamentals matter. In this
370
+ challenge, we were tasked with predicting rainfall. I enjoyed working on this one and I learnt a few things as I progressed with submitting
371
+ my predictions. I started with the usual EDA and feature engineering before testing a few models. I tend to default to models like
372
+ XGBoost because it works really well out of the box. I usually like to do a random forest as a baseline for most tasks though, then I
373
+ can iterate and start to build a picture of what works best. My go to toolbox is sklearn for models, pandas for data manipulation and seaborn
374
+ for visualisation. Sklearn is great for pre-processing data as well as hyperparameter tuning, running a quick random search cv and grid search cv
375
+ is usually my sequence. When performing a task like this, I alays like to keep in mind what data I have for predictions, especially whether the dataset
376
+ is well balanced, on this occasion, it was quite unbalanced. When this happens, I like to employ SMOTE, generating synthetic data to help balance the scales.
377
+ This dramatically improved performance. The final step was something new to me, I used a VotingClassifier to train multiple methods and
378
+ train a surrogate model picker to evaluate the classifier. Initially, I was really inpressed with my cross validation, but the test set
379
+ on Kaggle didnt look like anything special. However, I decided to stick with the good cross val score and it really payed off. I jumped up
380
+ hundreds of places in the final leaderboard. I got my free t-shirt, one of my prized possesions, a really proud moment when I started to realise
381
+ I'm more than ok at this, I can do this as a job. With no classical training, no full time data job, I was surpassing trained data scientists
382
+ and Kaggle grandmasters, just imagine what I could do if I did it full time.
383
+ • Modal Labs Choice Award ($5,000) for an agentic Model Context Protocol (MCP) server during a Gradio + Hugging Face
384
+ hackathon (Jul 2025). The joy wasn’t the prize — it was proving a lean, useful pattern quickly with real constraints.
385
+ This was one of the best achievements of my adult career. Prior to this, I hadnt written a single MCP, hadnt even really used MCP that much but I felt
386
+ I wanted the challenge. I worked incredibly hard on this, working very late into the night (I couldnt sleep anyway lol) and planned out my
387
+ application. At first, I wanted to make agentic deep research but, the more I thought about it, I want something quick with low latency. Shallow Research was born.
388
+ Shallow Research was about code, generating tested and validated code. The MCP in essence, took a user input about code, like "how do I perform predictions on a binary classification task?"
389
+ The MCP would begin a linear "agentic" workflow, where there would be various agents tasked with very specific tasks. There was a research agent
390
+ to look up best practice on the internet, then there was a code generation agent, then there was the most critical part, the code execution.
391
+ The real power of the MCP was the ability to run code in a remote sandbox on the Modal platform, an amazing platform for spinning up CPU or GPU instances.
392
+ The Code Runner agent would run the code and make sure the code works, the sandbox doesnt have the right library, no wories, the library would be installed
393
+ dynamically. A simple image was set up on Modal to decrease latency, this way it was set up quick with the core libraries and then other libraries were installed
394
+ only when needed. Finally, all of this would be returned to the user, the user can see that the code was executed and what the result was. They
395
+ could copy and paste the code knowing that the code would work. The aim was this to be a great MCP for those learning to code, it'd give you
396
+ working code with good explanations and citations to read more.
397
+ • The Bayesian optimisation result at AZ (same optimum, ~50% fewer experiments) because it moved from “cool idea” to
398
+ “how we actually work”. It was the first of its type in the department and I learned a lot from Bayesian experts
399
+ • The RAG assistant because it reduced real, everyday friction for colleagues hunting for knowledge in complex systems. The
400
+ tools I develop are thought of with the end user in mind, how can I try and make someones day better by helping them solve a problem.
401
+
402
+
403
+ Why AI — and why now
404
+
405
+ [Metadata]
406
+ Exec summary: Explains your motivation for pursuing AI, tying together analytical chemistry, statistics, Python, ML, GenAI, and the joy of iterative problem-solving.
407
+ Keywords: motivation, AI transition, analytical chemistry influence, statistics, Python, machine learning, generative AI, iteration
408
+ Questions:
409
+ 1. How did analytical chemistry shape your approach to data and decision-making?
410
+ 2. What role did statistics and Python play in your transition to AI?
411
+ 3. How do you describe the evolution from ML to GenAI and agentic patterns?
412
+ 4. Why do you find programming so engaging and time-dissolving?
413
+ [/Metadata]
414
+
415
+ Analytical chemistry immersed me in noisy data and decisions under constraint. Statistics gave me language for uncertainty.
416
+ Python gave me leverage — automation, analysis, and APIs. ML stitched it together into predictive systems. GenAI widened
417
+ the aperture to text and reasoning, and agentic patterns turn tools into coordinated doers. To be honest, I enjoy the loop:
418
+ frame the question, ship a tiny thing, see if it helps, and keep going. Programming is one of those activities I can start and
419
+ all of a sudden its 8 hours later and thats what I love about it, it tunes my brain, my brain was made to code, its just a shame
420
+ I found out so late.
421
+
422
+
423
+ How I work (and sound)
424
+
425
+ [Metadata]
426
+ Exec summary: Describes your working style, including goal orientation, iterative focus, accountability, collaboration, documentation habits, and conversational cues.
427
+ Keywords: working style, goal setting, iteration, accountability, collaboration, documentation, communication tone
428
+ Questions:
429
+ 1. How do you define success and plan your skill development?
430
+ 2. What is your approach to starting and finishing projects?
431
+ 3. How do you handle mistakes and team accountability?
432
+ 4. What role does documentation play in your delivery process?
433
+ 5. Which phrases signal your agreement or emphasis during conversations?
434
+ [/Metadata]
435
+
436
+ • “What does success look like in a year, what skill will I need in 6 months?” — then work backwards. I'm always thinking about how things can be done better
437
+ • Start small and see where it goes. If its good, I'll fixate on it until its done
438
+ • Nothing is ever good enough, we can always improve on processes
439
+ • I'm honest, if somethings my fault, I'll hold my hand up and expect the same of others. Pushing blame onto others or nitpicking people doesnt impress me
440
+ • Opinionated defaults, but collaborative. I’ll propose a pattern and then adapt with the team.
441
+ • Documentation is part of delivery. If someone can’t pick it up without me, I haven’t finished and thats hard to follow through on. It's not always easy but I try my best. In the world of Pharma, if you didnt write it down, it never happened.
442
+ If it was never written down but some action was performed, then an auditor is not going to like it and that can get you in serious trouble.
443
+ • I’ll say “Yeah, definitely,” when something resonates. I’ll say “to be honest,” when I need to cut through nicely.
444
+
445
+
446
+ Technical highlights (deeper cuts)
447
+
448
+ [Metadata]
449
+ Exec summary: Introduces deep-dive technical case studies covering RAG assistant, agentic workflows, Bayesian optimisation, chromatographic prediction, and API/ETL improvements.
450
+ Keywords: technical highlights, case studies, RAG, agentic workflows, Bayesian optimisation, chromatographic prediction, APIs, ETL correctness
451
+ Questions:
452
+ 1. What advanced technical initiatives do you showcase here?
453
+ 2. How do these highlights expand on earlier achievements?
454
+ 3. Which domains (retrieval, optimisation, prediction, API design) do you emphasise?
455
+ 4. How do these examples demonstrate your end-to-end problem solving?
456
+ [/Metadata]
457
+
458
+ RAG laboratory assistant
459
+ [Metadata]
460
+ Exec summary: Details the rationale, implementation steps, outcomes, and technical methods behind the RAG laboratory assistant PoC and rollout.
461
+ Keywords: RAG assistant, retrieval, embeddings, ChromaDB, prompt augmentation, multimodal troubleshooting, AI governance, evaluation
462
+ Questions:
463
+ 1. Why was the RAG laboratory assistant needed and what problem did it solve?
464
+ 2. How did you architect the retrieval pipeline, including embeddings and databases?
465
+ 3. What prompt augmentation strategy did you use to improve retrieval?
466
+ 4. How did you incorporate multimodal troubleshooting and image context into the solution?
467
+ 5. What governance, evaluation, and deployment steps were taken to roll out the assistant responsibly?
468
+ 6. How did the project balance user experience with accuracy and guardrails?
469
+ [/Metadata]
470
+
471
+ Why: People were wasting time on “Who knows X?” and “Where’s that doc?”. Retrieval needed to be first‑class.
472
+ How: Light doc loaders; chunking; embeddings; vector DB; retrieval‑augmented prompting; guardrails around sources;
473
+ simple UI; risk assessments; evaluation vs expected outputs; UAT with actual users.
474
+ Outcome: ~20% reduction in troubleshooting lead times and noticeably faster answers to routine questions.
475
+ How I did it: I built a PoC using Streamlit. I used Open AI embeddings to vectorise manuals and troubleshooting guides that I
476
+ selected from the internet, knowing that these ground truth documents were great sources. What do you do with embeddings, put
477
+ them in a vector database, personally I used ChromaDB because it was easy to set up locally but I have also used QDrant and Pinecone
478
+ which are great cloud alternatives. Then I had to layer in the LLM calls. To improve accuracy, I employed a prompt augmentation step,
479
+ I make an extra call to an LLM, to come up with 3 or 4 questions related to the user query but slightly different. This helps to widen the potential
480
+ retrieval of documents, especially if it asks questions the user hadnt thought of, its all about context. From this you can inject this into the prompt
481
+ and get a grounded answer (although you gotta cal set() on those retrieved chunks, dont want to waste tokens on duplicated lol).
482
+ I also included image based troubleshooting, early on in the multimodal landscape. Image embeddings wernt common then, so I used models to explain the issue
483
+ in the image and then used this context to perform retrieval, this meant it could be quite dynamic and still give ground truth results with references (key).
484
+ The other main input into this type of tool, prompt engineering, users dont want to type war and piece, by having a specialist RAG tool you can fine fune the system
485
+ prompt to abstract away some of the more complex prompting skills like chain of thought, its been done form them already.
486
+ Thats just the PoC, AI governance is key, copyright concerns is key. Deployment becomes a collaborative effort with teams all over the world, sprints in Jira
487
+ UAT with SME's, AI evaluation rounds with SME's to make sure responses meet requirements. FInally you get an app out in the wild and people start using it
488
+ feels great!!
489
+
490
+ Agentic workflows + MCP/tool‑use
491
+ [Metadata]
492
+ Exec summary: Summarises your approach to agentic workflows using Model Context Protocol, focusing on reducing manual coordination through secure tool orchestration.
493
+ Keywords: agentic workflows, MCP, tool-use, automation, coordination reduction, secure interfaces
494
+ Questions:
495
+ 1. What problem do agentic workflows solve for your teams?
496
+ 2. How do you apply MCP and tool-use patterns in these workflows?
497
+ 3. What outcomes have these automations delivered?
498
+ 4. How does scope control factor into your design choices?
499
+ [/Metadata]
500
+
501
+ Why: Multi‑step, cross‑system tasks were brittle and person‑dependent.
502
+ How: Orchestrated tools behind clear interfaces; used MCP/tool‑use patterns so models can call functions securely; kept
503
+ scope tight.
504
+ Outcome: In the right slices, up to ~80% reduction in human loops to reach usable results.
505
+
506
+ Bayesian optimisation for method development
507
+ [Metadata]
508
+ Exec summary: Describes your application of Bayesian optimisation to laboratory method development, reducing experiments while matching historical optimums.
509
+ Keywords: Bayesian optimisation, method development, experiment reduction, iterative loop, objective function
510
+ Questions:
511
+ 1. Why was Bayesian optimisation selected for the method development problem?
512
+ 2. How did you structure the optimisation loop and objective?
513
+ 3. What comparison baseline validated the approach?
514
+ 4. What efficiency gains were achieved in experiment count?
515
+ [/Metadata]
516
+
517
+ Why: Parameter spaces are expensive to explore; we needed a principled way to reach “good enough to ship” faster.
518
+ How: Replayed a historical development on the same instrument with bounded variables and a clear objective; ran an
519
+ iterative loop; compared against the known optimum.
520
+ Outcome: Same optimum with ~50% fewer experiments. Clear signal to scale into practice.
521
+
522
+ Chromatographic retention‑time prediction
523
+ [Metadata]
524
+ Exec summary: Covers your progression from baseline models to attention-based graph approaches for predicting chromatographic retention times, leveraging large datasets for pre-training and fine-tuning.
525
+ Keywords: chromatographic prediction, retention time, XGBoost, neural networks, graph models, pre-training, fine-tuning, method development
526
+ Questions:
527
+ 1. Why was chromatographic retention-time prediction valuable for your work?
528
+ 2. Which modelling techniques did you iterate through from baseline to advanced?
529
+ 3. How did you combine open datasets with internal data for training?
530
+ 4. What benefits did attention-based graph models provide over earlier approaches?
531
+ [/Metadata]
532
+
533
+ Why: Better priors mean fewer dead‑ends in method development.
534
+ How: Start with fingerprints + XGBoost baselines; extend to neural models; then pre‑train a graph model with attention on
535
+ ~70k open injections; fine‑tune on internal ~30k; evaluate on held‑out chemistries.
536
+ Outcome: Stronger generalisation and a reusable domain foundation to build on.
537
+
538
+ APIs & ETL correctness
539
+ [Metadata]
540
+ Exec summary: Highlights your focus on API design and ETL integrity, ensuring analysts access clean, typed data via FastAPI services and schema fixes.
541
+ Keywords: APIs, ETL correctness, FastAPI, Pydantic, schema flattening, data reliability, analytics enablement
542
+ Questions:
543
+ 1. Why do you emphasise clean APIs and ETL pipelines for analysts?
544
+ 2. How did you use FastAPI and Pydantic models to improve data access?
545
+ 3. What schema issues did you identify and resolve, and why did they matter?
546
+ 4. How did these efforts reduce friction and bespoke scripting across teams?
547
+ [/Metadata]
548
+
549
+ Why: Analysts shouldn’t screen‑scrape or wrestle nested XML‑ish blobs. Clean tables + typed APIs unlock everything.
550
+ How: FastAPI with Pydantic models; raised/resolved flattening issues so SQL was sane; wrote small services people
551
+ could actually call.
552
+ Outcome: Less friction; fewer bespoke scripts; more reliable dashboards and models.
553
+
554
+
555
+ What’s next
556
+
557
+ [Metadata]
558
+ Exec summary: Signals your future focus on expanding agentic workflows, strengthening data contracts, and sharing lightweight automation patterns for charities and SMEs.
559
+ Keywords: future plans, agentic workflows, data contracts, knowledge sharing, SMEs, charities
560
+ Questions:
561
+ 1. What future technical areas do you plan to invest in?
562
+ 2. How do you intend to help charities and SMEs with automation?
563
+ 3. Why are explicit data contracts a priority for your upcoming work?
564
+ 4. How does knowledge sharing feature in your outlook?
565
+ [/Metadata]
566
+
567
+ More agentic workflows wired to real systems; more explicit data contracts; and more public sharing of light‑weight tools
568
+ and patterns. I want charities and SMEs to have leverage without needing a 50‑person platform team.
569
+
570
+
571
+ Contact & links
572
+
573
+ [Metadata]
574
+ Exec summary: Provides primary contact information and online presence links for reaching you or exploring your work.
575
+ Keywords: contact, email, GitHub, portfolio, LinkedIn, location
576
+ Questions:
577
+ 1. What email addresses can be used to contact you personally or for business?
578
+ 2. Where can someone review your code and projects?
579
+ 3. Which portfolio site showcases your broader work?
580
+ 4. What is your LinkedIn profile and current location?
581
+ [/Metadata]
582
+
583
+ Email: [email protected] (personal) | [email protected] (business)
584
+ GitHub: github.com/CodeHalwell
585
+ Portfolio: codehalwell.io
586
+ LinkedIn: linkedin.com/in/danielhalwell
587
+ Location: Northwich, UK
588
+
589
+ Thanks for reading. If there’s something you want to build — or a process that needs unblocking — I’m happy to chat.
590
+ Let’s make the right thing the easy thing.
591
+
592
+ [Metadata]
593
+ Exec summary: Closing invitation encouraging collaboration and reinforcing your philosophy of making the right approach straightforward.
594
+ Keywords: closing note, collaboration invite, philosophy, call to action, accessibility
595
+ Questions:
596
+ 1. What offer do you extend to potential collaborators?
597
+ 2. How do you summarise your approach to solving problems?
598
+ 3. What tone do you set for prospective conversations?
599
+ 4. Why do you emphasise making the right thing easy?
600
+ [/Metadata]
601
+
602
+
603
+ Selected GitHub Repositories (LLM‑Ready Index)
604
+
605
+ [Metadata]
606
+ Exec summary: Introduces the curated list of your GitHub repositories with metadata for LLM-ready indexing, highlighting focus areas and inferred summaries.
607
+ Keywords: GitHub index, repositories, LLM-ready, project catalogue, focus areas, tags
608
+ Questions:
609
+ 1. What is the purpose of this GitHub repositories section?
610
+ 2. How are the repositories categorised and described?
611
+ 3. Which metadata fields accompany each repository listing?
612
+ 4. How does this section support LLM-friendly retrieval?
613
+ [/Metadata]
614
+
615
+ Daniel Halwell — Repositories Index (LLM-Ready)
616
+
617
+ [Metadata]
618
+ Exec summary: Explains the table-like format used to present repository metadata for quick scanning and indexing.
619
+ Keywords: repository format, metadata fields, presentation structure, LLM-ready, quick reference
620
+ Questions:
621
+ 1. How are the repository entries structured for readability?
622
+ 2. Which metadata columns are included for each repository?
623
+ 3. Why is a consistent format important for LLM-ready indexing?
624
+ 4. How does this format help with retrieval tasks?
625
+ [/Metadata]
626
+
627
+
628
+ Selected GitHub Repositories (Organized by Category)
629
+
630
+
631
+ **LLM Utilities & Language Models**
632
+ • yamllm - YAML ↔ LLM interaction utilities
633
+ • simple_rag - Minimal RAG baseline implementation
634
+ • openai-logp-viewer - Log probability inspection and visualization
635
+
636
+ **Agentic Systems & Automation**
637
+ • gradio-mcp-agent-hack - Model Context Protocol experimentation with Gradio
638
+ • agents-for-art - Creative agent orchestration tools
639
+ • n8n-mcp - n8n integration with Model Context Protocol
640
+ • synthetic-data-agent - Automated synthetic data generation
641
+ • research-agent - Deep research workflow automation
642
+ • coding-agent-cli - Command-line coding assistant
643
+ • agentic-ai-engineering - Agent engineering frameworks and patterns
644
+
645
+ **Web Development & Portfolio**
646
+ • CodeHalwell-Portfolio - Personal portfolio site
647
+ • portfolio-codehalwell - Alternative portfolio implementation
648
+ • WeatherApp - Weather API integration with UI
649
+ • web-page-test - Web development experiments
650
+
651
+ **Data Science & Analytics**
652
+ • washing-line-predictor - Weather-informed predictive modeling
653
+ • openai-logp-viewer - Data visualization for LLM analysis
654
+ • arxiv-scraper - Academic paper collection and processing
655
+
656
+ **Healthcare & Specialized Domains**
657
+ • BabelFHIR - FHIR/HL7 healthcare data processing
658
+
659
+ **Learning & Coursework**
660
+ • ibm-build-genai-apps - IBM watsonx platform exploration
661
+ • ibm-python-data-analysis - IBM data analysis certification work
662
+ • llm_engineering-course - LLM engineering fundamentals
663
+ • LLM101n - Large language model foundations
664
+ • DataCamp_DS_Cert - Data science certification projects
665
+ • oaqjp-final-project-emb-ai - Embedded AI final project
666
+
667
+ **Personal Projects & Apps**
668
+ • MyPoppet / poppet - Personal assistant experiments
669
+ • translator-with-voice-and-watsonx - Voice translation with IBM watsonx
670
+
671
+ **Utilities & Experiments**
672
+ • MyGPT - Quick GPT experimentation
673
+ • Grand-Gardens-AI - AI garden management concepts
674
+ • Useful_Scripts - General automation scripts
675
+ • deep-research - Research workflow tools
676
+ • food_review - Food review analysis
677
+ • podcast-censoring - Podcast content filtering
678
+ • playground_series_september2025 - September 2025 coding experiments
679
+ • pallscripting - Scripting utilities
680
+ • deep-learning-illustrated - Deep learning visualization
681
+ • build_own_chatbot_without_open_ai - Non-OpenAI chatbot implementation
682
+ • code_chat_bot - Code-focused chatbot
683
+ • neurIPS-open-polymer - Polymer research collaboration
684
+
685
+ **Repository List for Automation:**
686
+
687
+ repos = [
688
+ "CodeHalwell/yamllm","CodeHalwell/gradio-mcp-agent-hack","CodeHalwell/CodeHalwell-Portfolio",
689
+ "CodeHalwell/MyGPT","CodeHalwell/agents-for-art","CodeHalwell/Grand-Gardens-AI",
690
+ "CodeHalwell/Useful_Scripts","CodeHalwell/MyPoppet","CodeHalwell/deep-research",
691
+ "CodeHalwell/ibm-build-genai-apps","CodeHalwell/n8n-mcp","CodeHalwell/washing-line-predictor",
692
+ "CodeHalwell/portfolio-codehalwell","CodeHalwell/openai-logp-viewer","CodeHalwell/food_review",
693
+ "CodeHalwell/synthetic-data-agent","CodeHalwell/simple_rag","CodeHalwell/ibm-python-data-analysis",
694
+ "CodeHalwell/podcast-censoring","CodeHalwell/playground_series_september2025","CodeHalwell/poppet",
695
+ "CodeHalwell/arxiv-scraper","RanL703/neurIPS-open-polymer","CodeHalwell/WeatherApp",
696
+ "CodeHalwell/research-agent","CodeHalwell/pallscripting","CodeHalwell/deep-learning-illustrated",
697
+ "quotentiroler/BabelFHIR","CodeHalwell/coding-agent-cli","CodeHalwell/llm_engineering-course",
698
+ "CodeHalwell/agentic-ai-engineering","CodeHalwell/translator-with-voice-and-watsonx",
699
+ "CodeHalwell/build_own_chatbot_without_open_ai","CodeHalwell/oaqjp-final-project-emb-ai",
700
+ "CodeHalwell/LLM101n","CodeHalwell/code_chat_bot","CodeHalwell/DataCamp_DS_Cert",
701
+ "CodeHalwell/web-page-test"
702
+ ]
703
+
704
+ [Metadata]
705
+ Exec summary: Supplies a Python list of repository identifiers to support scripted ingestion or indexing workflows.
706
+ Keywords: repository list, Python array, identifiers, automation, ingestion helper
707
+ Questions:
708
+ 1. What data structure is used to enumerate your repositories for automation?
709
+ 2. How many repositories are captured in this list and what patterns do they follow?
710
+ 3. How might this list be used in vector database or indexing pipelines?
711
+ 4. Why is maintaining a consolidated repository list useful for your digital CV?
712
+ [/Metadata]
713
+
714
+
715
+ How I Work & Tools (Consolidated)
716
+
717
+ [Metadata]
718
+ Exec summary: Consolidated overview of your primary languages, data/ML stack, GenAI tooling, service design experience, orchestration platforms, data platforms, and workplace productivity tools.
719
+ Keywords: skills overview, toolchain, programming languages, ML stack, GenAI platforms, orchestration, DevOps, productivity tools
720
+ Questions:
721
+ 1. Which programming languages and core tools do you rely on daily?
722
+ 2. What data and machine learning libraries form your toolkit?
723
+ 3. Which GenAI and agent orchestration platforms do you use?
724
+ 4. What services, APIs, and orchestration methods do you employ?
725
+ 5. Which data platforms and DevOps tools are integral to your workflow?
726
+ 6. How do you manage documentation, project tracking, and operations?
727
+ [/Metadata]
728
+
729
+ • Languages & Core: Python (heavy daily use), SQL, TypeScript (portfolio/UI), Bash.
730
+ • Data & ML: NumPy, Pandas, scikit-learn, XGBoost, PyTorch, PyTorch Geometric; Power BI, Plotly, Matplotlib.
731
+ • GenAI & Agents: OpenAI API, Anthropic, Watsonx; Retrieval (FAISS/Chroma/Qdrant), RAG patterns; tool-use/MCP; CrewAI/AutoGen/SmolAgents; prompt evaluation and structured output with Pydantic/JSON-schema.
732
+ • Services & APIs: FastAPI (typed models via Pydantic), Flask (legacy), REST design; LangGraph-style orchestration patterns.
733
+ • Orchestration: n8n (daily), lightweight cron, Modal, small Dockerized jobs.
734
+ • Data Platforms: Snowflake/SQL; ETL correctness and schema hygiene are non-negotiable.
735
+ • DevOps/Infra: Docker, GitHub Actions, Azure/AWS/GCP basics.
736
+ • Workplace OS: Notion (docs/CRM/case studies), Linear (projects), Google Workspace, Canva, Miro. Accounting via QuickBooks; banking via Starling (sole trader).
737
+
738
+
739
+ Engagement Policy (Ethics & Fit)
740
+
741
+ [Metadata]
742
+ Exec summary: Defines your ethical guidelines for client engagements, categorising red, amber, and green domains with default operating principles.
743
+ Keywords: ethics, engagement policy, red lines, amber considerations, green projects, governance, transparency
744
+ Questions:
745
+ 1. What types of work do you refuse on ethical grounds?
746
+ 2. Which project domains require additional governance before engagement?
747
+ 3. What sectors align well with your ethical stance?
748
+ 4. What default practices do you implement to maintain ethical standards?
749
+ [/Metadata]
750
+
751
+ Red lines: no fossil fuels, weapons/arms, or harmful surveillance/abusive tech; avoid organisations and conflicts that contradict a people-first stance.
752
+ Ambers: ad tech, scraping of private data without consent, high-risk medical claims — require strict scoping, governance and auditability.
753
+ Greens: health & life sciences; education & upskilling; charities & non-profits; SMEs doing practical automation; research tooling.
754
+ Defaults: minimal lock-in; clear IP/licensing; privacy-by-design; evals and guardrails for GenAI; documented handovers with maintainers named.
755
+
756
+
757
+ Teaching & Mentoring
758
+
759
+ [Metadata]
760
+ Exec summary: Summarises your mentoring philosophy rooted in practical explanations, co-designed exercises, and fast feedback loops inspired by teaching experiences in Guyana.
761
+ Keywords: teaching, mentoring, diagrams, notebooks, handovers, feedback loops, user education
762
+ Questions:
763
+ 1. How do you approach teaching and documentation while building?
764
+ 2. What mentoring activities do you participate in at work?
765
+ 3. How did your time in Guyana influence your teaching style?
766
+ 4. Why do you emphasise tight feedback loops and simple interfaces when mentoring?
767
+ [/Metadata]
768
+
769
+ I explain as I build: diagrams, notebooks, READMEs, and handover sessions. I mentor through internal coding groups,
770
+ co-design small exercises, and prefer “show, don’t tell.” My Guyana year made me comfortable teaching under constraints;
771
+ at work I apply the same approach — tight feedback loops, simple interfaces, and momentum.
772
+
773
+
774
+ Personal Tech Lab (Home Server & Experiments)
775
+
776
+ [Metadata]
777
+ Exec summary: Describes your home lab environment, including server setups and experimentation with local LLM fine-tuning under the ChemGemma project.
778
+ Keywords: home lab, Mac Mini server, Raspberry Pi, automations, LLM lab, ChemGemma, fine-tuning, GRPO
779
+ Questions:
780
+ 1. What infrastructure do you maintain for personal experiments?
781
+ 2. Which services run on your home server and why?
782
+ 3. How do you use local hardware to prototype agents and RAG patterns?
783
+ 4. What is ChemGemma, and how did you fine-tune it?
784
+ [/Metadata]
785
+
786
+ I tinker. I run a Mac Mini home server (Jellyfin, n8n, Nextcloud, Pi‑hole, Home Assistant, web servers) and keep a Raspberry Pi 4B (SSD-boot) for small
787
+ automations. It’s where I test agents, RAG patterns, and light-weight services before hardening them for work.
788
+
789
+ I have an LLM lab for local models (RTX 3090). I've successfully run fine tuning and reinforcement learning using an open source Gemma model. I called this
790
+ model ChemGemma having curated a dataset from HuggingFace to perform supervised fine tuning and RL using GRPO to add reasoning.
791
+
792
+
793
+ n8n Systems: Daily arXiv Scraper & RAG Pipeline
794
+
795
+ [Metadata]
796
+ Exec summary: Details your n8n automations for research monitoring and RAG querying, outlining workflow steps, design choices, and resulting impact on knowledge retrieval.
797
+ Keywords: n8n workflows, arXiv scraper, RAG pipeline, automation, Qdrant, structured prompts, evaluation, impact
798
+ Questions:
799
+ 1. What goals do your n8n workflows achieve for research ingestion and querying?
800
+ 2. How is the daily arXiv scraper structured from trigger to vector storage?
801
+ 3. What design choices underpin the RAG query pipeline’s accuracy and guardrails?
802
+ 4. How do these systems improve your productivity and knowledge sharing?
803
+ 5. What impact metrics demonstrate the value of these workflows?
804
+ [/Metadata]
805
+
806
+ I’ve built a set of n8n flows that keep me and my tools up to date with AI/CS/DS research and make that corpus queryable.
807
+
808
+ 1) Daily arXiv Scraper (image: /mnt/data/db5d6387-16a9-4004-8a7b-6663d15217a2.png)
809
+ Goal: pull fresh research (AI/CS/DS), normalise it with an LLM, store summaries/metadata in Notion, and index the text in a vector store for search and RAG.
810
+
811
+ High-level steps in the flow:
812
+ • Schedule Trigger → RSS Read: runs daily, fetching new arXiv entries from feeds I care about.
813
+ • Loop Over Items: iterates papers.
814
+ • Message a model (Message Model): composes a clean prompt per item with extracted metadata.
815
+ • AI Agent (Chat Model + Tools): calls an OpenAI chat model; reaches out via an HTTP Request node when extra info is needed (e.g., to fetch the abstract or PDF link); produces structured JSON (title, authors, abstract, URL, categories, license hints).
816
+ • Structured Output Parser: enforces the schema and catches malformed outputs.
817
+ • If (branch): routes by licence/permissiveness or other policy flags.
818
+ • Create a database page (Notion): two variants of the Notion writer — one for permissive/common licences, another for restricted — so that only permissive-license papers are fully “stored” and enriched (restricted ones get a link-only/metadata card).
819
+ • Merge: folds both branches back into a single stream.
820
+ • Qdrant Vector Store: chunk + embed the permitted text (abstract/fulltext when allowed) using OpenAI embeddings; write vectors and metadata for retrieval later.
821
+ Result: a clean, daily-updated Notion knowledge base + vector index of papers I’m allowed to store, with policy-respecting handling of licences. It’s simple, fast to audit, and easy to extend.
822
+
823
+ 2) RAG Query Pipeline (image: /mnt/data/b503029c-a157-4c48-9a40-5271840d4327.png)
824
+ Goal: ask natural-language questions over the paper corpus with transparent retrieval and guardrails.
825
+
826
+ High-level steps in the flow:
827
+ • Webhook: entry-point for a query (from my portal or CLI).
828
+ • PromptAugmentation: uses a chat model to clean/expand the user prompt (e.g., add synonyms, normalise acronyms) and emits a structured plan via a Structured Output Parser.
829
+ • Code: tiny glue to format search queries and pass control values (k, filters).
830
+ • Loop Over Items: if the plan has multiple sub-queries, iterate them.
831
+ • AI Agent: coordinates two tools — (a) Qdrant Vector Store search with OpenAI embeddings; (b) a Cohere re-ranker for higher precision.
832
+ • Aggregate → Message a model → Respond to Webhook: aggregates top contexts, prompts the model to answer with citations and explicit “what I don’t know,” then returns the response JSON to the caller.
833
+ Design choices:
834
+ • Retrieval is explicit: top-k, distances/scores, and doc IDs logged.
835
+ • Re-ranking improves answer quality without overloading the LLM.
836
+ • Style/guardrails: British spelling, direct tone; citations mandatory; no hallucinated claims beyond the retrieved contexts.
837
+ - I hooked the RAG pipeline up to Telegram, that way, I can put a message in Telegram and it'll start the RAG pipeline, retrive relevant papers
838
+ and then drop the response back in a message a few minutes later.
839
+
840
+ Impact:
841
+ • I don’t waste time manually scanning feeds; new work lands in Notion and the vector store each morning.
842
+ • I can query “What’s new on tool-use/MCP for agents?” and get a grounded answer with links.
843
+ • The same index powers demos and internal RAG utilities — a single source of truth.
844
+
845
+ Email Triage - I get lost in a sea of emails, its so easy to miss things, the answer, have an agent do it for you.
846
+ My triage agent reads through emails and uses a tier system to highlight items for escalations. If an important email comes through I get a message and can take a look.
847
+
848
+ [Metadata]
849
+ Exec summary: Summarises the benefits of your automated research pipelines, highlighting time savings, improved retrieval, and reusable indexes.
850
+ Keywords: impact summary, time savings, retrieval, demos, single source of truth
851
+ Questions:
852
+ 1. How do the automations reduce your manual research effort?
853
+ 2. What querying capabilities do the pipelines unlock?
854
+ 3. How does the indexed corpus serve multiple applications?
855
+ 4. Why is a single source of truth valuable for your workflows?
856
+ [/Metadata]
857
+
858
+
859
+ What I’m Open To
860
+
861
+ [Metadata]
862
+ Exec summary: Lists the types of roles, pro bono work, and collaborations you is interested in pursuing, emphasizing AI engineering and mission-aligned projects.
863
+ Keywords: opportunities, AI engineer roles, pro bono, collaborations, automation, charities
864
+ Questions:
865
+ 1. What kinds of full-time or contract roles are you seeking?
866
+ 2. What pro bono engagements do you offer to charities?
867
+ 3. Which collaborative areas appeal to you for future work?
868
+ 4. How does this section help partners understand fit?
869
+ [/Metadata]
870
+
871
+ • Roles: AI Engineer / ML Engineer / Data Scientist (UK-remote or North West hybrid); full time positions and short build engagements for copilots, RAG, and agentic automations.
872
+ • Pro bono: time-boxed PoCs for UK charities and mission-led organisations.
873
+ • Collaborations: research tooling, open-source scaffolding, educational content.
874
+
875
+
876
+ Why I Love to Code (and the Thing I Obsess About)
877
+
878
+ [Metadata]
879
+ Exec summary: Expresses your passion for coding as the fastest route from problem to improvement, highlighting your focus on bottlenecks and iterative solutions.
880
+ Keywords: passion for coding, problem solving, bottlenecks, iteration, automation, optimisation
881
+ Questions:
882
+ 1. Why do you find coding compelling and energising?
883
+ 2. How do you describe your approach to identifying and solving bottlenecks?
884
+ 3. Which problem types do you align with specific solution strategies (RAG, Bayesian optimisation, APIs)?
885
+ 4. How does iteration and feedback drive your projects?
886
+ [/Metadata]
887
+
888
+ I love writing Python and shipping small tools that unblock people — it’s the quickest route from “problem” to “better.”
889
+ I tend to obsess about the bottleneck: if it’s retrieval, I build RAG; if it’s too many experiments, I reach for Bayesian optimisation;
890
+ if it’s brittle handoffs, I ship a typed API. Solving problems is the through-line for me — discreet questions, clear interfaces,
891
+ quick feedback, and steady iteration.
892
+ I love to help people in general, if someone is struggling with some Python, I like to solve it, if they are having issue with Microsoft COpilot Studio flows,
893
+ I'm more than happy to take a quick look and see what I can do. So far, people really appreciate this approach and I get some really good feedback.
894
+
895
+
896
+ Sport & Judo
897
+
898
+ [Metadata]
899
+ Exec summary: Shares your sporting interests and judo achievements, highlighting discipline, composure, and practical learning skills gained from coaching and competition.
900
+ Keywords: sport, judo, black belt, competition, coaching, discipline, composure, learning by doing
901
+ Questions:
902
+ 1. Which sports do you follow or participate in?
903
+ 2. What level of achievement did you reach in judo and at what age?
904
+ 3. What competitions and coaching experiences shape your discipline and composure?
905
+ 4. How do lessons from judo translate into your daily work habits?
906
+ [/Metadata]
907
+
908
+ I’m into sport — football, rugby, Formula 1, and most things with a scoreboard. As a teenager I was a judoka:
909
+ I earned my black belt at 16 (the youngest age you can), won medals across the country, including a bronze at an
910
+ international competition in London, and trained as a coach for my local club. It taught me discipline, composure under pressure, a
911
+ nd how to learn by doing — lessons I still apply daily.
pyproject.toml ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "digital-cv"
3
+ version = "0.1.0"
4
+ description = "An AI-powered digital CV that allows visitors to chat with Daniel Halwell through an intelligent conversational interface"
5
+ readme = "README.md"
6
+ requires-python = ">=3.11"
7
+ authors = [
8
+ {name = "Daniel Halwell", email = "[email protected]"},
9
+ ]
10
+ classifiers = [
11
+ "Development Status :: 4 - Beta",
12
+ "Intended Audience :: Developers",
13
+ "License :: Other/Proprietary License",
14
+ "Programming Language :: Python :: 3",
15
+ "Programming Language :: Python :: 3.11",
16
+ "Programming Language :: Python :: 3.12",
17
+ ]
18
+ dependencies = [
19
+ "anthropic>=0.49.0",
20
+ "autogen-agentchat>=0.4.9.2",
21
+ "autogen-ext[grpc,mcp,ollama,openai]>=0.4.9.2",
22
+ "bs4>=0.0.2",
23
+ "chroma>=0.2.0",
24
+ "chromadb>=1.1.0",
25
+ "gradio>=5.22.0",
26
+ "httpx>=0.28.1",
27
+ "ipywidgets>=8.1.5",
28
+ "langchain>=1.0.0a9",
29
+ "langchain-community>=0.3.30",
30
+ "langchain-openai>=0.3.33",
31
+ "lxml>=5.3.1",
32
+ "mcp-server-fetch>=2025.1.17",
33
+ "mcp[cli]>=1.5.0",
34
+ "openai>=1.68.2",
35
+ "openai-agents>=0.0.15",
36
+ "playwright>=1.51.0",
37
+ "plotly>=6.0.1",
38
+ "polygon-api-client>=1.14.5",
39
+ "psutil>=7.0.0",
40
+ "pypdf>=5.4.0",
41
+ "pypdf2>=3.0.1",
42
+ "python-dotenv>=1.0.1",
43
+ "requests>=2.32.3",
44
+ "semantic-kernel>=1.25.0",
45
+ "sendgrid>=6.11.0",
46
+ "setuptools>=78.1.0",
47
+ "smithery>=0.1.0",
48
+ "speedtest-cli>=2.1.3",
49
+ "wikipedia>=1.4.0",
50
+ "watchfiles>=0.24.0",
51
+ "huggingface-hub[cli]>=0.35.1",
52
+ ]
53
+
54
+ [dependency-groups]
55
+ dev = [
56
+ "ipykernel>=6.29.5",
57
+ ]
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=5.22.0
2
+ openai>=1.68.2
3
+ python-dotenv>=1.0.1
4
+ requests>=2.32.3
5
+ chromadb>=1.1.0
6
+ langchain-openai>=0.3.33
7
+ langchain-text-splitters>=0.3.0
8
+ pypdf>=5.4.0
9
+
tests/test_query.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ import unittest
2
+ from utils.vector_db import VectorDB
3
+
4
+ class TestQuery(unittest.TestCase):
5
+ def test_query(self):
6
+ vector_db = VectorDB()
7
+ result = vector_db.query("What is my name?")
8
+ self.assertEqual(result["documents"][0], "Daniel Halwell")
utils/__pycache__/app_logging.cpython-311.pyc ADDED
Binary file (2.8 kB). View file
 
utils/__pycache__/chat.cpython-311.pyc ADDED
Binary file (20.5 kB). View file
 
utils/__pycache__/logging.cpython-311.pyc ADDED
Binary file (2.8 kB). View file
 
utils/__pycache__/text_processing.cpython-311.pyc ADDED
Binary file (7.38 kB). View file
 
utils/__pycache__/vector_db.cpython-311.pyc ADDED
Binary file (9.17 kB). View file
 
utils/app_logging.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import os
3
+
4
+
5
+ def setup_logging():
6
+ """Setup logging for the application."""
7
+ global logger
8
+ logger = logging.getLogger(__name__)
9
+ logger.setLevel(logging.INFO)
10
+ # Set common formatter
11
+ _formatter = logging.Formatter("%(asctime)s %(levelname)s %(name)s: %(message)s")
12
+
13
+ # Ensure logs appear in terminal even if root isn't configured
14
+ _has_console = any(isinstance(h, logging.StreamHandler) and not isinstance(h, logging.FileHandler) for h in logger.handlers)
15
+ if not _has_console:
16
+ _console_handler = logging.StreamHandler()
17
+ _console_handler.setLevel(logging.INFO)
18
+ _console_handler.setFormatter(_formatter)
19
+ logger.addHandler(_console_handler)
20
+ logger.propagate = False
21
+
22
+ # Ensure logs are also saved to a file next to this script
23
+ _log_file = os.path.join(os.path.dirname(__file__), "digital-cv.log")
24
+ _has_file = any(isinstance(h, logging.FileHandler) and getattr(h, "baseFilename", "") == _log_file for h in logger.handlers)
25
+ if not _has_file:
26
+ try:
27
+ _file_handler = logging.FileHandler(_log_file)
28
+ _file_handler.setLevel(logging.INFO)
29
+ _file_handler.setFormatter(_formatter)
30
+ logger.addHandler(_file_handler)
31
+ except Exception:
32
+ # If file handler can't be created, continue with console-only logging
33
+ pass
34
+ return logger
35
+
utils/chat.py ADDED
@@ -0,0 +1,424 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dotenv import load_dotenv
4
+ from openai import OpenAI
5
+ import json
6
+ import os
7
+ from typing import List, Dict, Any, Optional
8
+
9
+ from utils.app_logging import setup_logging
10
+ from utils.vector_db import VectorDB
11
+ from utils.tool_calls import record_user_details, record_unknown_question
12
+
13
+
14
+ load_dotenv(override=True)
15
+ logger = setup_logging()
16
+
17
+
18
+ def chat_log(message, history):
19
+ """Save each user and assistant message to a file after each generation.
20
+ This is not a tool, it's just for logging.
21
+ Args:
22
+ message: The message to save.
23
+ history: The chat history to save.
24
+ """
25
+ logger.info(f"Saving chat log: {message}")
26
+ with open("chat_log.txt", "a") as f:
27
+ f.write(f"User: {message}\n")
28
+ f.write(f"Assistant: {history[-1]['content']}\n")
29
+
30
+
31
+ record_user_details_json = {
32
+ "name": "record_user_details",
33
+ "description": "Use this tool to record that a user is interested in being in touch and provided an email address",
34
+ "parameters": {
35
+ "type": "object",
36
+ "properties": {
37
+ "email": {
38
+ "type": "string",
39
+ "description": "The email address of this user",
40
+ },
41
+ "name": {
42
+ "type": "string",
43
+ "description": "The user's name, if they provided it",
44
+ },
45
+ "notes": {
46
+ "type": "string",
47
+ "description": "Any additional information about the conversation that's worth recording to give context",
48
+ },
49
+ },
50
+ "required": ["email"],
51
+ "additionalProperties": False,
52
+ },
53
+ }
54
+
55
+ record_unknown_question_json = {
56
+ "name": "record_unknown_question",
57
+ "description": "Always use this tool to record any question that couldn't be answered as you didn't know the answer",
58
+ "parameters": {
59
+ "type": "object",
60
+ "properties": {
61
+ "question": {
62
+ "type": "string",
63
+ "description": "The question that couldn't be answered",
64
+ },
65
+ },
66
+ "required": ["question"],
67
+ "additionalProperties": False,
68
+ },
69
+ }
70
+
71
+ tools = [
72
+ {
73
+ "type": "function",
74
+ "name": record_user_details_json["name"],
75
+ "description": record_user_details_json["description"],
76
+ "parameters": record_user_details_json["parameters"],
77
+ },
78
+ {
79
+ "type": "function",
80
+ "name": record_unknown_question_json["name"],
81
+ "description": record_unknown_question_json["description"],
82
+ "parameters": record_unknown_question_json["parameters"],
83
+ },
84
+ ]
85
+
86
+ # Chat Completions-compatible tools schema
87
+ chat_tools = [
88
+ {"type": "function", "function": record_user_details_json},
89
+ {"type": "function", "function": record_unknown_question_json},
90
+ ]
91
+
92
+
93
+ class Me:
94
+ def __init__(self):
95
+ """Initialize persona context, vector database, and OpenAI client."""
96
+ self.openai = OpenAI()
97
+ self.name = "Daniel Halwell"
98
+ self.vector_db = VectorDB()
99
+ self.system_context = self._build_system_context()
100
+ self.email = "[email protected]"
101
+
102
+ def _build_system_context(self) -> str:
103
+ """Render a concise persona context from vector store contents."""
104
+
105
+ try:
106
+ peek = self.vector_db.collection.peek(5)
107
+ documents: List[str] = []
108
+ metadatas: List[Dict[str, Any]] = []
109
+ if isinstance(peek, dict):
110
+ documents = peek.get("documents", []) or []
111
+ metadatas = peek.get("metadatas", []) or []
112
+ except Exception as exc:
113
+ logger.error(f"Failed to peek vector DB: {exc}")
114
+ documents, metadatas = [], []
115
+
116
+ combined_entries: List[str] = []
117
+ for text, metadata in zip(documents, metadatas):
118
+ source = (
119
+ metadata.get("source", "unknown")
120
+ if isinstance(metadata, dict)
121
+ else "unknown"
122
+ )
123
+ combined_entries.append(f"Source: {source}\n{text.strip()}")
124
+
125
+ if not combined_entries:
126
+ return (
127
+ "You are Daniel Halwell, a scientist-turned-AI engineer who builds"
128
+ " practical AI tooling, RAG systems, and automations. Be concise,"
129
+ " professional, and acknowledge uncertainty when context is missing."
130
+ )
131
+
132
+ joined = "\n\n".join(combined_entries)
133
+ return (
134
+ "You are provided with an indexed knowledge base about Daniel Halwell."
135
+ " Use it to answer questions faithfully.\n\n" + joined
136
+ )
137
+
138
+ def _compose_retrieval_query(
139
+ self, message: str, history: Optional[List[Dict[str, Any]]]
140
+ ) -> str:
141
+ """Combine current message with recent user turns for retrieval."""
142
+
143
+ recent_user_msgs: List[str] = []
144
+ if history:
145
+ for item in reversed(history):
146
+ if not isinstance(item, dict):
147
+ continue
148
+ if item.get("role") == "user":
149
+ content = item.get("content", "") or ""
150
+ if content.strip():
151
+ recent_user_msgs.append(content.strip())
152
+ if len(recent_user_msgs) >= 2:
153
+ break
154
+ recent_user_msgs.reverse()
155
+ if message.strip():
156
+ recent_user_msgs.append(message.strip())
157
+ return "\n\n".join(recent_user_msgs)
158
+
159
+ def _build_retrieval_context(
160
+ self, message: str, history: Optional[List[Dict[str, Any]]]
161
+ ) -> str:
162
+ """Retrieve relevant knowledge snippets for the given message."""
163
+
164
+ query = self._compose_retrieval_query(message, history)
165
+ if not query:
166
+ return ""
167
+
168
+ try:
169
+ results = self.vector_db.query(
170
+ query,
171
+ k=4,
172
+ include=["documents", "metadatas", "distances"],
173
+ )
174
+ except Exception as exc:
175
+ logger.error(f"Vector DB query failed: {exc}")
176
+ return ""
177
+
178
+ documents = []
179
+ metadatas = []
180
+ distances = []
181
+ if isinstance(results, dict):
182
+ documents = (results.get("documents") or [[" "]])[0]
183
+ metadatas = (results.get("metadatas") or [[{}]])[0]
184
+ distances = (results.get("distances") or [[None]])[0]
185
+
186
+ contexts: List[str] = []
187
+ for idx, (doc, metadata) in enumerate(zip(documents, metadatas)):
188
+ if not doc:
189
+ continue
190
+ source = "unknown"
191
+ if isinstance(metadata, dict):
192
+ source = metadata.get("source") or metadata.get("path") or "unknown"
193
+ chunk_id = metadata.get("chunk_id")
194
+ if chunk_id is not None:
195
+ source = f"{source}#chunk-{chunk_id}"
196
+ score = distances[idx] if idx < len(distances) else None
197
+ score_str = (
198
+ f" (score: {score:.3f})" if isinstance(score, (int, float)) else ""
199
+ )
200
+ snippet = doc.strip().replace("\n\n", "\n")
201
+ contexts.append(f"[{idx + 1}] Source: {source}{score_str}\n{snippet}")
202
+
203
+ if not contexts:
204
+ return ""
205
+
206
+ return "Retrieved knowledge snippets:\n" + "\n\n".join(contexts)
207
+
208
+ def handle_tool_call(self, tool_calls):
209
+ """Execute streamed tool calls and return tool result messages.
210
+
211
+ Args:
212
+ tool_calls: Iterable of tool call objects containing name, arguments, and id.
213
+
214
+ Returns:
215
+ A list of tool result message dicts compatible with the OpenAI responses API.
216
+ """
217
+ results = []
218
+ for tool_call in tool_calls:
219
+ tool_name = tool_call.function.name
220
+ arguments = json.loads(tool_call.function.arguments)
221
+ logger.info(f"Tool called: {tool_name} with arguments: {arguments}")
222
+ tool = globals().get(tool_name)
223
+ result = tool(**arguments) if tool else {}
224
+ results.append(
225
+ {
226
+ "role": "tool",
227
+ "content": json.dumps(result),
228
+ "tool_call_id": tool_call.id,
229
+ }
230
+ )
231
+ return results
232
+
233
+ def system_prompt(self):
234
+ """Construct the system prompt using persona context and vector DB summary."""
235
+
236
+ return f"""
237
+ You are acting as {self.name}. You are answering questions on {self.name}'s website, particularly questions related to {self.name}'s career, background, skills and experience.
238
+ Your responsibility is to represent {self.name} for interactions on the website as faithfully as possible.
239
+ You have access to a retrieval system that stores vetted chunks about {self.name}. Always ground answers in those retrieved contexts.
240
+ Sound warm, upbeat, and conversational — imagine you are chatting with someone you’d happily grab coffee with. Use friendly acknowledgements (e.g. “Great question,” “Happy to share,” “Thanks for asking”) before giving specifics. Keep explanations concise but encouraging, and invite them to follow up or email you if they want deeper detail.
241
+ If you cannot answer confidently, log the question via the record_unknown_question tool and gently mention you’ll circle back.
242
+ Context preview:
243
+ {self.system_context}
244
+ """
245
+
246
+ def chat_guardrails(self, message, history):
247
+ """Return True if the user message is appropriate, False otherwise.
248
+
249
+ Uses an LLM to classify sentiment and appropriateness without any
250
+ allow/deny heuristics. Falls back to True on error.
251
+
252
+ Args:
253
+ message: The latest user message string.
254
+ history: Prior conversation history (unused).
255
+
256
+ Returns:
257
+ Boolean indicating whether the message is appropriate.
258
+ """
259
+ system_msg = (
260
+ "You are a sentiment and safety classifier. First assess sentiment "
261
+ "(positive, neutral, or negative). Then determine if the message is "
262
+ "appropriate for a general audience (no PII (with the exception of email), hate, harassment, sexual, "
263
+ "or illegal content). Output only one token: 'True' if appropriate, "
264
+ "or 'False' if not. Do not output anything else."
265
+ "The only exception to PII is email, which is allowed if it's in the context of the conversation."
266
+ )
267
+ try:
268
+ resp = self.openai.chat.completions.create(
269
+ model="gpt-4o",
270
+ messages=[
271
+ {"role": "system", "content": system_msg},
272
+ {"role": "user", "content": message},
273
+ ],
274
+ temperature=0,
275
+ max_tokens=3,
276
+ )
277
+ raw = (resp.choices[0].message.content or "").strip()
278
+ cleaned = "".join(ch for ch in raw if ch.isalpha()).lower()
279
+ verdict = (
280
+ True if cleaned == "true" else False if cleaned == "false" else True
281
+ )
282
+ logger.info(f"Guardrails response: {raw} -> {verdict}")
283
+ return verdict
284
+ except Exception as e:
285
+ logger.error("Guardrails call failed, defaulting to allowing the message")
286
+ logger.error(f"Exception: {e}")
287
+ return True
288
+
289
+ def chat_guardrails_response(self):
290
+ """Return a standard response for blocked (inappropriate) messages."""
291
+ return (
292
+ "I'm sorry, I can't answer that. Please ask a question that isn't "
293
+ "about sensitive or inappropriate topics."
294
+ )
295
+
296
+ def chat(self, message, history):
297
+ """Generator that streams a chat response and handles tool calls.
298
+
299
+ Args:
300
+ message: The latest user message string.
301
+ history: Prior conversation history as a list of role/content dicts.
302
+
303
+ Returns:
304
+ Yields progressively longer assistant message strings for streaming UI updates.
305
+ """
306
+
307
+ # Sanitize incoming history to only include role/content pairs
308
+ def _sanitize(msg):
309
+ return {"role": msg.get("role"), "content": msg.get("content", "")}
310
+
311
+ retrieval_context = self._build_retrieval_context(message, history)
312
+
313
+ messages = (
314
+ [{"role": "system", "content": self.system_prompt()}]
315
+ + (
316
+ [
317
+ {
318
+ "role": "system",
319
+ "content": (
320
+ "Use the following retrieved snippets when forming your answer."
321
+ f" If they are empty, rely on your general knowledge of Daniel Halwell. If you don't know the answer, log the question via the record_unknown_question tool. My email is {self.email}.\n"
322
+ + retrieval_context
323
+ ),
324
+ }
325
+ ]
326
+ if retrieval_context
327
+ else []
328
+ )
329
+ + [
330
+ _sanitize(m)
331
+ for m in (history or [])
332
+ if isinstance(m, dict) and m.get("role") in {"user", "assistant"}
333
+ ]
334
+ + [{"role": "user", "content": message}]
335
+ )
336
+ logger.info(f"User: {message}")
337
+ if not self.chat_guardrails(message, history):
338
+ yield self.chat_guardrails_response()
339
+ return
340
+ while True:
341
+ stream = self.openai.chat.completions.create(
342
+ model="gpt-5-mini",
343
+ messages=messages,
344
+ tools=chat_tools,
345
+ stream=True,
346
+ )
347
+
348
+ content_accumulated = ""
349
+ streamed_tool_calls = {}
350
+ finish_reason = None
351
+
352
+ for event in stream:
353
+ if not getattr(event, "choices", None):
354
+ continue
355
+ choice = event.choices[0]
356
+ delta = getattr(choice, "delta", None)
357
+ if delta and getattr(delta, "content", None):
358
+ content_accumulated += delta.content
359
+ yield content_accumulated
360
+ # Collect tool call deltas
361
+ if delta and getattr(delta, "tool_calls", None):
362
+ for tc in delta.tool_calls:
363
+ idx = tc.index
364
+ if idx not in streamed_tool_calls:
365
+ streamed_tool_calls[idx] = {
366
+ "id": getattr(tc, "id", None),
367
+ "name": None,
368
+ "arguments": "",
369
+ }
370
+ func = getattr(tc, "function", None)
371
+ if func and getattr(func, "name", None):
372
+ streamed_tool_calls[idx]["name"] = func.name
373
+ if func and getattr(func, "arguments", None):
374
+ streamed_tool_calls[idx]["arguments"] += func.arguments
375
+ if getattr(choice, "finish_reason", None):
376
+ finish_reason = choice.finish_reason
377
+ break
378
+ # If the model wants tool calls, execute them and continue the loop
379
+ if finish_reason == "tool_calls" and streamed_tool_calls:
380
+ # Build assistant tool_call message stub
381
+ assistant_tool_msg = {
382
+ "role": "assistant",
383
+ "tool_calls": [
384
+ {
385
+ "id": item.get("id") or f"call_{idx}",
386
+ "type": "function",
387
+ "function": {
388
+ "name": item["name"],
389
+ "arguments": item.get("arguments", ""),
390
+ },
391
+ }
392
+ for idx, item in sorted(streamed_tool_calls.items())
393
+ ],
394
+ }
395
+ logger.info(f"Assistant tool message: {assistant_tool_msg}")
396
+ # Convert to handle_tool_call inputs
397
+ tool_calls_for_handler = []
398
+ for idx, item in sorted(streamed_tool_calls.items()):
399
+
400
+ class ToolCall:
401
+ def __init__(self, name, arguments, id):
402
+ self.function = type("Function", (), {})()
403
+ self.function.name = name
404
+ self.function.arguments = arguments
405
+ self.id = id
406
+
407
+ logger.info(f"Tool call for handler: {item}")
408
+ tool_calls_for_handler.append(
409
+ ToolCall(
410
+ name=item["name"],
411
+ arguments=item.get("arguments", ""),
412
+ id=item.get("id") or f"call_{idx}",
413
+ )
414
+ )
415
+ logger.info(f"Tool calls for handler: {tool_calls_for_handler}")
416
+ results = self.handle_tool_call(tool_calls_for_handler)
417
+ messages.append(assistant_tool_msg)
418
+ messages.extend(results)
419
+ logger.info(f"Messages: {messages}")
420
+ chat_log(message, messages)
421
+ continue
422
+
423
+ logger.info(f"Assistant final response: {content_accumulated}")
424
+ return
utils/create_vector_db.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ if __name__ == "__main__":
2
+ import sys
3
+ from pathlib import Path
4
+
5
+ # Ensure project root is on sys.path when running as a script
6
+ project_root = Path(__file__).resolve().parent.parent
7
+ if str(project_root) not in sys.path:
8
+ sys.path.insert(0, str(project_root))
9
+
10
+ from utils.text_processing import DocumentProcessing
11
+
12
+ def main():
13
+ document_processing = DocumentProcessing()
14
+ document_processing.create_vector_db_from_directory("me")
15
+
16
+
17
+ if __name__ == "__main__":
18
+ main()
19
+
utils/text_processing.py ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Iterable, Sequence
4
+
5
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
6
+ from langchain_openai import OpenAIEmbeddings
7
+ from langchain_community.document_loaders.pdf import PyPDFLoader
8
+ from langchain_community.document_loaders.text import TextLoader
9
+ import os
10
+ import dotenv
11
+ import sys
12
+ from pathlib import Path
13
+
14
+ project_root = Path(__file__).resolve().parent.parent
15
+ if str(project_root) not in sys.path:
16
+ sys.path.insert(0, str(project_root))
17
+
18
+ from utils.vector_db import VectorDB
19
+
20
+ dotenv.load_dotenv()
21
+
22
+ class DocumentProcessing:
23
+ def __init__(self):
24
+ self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
25
+ self.embeddings = OpenAIEmbeddings(
26
+ model="text-embedding-3-large", api_key=os.getenv("OPENAI_API_KEY")
27
+ )
28
+ self.vector_db = VectorDB(embedding_model=self.embeddings)
29
+
30
+ def split_text(self, document):
31
+ """Split document text into chunks"""
32
+ if isinstance(document, list):
33
+ # Handle list of documents from loaders
34
+ all_texts = []
35
+ for doc in document:
36
+ texts = self.text_splitter.split_text(doc.page_content)
37
+ all_texts.extend(texts)
38
+ return all_texts
39
+ else:
40
+ # Handle raw text string
41
+ texts = self.text_splitter.split_text(document)
42
+ return texts
43
+
44
+ def embed_text(self, texts: Sequence[str]):
45
+ """Generate embeddings for text chunks."""
46
+ return self.embeddings.embed_documents(list(texts))
47
+
48
+ def create_vector_db(self, texts, metadata=None):
49
+ """Add texts to vector database"""
50
+ if metadata is None:
51
+ metadata = [{"source": "unknown"} for _ in texts]
52
+
53
+ embeddings = self.embed_text(texts)
54
+
55
+ # Create documents with IDs
56
+ documents = list(texts)
57
+ metadatas = list(metadata)
58
+ ids = [f"doc_{i}" for i in range(len(documents))]
59
+
60
+ self.vector_db.add_documents(
61
+ documents=documents,
62
+ metadatas=metadatas,
63
+ ids=ids,
64
+ embeddings=embeddings,
65
+ )
66
+
67
+ def create_vector_db_from_file(self, file_path):
68
+ """Process a single file and add to vector database"""
69
+ if not os.path.exists(file_path):
70
+ raise FileNotFoundError(f"File not found: {file_path}")
71
+
72
+ if file_path.endswith(".pdf"):
73
+ loader = PyPDFLoader(file_path)
74
+ elif file_path.endswith(".txt"):
75
+ loader = TextLoader(file_path)
76
+ else:
77
+ raise ValueError(f"Unsupported file type: {file_path}")
78
+
79
+ documents = loader.load()
80
+ texts = self.split_text(documents)
81
+
82
+ # Create metadata for each chunk
83
+ metadata = [{"source": file_path, "chunk_id": i} for i in range(len(texts))]
84
+
85
+ self.create_vector_db(texts, metadata)
86
+ return self.vector_db
87
+
88
+ def create_vector_db_from_directory(self, directory_path):
89
+ """Process all supported files in a directory"""
90
+ if not os.path.exists(directory_path):
91
+ raise FileNotFoundError(f"Directory not found: {directory_path}")
92
+
93
+ supported_extensions = [".pdf", ".txt"]
94
+ processed_files = 0
95
+
96
+ for file in os.listdir(directory_path):
97
+ file_path = os.path.join(directory_path, file)
98
+
99
+ # Skip directories
100
+ if os.path.isdir(file_path):
101
+ continue
102
+
103
+ # Check if file has supported extension
104
+ if any(file.endswith(ext) for ext in supported_extensions):
105
+ try:
106
+ self.create_vector_db_from_file(file_path)
107
+ processed_files += 1
108
+ print(f"Processed: {file}")
109
+ except Exception as e:
110
+ print(f"Error processing {file}: {str(e)}")
111
+ else:
112
+ print(f"Skipping unsupported file type: {file}")
113
+
114
+ print(f"Successfully processed {processed_files} files")
115
+ return self.vector_db
116
+
utils/tool_calls.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import requests
4
+ from utils.app_logging import setup_logging
5
+
6
+ logger = setup_logging()
7
+
8
+ def push(text):
9
+ """Send a Pushover notification.
10
+
11
+ Args:
12
+ text: The message text to send.
13
+ """
14
+ try:
15
+ logger.info(f"Sending Pushover notification: {text}")
16
+ requests.post(
17
+ "https://api.pushover.net/1/messages.json",
18
+ data={
19
+ "token": os.getenv("PUSHOVER_TOKEN"),
20
+ "user": os.getenv("PUSHOVER_USER"),
21
+ "message": text,
22
+ },
23
+ timeout=10,
24
+ )
25
+ except Exception as e:
26
+ # Silently ignore notification failures to avoid impacting UX
27
+ logger.error(f"Failed to send Pushover notification: {e}")
28
+ pass
29
+
30
+
31
+ def record_user_details(email, name="Name not provided", notes="not provided"):
32
+ """Record a user's contact details via push notification.
33
+
34
+ Args:
35
+ email: The user's email address.
36
+ name: The user's name, if provided.
37
+ notes: Additional context to record.
38
+
39
+ Returns:
40
+ A dictionary indicating success, e.g., {"recorded": "ok"}.
41
+ """
42
+ logger.info(f"Recording {name} with email {email} and notes {notes}")
43
+ push(f"Recording {name} with email {email} and notes {notes}")
44
+ return {"recorded": "ok"}
45
+
46
+ def record_unknown_question(question):
47
+ """Record an unanswered user question via push notification.
48
+
49
+ Args:
50
+ question: The question that couldn't be answered.
51
+
52
+ Returns:
53
+ A dictionary indicating success, e.g., {"recorded": "ok"}.
54
+ """
55
+ logger.info(f"Recording {question}")
56
+ push(f"Recording {question}")
57
+ return {"recorded": "ok"}
utils/vector_db.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import os
4
+ from pathlib import Path
5
+ from typing import Any, Iterable, Optional, Sequence
6
+
7
+ import chromadb as cdb
8
+ import dotenv
9
+ from langchain_openai import OpenAIEmbeddings
10
+ from utils.text_processing import DocumentProcessing
11
+
12
+
13
+ dotenv.load_dotenv()
14
+
15
+
16
+ def _default_storage_path() -> str:
17
+ """Return on-disk location for the Chroma persistent client."""
18
+
19
+ env_path = os.getenv("VECTOR_DB_PATH")
20
+ if env_path:
21
+ return env_path
22
+
23
+ project_root = Path(__file__).resolve().parent.parent
24
+ storage_dir = project_root / "data" / "chroma"
25
+ storage_dir.mkdir(parents=True, exist_ok=True)
26
+ return str(storage_dir)
27
+
28
+
29
+ class VectorDB:
30
+ """Light wrapper around a persistent Chroma collection."""
31
+
32
+ def __init__(
33
+ self,
34
+ *,
35
+ collection_name: str = "me_profile",
36
+ persist_directory: Optional[str] = None,
37
+ embedding_model: Optional[OpenAIEmbeddings] = None,
38
+ ) -> None:
39
+ self.persist_directory = persist_directory or _default_storage_path()
40
+ self.client = cdb.PersistentClient(path=self.persist_directory)
41
+
42
+ try:
43
+ self.collection = self.client.get_or_create_collection(collection_name)
44
+ except Exception:
45
+ # Fallback for older Chroma versions
46
+ self.collection = self.client.create_collection(collection_name)
47
+
48
+ self.embedding_model = embedding_model or OpenAIEmbeddings(
49
+ model="text-embedding-3-large",
50
+ api_key=os.getenv("OPENAI_API_KEY"),
51
+ )
52
+
53
+ # Auto-initialize from 'me/' if empty and the directory exists
54
+ try:
55
+ if self.collection.count() == 0 and os.path.isdir(os.path.join(Path(__file__).resolve().parent.parent, "me")):
56
+ try:
57
+ # Avoid circular import issues and heavy work if OPENAI key missing
58
+ if os.getenv("OPENAI_API_KEY"):
59
+ dp = DocumentProcessing()
60
+ dp.create_vector_db_from_directory(os.path.join(Path(__file__).resolve().parent.parent, "me"))
61
+ except Exception:
62
+ # If auto-build fails, continue with empty DB; app will still run
63
+ pass
64
+ except Exception:
65
+ pass
66
+
67
+ # ------------------------------------------------------------------
68
+ # Document ingestion helpers
69
+ # ------------------------------------------------------------------
70
+ def add_documents(
71
+ self,
72
+ documents: Sequence[str],
73
+ *,
74
+ metadatas: Optional[Sequence[dict[str, Any]]] = None,
75
+ ids: Optional[Sequence[str]] = None,
76
+ embeddings: Optional[Sequence[Sequence[float]]] = None,
77
+ ) -> None:
78
+ """Add documents to the Chroma collection."""
79
+
80
+ documents = list(documents)
81
+ if not documents:
82
+ return
83
+
84
+ count = len(documents)
85
+
86
+ if metadatas is None:
87
+ metadatas = [{} for _ in range(count)]
88
+ if ids is None:
89
+ ids = [f"doc_{i}" for i in range(count)]
90
+
91
+ if embeddings is None:
92
+ embeddings = self.embedding_model.embed_documents(list(documents))
93
+
94
+ self.collection.add(
95
+ documents=documents,
96
+ metadatas=list(metadatas),
97
+ ids=list(ids),
98
+ embeddings=list(embeddings),
99
+ )
100
+
101
+ # ------------------------------------------------------------------
102
+ # Query helpers
103
+ # ------------------------------------------------------------------
104
+ def query(
105
+ self,
106
+ query_texts: Iterable[str],
107
+ *,
108
+ k: int = 5,
109
+ include: Optional[Sequence[str]] = None,
110
+ ) -> dict[str, Any]:
111
+ """Query the collection with one or more natural-language strings."""
112
+
113
+ if isinstance(query_texts, str):
114
+ query_texts = [query_texts]
115
+ else:
116
+ query_texts = list(query_texts)
117
+
118
+ if not query_texts:
119
+ raise ValueError("query_texts must contain at least one string")
120
+
121
+ query_embeddings = self.embedding_model.embed_documents(list(query_texts))
122
+
123
+ return self.collection.query(
124
+ query_texts=list(query_texts),
125
+ query_embeddings=list(query_embeddings),
126
+ n_results=k,
127
+ include=include,
128
+ )
129
+
130
+ # ------------------------------------------------------------------
131
+ # Thin wrappers around underlying collection methods
132
+ # ------------------------------------------------------------------
133
+ def upsert(
134
+ self,
135
+ documents: Sequence[str],
136
+ *,
137
+ metadatas: Optional[Sequence[dict[str, Any]]] = None,
138
+ ids: Optional[Sequence[str]] = None,
139
+ embeddings: Optional[Sequence[Sequence[float]]] = None,
140
+ ) -> None:
141
+ if embeddings is None:
142
+ embeddings = self.embedding_model.embed_documents(list(documents))
143
+ self.collection.upsert(
144
+ documents=list(documents),
145
+ metadatas=list(metadatas) if metadatas is not None else None,
146
+ ids=list(ids) if ids is not None else None,
147
+ embeddings=list(embeddings),
148
+ )
149
+
150
+ def delete(self, ids: Sequence[str]) -> None:
151
+ self.collection.delete(ids=list(ids))
152
+
153
+ def update(
154
+ self,
155
+ ids: Sequence[str],
156
+ documents: Optional[Sequence[str]] = None,
157
+ metadatas: Optional[Sequence[dict[str, Any]]] = None,
158
+ embeddings: Optional[Sequence[Sequence[float]]] = None,
159
+ ) -> None:
160
+ if documents is not None and embeddings is None:
161
+ embeddings = self.embedding_model.embed_documents(list(documents))
162
+ self.collection.update(
163
+ ids=list(ids),
164
+ documents=list(documents) if documents is not None else None,
165
+ metadatas=list(metadatas) if metadatas is not None else None,
166
+ embeddings=list(embeddings) if embeddings is not None else None,
167
+ )
168
+
169
+ def get(self, ids: Sequence[str]) -> dict[str, Any]:
170
+ return self.collection.get(ids=list(ids))
171
+
172
+ def count(self) -> int:
173
+ return self.collection.count()
174
+
175
+ def list(self) -> list[str]:
176
+ return self.collection.list()
177
+
178
+ def delete_all(self) -> None:
179
+ self.collection.delete()
180
+
181
+ def get_all(self) -> dict[str, Any]:
182
+ return self.collection.get()
183
+
184
+ def get_all_metadata(self) -> list[dict[str, Any]]:
185
+ return self.collection.get(include=["metadatas"]) # type: ignore[return-value]
186
+
187
+ def get_all_ids(self) -> list[str]:
188
+ return self.collection.get(include=["ids"]).get("ids", []) # type: ignore[assignment]
189
+
190
+ def get_all_texts(self) -> list[str]:
191
+ return self.collection.get(include=["documents"]).get("documents", []) # type: ignore[assignment]
192
+
193
+ def get_all_embeddings(self) -> list[list[float]]:
194
+ return self.collection.get(include=["embeddings"]).get("embeddings", []) # type: ignore[assignment]
uv.lock ADDED
The diff for this file is too large to render. See raw diff