MoizK commited on
Commit
c7dc5b8
·
verified ·
1 Parent(s): 9979d01

initital commit

Browse files
Files changed (8) hide show
  1. .dockerignore +30 -0
  2. Dockerfile +47 -0
  3. README.md +106 -0
  4. chainlit.md +11 -0
  5. download_assets.py +51 -0
  6. ingest.py +28 -0
  7. model.py +128 -0
  8. requirements.txt +12 -0
.dockerignore ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Git
2
+ .git
3
+ .gitignore
4
+ .gitattributes
5
+
6
+ # Python
7
+ __pycache__/
8
+ *.py[cod]
9
+ *$py.class
10
+ *.so
11
+ .Python
12
+ venv/
13
+ ENV/
14
+
15
+ # Environment
16
+ .env
17
+ .venv
18
+
19
+ # IDE
20
+ .vscode/
21
+ .idea/
22
+
23
+ # Chainlit
24
+ .chainlit/
25
+
26
+ # Misc
27
+ .DS_Store
28
+ *.log
29
+ README.md
30
+ LICENSE
Dockerfile ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.10 slim image as base
2
+ FROM python:3.10-slim
3
+
4
+ # Install system dependencies
5
+ RUN apt-get update && \
6
+ apt-get install -y \
7
+ build-essential \
8
+ git \
9
+ poppler-utils \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Set working directory
13
+ WORKDIR /app
14
+
15
+ # Set environment variables
16
+ ENV PYTHONUNBUFFERED=1
17
+ ENV TRANSFORMERS_CACHE=/app/model_cache
18
+ ENV HF_HOME=/app/model_cache
19
+ ENV TORCH_HOME=/app/model_cache
20
+ ENV CHAINLIT_HOST=0.0.0.0
21
+ ENV CHAINLIT_PORT=7860
22
+
23
+ # Install Python dependencies
24
+ COPY requirements.txt .
25
+ RUN pip install --no-cache-dir -r requirements.txt
26
+
27
+ # Create necessary directories
28
+ RUN mkdir -p /app/model_cache /app/vectorstore/db_faiss /app/data
29
+
30
+ # Copy application files
31
+ COPY model.py ingest.py chainlit.md download_assets.py ./
32
+
33
+ # Download models and cache them
34
+ RUN python -c "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM; \
35
+ AutoTokenizer.from_pretrained('google/flan-t5-base'); \
36
+ AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base'); \
37
+ from sentence_transformers import SentenceTransformer; \
38
+ SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
39
+
40
+ # Download assets from Hugging Face Hub
41
+ RUN python download_assets.py
42
+
43
+ # Expose the port Chainlit runs on
44
+ EXPOSE 7860
45
+
46
+ # Run the Chainlit application
47
+ CMD ["chainlit", "run", "model.py", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MindMedic - AI Mental Health Assistant 🧠
2
+
3
+ MindMedic is an AI-powered mental health diagnostic assistant built using FLAN-T5 and LangChain. It helps users understand potential mental health concerns by providing evidence-based information and preliminary insights based on trusted mental health resources.
4
+
5
+ ## Table of Contents
6
+
7
+ - [MindMedic - AI Mental Health Assistant 🧠](#mindmedic---ai-mental-health-assistant-)
8
+ - [Table of Contents](#table-of-contents)
9
+ - [Introduction](#introduction)
10
+ - [Features](#features)
11
+ - [Prerequisites](#prerequisites)
12
+ - [Installation](#installation)
13
+ - [Usage](#usage)
14
+ - [Important Note](#important-note)
15
+ - [Emergency Resources:](#emergency-resources)
16
+ - [Contributing](#contributing)
17
+
18
+ ## Introduction
19
+
20
+ MindMedic leverages advanced language models and vector stores to provide informative responses to mental health-related queries. It processes and understands a curated collection of mental health resources to offer reliable, evidence-based information about various mental health conditions, symptoms, and general mental wellness topics.
21
+
22
+ ## Features
23
+
24
+ - 🤖 Powered by Google's FLAN-T5 language model
25
+ - 📚 Knowledge base built from trusted mental health resources
26
+ - 💡 Provides evidence-based responses with sources
27
+ - 🔍 Semantic search capabilities for accurate information retrieval
28
+ - 💻 User-friendly chat interface powered by Chainlit
29
+ - 🔒 Runs locally for privacy
30
+
31
+ ## Prerequisites
32
+
33
+ Before setting up MindMedic, ensure you have:
34
+
35
+ - Python 3.6 or higher
36
+ - pip (Python package manager)
37
+ - 4GB+ RAM recommended
38
+ - CPU with x86_64 architecture
39
+
40
+ ## Installation
41
+
42
+ 1. Clone this repository:
43
+ ```bash
44
+ git clone https://github.com/your-username/MindMedic.git
45
+ cd MindMedic
46
+ ```
47
+
48
+ 2. Create and activate a virtual environment:
49
+ ```bash
50
+ python -m venv venv
51
+ # On Windows
52
+ venv\Scripts\activate
53
+ # On Unix or MacOS
54
+ source venv/bin/activate
55
+ ```
56
+
57
+ 3. Install required packages:
58
+ ```bash
59
+ pip install -r requirements.txt
60
+ ```
61
+
62
+ 4. Prepare the knowledge base:
63
+ ```bash
64
+ python ingest.py
65
+ ```
66
+
67
+ ## Usage
68
+
69
+ 1. Start the MindMedic chatbot:
70
+ ```bash
71
+ chainlit run model.py -w
72
+ ```
73
+
74
+ 2. Open your web browser and navigate to `http://localhost:8000`
75
+
76
+ 3. Start interacting with MindMedic by asking mental health-related questions
77
+
78
+ Example queries:
79
+ - "What are the common symptoms of anxiety?"
80
+ - "How can I tell if I'm experiencing depression?"
81
+ - "What are some coping strategies for stress?"
82
+ - "Can you explain what panic attacks feel like?"
83
+
84
+ ## Important Note
85
+
86
+ ⚠️ **Disclaimer**: MindMedic is an AI assistant designed to provide information and general guidance about mental health topics. It is NOT a replacement for professional mental health care. Always consult with qualified mental health professionals for diagnosis and treatment. In case of emergency, contact your local emergency services or mental health crisis hotline immediately.
87
+
88
+ ### Emergency Resources:
89
+ - National Suicide Prevention Lifeline (US): 988
90
+ - Crisis Text Line: Text HOME to 741741
91
+ - Find local mental health resources: [NAMI HelpLine](https://www.nami.org/help)
92
+
93
+ ## Contributing
94
+
95
+ Contributions to improve MindMedic are welcome! To contribute:
96
+
97
+ 1. Fork the repository
98
+ 2. Create a feature branch
99
+ 3. Make your changes
100
+ 4. Submit a pull request
101
+
102
+ Please ensure your contributions align with mental health best practices and maintain the focus on providing accurate, helpful information.
103
+
104
+ ---
105
+
106
+ Built with ❤️ for mental health awareness and support. Remember, it's okay to not be okay, and seeking help is a sign of strength.
chainlit.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Welcome to MindMate! 🚀🤖
2
+
3
+ Hi there, 👋 and welcome to **MindMate**, your AI-powered mental health support assistant. This bot is designed to help you get reliable, evidence-based answers to questions about mental well-being—whether it’s about managing anxiety, coping strategies for stress, or understanding depression.
4
+
5
+ ## Useful Links 🔗
6
+
7
+ - **Knowledge Base:** All the mental health guides, fact sheets, and clinical resources we’ve ingested to power MindMate. Explore our source documents here: [Mental Health Knowledge Base](vectorstore/db_faiss) 📚
8
+ - **Project Repository:** View the code, contribute enhancements, or report issues on GitHub: [Llama2-Medical-Chatbot](https://github.com/AIAnytime/Llama2-Medical-Chatbot) 💻
9
+
10
+ Take care of your mind and happy chatting! 🧠😊
11
+
download_assets.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from huggingface_hub import hf_hub_download
2
+ import os
3
+
4
+ def download_assets():
5
+ """Download necessary assets from Hugging Face Hub"""
6
+ # Create directories if they don't exist
7
+ os.makedirs('data', exist_ok=True)
8
+ os.makedirs('vectorstore/db_faiss', exist_ok=True)
9
+
10
+ # Dataset repository ID
11
+ repo_id = "MoizK/mindmedic-assets"
12
+
13
+ # Download PDF files
14
+ pdf_files = [
15
+ "71763-gale-encyclopedia-of-medicine.-vol.-1.-2nd-ed.pdf",
16
+ "Depression-NIM-2024.pdf",
17
+ "Depression-and-Other-Common-Mental-Disorders-Global-Health-Estimates.pdf",
18
+ "Doing-What-Matters-in-Times-of-Stress.pdf",
19
+ "Generalized-Anxiety-Disorder-When-Worry-Gets-Out-of-Control.pdf",
20
+ "WHO-mhGAP-Intervention-Guide-v2.pdf",
21
+ "social-anxiety-disorder-more-than-just-shyness.pdf"
22
+ ]
23
+
24
+ for pdf_file in pdf_files:
25
+ try:
26
+ hf_hub_download(
27
+ repo_id=repo_id,
28
+ filename=f"data/{pdf_file}",
29
+ local_dir=".",
30
+ local_dir_use_symlinks=False
31
+ )
32
+ print(f"Downloaded {pdf_file}")
33
+ except Exception as e:
34
+ print(f"Error downloading {pdf_file}: {e}")
35
+
36
+ # Download FAISS index files
37
+ index_files = ["index.faiss", "index.pkl"]
38
+ for index_file in index_files:
39
+ try:
40
+ hf_hub_download(
41
+ repo_id=repo_id,
42
+ filename=f"vectorstore/db_faiss/{index_file}",
43
+ local_dir=".",
44
+ local_dir_use_symlinks=False
45
+ )
46
+ print(f"Downloaded {index_file}")
47
+ except Exception as e:
48
+ print(f"Error downloading {index_file}: {e}")
49
+
50
+ if __name__ == "__main__":
51
+ download_assets()
ingest.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain_community.embeddings import HuggingFaceEmbeddings
2
+ from langchain_community.vectorstores import FAISS
3
+ from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
4
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
5
+
6
+ DATA_PATH = 'data/'
7
+ DB_FAISS_PATH = 'vectorstore/db_faiss'
8
+
9
+ # Create vector database
10
+ def create_vector_db():
11
+ loader = DirectoryLoader(DATA_PATH,
12
+ glob='*.pdf',
13
+ loader_cls=PyPDFLoader)
14
+
15
+ documents = loader.load()
16
+ text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,
17
+ chunk_overlap=50)
18
+ texts = text_splitter.split_documents(documents)
19
+
20
+ embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2',
21
+ model_kwargs={'device': 'cpu'})
22
+
23
+ db = FAISS.from_documents(texts, embeddings)
24
+ db.save_local(DB_FAISS_PATH)
25
+
26
+ if __name__ == "__main__":
27
+ create_vector_db()
28
+
model.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain.prompts import PromptTemplate
2
+ from langchain_community.embeddings import HuggingFaceEmbeddings
3
+ from langchain_community.vectorstores import FAISS
4
+ from langchain.llms import HuggingFacePipeline
5
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
6
+ from langchain.chains import RetrievalQA
7
+ import chainlit as cl
8
+ from dotenv import load_dotenv
9
+ import torch
10
+ import os
11
+
12
+ load_dotenv()
13
+
14
+ DB_FAISS_PATH = 'vectorstore/db_faiss'
15
+
16
+ # Prompt Template
17
+ custom_prompt_template = """Use the following pieces of information to answer the user's question.
18
+ If you don't know the answer, just say that you don't know, don't try to make up an answer.
19
+
20
+ Context: {context}
21
+ Question: {question}
22
+
23
+ Only return the helpful answer below and nothing else.
24
+ Helpful answer:
25
+ """
26
+
27
+ def set_custom_prompt():
28
+ prompt = PromptTemplate(template=custom_prompt_template,
29
+ input_variables=['context', 'question'])
30
+ return prompt
31
+
32
+ # Create RetrievalQA chain
33
+ def retrieval_qa_chain(llm, prompt, db):
34
+ qa_chain = RetrievalQA.from_chain_type(
35
+ llm=llm,
36
+ chain_type='stuff',
37
+ retriever=db.as_retriever(search_kwargs={'k': 2}),
38
+ return_source_documents=True,
39
+ chain_type_kwargs={'prompt': prompt}
40
+ )
41
+ return qa_chain
42
+
43
+ # Load Hugging Face LLM
44
+ def load_llm():
45
+ # Load model and tokenizer
46
+ tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
47
+ model = AutoModelForSeq2SeqLM.from_pretrained(
48
+ "google/flan-t5-base",
49
+ device_map="cpu",
50
+ torch_dtype=torch.float32
51
+ )
52
+
53
+ # Create text-generation pipeline without invalid parameters
54
+ pipe = pipeline(
55
+ "text2text-generation",
56
+ model=model,
57
+ tokenizer=tokenizer,
58
+ max_new_tokens=512,
59
+ repetition_penalty=1.15
60
+ )
61
+
62
+ # Create LangChain wrapper for the pipeline
63
+ llm = HuggingFacePipeline(pipeline=pipe)
64
+ return llm
65
+
66
+ # Build full chatbot pipeline
67
+ def qa_bot():
68
+ embeddings = HuggingFaceEmbeddings(
69
+ model_name="sentence-transformers/all-MiniLM-L6-v2",
70
+ model_kwargs={'device': 'cpu'}
71
+ )
72
+ db = FAISS.load_local(
73
+ DB_FAISS_PATH,
74
+ embeddings,
75
+ allow_dangerous_deserialization=True
76
+ )
77
+
78
+ llm = load_llm()
79
+ qa_prompt = set_custom_prompt()
80
+ qa = retrieval_qa_chain(llm, qa_prompt, db)
81
+ return qa
82
+
83
+ # Run for one query (used internally)
84
+ def final_result(query):
85
+ qa_result = qa_bot()
86
+ response = qa_result({'query': query})
87
+ return response
88
+
89
+ # Chainlit UI - Start
90
+ @cl.on_chat_start
91
+ async def start():
92
+ chain = qa_bot()
93
+ msg = cl.Message(content="Starting the bot...")
94
+ await msg.send()
95
+ msg.content = "Hi, Welcome to MindMate. What is your query?"
96
+ await msg.update()
97
+ cl.user_session.set("chain", chain)
98
+
99
+ # Chainlit UI - Handle messages
100
+ @cl.on_message
101
+ async def main(message: cl.Message):
102
+ chain = cl.user_session.get("chain")
103
+ cb = cl.AsyncLangchainCallbackHandler(
104
+ stream_final_answer=True, answer_prefix_tokens=["FINAL", "ANSWER"]
105
+ )
106
+ cb.answer_reached = True
107
+
108
+ # Use invoke with proper query format
109
+ res = await cl.make_async(chain.invoke)(
110
+ {"query": message.content},
111
+ callbacks=[cb]
112
+ )
113
+
114
+ # Extract result and sources from the response
115
+ answer = res.get("result", "No result found")
116
+ sources = res.get("source_documents", [])
117
+
118
+ # Format sources to show only the content
119
+ if sources:
120
+ formatted_sources = []
121
+ for source in sources:
122
+ if hasattr(source, 'page_content'):
123
+ formatted_sources.append(source.page_content.strip())
124
+
125
+ if formatted_sources:
126
+ answer = f"{answer}\n\nBased on the following information:\n" + "\n\n".join(formatted_sources)
127
+
128
+ await cl.Message(content=answer).send()
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pypdf>=3.0.0
2
+ langchain>=0.1.0
3
+ torch>=2.0.0
4
+ transformers>=4.30.0
5
+ accelerate>=0.20.0
6
+ bitsandbytes>=0.41.0
7
+ sentence-transformers>=2.2.0
8
+ faiss-cpu>=1.7.0
9
+ chainlit>=0.7.0
10
+ huggingface-hub>=0.19.0
11
+ langchain-community>=0.0.10
12
+ python-dotenv>=1.0.0