astra / Astra Project Setup Instructions.md
suryadev1's picture
Update Astra Project Setup Instructions.md
d58e19e verified
# Astra Project Setup Instructions
## Prerequisites
Make sure you have the following installed before proceeding:
- Python 3.12.4
- Git
- Git Large File Storage (LFS)
## Step 1: Install Git LFS
Git LFS (Large File Storage) is required for managing large files in the Astra project. Follow these steps to install Git LFS:
### Windows
1. Download the Git LFS installer from [Git LFS Releases](https://git-lfs.github.com/).
2. Run the installer and follow the setup instructions.
3. Open a terminal (Command Prompt or PowerShell) and run:
```sh
git lfs install
```
### macOS
1. Install Git LFS using Homebrew:
```sh
brew install git-lfs
```
2. Initialize Git LFS:
```sh
git lfs install
```
### Linux
1. Install Git LFS using your package manager:
- Debian/Ubuntu:
```sh
sudo apt install git-lfs
```
- Fedora:
```sh
sudo dnf install git-lfs
```
- Arch Linux:
```sh
sudo pacman -S git-lfs
```
2. Initialize Git LFS:
```sh
git lfs install
```
## Step 2: Install Python (Alternative: pyenv)
While Python 3.12.4 is required, it is recommended to use `pyenv` if you want to work with multiple Python versions or if you encounter errors while installing dependencies.
### Installing pyenv
#### macOS & Linux:
```sh
curl https://pyenv.run | bash
```
After installation, restart your terminal and install Python:
```sh
pyenv install 3.12.4
pyenv global 3.12.4
```
#### Windows:
Use [pyenv-win](https://github.com/pyenv-win/pyenv-win):
```sh
git clone https://github.com/pyenv-win/pyenv-win.git ~/.pyenv
setx PYENV "%USERPROFILE%\.pyenv"
setx PATH "%PYENV%\bin;%PYENV%\shims;%PATH%"
pyenv install 3.12.4
pyenv global 3.12.4
```
## Step 3: Clone the Repository
Clone the Astra project repository using Git:
```sh
git clone <repository_url>
cd astra
```
## Step 4: Install Dependencies
Install all required dependencies from the `requirements.txt` file:
```sh
pip install -r requirements.txt
```
## Step 5: Verify Installation
Ensure all dependencies are installed correctly by running:
```sh
python --version
pip list
```
## Step 6: Run the Application or Test the Model
You have two options to proceed:
### Option 1: Run the Gradio App
To open the Gradio app in your web browser and interact with the application, run:
```sh
python app.py
```
### Option 2: Test the Model with a Sample File
To test the fine-tuned model using a sample file, navigate to the root folder of the project and run the following command:
```sh
cd <root_folder>
python new_test_saved_finetuned_model.py \
-workspace_name "ratio_proportion_change3_2223/sch_largest_100-coded" \
-finetune_task "<finetune_task>" \
-test_dataset_path "../../../../fileHandler/selected_rows.txt" \
-finetuned_bert_classifier_checkpoint "ratio_proportion_change3_2223/sch_largest_100-coded/output/highGRschool10/bert_fine_tuned.model.ep42" \
-e 1 \
-b 1000
```
Replace `<finetune_task>` with the actual fine-tuning task value.
### Arguments
**`-workspace_name`**
- Description: The folder/workspace name where the project, dataset, and model outputs are organized.
- Example: `"ratio_proportion_change3_2223/sch_largest_100-coded"`
**`-finetune_task`**
- Description: Specifies which fine-tuning strategy was applied to the model.
- Options:
- **ASTRA-FT-HGR** β†’ Fine-tuned with 10% data from schools that have a **High Graduation Rate (HGR)**.
- **ASTRA-FT-FIRST10-WSKILLS**
- Checkpoint: `first10/bert_fine_tuned.model.first10%.wskills.ep24`
- Description: Fine-tuned with 10% of initial problems from both **HGR + LGR schools**, with **Prior Skills encoded** using **Bayesian Knowledge Tracing (BKT)**.
- **ASTRA-FT-FIRST10-WTIME**
- Checkpoint: `first10/bert_fine_tuned.model.first10%.wfaopttime.wttime.wttopttime.wttnoopttime.ep23`
- Description: Fine-tuned with 10% of initial problems from both **HGR + LGR schools**, using **temporal features** measuring student engagement in MATHia.
- **ASTRA-FT-FIRST10-WSKILLS_WTIME**
- Checkpoint: `first10/bert_fine_tuned.model.first10%.wskills.wfaopttime.wttime.wttopttime.wttnoopttime.ep40`
- Description: Fine-tuned with 10% of initial problems from both **HGR + LGR schools**, combining **Prior Skills (BKT) + temporal features**.
**`-test_dataset_path`**
- Description: Path to the test dataset file that you want to use for evaluation.
- Example: `"../../../../fileHandler/selected_rows.txt"`
**`-finetuned_bert_classifier_checkpoint`**
- Description: The path to the saved fine-tuned BERT model checkpoint (specific `.model.epXX` file).
- Example:
`"ratio_proportion_change3_2223/sch_largest_100-coded/output/highGRschool10/bert_fine_tuned.model.ep42"`
- Note: `ep42` means the checkpoint from **epoch 42** during training.
**`-e`**
- Description: Number of epochs to run during testing (or evaluation).
- Example: `-e 1` β†’ run evaluation once.
**`-b`**
- Description: Batch size for testing β€” determines how many test samples are processed together in each forward pass.
- Example: `-b 1000` β†’ each batch will contain **1000 examples**.
---
βœ… Your Astra project should now be fully set up and ready to use!