astra / Astra Project Setup Instructions.md
suryadev1's picture
Update Astra Project Setup Instructions.md
d58e19e verified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

Astra Project Setup Instructions

Prerequisites

Make sure you have the following installed before proceeding:

  • Python 3.12.4
  • Git
  • Git Large File Storage (LFS)

Step 1: Install Git LFS

Git LFS (Large File Storage) is required for managing large files in the Astra project. Follow these steps to install Git LFS:

Windows

  1. Download the Git LFS installer from Git LFS Releases.
  2. Run the installer and follow the setup instructions.
  3. Open a terminal (Command Prompt or PowerShell) and run:
    git lfs install
    

macOS

  1. Install Git LFS using Homebrew:
    brew install git-lfs
    
  2. Initialize Git LFS:
    git lfs install
    

Linux

  1. Install Git LFS using your package manager:
    • Debian/Ubuntu:
      sudo apt install git-lfs
      
    • Fedora:
      sudo dnf install git-lfs
      
    • Arch Linux:
      sudo pacman -S git-lfs
      
  2. Initialize Git LFS:
    git lfs install
    

Step 2: Install Python (Alternative: pyenv)

While Python 3.12.4 is required, it is recommended to use pyenv if you want to work with multiple Python versions or if you encounter errors while installing dependencies.

Installing pyenv

macOS & Linux:

curl https://pyenv.run | bash

After installation, restart your terminal and install Python:

pyenv install 3.12.4
pyenv global 3.12.4

Windows:

Use pyenv-win:

git clone https://github.com/pyenv-win/pyenv-win.git ~/.pyenv
setx PYENV "%USERPROFILE%\.pyenv"
setx PATH "%PYENV%\bin;%PYENV%\shims;%PATH%"
pyenv install 3.12.4
pyenv global 3.12.4

Step 3: Clone the Repository

Clone the Astra project repository using Git:

git clone <repository_url>
cd astra

Step 4: Install Dependencies

Install all required dependencies from the requirements.txt file:

pip install -r requirements.txt

Step 5: Verify Installation

Ensure all dependencies are installed correctly by running:

python --version
pip list

Step 6: Run the Application or Test the Model

You have two options to proceed:

Option 1: Run the Gradio App

To open the Gradio app in your web browser and interact with the application, run:

python app.py

Option 2: Test the Model with a Sample File

To test the fine-tuned model using a sample file, navigate to the root folder of the project and run the following command:

cd <root_folder>
python new_test_saved_finetuned_model.py \
    -workspace_name "ratio_proportion_change3_2223/sch_largest_100-coded" \
    -finetune_task "<finetune_task>" \
    -test_dataset_path "../../../../fileHandler/selected_rows.txt" \
    -finetuned_bert_classifier_checkpoint "ratio_proportion_change3_2223/sch_largest_100-coded/output/highGRschool10/bert_fine_tuned.model.ep42" \
    -e 1 \
    -b 1000

Replace <finetune_task> with the actual fine-tuning task value.

Arguments

-workspace_name

  • Description: The folder/workspace name where the project, dataset, and model outputs are organized.
  • Example: "ratio_proportion_change3_2223/sch_largest_100-coded"

-finetune_task

  • Description: Specifies which fine-tuning strategy was applied to the model.
  • Options:
    • ASTRA-FT-HGR β†’ Fine-tuned with 10% data from schools that have a High Graduation Rate (HGR).

    • ASTRA-FT-FIRST10-WSKILLS

      • Checkpoint: first10/bert_fine_tuned.model.first10%.wskills.ep24
      • Description: Fine-tuned with 10% of initial problems from both HGR + LGR schools, with Prior Skills encoded using Bayesian Knowledge Tracing (BKT).
    • ASTRA-FT-FIRST10-WTIME

      • Checkpoint: first10/bert_fine_tuned.model.first10%.wfaopttime.wttime.wttopttime.wttnoopttime.ep23
      • Description: Fine-tuned with 10% of initial problems from both HGR + LGR schools, using temporal features measuring student engagement in MATHia.
    • ASTRA-FT-FIRST10-WSKILLS_WTIME

      • Checkpoint: first10/bert_fine_tuned.model.first10%.wskills.wfaopttime.wttime.wttopttime.wttnoopttime.ep40
      • Description: Fine-tuned with 10% of initial problems from both HGR + LGR schools, combining Prior Skills (BKT) + temporal features.

-test_dataset_path

  • Description: Path to the test dataset file that you want to use for evaluation.
  • Example: "../../../../fileHandler/selected_rows.txt"

-finetuned_bert_classifier_checkpoint

  • Description: The path to the saved fine-tuned BERT model checkpoint (specific .model.epXX file).
  • Example:
    "ratio_proportion_change3_2223/sch_largest_100-coded/output/highGRschool10/bert_fine_tuned.model.ep42"
  • Note: ep42 means the checkpoint from epoch 42 during training.

-e

  • Description: Number of epochs to run during testing (or evaluation).
  • Example: -e 1 β†’ run evaluation once.

-b

  • Description: Batch size for testing β€” determines how many test samples are processed together in each forward pass.
  • Example: -b 1000 β†’ each batch will contain 1000 examples.

βœ… Your Astra project should now be fully set up and ready to use!