Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -13,17 +13,17 @@ tags: | |
| 13 | 
             
            - Methodology
         | 
| 14 | 
             
            ---
         | 
| 15 |  | 
| 16 | 
            -
            #  | 
| 17 |  | 
| 18 | 
            -
            ## Welcome to the  | 
| 19 |  | 
| 20 | 
            -
            Get ready to supercharge your scientific research with the ** | 
| 21 |  | 
| 22 | 
             
            ## Model Overview
         | 
| 23 |  | 
| 24 | 
             
            The NexaSci family is a 110 million to 2.2 billion parameter architecture that uses a **Semantic Router** to direct queries to domain-specific expert modules (Physics, Biology, Materials Science). It’s optimized for resource-constrained environments, leveraging advanced training strategies, hardware optimizations, and techniques like reinforcement learning and sparse attention. Below are the current and planned models:
         | 
| 25 |  | 
| 26 | 
            -
            ### 1. NexaSci-1 (Still working on this Indefinite timeline)
         | 
| 27 | 
             
            - **Parameters**: ~110 million
         | 
| 28 | 
             
            - **Purpose**: Generates hypotheses and methodological scaffolding for scientific tasks in physics, biology, and materials science.
         | 
| 29 | 
             
            - **Architecture**:
         | 
| @@ -32,8 +32,8 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u | |
| 32 | 
             
              - **Inference & Validation Pipeline**: Aggregates expert outputs and ensures consistency.
         | 
| 33 | 
             
              - **Knowledge Feedback Loop**: Refines routing using reinforcement learning.
         | 
| 34 | 
             
            - **Training**:
         | 
| 35 | 
            -
              - Pretrained on ~ | 
| 36 | 
            -
              - Fine-tuned with QLoRA on  | 
| 37 | 
             
              - Uses AzureSky Optimizer (Stochastic Approximation + Adam hybrid).
         | 
| 38 | 
             
            - **Use Cases**:
         | 
| 39 | 
             
              - Generate plausible hypotheses (e.g., new material properties).
         | 
| @@ -49,7 +49,7 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u | |
| 49 | 
             
              - Integrates with expert modules for structured, logical outputs.
         | 
| 50 | 
             
            - **Training**:
         | 
| 51 | 
             
              - Trained in three stages: Easy (basic logic), Moderate (complex tasks), Hard (advanced reasoning).
         | 
| 52 | 
            -
              - Uses ~ | 
| 53 | 
             
              - Employs AzureSky Optimizer with reinforcement learning fine-tuning.
         | 
| 54 | 
             
            - **Use Cases**:
         | 
| 55 | 
             
              - Solve multi-step physics problems (e.g., astrophysics simulations).
         | 
| @@ -64,10 +64,10 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u | |
| 64 | 
             
              - Includes a **Longform Context Manager** to chunk inputs while preserving semantic coherence.
         | 
| 65 | 
             
              - Scales parameters using mixed precision training and gradient checkpointing.
         | 
| 66 | 
             
            - **Training**:
         | 
| 67 | 
            -
              - Trained on ~ | 
| 68 | 
             
              - Uses AzureSky Optimizer with mixed precision (FP16/BF16) and gradient checkpointing.
         | 
| 69 | 
             
            - **Use Cases**:
         | 
| 70 | 
            -
              - Summarize or analyze long scientific papers (e.g.,  | 
| 71 | 
             
              - Generate hypotheses from extended contexts (e.g., patent methods).
         | 
| 72 | 
             
              - Support multi-query tasks requiring deep document understanding.
         | 
| 73 |  | 
| @@ -81,10 +81,8 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u | |
| 81 | 
             
            The NexaSci family is trained on a **tiered token strategy** to maximize efficiency and domain specificity, as outlined in the architecture document:
         | 
| 82 |  | 
| 83 | 
             
            - **Warm Start Corpus** (100M tokens): General language understanding from FineWeb-Edu, OpenWebMath, Wikipedia, and Aristo Science Questions.
         | 
| 84 | 
            -
            - **Scientific Pretraining Corpus** ( | 
| 85 | 
            -
            - **Instruction Fine-Tune Dataset** ( | 
| 86 | 
            -
            - **Reasoning Curriculum Dataset** (50-75M tokens, CoT only): SciBench, OpenBookQA, and others for step-by-step reasoning.
         | 
| 87 | 
            -
            - **Long-Context Corpus** (100-150M tokens, UltraMAX only): Full arXiv papers, NIH grants, and USPTO patents for long-context alignment.
         | 
| 88 |  | 
| 89 | 
             
            **Token Efficiency Strategies**:
         | 
| 90 | 
             
            - Entropy scoring to remove low-information samples.
         | 
| @@ -93,14 +91,10 @@ The NexaSci family is trained on a **tiered token strategy** to maximize efficie | |
| 93 | 
             
            - Routing and filtering to activate only relevant expert paths.
         | 
| 94 |  | 
| 95 | 
             
            **Total Token Budget**:
         | 
| 96 | 
            -
             | 
| 97 | 
            -
            - NEXA-CoT: ~425-500M  tokens
         | 
| 98 | 
            -
            - NEXA-Ultramax: ~600-650M tokens
         | 
| 99 |  | 
| 100 | 
             
            **Hardware**:
         | 
| 101 | 
            -
             | 
| 102 | 
            -
            - GPUs: Dual NVIDIA T4 GPUs (cloud-hosted) at 90%+ capacity.
         | 
| 103 | 
            -
            - Performance: 47-50 petaflops with an optimized CPU-GPU pipeline.
         | 
| 104 |  | 
| 105 | 
             
            **Optimization Techniques**:
         | 
| 106 | 
             
            - Sparse attention, mixed precision training, gradient checkpointing.
         | 
|  | |
| 13 | 
             
            - Methodology
         | 
| 14 | 
             
            ---
         | 
| 15 |  | 
| 16 | 
            +
            # NexaSci Family of Models
         | 
| 17 |  | 
| 18 | 
            +
            ## Welcome to the NexaSci Repository!
         | 
| 19 |  | 
| 20 | 
            +
            Get ready to supercharge your scientific research with the **Nexasci family of models**! This Hugging Face repository hosts a powerful suite of Mixture-of-Experts (MoE) models designed to generate hypotheses and methodologies across **physics**, **biology**, and **materials science**. Built with efficiency and scalability in mind, the NexaSci family includes the baseline **NexaSci**, the reasoning-enhanced **NEXASci-1-CoT**, and the long-context powerhouse **NEXA-1-Max**. Whether you’re a researcher tackling complex STEM problems, a data scientist exploring scientific ML, or a student learning about domain-specific AI, this repository is your go-to resource for cutting-edge scientific computation.
         | 
| 21 |  | 
| 22 | 
             
            ## Model Overview
         | 
| 23 |  | 
| 24 | 
             
            The NexaSci family is a 110 million to 2.2 billion parameter architecture that uses a **Semantic Router** to direct queries to domain-specific expert modules (Physics, Biology, Materials Science). It’s optimized for resource-constrained environments, leveraging advanced training strategies, hardware optimizations, and techniques like reinforcement learning and sparse attention. Below are the current and planned models:
         | 
| 25 |  | 
| 26 | 
            +
            ### 1. NexaSci-1-Mini (Still working on this Indefinite timeline)
         | 
| 27 | 
             
            - **Parameters**: ~110 million
         | 
| 28 | 
             
            - **Purpose**: Generates hypotheses and methodological scaffolding for scientific tasks in physics, biology, and materials science.
         | 
| 29 | 
             
            - **Architecture**:
         | 
|  | |
| 32 | 
             
              - **Inference & Validation Pipeline**: Aggregates expert outputs and ensures consistency.
         | 
| 33 | 
             
              - **Knowledge Feedback Loop**: Refines routing using reinforcement learning.
         | 
| 34 | 
             
            - **Training**:
         | 
| 35 | 
            +
              - Pretrained on ~2B tokens from arXiv, PubMed, and other scientific corpora.
         | 
| 36 | 
            +
              - Fine-tuned with QLoRA on 500k instruction-style samples.
         | 
| 37 | 
             
              - Uses AzureSky Optimizer (Stochastic Approximation + Adam hybrid).
         | 
| 38 | 
             
            - **Use Cases**:
         | 
| 39 | 
             
              - Generate plausible hypotheses (e.g., new material properties).
         | 
|  | |
| 49 | 
             
              - Integrates with expert modules for structured, logical outputs.
         | 
| 50 | 
             
            - **Training**:
         | 
| 51 | 
             
              - Trained in three stages: Easy (basic logic), Moderate (complex tasks), Hard (advanced reasoning).
         | 
| 52 | 
            +
              - Uses ~2B tokens
         | 
| 53 | 
             
              - Employs AzureSky Optimizer with reinforcement learning fine-tuning.
         | 
| 54 | 
             
            - **Use Cases**:
         | 
| 55 | 
             
              - Solve multi-step physics problems (e.g., astrophysics simulations).
         | 
|  | |
| 64 | 
             
              - Includes a **Longform Context Manager** to chunk inputs while preserving semantic coherence.
         | 
| 65 | 
             
              - Scales parameters using mixed precision training and gradient checkpointing.
         | 
| 66 | 
             
            - **Training**:
         | 
| 67 | 
            +
              - Trained on ~2B tokens, including a Long-Context Corpus of full arXiv papers and NIH grants.
         | 
| 68 | 
             
              - Uses AzureSky Optimizer with mixed precision (FP16/BF16) and gradient checkpointing.
         | 
| 69 | 
             
            - **Use Cases**:
         | 
| 70 | 
            +
              - Summarize or analyze long scientific papers (e.g., 120K-token preprints).
         | 
| 71 | 
             
              - Generate hypotheses from extended contexts (e.g., patent methods).
         | 
| 72 | 
             
              - Support multi-query tasks requiring deep document understanding.
         | 
| 73 |  | 
|  | |
| 81 | 
             
            The NexaSci family is trained on a **tiered token strategy** to maximize efficiency and domain specificity, as outlined in the architecture document:
         | 
| 82 |  | 
| 83 | 
             
            - **Warm Start Corpus** (100M tokens): General language understanding from FineWeb-Edu, OpenWebMath, Wikipedia, and Aristo Science Questions.
         | 
| 84 | 
            +
            - **Scientific Pretraining Corpus** (1-2B tokens): Domain-specific data from arXiv (physics), PubMed/BioRxiv (biology), and Materials Project/ChemRxiv (materials science).
         | 
| 85 | 
            +
            - **Instruction Fine-Tune Dataset** (500K tokens): 5k high-quality instruction-style samples for hypothesis and method generation.
         | 
|  | |
|  | |
| 86 |  | 
| 87 | 
             
            **Token Efficiency Strategies**:
         | 
| 88 | 
             
            - Entropy scoring to remove low-information samples.
         | 
|  | |
| 91 | 
             
            - Routing and filtering to activate only relevant expert paths.
         | 
| 92 |  | 
| 93 | 
             
            **Total Token Budget**:
         | 
| 94 | 
            +
            For all models ~2B tokens 
         | 
|  | |
|  | |
| 95 |  | 
| 96 | 
             
            **Hardware**:
         | 
| 97 | 
            +
            Currently limited here still looking and hunting
         | 
|  | |
|  | |
| 98 |  | 
| 99 | 
             
            **Optimization Techniques**:
         | 
| 100 | 
             
            - Sparse attention, mixed precision training, gradient checkpointing.
         | 
