YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DistilBERT Model for Crop Recommendation Based on Environmental Parameters

This repository contains a fine-tuned DistilBERT model trained for crop recommendation using structured agricultural data. By converting numerical environmental features into text format, the model leverages transformer-based NLP techniques to classify the most suitable crop type.

🌾 Problem Statement

The goal is to recommend the best crop to cultivate based on parameters such as soil nutrients and weather conditions. Traditional ML models handle this as a tabular classification problem. Here, we explore the innovative approach of using NLP models (DistilBERT) on serialized tabular data.


πŸ“Š Dataset

  • Source: Crop Recommendation Dataset

  • Features:

    • N: Nitrogen content in soil
    • P: Phosphorus content in soil
    • K: Potassium content in soil
    • Temperature: in Celsius
    • Humidity: %
    • pH: Acidity of soil
    • Rainfall: mm
  • Target: Crop label (22 crop types)

The dataset is preprocessed by concatenating all numeric features into a single space-separated string, making it suitable for transformer-based tokenization.


🧠 Model Details

  • Architecture: DistilBERT
  • Tokenizer: DistilBertTokenizerFast
  • Model: DistilBertForSequenceClassification
  • Task Type: Multi-Class Classification (22 classes)

πŸ”§ Installation

pip install transformers datasets pandas scikit-learn torch

Loading the Model

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load model and tokenizer
model_path = "model_fp32_dir"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_path)
model = DistilBertForSequenceClassification.from_pretrained(model_path)

# Sample input
sample_text = "90 42 43 20.879744 82.002744 6.502985 202.935536"
inputs = tokenizer(sample_text, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
print("Predicted class index:", predicted_class)

πŸ“ˆ Performance Metrics

  • Accuracy: 0.7636
  • Precision: 0.7738
  • Recall: 0.7636
  • F1 Score: 0.7343

πŸ‹οΈ Fine-Tuning Details

πŸ“š Dataset

The dataset is sourced from the publicly available Crop Recommendation Dataset. It consists of structured features such as:

  • Nitrogen (N)
  • Phosphorus (P)
  • Potassium (K)
  • Temperature (Β°C)
  • Humidity (%)
  • pH
  • Rainfall (mm)

All numerical features were converted into a single textual input string to be used with the DistilBERT tokenizer. Labels were factorized into class indices for training.

The dataset was split using an 80/20 ratio for training and testing.


πŸ”§ Training Configuration

  • Epochs: 3
  • Batch size: 8
  • Learning rate: 2e-5
  • Evaluation strategy: epoch
  • Model Base: DistilBERT (distilbert-base-uncased)
  • Framework: Hugging Face Transformers + PyTorch

πŸ”„ Quantization

Post-training quantization was applied using PyTorch’s half() precision (FP16).
This reduces the model size and speeds up inference with minimal impact on performance.

The quantized model can be loaded with:

model = DistilBertForSequenceClassification.from_pretrained("quantized_model_fp16", torch_dtype=torch.float16)

Repository Structure

.
β”œβ”€β”€ quantized-model/               # Contains the quantized model files
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ vocab.txt
β”‚   └── special_tokens_map.json
β”œβ”€β”€ README.md                      # Model documentation

Limitations

  • Uses text conversion of tabular data, which may miss deeper feature interactions.
  • Trained on a specific dataset; may not generalize to different regions or conditions.
  • FP16 quantization may slightly reduce accuracy in rare cases.

Contributing

Feel free to open issues or submit pull requests to improve the model or documentation.

Downloads last month
3
Safetensors
Model size
67M params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support