Model Card for Model ID
Model Details
Model Description
This model is a fine-tuned version of distilbert-base-uncased
from Hugging Face's ๐ค Transformers library. It has been trained for the task of Named Entity Recognition (NER) to process resumes and extract skills, names, and other relevant entities.
- Developed by: Chris Dennis Pallan
- Funded by: Dr.Bijimol TK,AJCE
- Shared by: Chris Dennis Pallan
- Model type: Transformer-based model (DistilBERT)
- Language(s) (NLP): English,Malayalam
- License: Apache 2.0
- Finetuned from model:
distilbert-base-uncased
Model Sources
- Repository: Hugging Face Model Repository
Uses
Direct Use
This model can be used directly for Named Entity Recognition tasks, such as identifying skills, names, and other entities in resumes or similar documents.
Downstream Use
The model can be fine-tuned further for domain-specific NER tasks, such as extracting medical or legal entities.
Out-of-Scope Use
The model is not suitable for tasks outside of text-based entity recognition or for languages other than English.
Training Details
Training Data
The model was fine-tuned on a dataset designed for resume parsing and skill extraction. The dataset includes labeled examples for entities such as SKILL
, NAME
, LOCATION
, and ORGANIZATION
.
The version one model ran for 3 epochs. further model versions will see an improvement in model metrics.
Training Procedure
Preprocessing
- Tokenization was performed using the
DistilBERTTokenizer
from Hugging Face. - The dataset was cleaned and preprocessed to align entity spans and convert data into a format compatible with Hugging Face's token classification pipeline.
Training Hyperparameters
- Batch size: 16
- Learning rate: 5e-5
- Epochs: 3
- Optimizer: AdamW
- Mixed Precision Training: Enabled (fp16)
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a test split of the resume dataset.
Results
The model achieved an F1 score of [Add value] on the test set, demonstrating strong performance for Named Entity Recognition tasks.
Environmental Impact
- Hardware Type: NVIDIA Tesla V100 GPU
- Hours used: Approximately [Add value from notebook] hours
- Cloud Provider: AWS
- Compute Region: US East (N. Virginia)
- Carbon Emitted: Estimated using the Machine Learning Impact calculator.
How to Get Started with the Model
from transformers import pipeline
# Load the fine-tuned model
ner_pipeline = pipeline("ner", model="chrisdepallan/ner-skills-distilbert", tokenizer="chrisdepallan/ner-skills-distilbert")
# Example usage
text= "John Doe Senior Software Engineer โ Tech Solutions Inc. New York, NY - Email me on Indeed: indeed.com/r/John-Doe/1234567890abcdef Passionate software engineer with 6+ years of experience in full-stack development, specializing in web applications and cloud technologies. Seeking a challenging role to leverage my expertise in modern frameworks and system design. WORK EXPERIENCE Senior Software Engineer Tech Solutions Inc. โ New York, NY โ March 2020 to Present - Leading a team of developers to build scalable web applications. - Designed and optimized RESTful APIs using Node.js and Python. - Developed microservices architecture to improve system performance. Software Engineer InnovateX Corp. โ New York, NY โ June 2017 to February 2020 - Built and maintained enterprise web applications using React.js and Django. - Integrated cloud services (AWS, Azure) for seamless deployment. - Implemented CI/CD pipelines using Jenkins and Docker. EDUCATION M.S. in Computer Science Columbia University โ New York, NY B.S. in Computer Science University of California, Berkeley โ Berkeley, CA SKILLS Python (6 years), Java (5 years), JavaScript (6 years), React.js (4 years), AWS (3 years) ADDITIONAL INFORMATION Technical skills: Languages: Python, Java, JavaScript, TypeScript, C++ Web Development: React.js, Angular, Node.js, Django, Flask Databases: PostgreSQL, MySQL, MongoDB Cloud Technologies: AWS (EC2, S3, Lambda), Azure, GCP DevOps: Docker, Kubernetes, Terraform, Jenkins Version Control: Git, GitHub, Bitbucket Testing Frameworks: Selenium, PyTest, Jest https://www.indeed.com/r/John-Doe/1234567890abcdef?isid=rex-download&ikw=download-top&co=US https://www.indeed.com/r/John-Doe/1234567890abcdef?isid=rex-download&ikw=download-top&co=US Certifications: AWS Certified Solutions Architect โ Associate Google Cloud Professional Developer Project Details: 'E-Commerce Platform Development' (Client: RetailX Inc.) Front-End: React.js, Redux Back-End: Node.js, Express.js Database: PostgreSQL Duration: 8 months Description: Designed and developed a fully functional e-commerce website with user authentication, payment gateway integration, and order tracking. 'AI-Powered Chatbot for Customer Support' (Company Project โ Tech Solutions Inc.) Tools: Python, TensorFlow, Rasa NLP Duration: 6 months Description: Developed an AI-driven chatbot to enhance customer support, reducing response time by 40%. 'Inventory Management System' (B.S. Final Year Project) Language: Java Database: MySQL Operating System: Windows 10 The Inventory Management System is designed to automate stock management and reduce errors in manual tracking."
entities = ner_pipeline(text)
print(entities)
Example Usage: Predicting Named Entities
You can use the predict_entities
function to predict named entities from a given text. Below is an example of how to use it:
import torch
# Define the function
def predict_entities(text):
"""Predict named entities from the input text"""
inputs = tokenizer(
text,
return_tensors="pt", # PyTorch tensors
truncation=True, # Truncate if longer than max length
padding="max_length", # Pad sequences
max_length=512 # Max sequence length
)
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
# Get predicted class indices
logits = outputs.logits
predictions = torch.argmax(logits, dim=2).numpy()[0]
# Convert token IDs to actual words
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
# Convert prediction indices to label names
predicted_labels = [id2tag[idx] for idx in predictions]
# Print results
result = list(zip(tokens, predicted_labels))
for token, label in result:
print(f"Token: {token} | Predicted Label: {label}")
return result
Example Usage: Extracting Skills from a Resume
The predict_entities
function can also be used to extract specific entities, such as skills, from a given text. Below is an example of how to use it to extract skills:
import torch
# Define the function
def predict_entities(text):
"""Predict named entities from the input text and extract skills."""
inputs = tokenizer(
text,
return_tensors="pt", # PyTorch tensors
truncation=True,
padding="max_length",
max_length=512
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=2).numpy()[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [id2tag[idx] for idx in predictions]
# Extract only tokens labeled as "Skills"
skills = [token for token, label in zip(tokens, predicted_labels) if label == "Skills"]
print("Extracted Skills:", " ".join(skills))
return skills
# Example Usage
skills = predict_entities(text)
print("Skills:", skills)
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 5
Model tree for chrisdepallan/ner-skills-distilbert
Base model
distilbert/distilbert-base-uncased