LLMSIEM/logem-win

LLMSIEM/logem-win is a specialized language model fine-tuned specifically for Windows Event Log (EVTX) analysis and field extraction. Built for Windows-centric security operations and SIEM workflows.

Model Details

Model Description

LLMSIEM/logem-win is a domain-specific fine-tuned version of Qwen3-0.6B, optimized exclusively for parsing and extracting structured data from Windows XML Event Logs (EVTX format). This model excels at handling complex nested XML structures found in Windows Security, System, and Application event logs.

Developed by: [Hassan Shehata]
Model type: Causal Language Model (Fine-tuned for Windows EVTX)
Language(s): English
License: Apache 2.0
Finetuned from model: Qwen/Qwen3-0.6B
Specialization: Windows XML Event Logs (EVTX)
Model size: ~1.2 GB (FP16), ~396 MB (Q4_K_M quantized)
Parameters: 0.6B

Model Sources

General logem model: LLMSIEM/logem
Research Series: [LinkedIn/Blog Series Link]

Performance Highlights

🪟 Windows EVTX Specialist with superior performance on Windows event parsing:

69.2% perfect matches on Windows EVTX test cases
0.830 F1 score - competitive with general security models
0.846 recall - excellent field detection capability
1.34s average response time for complex XML parsing
Handles nested XML structures in Windows events

Windows Event Log Coverage

Supported Event Categories

🔐 Security Events (Channel: Security)

Logon/Logoff Events (4624, 4634, 4647)
Account Management (4720, 4722, 4726, 4728)
Privilege Use (4672, 4673, 4674)
Process and Object Access (4656, 4658, 4688)
Policy Changes (4719, 4739)
Authentication Events (4768, 4769, 4771)

⚙️ System Events (Channel: System)

Service Control Manager events
System startup and shutdown events
Driver installation and loading
Hardware events and errors
Time service synchronization

📱 Application Events (Channel: Application)

Application errors and crashes
Software installation events
MSI installer logs
.NET runtime events
Custom application event logs

Uses

Direct Use

Ideal for Windows security teams and SIEM engineers who need to:

Parse Windows Event Logs (EVTX format)
Extract structured fields from Windows security events
Automate Windows event log analysis
Normalize Windows events for SIEM ingestion
Analyze Domain Controller and Active Directory events

Example Use Cases

# Example: Parse Windows Security Event 4624 (Successful Logon)
input_text = """Extract fields from this Windows Security Event:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <s>
    <Provider Name='Microsoft-Windows-Security-Auditing'/>
    <EventID>4624</EventID>
    <TimeCreated SystemTime='2024-01-15T10:30:45.123456Z'/>
    <Computer>DC01.contoso.com</Computer>
  </s>
  <EventData>
    <Data Name='SubjectUserName'>DC01$</Data>
    <Data Name='TargetUserName'>john.doe</Data>
    <Data Name='LogonType'>2</Data>
    <Data Name='IpAddress'>192.168.1.100</Data>
  </EventData>
</Event>"""

# Model will output structured JSON:
# {
#   "event_id": "4624",
#   "event_type": "successful_logon",
#   "timestamp": "2024-01-15T10:30:45.123456Z",
#   "computer": "DC01.contoso.com",
#   "target_user": "john.doe",
#   "logon_type": "2",
#   "source_ip": "192.168.1.100"
# }

Downstream Use

Windows SIEM Integration: Splunk, Microsoft Sentinel, QRadar
Active Directory Monitoring: Domain controller event analysis
Incident Response: Automated Windows event triage
Compliance Reporting: PCI DSS, SOX, HIPAA Windows event parsing
Threat Hunting: Windows-specific IOC extraction
SOAR Workflows: Windows event enrichment and normalization

Out-of-Scope Use

Non-Windows log formats (use LLMSIEM/logem instead)
Unix/Linux system logs
Network device logs (firewalls, routers)
Web server logs
General text generation tasks

Model Selection Guide

Use Case	Recommended Model
Windows-only environment	logem-win
Mixed Windows/Linux environment	logem (general) + logem-win
Pure Linux/Unix environment	logem (general)
Network security focus	logem (general)
Domain Controller monitoring	logem-win

Bias, Risks, and Limitations

Technical Limitations

Windows-specific: Only optimized for Windows EVTX format
XML complexity: May struggle with heavily nested or malformed XML
Custom events: Performance may vary on non-standard Windows events
Processing time: Slower than general model due to XML complexity (1.34s avg)

Security Considerations

Windows event expertise required: Users should understand Windows event log structure
XML validation needed: Malformed EVTX input may produce unexpected results
Context dependency: Some Windows events require additional context for full interpretation

Recommendations

Validate XML structure before processing with the model
Combine with general logem for comprehensive multi-platform coverage
Implement fallbacks for unsupported or malformed EVTX entries
Use alongside Windows expertise for production security operations

How to Get Started with the Model

Using with Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("LLMSIEM/logem-win")
model = AutoModelForCausalLM.from_pretrained("LLMSIEM/logem-win")

# Example: Parse Windows Security Event
prompt = """Extract fields from this Windows Event Log:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <EventData>
    <Data Name='SubjectUserName'>alice.smith</Data>
    <Data Name='NewProcessName'>C:\\Windows\\System32\\cmd.exe</Data>
  </EventData>
</Event>

Extract the following fields as JSON:"""

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_length=1024,
        temperature=0.1,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Using with Ollama (Recommended for Production)

# Pull the model
ollama pull LLMSIEM/logem-win

# Process Windows Event Log
ollama run LLMSIEM/logem-win "Extract fields from Windows Event ID 4625 failed logon attempt..."

Training Details

Training Data

The model was fine-tuned on a comprehensive dataset of Windows Event Logs including:

Security Events: Authentication, account management, privilege escalation
System Events: Service control, startup/shutdown, driver events
Application Events: Application crashes, software installation, custom logs

Training Procedure

Base model: Qwen3-0.6B
Training regime: Mixed precision (fp16)
Specialization focus: Windows XML Event Log parsing
Fine-tuning approach: Supervised learning on EVTX-to-JSON extraction

Evaluation

Results

Metric	Score
Perfect Matches	9/13 (69.2%)
Average F1 Score	0.830
Average Precision	0.817
Average Recall	0.846
Average Response Time	1.34s
Complete Failures	2 (complex nested XML)

Comparison with General Model

Model	Perfect Matches	F1 Score	Speed	Use Case
logem-win	69.2%	0.830	1.34s	Windows EVTX
logem (general)	66.7%	0.833	1.00s	Multi-platform

Citation

@misc{llmsiem-logem-win-2025,
  title={LLMSIEM/logem-win: A Windows EVTX Specialized Language Model for Security Log Analysis},
  author=Hassan Shehata,
  year={2025},
  url={https://huggingface.co/LLMSIEM/logem-win},
  note={Fine-tuned from Qwen3-0.6B for Windows Event Log parsing}
}

Model Card Authors

[Hassan Shehata/LLMSIEM]

Model Card Contact

For questions about this model:

Email: [[email protected]]
LinkedIn: [https://www.linkedin.com/in/hassan-shehata-503272172/]
GitHub: [Your GitHub Profile]

Part of the LLMSIEM model family. For general security log parsing, see LLMSIEM/logem. For comprehensive Windows security operations, deploy both models together.

HassanShehata
/

logem-win