--- language: en license: other tags: - finance - risk-relation - retrieval - encoder - feature-extraction - stock-prediction pipeline_tag: feature-extraction --- # Financial Risk Identification through Dual-view Adaptation — Encoder This repository hosts the pretrained encoder from the work **“Financial Risk Identification through Dual-view Adaptation.”** The model is designed to uncover **inter-firm risk relations** from financial text, supporting downstream tasks such as **retrieval**, **relation mining**, and **stock-signal experiments** where relation strength acts as a feature. > **Files** > - `pytorch_model.safetensors` — model weights > - `config.json` — model configuration > - `README.md` (this file) --- ## ✨ What’s special (Dual-view Adaptation) The model aligns two complementary “views” of firm relations and adapts them during training: - **Lexical view (`lex`)** — focuses on token/phrase-level and domain terms common in 10-K and financial news. - **Temporal view (`time`)** — encourages stability/consistency of relations across reporting periods and evolving events. A **two-view combination (“Best”)** integrates both signals and yields stronger retrieval quality and more stable risk-relation estimates. Ablations (`lex`, `time`) are also supported for analysis. --- ## 🔧 Intended Use - **Feature extraction / sentence embeddings** for paragraphs, sections, or documents in financial filings. - **Retrieval & ranking**: compute similarities between queries (e.g., “supply chain risk for X”) and candidate passages. - **Risk-relation estimation**: aggregate cross-document similarities to produce pairwise firm relation scores used in downstream analytics. > ⚠️ Not a generative LLM. Use it as an **encoder** (feature extractor). --- ## 🚀 Quickstart (Transformers) ```python import torch from transformers import AutoTokenizer, AutoModel MODEL_ID = "william0816/Dual_View_Financial_Encoder" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True) model = AutoModel.from_pretrained(MODEL_ID) def mean_pool(last_hidden_state, attention_mask): # Mean-pool w.r.t. the attention mask mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state) summed = (last_hidden_state * mask).sum(dim=1) counts = torch.clamp(mask.sum(dim=1), min=1e-9) return summed / counts texts = [ "The company faces supplier concentration risk due to a single-source vendor.", "Management reported foreign exchange exposure impacting Q4 margins." ] enc = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") with torch.no_grad(): outputs = model(**enc) embeddings = mean_pool(outputs.last_hidden_state, enc["attention_mask"]) # Cosine similarity for retrieval emb_norm = torch.nn.functional.normalize(embeddings, p=2, dim=1) similarity = emb_norm @ emb_norm.T print(similarity) ``` ## 🖇️ Citation If you use this model or the dual-view methodology, please cite: ```bibtex @misc{financial_risk_dualview_2025, title = {Financial Risk Identification through Dual-view Adaptation}, author = {Chiu, Wei-Ning and collaborators}, year = {2025}, note = {Preprint/Project}, howpublished = {\url{https://huggingface.co/william0816/Dual_View_Financial_Encoder}} }