File size: 4,226 Bytes

aef0fda
 
d181b55
 
 
 
 
da9ea0c
d181b55
5cff37b
19f8acb
5b3503c
771f88d
5b3503c
8372fdc
5cff37b
f1aac0e
6da9c95
634b1ff
74135d6
 
5cff37b
3fec28b
462f2b7
da9ea0c
462f2b7
3fec28b
bfa2c07
3fec28b
462f2b7
0efa5e3
3fec28b
0efa5e3
 
3fec28b
 
 
 
 
 
61f41b6
 
 
ea2c349
61f41b6
ea2c349
61f41b6
 
 
ea2c349
61f41b6
 
 
 
 
 
 
 
 
 
 
 
ea2c349
61f41b6
ea2c349
61f41b6
 
 
ea2c349
61f41b6
 
 
 
 
 
bfa2c07
462f2b7

---
license: mit
language: ja
tags:
  - luke
  - sentiment-analysis
  - japanese
---

# このモデルはLuke-japanese-base-liteをファインチューニングしたものです。
このモデルを用いることで文章がポジティブかネガティブかをLUKEを用いて分類することができます。
夏目漱石さんの文章（こころ、坊ちゃん、三四郎、etc）を日本語極性辞書
（　http://www.cl.ecei.tohoku.ac.jp/Open_Resources-Japanese_Sentiment_Polarity_Dictionary.html　）
を用いてポジティブ・ネガティブ判定したものを教師データとしてモデルの学習を行いました。
使用した教師データから、口語より文語に対して高い正答率となることが期待されます。

# This model is based on Luke-japanese-base-lite
This model is fine-tuned model which besed on studio-ousia/Luke-japanese-base-lite.
This could be able to distinguish between positive and negative content.
This model was fine-tuned by using Natsume Souseki's documents.
For example Kokoro, Bocchan, Sanshiro and so on...

# what is Luke?　Lukeとは？[1] 
LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transformer. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. LUKE adopts an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores.

LUKE achieves state-of-the-art results on five popular NLP benchmarks including SQuAD v1.1 (extractive question answering), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), TACRED (relation classification), and Open Entity (entity typing).
luke-japaneseは、単語とエンティティの知識拡張型訓練済み Transformer モデルLUKEの日本語版です。LUKE は単語とエンティティを独立したトークンとして扱い、これらの文脈を考慮した表現を出力します。詳細については、GitHub リポジトリを参照してください。

このモデルは、通常の NLP タスクでは使われない Wikipedia エンティティのエンベディングを含んでいます。単語の入力のみを使うタスクには、lite versionを使用してください。
# how to use 使い方
出力としてはpre.logitsが得られます。
pre.logitsはtensor[[x, y]]というテンソルになっています。
num = SOFTMAX(pre.logits)にすることで、num[0]がネガティブである確率、num[1]がポジティブである確率を表すようになります。

we could get "pre.logits" as the output.
"pre.logits" is the shape like tensor[[x, y]].
"num = SOFTMAX(pre.logits)"
num[0] will show the probability of negative, num[1] will show the probability of positive.


-------------------------------------------------------------

import torch

from transformers import MLukeTokenizer

from torch import nn 

tokenizer = MLukeTokenizer.from_pretrained('studio-ousia/luke-japanese-base-lite')

model = torch.load('C:\\[My_luke_model_pn.pthのあるディレクトリ]\\My_luke_model_pn.pth')

text=input()

encoded_dict = tokenizer.encode_plus(
                        text,                     
                        return_attention_mask = True,   # Attention maksの作成
                        return_tensors = 'pt',     #  Pytorch tensorsで返す
                )

pre = model(encoded_dict['input_ids'], token_type_ids=None, attention_mask=encoded_dict['attention_mask'])
SOFTMAX=nn.Softmax(dim=0)

num=SOFTMAX(pre.logits[0])

if num[1]>0.5:
    print(str(num[1]))
    print('ポジティブ')

else:
    print(str(num[1]))
    print('ネガティブ')


-------------------------------------------------------------

# Citation
[1]@inproceedings{yamada2020luke,
  title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
  author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
  booktitle={EMNLP},
  year={2020}
}