python3---情感分析（基于词典中文）

article/2025/7/19 9:13:36

写在前面：
现有的情感分析比较常用的有两种，分别是基于词典的和机器学习，前者也属于非监督学习，后者自然一般属于监督学习。

刚开始学情感分析，下面先从**【基于词典的情感分析】**开始进行：

词典：我东搜西找找到了一些感觉是常用的字典，主要有（台湾大学NTUSD简体中文情感词典，清华大学李军中文褒贬义词典，BosonNLP_sentiment_score，知网hownet2007）
词典下载传送门
（积fen少的学生党可以评论留言【年级+邮箱】，看到会发送滴）
因为刚学，所以设计了一些比较基础的规则（基于文本预处理之后生成了关于每一个文档的【词列表向量】）

代码实现：
加载词典（我主要用的是NTUSD的中文情感极性词典）：

#定义一个函数加载词典
def dict_load(path):dict=[]with open(path, encoding='utf-8-sig') as f:for line in f:if line.strip() !='':#养成去空好习惯dict.append(line.strip())return(dict)#开始加载情感词典列表
neg_dict = [] #消极情感词典
pos_dict = [] #积极情感词典
no_dict = [] #否定词词典
dict_file_path='XXXXXX\\'#词典位置，根据需要修改，注意转义符啥的！！！
pos_dict=dict_load(dict_file_path+'台湾大学NTUSD简体中文情感词典/ntusd-positive.txt')
#print(pos_dict)
print("==pos_dict loaded successfully==")
neg_dict=dict_load(dict_file_path+'台湾大学NTUSD简体中文情感词典/ntusd-negative.txt')
#print(neg_dict)
print("==neg_dict loaded successfully==")
no_dict=dict_load(dict_file_path+'否定词典\\否定.txt')
#print(no_dict)
print("==no_dict loaded successfully==")
#加载情感词典结束'''

把之前的规则码出来【确定以下想要的输入的格式，以下输入的单个文档的分词列表】

#定义一个函数来判断句子中积极词、消极词词频
#===============#sent是分好词的列表格式或者序列格式====================
def sent_count(sent, negdict, posdict, nodict):pos = 0neg = 0for i in range(len(sent)):if sent[i] in negdict:if i==1 and sent[i-1] in nodict:pos=pos+1        #否定-消极elif i==1 and sent[i-1] not in nodict:neg=neg+1        #其他-消极elif i>1 and sent[i-1] in nodict:if sent[i-2] in nodict:neg=neg+1   #否定-否定-消极else: pos=pos+1      #其他-否定-消极elif i>1 and sent[i-1] not in nodict:if sent[i-2] in nodict:pos =pos+1  #否定-其他-消极else:neg =neg+1  #其他-其他-消极elif sent[i] in posdict:if i==1 and sent[i-1] in nodict:neg=neg+1        #否定-积极elif i==1 and sent[i-1] not in nodict:pos=pos+1        #其他-积极elif i>1 and sent[i-1] in nodict:if sent[i-2] in nodict:pos=pos+1   #否定-否定-积极else: neg=neg+1      #其他-否定-积极elif i>1 and sent[i-1] not in nodict:if sent[i-2] in nodict:neg =neg+1  #否定-其他-积极else:pos =pos+1  #其他-其他-积极return pos, neg

如果想要更加快速的分析整个文本数据（有多个文档组成），可以用下面的，多加了一层循环：

#==sents是list of list，sent 是分好词的列表格式或者序列格式=============
#判断句子中积极词、消极词词频
def sent_count(sents, negdict, posdict, nodict):pos_list = []neg_list = []for sent in sents:pos=0neg=0for i in range(len(sent)):if sent[i] in negdict:if i==1 and sent[i-1] in nodict:pos=pos+1        #否定-消极elif i==1 and sent[i-1] not in nodict:neg=neg+1        #其他-消极elif i>1 and sent[i-1] in nodict:if sent[i-2] in nodict:neg=neg+1   #否定-否定-消极else: pos=pos+1      #其他-否定-消极elif i>1 and sent[i-1] not in nodict:if sent[i-2] in nodict:pos =pos+1  #否定-其他-消极else:neg =neg+1  #其他-其他-消极elif sent[i] in posdict:if i==1 and sent[i-1] in nodict:neg=neg+1        #否定-积极elif i==1 and sent[i-1] not in nodict:pos=pos+1        #其他-积极elif i>1 and sent[i-1] in nodict:if sent[i-2] in nodict:pos=pos+1   #否定-否定-积极else: neg=neg+1      #其他-否定-积极elif i>1 and sent[i-1] not in nodict:if sent[i-2] in nodict:neg =neg+1  #否定-其他-积极else:pos =pos+1  #其他-其他-积极pos_list.append(pos)neg_list.append(neg)return pos_list, neg_list