lua 文件读写处理（操作敏感词库）

article/2025/9/19 12:37:49

最近需要给游戏做一个敏感词新系统，我采用的方法是比较常用的DFA（确定有穷状态机）算

法，先不讲算法，而这种算法的实现需要一个相应的敏感词库。

我拿到了词库后发现词库中大概有8000+个词，其中包括很多重复的，还有很多有着头包含关

系的词；

　　什么是头包含词呢？看如下例子：

　　我们知道在DFA算法读取敏感词后如果存在这种情况：

　　词1: "ab" 词2: "abc"

　　在读取之后“ ab “这个敏感词就会不复存在而被abc覆盖掉，而我们游戏需要对敏感词进行的

操作不是以其他字符（如 * *）代替句子中的敏感词而是如果判断出句子中含有敏感词，则无法发

出。所以，如果 “ab” 已经是敏感词了，“abc”就没有必要出现在敏感词库中了所以我需要将敏感

词库中的

　　1. 相同的词只留下一个

　　2. 删除头包含其他敏感词的敏感词

　　但是现有的敏感词库中有8000+ 个词我不可能一个个去找，所以我就想到了利用现有的lua io

文件库对原先的敏感词库进行处理这样可以节省太多的时间代码如下

local function getNewWord()local wordsDataInput  = {}local wordsDataOutput = {}-- 读取文件-- 以只读方式打开文件local file_input = io.open("sensitive_words_input.txt", "r")-- 设置默认输入文件为 test.luaio.input(file_input)-- 逐行读取文件local string_l = file_input:read("*l") while(string_l ~= nil)dotable.insert(wordsDataInput, string_l)string_l = file_input:read("*l") endio.close(file_input)-- 写入文件-- 以只写方式打开文件local file_output = io.open("sensitive_words.txt", "a")-- 设置默认输出文件为io.output(file_output)-- 对数据进行处理-- 如果有头包含local function ifIsHeadInTable(str)for i = 1, #wordsDataInput dolocal startIndex, endIndex = string.find(wordsDataInput[i], str)if startIndex ~= nil and endIndex ~= nil then-- 如果find到头索引为1,尾索引不为字符串长度则可以认定为是头包含关系if startIndex == 1 and endIndex ~= string.len(wordsDataInput[i]) thenwordsDataInput[i] = "\n"endendend    end -- 是否已经有相同的local function isHasSameInTable(str)if not wordsDataOutput or not next(wordsDataOutput) then return false endfor key, value in ipairs(wordsDataOutput) doif value == str thenreturn trueendendreturn falseend-- 先剔除头包含for key, value in pairs(wordsDataInput) doifIsHeadInTable(value)end-- 再剔除相同的for key, value in ipairs(wordsDataInput) doif not isHasSameInTable(value) thentable.insert(wordsDataOutput, value)endendfor index, word in pairs(wordsDataOutput) doio.write(word.."\n")endio.close(file_output)
end