目录

  • 实验目的
  • 实验内容
  • 实验步骤
    • 程序说明
      • 帮助功能
      • 程序结构图
    • 设计数据库
      • 设计窗口 新建数据库
      • 查询数据库文件
      • 删除数据库
      • 数据库内添加txt文件
      • 删除数据库中的文件
      • 为数据库中txt文件添加内容,并保存
    • 文档中文分词
      • 建立窗口
      • 输入文本信息
      • 中文分词
    • 文档去停用词
      • 设计窗口并获取txt_file里的文档
      • 去停用词
    • 建立倒排文档
      • 创建窗口
      • 此数据库下的所有文本倒排文档
    • 多词联合查询文档链接
      • 建立窗口
      • 多次查找
  • 实验代码

实验目的

训练学生利用所学的信息检索技术,进行基于信息模型的信息提取系统的开发与设计。促使学生复习理论知识,并增强动手操作能力

实验内容

(一)实验环境:开发环境, 能够上网
(二)实验内容及步骤
实现1-4相应功能,并设计相应数据库及适当界面;
1.文档中文分词
2.文档去除停用词
3.建立倒排文档(不考虑词频)
4.利用倒排文档实现对文档的多词联合查询文档链接(不考虑词频,只考虑词的出现)
5.提交实验报告。
(三)实验参考
1.界面参考:

2.中文分词算法:
读取中文字典,使用正向最大匹配法对文本进行分词。
尽量为句子中每一个词在字典中找到最长的匹配项。
(1)针对待处理句子第一个字,取词典中最长单词对该句子进行适配;
(2)如果没有适配,逐字递减,在对应的词典中进行查找。
(3)逐次处理直到句尾。
例如,对句子“成都是四川省省会”,
(1)首先查找词典中“成”字头开始的所有词;
(2)找出这些词中能够和句子匹配的最长的词。

实验步骤

说明:粘贴的函数代码有self,原因在源代码中是在类中,粘贴的时候没有去除self

程序说明

帮助功能

    def help_info(self):root7 = tk.Toplevel()root7.geometry("600x500+350+250")helpbook = "此程序具备功能:\n1.文档中文分词\n2.文档去除停用词" \"\n3.建立倒排文档\n4.多词联合查询\n5.数据库建立、查找、删除" \"\n使用流程:\n1.新建数据库\n2.数据库添加文件 \n3.为数据库下所有文档进行分词\n" \"4.为数据库建立倒排文档\n5.多词联合查询"scr = scrolledtext.ScrolledText(root7, width=67, height=27, font=("隶书", 12))scr.place(x=20, y=10)scr.insert(END, helpbook)button1 = tk.Button(root7, text="关闭", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root7.destroy)button1.place(x=520, y=450)

程序结构图

 def database_structure(self):global imgroot8 = tk.Toplevel()root8.geometry("600x600+350+250")button1 = tk.Button(root8, text="关闭", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root8.destroy)my_text = tk.Text(root8, width=80, height=35)my_text.pack(padx=10, pady=10)img = tk.PhotoImage(file="database.gif")my_text.image_create(END, image=img)button1.pack()

设计数据库

设计窗口 新建数据库

#新建数据库,root2,e1def database_set(self):global root2, e1root2 = tk.Toplevel()root2.title("数据库建立")root2.geometry("400x350+350+250")#输入框var = tk.StringVar()e1 = tk.Entry(root2, textvariable=var)e1.place(x=30, y=40, width=340, height=30)Label = tk.Label(root2, text='请输入新建数据库名字', font=('黑体', 12, 'bold'))e1.insert(0, "")#确定,取消按钮button1 = tk.Button(root2, text=" 确定 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.database_new)button2 = tk.Button(root2, text=" 取消 ",  bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root2.destroy)Label.place(x=24, y=12)button1.place(x=12, y=300)button2.place(x=320, y=300)root2.mainloop()def database_new(self):database_name = e1.get()path = path_root + database_nameexist_name = os.path.exists(path)#判断文件夹是否存在,是返回trueif exist_name:messagebox.showerror('错误', '已存在同名数据库,请更换数据库名', parent=root2)else:os.makedirs(path_root + database_name + '/txt_file')#递归创建目录os.makedirs(path_root + database_name + '/daopai_file')os.makedirs(path_root + database_name + '/wordcount_file')messagebox.showinfo('消息框', '新建数据库成功', parent=root2)

查询数据库文件

#查询数据库文件def file_name(self, file_dir):database_str = []for root, dirs, files in os.walk(file_dir):database_name = dirsbreakfor i in database_name:txt_filePath = path_root + i + '/txt_file'for root, dirs, files in os.walk(txt_filePath):database_str.append([i, files])return database_strdef database_info(self):root10 = tk.Toplevel()root10.geometry("600x350+350+250")scr = scrolledtext.ScrolledText(root10, width=70, height=18, font=("隶书", 12))button1 = tk.Button(root10, text="返回", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root10.destroy)button5 = tk.Button(root10, text="退出", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root10.destroy)scr.place(x=20, y=10)button1.place(x=20, y=305)button5.place(x=520, y=305)scr.insert(END, "数据库名\t\t文件名\n")for i in self.file_name(path_root):database_name = i[0]txt_name = "\t".join(str(i) for i in i[1])scr.insert(END, database_name + "\t\t" + txt_name + "\n")

删除数据库

#删除数据库def database_del(self):global e2, root3root3 = tk.Toplevel()root3.title("数据库删除")root3.geometry("400x350+350+250")var = tk.StringVar()e2 = tk.Entry(root3, textvariable=var)e2.place(x=30, y=40, width=340, height=30)e2.insert(0, "")Label = tk.Label(root3, text="请输入待删除数据库名", font=('黑体', 12, 'bold'))button1 = tk.Button(root3, text="确定", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.database_lost)button2 = tk.Button(root3, text="取消", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root3.destroy)Label.place(x=24, y=12)button1.place(x=12, y=300)button2.place(x=320, y=300)root3.mainloop()def database_lost(self):database_name = e2.get()path = path_root + database_nameexist_name = os.path.exists(path)if not exist_name:messagebox.showerror('错误', '不存在该数据库,请确保数据库名正确', parent=root3)else:self.del_dir_all(path)shutil.rmtree(path)messagebox.showinfo('消息框', database_name + '数据库删除成功', parent=root3)def del_dir_all(self, path):x = os.listdir(path)for i in x:new_dir = os.path.join(path, i)if not os.path.isdir(new_dir):os.remove(new_dir)return self.del_dir_all(os.path.dirname(new_dir))else:self.del_dir_all(new_dir)os.rmdir(new_dir)

数据库内添加txt文件

# 为数据库添加文档def file_add(self):global e3, e4, root4root4 = tk.Toplevel()root4.title("数据库添加文件")root4.geometry("400x350+350+250")var1 = tk.StringVar()e3 = tk.Entry(root4, textvariable=var1)e3.place(x=30, y=40, width=340, height=30)e3.insert(0, "")Label1 = tk.Label(root4, text="请输入数据库名", font=('黑体', 12, 'bold'))var2 = tk.StringVar()e4 = tk.Entry(root4, textvariable=var2)e4.place(x=30, y=120, width=340, height=30)e4.insert(0, "")Label2 = tk.Label(root4, text="请输入待添加文件名", font=('黑体', 12, 'bold'))button1 = tk.Button(root4, text="确定", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.file_add_action)button2 = tk.Button(root4, text="取消", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root4.destroy)Label1.place(x=24, y=12)Label2.place(x=24, y=80)button1.place(x=12, y=300)button2.place(x=320, y=300)root4.mainloop()def file_add_action(self):database_name = e3.get()file_name = e4.get()path = path_root + database_name + "/txt_file/" + file_name + ".txt"exist_name = os.path.exists(path)exist_database = os.path.exists(path_root + database_name)if exist_name:messagebox.showerror('错误', '已存在同名文件,请更换文件名', parent=root4)elif not exist_database:messagebox.showerror('错误', '不存在该数据库', parent=root4)else:with open(path, "w") as f:f.close()messagebox.showinfo('消息框', file_name + '文件添加成功\n请对该文件进行分词后,为该数据库重新建立倒排文档', parent=root4)

删除数据库中的文件

# 删除数据库中的文件def file_del(self):global e5, e6, root5root5 = tk.Toplevel()root5.title("数据库删除文件")root5.geometry("400x350+350+250")var1 = tk.StringVar()e5 = tk.Entry(root5, textvariable=var1)e5.place(x=30, y=40, width=340, height=30)e5.insert(0, "")Label1 = tk.Label(root5, text="请输入数据库名", font=('黑体', 12, 'bold'))var2 = tk.StringVar()e6 = tk.Entry(root5, textvariable=var2)e6.place(x=30, y=120, width=340, height=30)e6.insert(0, "")Label2 = tk.Label(root5, text="请输入待删除文件名", font=('黑体', 12, 'bold'))button1 = tk.Button(root5, text="确定", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.files_del)button2 = tk.Button(root5, text="取消", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root5.destroy)Label1.place(x=24, y=12)Label2.place(x=24, y=80)button1.place(x=12, y=300)button2.place(x=320, y=300)root5.mainloop()def files_del(self):database_name = e5.get()file_name = e6.get()path = path_root + database_name + "/txt_file/" + file_name + ".txt"wordcount_path = "D:/" + database_name + "/wordcount_file" + file_name + ".txt"exist_name = os.path.exists(path)exist_wordcount = os.path.exists(wordcount_path)if not exist_name:print(path)messagebox.showerror('错误', '不存在该文件,请确保文件名正确', parent=root5)else:os.remove(path)if exist_wordcount:os.remove(wordcount_path)messagebox.showinfo('消息框', file_name + '文件删除成功\n请为该数据库重新建立倒排文档', parent=root5)

为数据库中txt文件添加内容,并保存

    def file_change(self):global e14, e15, scr7, root9root9 = tk.Toplevel()root9.geometry("600x600+200+150")var = tk.StringVar()e14 = tk.Entry(root9, textvariable=var)e14.place(x=20, y=40, width=400, height=30)e14.insert(0, "")Label1 = tk.Label(root9, text="请输入数据库名", font=('黑体', 12, 'bold'))var1 = tk.StringVar()e15 = tk.Entry(root9, textvariable=var1)e15.place(x=20, y=110, width=400, height=30)e15.insert(0, "")Label2 = tk.Label(root9, text="请输入文件名", font=('黑体', 12, 'bold'))scr7 = scrolledtext.ScrolledText(root9, width=75, height=29, font=("隶书", 10))scr7.place(x=20, y=160)button1 = tk.Button(root9, text=" 返回 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root9.destroy)button2 = tk.Button(root9, text=" 打开 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.file_read)button3 = tk.Button(root9, text=" 保存 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=database.file_save)button5 = tk.Button(root9, text=" 退出 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root9.destroy)Label1.place(x=20, y=12)Label2.place(x=20, y=80)button1.place(x=20, y=550)button2.place(x=520, y=18)button3.place(x=285, y=550)button5.place(x=520, y=550)def file_read(self):global fileread_pathscr7.delete(1.0, END)database_name = e14.get()file_name = e15.get()fileread_path = path_root + database_name + "/txt_file/" + file_name + ".txt"if not os.path.exists(fileread_path):messagebox.showerror('错误', '不存在该文件,请确保文件名正确', parent=root9)else:with open(fileread_path, "r", encoding="utf-8-sig") as f:text = f.read()f.close()if text == "":scr7.insert(END, file_name + "文件内容为空!请将文件内容写进该文本框,然后保存")scr7.insert(END, text)scr7.see(END)scr7.update()def file_save(self):filesave_path = fileread_pathtext = scr7.get(1.0, END)with open(filesave_path, "w", encoding='utf-8-sig') as f:f.write(text)f.close()messagebox.showinfo('提示', '保存文件成功', parent=root9)f.close()

文档中文分词

建立窗口

    def button1_click(self):global scr1, scr2, frame1frame1 = tk.Frame(root1, height=600, width=800)frame1.pack(side='top')button1 = tk.Button(frame1, text=" 返回 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame1.destroy)button2 = tk.Button(frame1, text=" 添加 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button1_tk.wordcount_text)button3 = tk.Button(frame1, text=" 开始 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button1_tk.text_get)button5 = tk.Button(frame1, text=" 退出 ", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame1.destroy)Label1 = tk.Label(frame1, text="待分词文本", font=('黑体', 12, 'bold'))Label2 = tk.Label(frame1, text="分词结果", font=('黑体', 12, 'bold'))scr1 = scrolledtext.ScrolledText(frame1, width=108, height=15, font=("隶书", 10), highlightthickness=1, bd=4)scr2 = scrolledtext.ScrolledText(frame1, width=108, height=18, font=("隶书", 10), highlightthickness=1, bd=4)button1.place(x=20, y=550)button2.place(x=700, y=0)button3.place(x=700, y=250)button5.place(x=700, y=550)Label1.place(x=20, y=18)Label2.place(x=20, y=280)scr1.place(x=20, y=38)scr2.place(x=20, y=300)#root1.mainloop()

输入文本信息

    def wordcount_text(self):global e7, e8, root5root5 = tk.Toplevel()root5.title("选择待分词文本")root5.geometry("400x350+300+200")var1 = tk.StringVar()e7 = tk.Entry(root5, textvariable=var1)e7.place(x=30, y=40, width=340, height=30)e7.insert(0, "")Label1 = tk.Label(root5, text="请输入数据库名", bd=5, font=('黑体', 12, 'bold'))var2 = tk.StringVar()e8 = tk.Entry(root5, textvariable=var2)e8.place(x=30, y=120, width=340, height=30)e8.insert(0, "")Label2 = tk.Label(root5, text="请输入待分词文本文件名", bd=5, font=('黑体', 12, 'bold'))button1 = tk.Button(root5, text="确定", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button1_tk.text_chose)button2 = tk.Button(root5, text="取消", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root5.destroy)Label1.place(x=24, y=12)Label2.place(x=24, y=80)button1.place(x=12, y=300)button2.place(x=320, y=300)root5.mainloop()# 获取待分词文本def text_chose(self):scr1.delete(1.0, END)global count_txtSavepath, count_txtSaveflagedatabase_name = e7.get()file_name = e8.get()path = path_root + database_name + "/txt_file/" + file_name + ".txt"count_txtSavepath = path_root + database_name + "/wordcount_file/" + file_name + ".txt"text_exits = os.path.exists(path)if not text_exits:messagebox.showerror('错误', '不存在该文档,请确认数据库和文件名是否正确', parent=root5)else:with open(path, "r", encoding="utf-8-sig") as f:text = f.read()scr1.insert(END, text)scr1.see(END)scr1.update()count_txtSaveflage = Trueroot5.destroy()

中文分词

    def text_get(self):scr2.delete(1.0, END)global count_txtSaveflage, count_txtSavepathwith open("知网中文词典.txt", "r", encoding="utf-8-sig") as dic_txt:dic = []text = dic_txt.readlines()for line in text:line = line.strip()dic.append(line)dic_txt.close()texts = scr1.get(1.0, "end")scr2.insert(END, "\t".join(button1_tk.max_match_segment(texts.strip(), dic)))scr2.see(END)scr2.update()if count_txtSaveflage:fpo = open(count_txtSavepath, "w", encoding="utf-8-sig")fpo.write("\t".join(button1_tk.max_match_segment(texts.strip(), dic)))fpo.close()#正向最大匹配算法def max_match_segment(self, text, dic):window_size = 5chars = textwords = []idx = 0i = 0while idx < len(text):matched = Falsefor i in range(window_size, 0, -1):cand = chars[idx:idx + i]if cand in dic:words.append(cand)matched = Truebreakif not matched:i = 1words.append(chars[idx])idx += ireturn words

文档去停用词

设计窗口并获取txt_file里的文档

    def button2_click(self):global scr3, scr4, scr5, frame2frame2 = tk.Frame(root1, height=600, width=800)frame2.pack(side='top')button1 = tk.Button(frame2, text="返回", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame2.destroy)button2 = tk.Button(frame2, text="添加", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button2_tk.checkWord_text)button3 = tk.Button(frame2, text="开始", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button2_tk.check_word)button5 = tk.Button(frame2, text="退出", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame2.destroy)Label1 = tk.Label(frame2, text="待去停用词文本", font=('黑体', 12, 'bold'))Label2 = tk.Label(frame2, text="结果", font=('黑体', 12, 'bold'))Label3 = tk.Label(frame2, text="去掉的停用词", font=('黑体', 12, 'bold'))scr3 = scrolledtext.ScrolledText(frame2, width=108, height=15, font=("隶书", 10))scr4 = scrolledtext.ScrolledText(frame2, width=56, height=18, font=("隶书", 10))scr5 = scrolledtext.ScrolledText(frame2, width=45, height=18, font=("隶书", 10))button1.place(x=20, y=550)button2.place(x=700, y=0)button3.place(x=700, y=250)button5.place(x=700, y=550)Label1.place(x=20, y=18)Label2.place(x=20, y=280)Label3.place(x=450, y=280)scr3.place(x=20, y=38)scr4.place(x=20, y=300)scr5.place(x=450, y=300)#frame2.mainloop()  ##打开文档def checkWord_text(self):global e9, e10, root6root6 = tk.Toplevel()root6.title("选择待去停用词文本")root6.geometry("400x350+300+200")var1 = tk.StringVar()e9 = tk.Entry(root6, textvariable=var1)e9.place(x=30, y=40, width=340, height=30)Label_1 = tk.Label(root6, text="请输入数据库名", bd=5, font=('黑体', 12, 'bold'))e9.insert(0, "")var2 = tk.StringVar()e10 = tk.Entry(root6, textvariable=var2)e10.place(x=30, y=120, width=340, height=30)Label_2 = tk.Label(root6, text="请输入待分词文本文件名", bd=5, font=('黑体', 12, 'bold'))e10.insert(0, "")button1 = tk.Button(root6, text="确定", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=button2_tk.text1_chose)button2 = tk.Button(root6, text="取消", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=root6.destroy)Label_1.place(x=24, y=12)Label_2.place(x=24, y=80)button1.place(x=12, y=300)button2.place(x=320, y=300)root6.mainloop()

去停用词

    def text1_chose(self):database_name = e9.get()file_name = e10.get()path = path_root + database_name + "/wordcount_file/" + file_name + ".txt"text_exits = os.path.exists(path)if not text_exits:messagebox.showerror('错误', '不存在该文档,请确认数据库和文件名是否正确', parent=root6)else:with open(path, "r", encoding="utf-8-sig") as f:text = f.read()f.close()scr3.insert(END, text)scr3.see(END)scr3.update()root6.destroy()#去停用词def check_word(self):check_flage = Falsescr4.delete(1.0, END)scr5.delete(1.0, END)with open("中文停用词表.txt", "r", encoding="utf-8-sig") as dic_txt:out_txt = []find_word = []dic = []text = dic_txt.readlines()dic_txt.close()for line in text:line = line.strip()dic.append(line)text1 = scr3.get(1.0, "end")text1 = text1.strip()if "\t" in text1:text1 = text1.split("\t")check_flage = Truefor word in text1:if word in dic:find_word.append(word)else:out_txt.append(word)if check_flage:scr4.insert(END, "\t".join(str(i) for i in out_txt))else:scr4.insert(END, "".join(str(i) for i in out_txt))scr5.insert(END, "\t".join(str(i) for i in find_word))scr4.see(END)scr5.see(END)scr4.update()scr5.update()

建立倒排文档

创建窗口

    def button3_click(self):global e11, scr6, frame3frame3 = tk.Frame(root1, height=600, width=800)frame3.pack(side='top')var = tk.StringVar()e11 = tk.Entry(frame3, textvariable=var)e11.place(x=20, y=50, width=650, height=30)Label = tk.Label(frame3, text='请输入数据库名,将为此数据库下的所有文本建立倒排文档', bd=5, font=('黑体', 12, 'bold'))e11.insert(0, "")scr6 = scrolledtext.ScrolledText(frame3, width=93, height=35, font=("隶书", 10))scr6.place(x=20, y=85)button1 = tk.Button(frame3, text="返回", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame3.destroy)button2 = tk.Button(frame3, text="确定", bd=5, font=('黑体', 12, 'bold'), command=button3_tk.daopaitext_chose)button5 = tk.Button(frame3, text="退出", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame3.destroy)Label.place(x=20, y=20)button1.place(x=20, y=550)button2.place(x=700, y=40)button5.place(x=700, y=550)#frame3.mainloop()  #

此数据库下的所有文本倒排文档

    def daopaitext_chose(self):scr6.delete(1.0, END)daopai_dic = {}wrodcount_filesName = []txt_filesName = []database_name = e11.get()path = path_root + database_nametxt_path = path_root + database_name + "/txt_file"wrodcount_filePath = path_root + database_name + "/wordcount_file"daopai_txtPath = path_root + database_name + "/daopai_file/" + database_name + ".txt"for root, dirs, files in os.walk(wrodcount_filePath):wrodcount_filesName = filesfor root, dirs, files in os.walk(txt_path):txt_filesName = filesdatabase_exits = os.path.exists(path)wrodcount_filenum = len(wrodcount_filesName)txt_filesnum = len(txt_filesName)if not database_exits:messagebox.showerror('错误', '不存在该数据库,请确认数据库名是否正确', parent=frame3)elif wrodcount_filesName == []:messagebox.showerror('错误', '找不到分词文件来建立倒排文档,\n请先对此数据库下的文件进行文本分词', parent=frame3)else:if wrodcount_filenum < txt_filesnum:messagebox.showwarning('警告', '该数据库下存在文件未进行文本分词操作\n可能会影响倒排文档建立,''导致后续查找,找不到指定文档', parent=frame3)for i in wrodcount_filesName:wrodcount_txtPath = wrodcount_filePath + "/" + iwith open(wrodcount_txtPath, "r", encoding="utf-8-sig")as f:word = f.read()content = word.strip()content = content.split("\t")f.close()for j in content:a = j in daopai_dicif a:if i not in daopai_dic[j]:daopai_dic[j].append(i)elif not a:daopai_dic[j] = [i]else:passscr6.insert(END, "词" + "\t\t" + "含词的文档\n")with open(daopai_txtPath, "w", encoding='utf-8-sig') as f:f.close()for key in daopai_dic:values = "\t".join(str(i) for i in daopai_dic[key])with open(daopai_txtPath, "a") as f:f.writelines(key + "\t\t" + values + "\n")f.close()scr6.insert(END, key + "\t\t" + values + "\n")scr6.see(END)scr6.update()

多词联合查询文档链接

建立窗口

    def button4_click(self):global e12, e13, scr6, frame4frame4 = tk.Frame(root1, height=600, width=800)frame4.pack(side='top')var = tk.StringVar()e12 = tk.Entry(frame4, textvariable=var)e12.place(x=20, y=40, width=650, height=30)Label_1 = tk.Label(frame4, text='请输入几个词,以逗号分隔', bd=5, font=('黑体', 12, 'bold'))e12.insert(0, "")var1 = tk.StringVar()e13 = tk.Entry(frame4, textvariable=var1)e13.place(x=20, y=100, width=650, height=30)Label_2 = tk.Label(frame4, text='请输入数据库名', bd=5, font=('黑体', 12, 'bold'))e13.insert(0, "")scr6 = scrolledtext.ScrolledText(frame4, width=93, height=30, font=("隶书", 10))scr6.place(x=20, y=150)button1 = tk.Button(frame4, text="返回", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame4.destroy)button2 = tk.Button(frame4, text="查询", bd=5, font=('黑体', 12, 'bold'), command=button4_tk.txt_find)button5 = tk.Button(frame4, text="退出", bd=5, font=('黑体', 12, 'bold'), bg="light grey", command=frame4.destroy)Label_1.place(x=20, y=10)Label_2.place(x=20, y=70)button1.place(x=20, y=550)button2.place(x=700, y=100)button5.place(x=700, y=550)frame4.mainloop()  #

多次查找

# 多词查找def txt_find(self):scr6.delete(1.0, END)database_name = e13.get()words = e12.get()word_list = words.split(",")a = os.path.exists(path_root + database_name)b = "," not in wordsif (not a and b):messagebox.showerror('错误', '不存在该数据库,且所输入词组未用中文逗号分隔', parent=frame4)elif not a:messagebox.showerror('错误', '不存在该数据库', parent=frame4)elif b:messagebox.showwarning('警告', '所输入词组未用中文逗号分隔', parent=frame4)else:dic = {}path = path_root + database_name + "/daopai_file/" + database_name + ".txt"with open(path, "r", encoding="gbk") as f:lines = f.readlines()f.close()for line in lines:line = line.replace("\t\t", ",")line = line.replace("\t", ",")line = line.strip()line = line.split(",")key = line[0]values = line[1:]dic[key] = valuesscr6.insert(END, "字\t文件名\t\t文件路径\n")for m in range(len(word_list)):i = word_list[m]if i not in dic:messagebox.showerror('错误', '找不到文本' + i, parent=frame4)elif i in dic:list_1 = dic[i]count = 0path_len = len(list_1)for path in list_1:flage = Truefor j in word_list:if j not in dic:flage = Falseelif path not in dic[j]:flage = Falsebreakif flage:scr6.insert(END, i +'\t' + path + "\t\t" + path_root + database_name + "/txt_file/" + path + "\n")if not flage:count += 1if count == path_len:messagebox.showerror('错误', '找不到文本', parent=frame4)

实验代码

代码使用说明:

主函数path_root需更改,class DataBase里的files_del()
中wordcount_path需更改。依据自己的pth_root建立文件至于中文停用词表和知网中文词典,可私信我要也可自行查找程序结构图database.gif见上面

系统分析实验 Python实验代码

系统分析实验 Python相关推荐

  1. linux操作系统分析实验—基于mykernel的时间片轮转多道程序实现与分析

    linux操作系统分析实验-基于mykernel的时间片轮转多道程序实现与分析 学号384 原创作业转载请注明出处+中国科学技术大学孟宁老师的Linux操作系统分析 https://github.co ...

  2. 数字图像处理实验——Python语言实现

    数字图像处理实验--Python语言实现 实验一:数字图像处理入门 实验二:直方图均衡 实验三:线性平滑和锐化--掩模法 实验四:非线性平滑--中值滤波 实验五:非线性锐化--梯度法 GitHub地址 ...

  3. Unix/Linux操作系统分析实验二 内存分配与回收:Linux系统下利用链表实现动态内存分配

    Unix/Linux操作系统分析实验一 进程控制与进程互斥 Unix/Linux操作系统分析实验三 文件操作算法: 实现在/proc目录下添加文件 Unix/Linux操作系统分析实验四 设备驱动: ...

  4. Unix/Linux操作系统分析实验四 设备驱动: Linux系统下的字符设备驱动程序编程

    Unix/Linux操作系统分析实验一 进程控制与进程互斥 Unix/Linux操作系统分析实验二 内存分配与回收:Linux系统下利用链表实现动态内存分配 Unix/Linux操作系统分析实验三 文 ...

  5. python的特征提取实验一_在opencv3中使用ORB进行特征提取实验-Python版

    ORB (Oriented FAST and Rotated BRIEF) 分为两部分: 特征点提取 -由FAST(Features from Accelerated Segment Test)算法发 ...

  6. 操作系统虚拟存储器实验---Python实现

    最近做实验,要求用文件存取数据,上网搜,主要都是C语言或者Java写的,Python的粗略看是没搜到. 所以想到用Python来写,操作文件更方便一些. 废话不多说,直接上代码 class load_ ...

  7. 仿射密码实验——Python实现(完整解析版)

    文章目录 前言 实验内容 实验操作步骤 1.编写主程序 2.编写加密模块 3.编写解密模块 4.编写文件加解密模块 实验结果 实验心得 实验源码 scirpt.py usefile.py 前言 实验目 ...

  8. 鲲鹏云实验-Python+Jupyter机器学习基础环境

    [摘要] 介绍Ubuntu 18.04环境下Python3常用科学计算和数据分析包(numpy, scipy, matplotlib, sklearn, pandas)的安装,以及Jupyter No ...

  9. 鲲鹏云服务器运行python项目_鲲鹏云实验-Python+Jupyter机器学习基础环境

    [摘要] 介绍Ubuntu 18.04环境下Python3常用科学计算和数据分析包(numpy, scipy, matplotlib, sklearn, pandas)的安装,以及Jupyter No ...

最新文章

  1. linux 环境下,yum 安装redis
  2. java日历教程_JAVA Calendar方法使用基础教程详解
  3. 10个让你获得最新设计理念的网站
  4. 利用微软Atlas消费外部Web服务
  5. 代友招中高级.NET开发工程师【上海-徐汇】
  6. C#正则表达式编程(三):Match类和Group类用法
  7. C# Socket编程(3)编码和解码
  8. 请指点一下,讨论也可以,顶也有分
  9. 不连续曲线 highcharts_无人车运动规划中常用的方法:多项式曲线
  10. C++unique函数应用举例
  11. 【qduoj - 夏季学期创新题】最长公共子串(水题暴力枚举,不是LCS啊)
  12. 凝聚共识 聚力前行丨《数据库系统的分类和评测研究》报告发布
  13. C++设计模式详解之抽象工厂模式解析
  14. 【干货】2020年陆奇最新万字演讲:世界新格局下的创业创新机会.pdf(附下载链接)...
  15. c语言的关键字及其作用,c语言关键字及其含义 详细归纳
  16. 可以上网但是不能ping通局域网
  17. 老师偷偷塞给我一份资料,封面写着《操作系统》,下面一行小字
  18. 破解win7开机密码!
  19. FastDFS Destination image dimensions must not be less than 0 pixels
  20. Class6——筛选后显示+scipy色彩聚类

热门文章

  1. win10+anaconda+pycharm python画图完整过程
  2. 年薪百万的阿里 P7 到底该具备什么样的能力?!解密篇
  3. 互联网公司和外包公司有什么区别?为什么有些程序员不想进外包公司?
  4. python中encoding是什么意思_python中encoding是什么意思
  5. 安卓手机丢了,危险了!意外的7万美元的谷歌Pixel绕过锁屏
  6. linux下google浏览器字体不清晰,google浏览器的字体模糊的原因是什么_怎么解决 - 驱动管家...
  7. [内核内存] 反向映射详解
  8. 安装jupyter notebook中关于markupsafe的问题
  9. 存储基础知识——SAN
  10. 用Dijkstra算法找到图上两点之间的最短路径