词频统计-------------web版本

要求：把程序迁移到web平台，通过用户上传TXT的方式接收文件。建议(但不强制要求)保留并维护Console版本，有利于测试。

在页面上设置上传的控件,然后在servlet中接受，得到的是一个字节流，然后转化为字符型在原有代码中进行统计。

jsp页面的代码如下

<%@ page language="java" contentType="text/html; charset=utf-8"pageEncoding="utf-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Insert title here</title>
</head>
<body><table><tr><td><form action="server/CountWordServlet" method="post" enctype="multipart/form-data">请上传要统计的文件<input type="file" name="sourceFile"/><input type="submit" value="上传"></form></td></tr></table>
</body>
</html>

展示结果的页面如下

<%@page import="com.server.servlet.Word"%>
<%@page import="java.util.ArrayList"%>
<%@ page language="java" contentType="text/html; charset=utf-8"pageEncoding="utf-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<%ArrayList<Word> list=(ArrayList<Word>)request.getAttribute("list"); %>
<title>Insert title here</title>
</head>
<body><table><%if(list!=null&&list.size()!=0){%><tr> <td>单词</td><td>数量</td> </tr><% for(int i=0;i<list.size();i++){String word=((Word)list.get(i)).getWord();int num=((Word)list.get(i)).getNum();%><tr><td><%=word%></td><td><%=num%></td></tr> <%  }}else{  %><td>此文件没有单词或者文件不存在</td><%     }%></table>
</body>
</html>

servle中的代码如下

public class CountWordServlet extends HttpServlet {private static final long serialVersionUID = 1L;protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {try {request.setCharacterEncoding("utf-8");ArrayList<Word> list=new ArrayList<>();DiskFileItemFactory factory=new DiskFileItemFactory();ServletFileUpload upload=new ServletFileUpload(factory); FileItemIterator iterator=upload.getItemIterator(request);while(iterator.hasNext()){InputStream input=iterator.next().openStream();WordCountFreq wcf=new WordCountFreq();list=(ArrayList<Word>) wcf.sortAndOutput(input);request.setAttribute("list", list);}} catch (FileUploadException e) { e.printStackTrace();}System.out.println("成功了！");response.setContentType("text/html;charset=utf-8");request.getRequestDispatcher("/show.jsp").forward(request, response); }}

然后将统计过程的关键方法sortAndOutput（）展示如下

public List<Word> sortAndOutput(InputStream input) throws IOException {BufferedInputStream bis=new BufferedInputStream(input);byte [] buf = new byte[1024];int len = -1; String temp = ""; String lastWord = ""; while((len = bis.read(buf)) != -1) {//将读取到的字节数据转化为字符串打印出来String str = new String(buf,0,len); temp = ""; temp += lastWord;for (int i = 0; i < str.length(); i++) {temp += str.charAt(i);}lastWord = ""; if (Character.isLetter(str.charAt(str.length()-1))) { int j, t;for (j = str.length() - 1, t = 0; Character.isLetter(str.charAt(j)); j--, t++); temp = temp.substring(0, temp.length() - t); for (int k = j + 1; k < str.length(); k++) {lastWord += str.charAt(k);}}  root = generateCharTree(temp);  }

示例如下

在没做web版本之前，只是传入文件的路径进行处理。改为web版本之后将遇见的一点小困难是要将字节流转化为字符进行处理，经过查询也很快就解决了。

ssh:git@git.coding.net:muziliquan/GUIVersion.git

git:git://git.coding.net/muziliquan/GUIVersion.git

转载于:https://www.cnblogs.com/liquan/p/5978546.html

词频统计-------------web版本相关推荐

java词频统计——web版支持
需求概要: 1.把程序迁移到web平台,通过用户上传TXT的方式接收文件. 2.用户直接输入要统计的文本,服务器返回结果 3.在页面上给出链接 (如果有封皮.作者.字数.页数等信息更佳)或表格,展示经 ...
java统计词频算法_Java实现的词频统计——功能改进
本次改进是在原有功能需求及代码基础上额外做的修改,保证了原有的基础需求之外添加了新需求的功能. 功能: 1. 小文件输入--从控制台由用户输入到文件中,再对文件进行统计: 2.支持命令行输入英文作品的 ...
python 英语词频统计软件_Python数据挖掘——文本分析
作者 | zhouyue65 来源 | 君泉计量文本挖掘:从大量文本数据中抽取出有价值的知识,并且利用这些知识重新组织信息的过程. 一.语料库(Corpus) 语料库是我们要分析的所有文档的集合. ...
HADOOP集群大数据词频统计及设计比较（完整教程）
###如若发现错误,或代码敲错,望能评论指正!!! 通过百度网盘分享的文件:Hadoop相关需要的软件链接:https://pan.baidu.com/s/1XzDvyhP4_LQzAM1auQCS ...
Hadoop的环境配置——搭建一个主机hadoop102，两个从机hadoop103，hadoop104，并运行分布式词频统计
本文是跟着B站上的视频实现的,链接如下: https://www.bilibili.com/video/BV1Qp4y1n7EN?p=18 Hadoop运行环境搭建重来3遍是正常的,这篇针对的是怎么 ...
软工作业3：词频统计
词频统计一.编译环境 (1)IDE:PyCharm 2018 (2)python版本:python3.6.3(Anaconda3-5.1.0 ) 二.程序分析 (1)读文件到缓冲区(process ...
python 词频统计，分词笔记
Python的中文分词库有很多,常见的有: jieba(结巴分词) THULAC(清华大学自然语言处理与社会人文计算实验室) pkuseg(北京大学语言计算与机器学习研究组) SnowNLP pynl ...
Hadoop的改进实验（中文分词词频统计及英文词频统计）（1/4）
声明: 1)本文由我bitpeach原创撰写,转载时请注明出处,侵权必究. 2)本小实验工作环境为Windows系统下的百度云(联网),和Ubuntu系统的hadoop1-2-1(自己提前配好).如不 ...
python 对excel文件进行分词并进行词频统计_python 词频分析
python词频分析昨天看到几行关于用 python 进行词频分析的代码,深刻感受到了 python 的强大之处.(尤其是最近自己为了在学习 c 语言感觉被它的语法都快搞炸了,python 从来没有 ...

词频统计-------------web版本

词频统计-------------web版本相关推荐

最新文章

热门文章