Word Frequency(Leetcode192)

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ’ ’ characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.
For example, assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

words.txt只包括小写字母和 ’ ’ 。
每个单词只由小写字母组成。
单词间由一个或多个空格字符分隔。
示例:

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is

你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1

# Read from the file words.txt and output the word frequency list to stdout.
#cat words.txt | tr "\n" " " | sed  's/\s\+/\n/g' | sort | uniq -c | awk '{print $2" "$1}' | sort -rnk 2cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

tr -s: truncate（缩短） the string with target string, but only remaining one instance (e.g. multiple whitespaces)
把多个连续的空格缩短为一个空格，即只保留一个空格，然后用’\n’替换空格。

|sort 排序
|uniq -c 去重，并且显示每个单词出现的数量
| sort -k1 -r 按照单词出现的数量进行逆序排序
|awk ‘{print $2,$1}’ 交换第一列和第二列的位置（因为uniq出来的结果数量是在前面的）

Word Frequency(Leetcode192)相关推荐

Individual Project - Word frequency program-11061171-MaoYu
BUAA Advanced Software Engineering Project: Individual Project - Word frequency program Ryan Mao (毛 ...
Python:实现word frequency functions词频函数算法(附完整源码)
Python:实现word frequency functions词频函数算法 import string from math import log10 def term_frequency(term ...
192. Word Frequency 使用shell统计词频
答案 cat words.txt | sed 's/ /\n/g' | sed '/^$/d' | sort | uniq -c | awk '{print $2, $1}' | sort -nrk2 ...
linux - word frequency
linux 输出某个文件的单词出现频率解决方式 cat words.txt |awk '{for(i=1;i<=NF;i++) print $i;}'|sort|uniq -c|sort - ...
Project: Individual Project - Word frequency program----11061192zmx
Description & Requirements http://www.cnblogs.com/jiel/p/3311400.html 项目时间估计理解项目要求: 1小时构建项目逻辑: ...
《Reducing Word Omission Errors in Neural Machine Translation:A Contrastive Learning Approach》论文阅读笔记
Reducing Word Omission Errors in Neural Machine Translation:A Contrastive Learning Approach 基本信息研究目 ...
html+word+clou,AE脚本：Word Cloud 1.0.3_文字云排版动画脚本+教程
Word Cloud是单词(词语,文字)频率的图形表示,它更突出显示在源文本中更频繁出现的单词.视觉中的单词越大,单词在文档中越常见.算法通过使用单词权重,字体和颜色主题填充单词而不重叠来创建状态.通 ...
2021年大数据Flink（三十五）：Table与SQL 案例二
目录案例二需求代码实现-SQL 代码实现-Table 案例二需求使用SQL和Table两种方式对DataStream中的单词进行统计代码实现-SQL package cn.it.sql;i ...
python中nlp的库_单词袋简介以及如何在Python for NLP中对其进行编码
python中nlp的库 by Praveen Dubey 通过Praveen Dubey 单词词汇入门以及如何在Python中为NLP 编写代码的简介 (An introduction to Bag ...
python余弦相似度文本分类_Jaccard与cosine文本相似度的异同
工作过程中,常常其他业务的同学问到:某两个词的相似度是多少?某两个句子的相似度是多少?某两个文档之间的相似度是多少?在本文中,我们讨论一下jaccard与cosine在文本相似度上的差异,以及他们适用 ...

Word Frequency(Leetcode192)

Word Frequency(Leetcode192)相关推荐

最新文章

热门文章