





4行4列 轴对称 只需要看一半就可以 横着看 竖着看都行 数值越接近1 表示越相似

我们通过这个可以将新的新闻(还未加入数据库的新闻)放在左上角 然后mongodb存的老新闻和他比较 如果超一定值

比如0.8 表示相似度高 我们就帮他当成一个新闻 那么这个左上角新的新闻

就会被踢掉  如果相似度很低 说明是新的新闻 那么就 执行命令加入mongodb中来  大概这个意思



里面有特殊字符 不加注释出爆NO-ASXIAL码的问题

# -*- coding: utf-8 -*-from sklearn.feature_extraction.text import TfidfVectorizernews1 = """
(CNN)President Donald Trump on Saturday again attacked a federal judge whose decision he disliked, blasting Judge James Robart, a George W. Bush appointee who temporarily stopped his controversial travel ban Friday night.
Trump's increasingly heated responses quickly drew objections from Democrats, who said he was improperly attacking an independent judiciary. By Saturday afternoon, Trump had stepped up his criticism: "Because the ban was lifted by a judge, many very bad and dangerous people may be pouring into our country. A terrible decision."
Shortly after 8 a.m. ET, the President tweeted, "The opinion of this so-called judge, which essentially takes law-enforcement away from our country, is ridiculous and will be overturned."
The opinion of this so-called judge, which essentially takes law-enforcement away from our country, is ridiculous and will be overturned!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
That tweet was one of several Trump issued Saturday morning in which he defended his executive order on immigration, which bars citizens of seven Muslim-majority countries from entering the US for 90 days, all refugees for 120 days and indefinitely halts refugees from Syria.
RELATED: James Robart: 5 things to know about judge who blocked travel ban
"When a country is no longer able to say who can, and who cannot , come in & out, especially for reasons of safety &.security - big trouble," Trump next tweeted.
When a country is no longer able to say who can, and who cannot , come in & out, especially for reasons of safety &.security - big trouble!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
"Interesting that certain Middle-Eastern countries agree with the ban. They know if certain people are allowed in it's death & destruction," he added, though he didn't name any countries.
Interesting that certain Middle-Eastern countries agree with the ban. They know if certain people are allowed in it's death & destruction!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
Saturday afternoon, Trump resumed his criticism, tweeting: "What is our country coming to when a judge can halt a Homeland Security travel ban and anyone, even with bad intentions, can come into U.S.?"
What is our country coming to when a judge can halt a Homeland Security travel ban and anyone, even with bad intentions, can come into U.S.?
— Donald J. Trump (@realDonaldTrump) February 4, 2017
He followed up with, "Because the ban was lifted by a judge, many very bad and dangerous people may be pouring into our country. A terrible decision."
Because the ban was lifted by a judge, many very bad and dangerous people may be pouring into our country. A terrible decision
— Donald J. Trump (@realDonaldTrump) February 4, 2017
And he was still tweeting about it early Saturday evening: "Why aren't the lawyers looking at and using the Federal Court decision in Boston, which is at conflict with ridiculous lift ban decision?"
Why aren't the lawyers looking at and using the Federal Court decision in Boston, which is at conflict with ridiculous lift ban decision?
— Donald J. Trump (@realDonaldTrump) February 4, 2017
Trump was referring to a decision by a federal judge in Boston earlier Friday, a more limited ruling that declined to renew a temporary restraining order in Massachusetts. It would have prohibited the detention or removal of foreign travelers legally authorized to come to the Boston area, and the decision represented the Trump administration's first court victory regarding the order.
Unusual criticism
It is highly unusual for a President to publicly criticize a federal judge, but during the campaign, Trump memorably railed against Judge Gonzalo Curiel, who was overseeing a lawsuit against Trump University. Trump said Curiel, who was born in Indiana, was unable to fairly preside over the lawsuit because of his "Mexican heritage." Trump had introduced plans to build a wall along the Mexican border and take a hard stance on immigration.
Vice President Mike Pence later defended Trump in an interview with ABC News' George Stephanopoulos.
"Is it right for the President to say 'so-called' judge'? Doesn't that undermine the separation of powers in the Constitution?" Stephanopoulos asked Pence on "This Week" in a clip released Saturday afternoon.
"I don't think it does," Pence replied. "I think the American people are very accustomed to this president speaking his mind and speaking very straight with them."
ABC Breaking News | Latest News Videos
But Democrats pounced on Trump's criticism of Robart, with Democratic senators flatly saying the President's comments will factor into the confirmation hearings for Supreme Court nominee Neil Gorsuch.
"Attack on federal judge from POTUS is beneath the dignity of that office. That attitude can lead America to calamity," Washington Gov. Jay Inslee tweeted Saturday.
Attack on federal judge from POTUS is beneath the dignity of that office. That attitude can lead America to calamity.
— Governor Jay Inslee (@GovInslee) February 4, 2017
"The President's attack on Judge James Robart, a Bush appointee who passed with 99 votes, shows a disdain for an independent judiciary that doesn't always bend to his wishes and a continued lack of respect for the Constitution, making it more important that the Supreme Court serve as an independent check on the administration," Senate Minority Leader Chuck Schumer said in a statement.
"With each action testing the Constitution, and each personal attack on a judge, President Trump raises the bar even higher for Judge Gorsuch's nomination to serve on the Supreme Court. His ability to be an independent check will be front and center throughout the confirmation process."
Vermont. Sen. Patrick Leahy, the ranking member of the Judiciary Committee, said Trump's "hostility toward the rule of law is not just embarrassing, it is dangerous."
"We need a nominee for the Supreme Court willing to demonstrate he or she will not cower to an overreaching executive. This makes it even more important that Judge Gorsuch, and every other judge this president may nominate, demonstrates the ability to be an independent check and balance on an administration that shamefully and harmfully seems to reject the very concept."
Robart's order on Friday was a significant setback to Trump's ban and set up the nation for a second straight weekend of confusion about the policy's legality.
The White House said Friday the Department of Justice will challenge the decision. In a statement, White House press secretary Sean Spicer initially called Robart's order "outrageous" before quickly issuing another statement that dropped that word.
Robart has presided in the US District Court for the Western District of Washington state since 2004. He assumed senior status in 2016.
news2 = """
President Donald Trump on Saturday again attacked a federal judge whose decision he disliked, criticizing Judge James Robart, a George W. Bush appointee who temporarily stopped his controversial travel ban Friday night.
President Trump’s attacks quickly drew objections from Democrats, who said he was attacking an independent judiciary. And by Saturday afternoon, President Trump was openly accusing Robart of potentially allowing “many very bad and dangerous people” to flow into the US and warning of dire consequences if the executive order is not enforced.
He also said, “What is our country coming to when a judge can halt a Homeland Security ban and anyone, even with bad intentions, can come into the U.S.?”
What is our country coming to when a judge can halt a Homeland Security travel ban and anyone, even with bad intentions, can come into U.S.?
— Donald J. Trump (@realDonaldTrump) February 4, 2017
Shortly after 8 a.m. ET, the President tweeted, “The opinion of this so-called judge, which essentially takes law-enforcement away from our country, is ridiculous and will be overturned.”
The opinion of this so-called judge, which essentially takes law-enforcement away from our country, is ridiculous and will be overturned!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
The tweet was one of several President Trump issued Saturday morning in which he defended his executive order on immigration, which bars citizens of seven Muslim-majority countries from entering the US for 90 days, all refugees for 120 days and indefinitely halts refugees from Syria.
“When a country is no longer able to say who can, and who cannot , come in & out, especially for reasons of safety &.security – big trouble,” President Trump next tweeted.
When a country is no longer able to say who can, and who cannot , come in & out, especially for reasons of safety &.security – big trouble!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
“Interesting that certain Middle-Eastern countries agree with the ban. They know if certain people are allowed in it’s death & destruction,” he added, though he didn’t name any countries.
Interesting that certain Middle-Eastern countries agree with the ban. They know if certain people are allowed in it's death & destruction!
— Donald J. Trump (@realDonaldTrump) February 4, 2017
Saturday afternoon, President Trump resumed his criticism, tweeting: “What is our country coming to when a judge can halt a Homeland Security travel ban and anyone, even with bad intentions, can come into U.S.?”
He followed up with, “Because the ban was lifted by a judge, many very bad and dangerous people may be pouring into our country. A terrible decision.”
Because the ban was lifted by a judge, many very bad and dangerous people may be pouring into our country. A terrible decision
— Donald J. Trump (@realDonaldTrump) February 4, 2017
It is highly unusual for a President to publicly criticize a federal judge, but during the campaign, President Trump memorably railed against Judge Gonzalo Curiel, who was overseeing a lawsuit against Trump University. President Trump said Curiel, who was born in Indiana, was unable to fairly preside over the lawsuit because of his “Mexican heritage.” President Trump had introduced plans to build a wall along the Mexican border and take a hard stance on immigration.
Democrats pounced on President Trump’s criticism of Robart, with Democratic senators flatly saying the President’s comments will factor into the confirmation hearings for Supreme Court nominee Neil Gorsuch.
“Attack on federal judge from POTUS is beneath the dignity of that office. That attitude can lead America to calamity,” Washington Gov. Jay Inslee tweeted Saturday.
Attack on federal judge from POTUS is beneath the dignity of that office. That attitude can lead America to calamity.
— Governor Jay Inslee (@GovInslee) February 4, 2017
“The President’s attack on Judge James Robart, a Bush appointee who passed with 99 votes, shows a disdain for an independent judiciary that doesn’t always bend to his wishes and a continued lack of respect for the Constitution, making it more important that the Supreme Court serve as an independent check on the administration,” Senate Minority Leader Chuck Schumer said in a statement.
“With each action testing the Constitution, and each personal attack on a judge, President Trump raises the bar even higher for Judge Gorsuch’s nomination to serve on the Supreme Court. His ability to be an independent check will be front and center throughout the confirmation process.”
Vermont. Sen. Patrick Leahy, the ranking member of the Judiciary Committee, said President Trump’s “hostility toward the rule of law is not just embarrassing, it is dangerous.”
“We need a nominee for the Supreme Court willing to demonstrate he or she will not cower to an overreaching executive. This makes it even more important that Judge Gorsuch, and every other judge this president may nominate, demonstrates the ability to be an independent check and balance on an administration that shamefully and harmfully seems to reject the very concept.”
Robart’s order on Friday was a significant setback to President Trump’s ban and set up the nation for a second straight weekend of confusion about the policy’s legality.
The White House said Friday the Department of Justice will challenge the decision. In a statement, White House press secretary Sean Spicer initially called Robart’s order “outrageous” before quickly issuing another statement that dropped that word.
Robart has presided in the US District Court for the Western District of Washington state since 2004. He assumed senior status in 2016.

documents = [news1, news2]tfidf = TfidfVectorizer().fit_transform(documents)
pairwise_sim = tfidf * tfidf.Tprint pairwise_sim.A


上面是2个新闻 都是Trump的新闻 第一个是cnn的 第二个是其他网站的

我们发现相似度相当高 0.96




week07 13.3 NewsPipeline之 三News Deduper之 tf_idf 查重相关推荐

  1. 三本毕业论文查重吗?

    现在大部分高校都需要进行论文查重,不管是一本.二本还是三本,我们需要认真对待论文,不要抄袭和复制,今天给大家讲解一下三本毕业论文查重吗? 三本毕业论文查重吗? 如今有许多人对三本大学有些偏见,认为三本 ...

  2. 【程序人生】外包公司派遣到网易,上班地点网易大厦,转正后工资8k-10k,13薪,包三餐,值得去吗?

    外包公司派遣到网易,上班地点网易大厦,转正后工资8k-10k,13薪,包三餐,值得去吗? 题目很长,但映入眼帘的,只有两个字--不是"网易",是"外包"了. 很 ...

  3. 求次方的c语言程序,C语言编程求13的13次方的最后三位数

    求13的13次方的最后三位数 *问题分析与算法设计 解本题最直接的方法是:将13累乘13次方截取最后三位即可. 但是由于计算机所能表示的整数范围有限,用这种"正确"的算法不可能得到 ...

  4. 【2023秋招】10月13日荣耀校招三道题

    2023大厂真题提交网址(含题解): www.CodeFun2000.com( 最近我们一直在将收集到的机试真题制作数据并搬运到自己的OJ上,供大家免费练 ...

  5. adb 重命名_Linux操作系统:三种最基本的文件重命名方法

    在我们的工作生活中,不管是程序员还是非程序员,我们都会遇到过一个需求,那就是对一堆的文件进行重命名.在Windows下有很多优秀的软件可以帮助我们完成这个需求,而在Linux环境下,我们可以简单敲一些 ...

  6. excel三种查重方法

    业务目标: 根据B.C.D列的数据是否完全一样,来判断是否是重复数据 方法A:删除重复项 选中B.C.D列,点击数据菜单栏下的删除重复项,重复项被删除,简单粗暴 优点:简单 缺点:难以确定被删除的是哪 ...

  7. 三、CRUD(增删改查)

    三.CRUD(增删改查) 3.1.namespace ​ namespace中的包名需要和mapper接口的包名一致! 1. id:就是对应的namespace中的方法名:2. resultType: ...

  8. c# getresponsestream返回byte[]_C# 基础知识系列-13 常见类库(三)

    0. 前言 在<C# 基础知识系列- 13 常见类库(二)>中,我们介绍了一下DateTime和TimeSpan这两个结构体的内容,也就是C#中日期时间的简单操作.本篇将介绍Guid和Nu ...

  9. 外包公司派遣到网易,上班地点网易大厦,转正后工资8k-10k,13薪,包三餐,值得去吗?...

    作者 | 沉默王二 来源 | 沉默王二(ID:cmower) 外包=食物链最底层? 题目是我在知乎上看到的,相信也有不少读者朋友看到了.题目很长,但映入眼帘的,只有两个字--不是"网易&qu ...


  1. Javascript-入门
  2. 一根Express Route同时支持ARM和ASM的VNET
  3. max日期最大值为0_excel函数技巧:MAX在数字查找中的应用妙招
  4. php实现git服务器,如何搭建和配置Git服务器
  5. 数据库范式5nf_第四范式(4NF)| 数据库管理系统
  6. linux环境apache,php的安装目录
  7. android edittext drawable,android – 在事件上显示复合drawable到EditText
  8. MongoDB小结26 - 地理空间索引
  9. 【bozj2287】【[POJ Challenge]消失之物】维护多值递推
  10. xgboost三种特征重要性选择方法
  11. 博科光纤交换机操作手册之三
  12. 服务器虚拟化税收编码,服务器主机税收分类
  13. google chrome浏览器崩溃修复
  14. 微信更新,干掉手机输入法
  15. 通证经济大局观(三十):贵族的没落
  16. 通俗易懂的讲讲TCP的三次握手
  17. 如何查看ubuntu的内核版本和发行版本号?
  18. 对连续值/离散值进行预处理的两种方式(标准化/one-hot化)、反标准化/逆标准化、字符串预处理
  19. 利用c++编写一个养成类小游戏
  20. 爬楼梯算法 一个小孩练习爬台阶,一共10级台阶,小孩可以一次向上选择爬1-3级


  1. 货币金融学-期末复习
  2. java 日历工具_java之日历处理工具类Calendar类
  3. 目标检测与位姿估计(五):A Survey: Visual Place Recognition
  4. 大带宽服务器对于网站速度的影响有多大?
  5. android sqlite delete 返回值,SQLite 使用详解
  6. 回溯算法:从电影《蝴蝶效应》中学习回溯算法的核心思想
  7. 补充方法的声明及使用:
  8. 记一次作为主讲人的培训经历
  9. 图像处理——如何处理不同格式和深度的图像确保清晰度满足要求
  10. RabbitMQ的Qos