从头学习计算机网络

它是如何开始的故事 (The story of how it began)

It was midnight on a Friday, my friends were out having a good time, and yet I was nailed to my computer screen typing away.

星期五是午夜,我的朋友们出去玩得很开心,但我被钉在电脑屏幕上打字了。

Oddly, I didn’t feel left out.

奇怪的是,我没有被排除在外。

I was working on something that I thought was genuinely interesting and awesome.

我正在做一些我认为真的很有趣而且很棒的事情。

I was right out of college, and I needed a job. When I left for Seattle, I had a backpack full of college textbooks and some clothes. I could fit everything I owned in the trunk of my 2002 Honda Civic.

我当时刚大学毕业,需要一份工作。 当我去西雅图时,我有一个装满大学课本和一些衣服的背包。 我可以装满2002年本田思域后备箱中的所有物品。

I didn’t like to socialize much back then, so I decided to tackle this job-finding problem the best way I knew how. I tried to build an app to do it for me, and this article is about how I did it. ?

那时我不喜欢社交,所以我决定以我所知道的最好方式解决这个找工作的问题。 我试图构建一个应用程序来为我做这件事,而本文则是关于我是如何做到的。 ?

Craigslist入门 (Getting started with Craigslist)

I was in my room, furiously building some software that would help me collect, and respond to, people who were looking for software engineers on Craigslist. Craigslist is essentially the marketplace of the Internet, where you can go and find things for sale, services, community posts, and so on.

我当时在我的房间里,疯狂地开发一些软件,这些软件可以帮助我收集和响应在Craigslist上寻找软件工程师的人们。 Craigslist本质上是Internet的市场,您可以在其中找到要出售的东西,服务,社区帖子等。

At that point in time, I had never built a fully fledged application. Most of the things I worked on in college were academic projects that involved building and parsing binary trees, computer graphics, and simple language processing models.

那时,我从未构建过完整的应用程序。 我在大学期间从事的大多数工作都是学术项目,涉及构建和解析二叉树,计算机图形学以及简单的语言处理模型。

I was quite the “newb.”

我真是个“新手”。

That said, I had always heard about this new “hot” programming language called Python. I didn’t know much Python, but I wanted to get my hands dirty and learn more about it.

就是说,我一直都听说过这种称为Python的新“热门”编程语言。 我对Python不太了解,但是我想弄清楚自己的手,并进一步了解它。

So I put two and two together, and decided to build a small application using this new programming language.

因此,我将两个和两个放在一起,并决定使用这种新的编程语言来构建一个小型应用程序。

建立(工作中的)原型的旅程 (The journey to build a (working) prototype)

I had a used BenQ laptop my brother had given me when I left for college that I used for development.

我上大学时曾用过哥哥给我的一台二手BenQ笔记本电脑,当时我用它来开发。

It wasn’t the best development environment by any measure. I was using Python 2.4 and an older version of Sublime text, yet the process of writing an application from scratch was truly an exhilarating experience.

无论如何,它都不是最佳的开发环境。 我使用的是Python 2.4和较旧版本的Sublime文本 ,但是从头开始编写应用程序的过程确实令人振奋。

I didn’t know what I needed to do yet. I was trying various things out to see what stuck, and my first approach was to find out how I could access Craigslist data easily.

我还不知道该怎么办。 我尝试了各种尝试以了解问题所在,而我的第一种方法是找出如何轻松访问Craigslist数据的方法。

I looked up Craigslist to find out if they had a publicly available REST API. To my dismay, they didn’t.

我查找了Craigslist,以了解他们是否具有公开可用的REST API。 令我沮丧的是,他们没有。

However, I found the next best thing.

但是,我找到了下一个最好的东西。

Craigslist had an RSS feed that was publicly available for personal use. An RSS feed is essentially a computer-readable summary of updates that a website sends out. In this case, the RSS feed would allow me to pick up new job listings whenever they were posted. This was perfect for my needs.

Craigslist的RSS供稿已公开供个人使用。 RSS feed本质上是网站发送的更新的计算机可读摘要 。 在这种情况下,RSS提要将允许我在发布新职位列表时选择它们。 这非常适合我的需求。

Next, I needed a way to read these RSS feeds. I didn’t want to go through the RSS feeds manually myself, because that would be a time-sink and that would be no different than browsing Craigslist.

接下来,我需要一种阅读这些RSS feed的方法。 我不想自己亲自浏览RSS提要,因为那会浪费时间,而且与浏览Craigslist没什么不同。

Around this time, I started to realize the power of Google. There’s a running joke that software engineers spend most of their time Googling for answers. I think there’s definitely some truth to that.

大约在这段时间里,我开始意识到Google的强大功能。 开个玩笑,软件工程师将大部分时间都用在Google搜索上。 我认为这肯定是有些道理。

After a little bit of Googling, I found this useful post on StackOverflow that described how to search through a Craiglist RSS feed. It was sort of a filtering functionality that Craigslist provided for free. All I had to do was pass in a specific query parameter with the keyword I was interested in.

经过一番谷歌搜索之后,我在StackOverflow上找到了这篇有用的文章,描述了如何搜索Craiglist RSS feed。 这是Craigslist免费提供的一种筛选功能。 我要做的就是用我感兴趣的关键字传递特定的查询参数。

I was focused on searching for software-related jobs in Seattle. With that, I typed up this specific URL to look for listings in Seattle that contained the keyword “software”.

我专注于在西雅图寻找与软件相关的工作。 这样,我输入了该特定URL,以查找包含关键字“软件”的西雅图清单。

https://seattle.craigslist.org/search/sss?format=rss&query=software

https://seattle.craigslist.org/search/sss?format=rss&query=software

And voilà! It worked beautifully.

和瞧! 它工作得很漂亮

我吃过最美丽的汤 (The most beautiful soup I’ve ever tasted)

I wasn’t convinced, however, that my approach would work.

但是,我没有确信我的方法会奏效。

First, the number of listings was limited. My data didn’t contain all the available job postings in Seattle. The returned results were merely a subset of the whole. I was looking to cast as wide a net as possible, so I needed to know all the available job listings.

首先, 列表数量是有限的 。 我的数据没有包含西雅图所有可用的职位发布。 返回的结果只是整体的一部分。 我一直在寻找尽可能广泛的网络,所以我需要知道所有可用的工作清单。

Second, I realized that the RSS feed didn’t include any contact information. That was a bummer. I could find the listings, but I couldn’t contact the posters unless I manually filtered through these listings.

其次,我意识到RSS提要不包含任何联系信息 。 真是可惜。 我可以找到列表,但是除非手动过滤这些列表,否则我无法联系海报。

I’m a person of many skills and interests, but doing repetitive manual work isn’t one of them. I could’ve hired someone to do it for me, but I was barely scraping by with 1-dollar ramen cup noodles. I couldn’t splurge on this side project.

我是一个有很多技能和兴趣的人,但是做重复的体力劳动不是其中之一。 我本来可以雇一个人为我做的,但我勉强抓着一美元的拉面杯面条。 我不能为此项目挥霍。

That was a dead-end. But it wasn’t the end.

那是死路一条。 但它是不是结束

连续迭代 (Continuous iteration)

From my first failed attempt, I learned that Craigslist had an RSS feed that I could filter on, and each posting had a link to the actual posting itself.

从我的第一次失败尝试中,我了解到Craigslist有一个RSS提要供我过滤,并且每个帖子都有指向实际帖子本身的链接。

Well, if I could access the actual posting, then maybe I could scrape the email address off of it?

从头学习计算机网络_我如何通过从头开始构建网络爬虫来自动进行求职相关推荐

  1. 从头学习计算机网络_如何从头开始构建三层神经网络

    从头学习计算机网络 by Daphne Cornelisse 达芙妮·康妮莉丝(Daphne Cornelisse) 如何从头开始构建三层神经网络 (How to build a three-laye ...

  2. scrapy 中爬取时被重定向_一篇文章教会你理解Scrapy网络爬虫框架的工作原理和数据采集过程...

    今天小编给大家详细的讲解一下Scrapy爬虫框架,希望对大家的学习有帮助. 1.Scrapy爬虫框架 Scrapy是一个使用Python编程语言编写的爬虫框架,任何人都可以根据自己的需求进行修改,并且 ...

  3. python网络爬虫_一篇文章教会你利用Python网络爬虫获取穷游攻略

    点击上方"IT共享之家",进行关注 回复"资料"可获赠Python学习福利 [一.项目背景] 穷游网提供原创实用的出境游旅行指南.攻略,旅行社区和问答交流平台, ...

  4. java怎么写网络爬虫_教你如何编写简单的网络爬虫

    一.网络爬虫的基本知识 网络爬虫通过遍历互联网络,把网络中的相关网页全部抓取过来,这体现了爬的概念.爬虫如何遍历网络呢,互联网可以看做是一张大图,每个页面看做其中的一个节点,页面的连接看做是有向边.图 ...

  5. python3爬虫有道翻译_一篇文章教会你利用Python网络爬虫获取有道翻译手机版的翻译接口...

    [一.项目背景] 有道翻译作为国内最大的翻译软件之一,用户量巨大.在学习时遇到不会的英语词汇,会第一时间找翻译,有道翻译就是首选.今天教大家如何去获取有道翻译手机版的翻译接口. ![image](ht ...

  6. python网络爬虫网易云音乐_一篇文章带你用Python网络爬虫实现网易云音乐歌词抓取...

    标签下,如下图所示: 接下来我们利用美丽的汤来获取目标信息,直接上代码,如下图: 此处要注意获取ID的时候需要对link进行切片处理,得到的数字便是歌曲的ID:另外,歌曲名是通过get_text()方 ...

  7. 利用python从网络上爬取图片_一篇文章教会你利用Python网络爬虫抓取王者荣耀图片...

    点击上方"IT共享之家",进行关注 回复"资料"可获赠Python学习福利 [一.项目背景] 王者荣耀作为当下最火的游戏之一,里面的人物信息更是惟妙惟肖,但受到 ...

  8. 如何用python搜索要用的素材_一篇文章教会你利用Python网络爬虫获取素材图片

    [一.项目背景] 在素材网想找到合适图片需要一页一页往下翻,现在学会python就可以用程序把所有图片保存下来,慢慢挑选合适的图片. [二.项目目标] 1.根据给定的网址获取网页源代码. 2.利用正则 ...

  9. python下载电影天堂视频_一篇文章教会你利用Python网络爬虫获取电影天堂视频下载链接...

    点击上方"IT共享之家",进行关注 回复"资料"可获赠Python学习福利 [一.项目背景] 相信大家都有一种头疼的体验,要下载电影特别费劲,对吧?要一部一部的 ...

最新文章

  1. linux下使用inotify实时监控文件变更,做完整性检查
  2. java学习笔记—国际化(41)
  3. Linux 下从命令行打开pdf文件和html文件的命令
  4. 五十、微信小程序云开发中的云数据库
  5. 比较和逻辑运算符 011
  6. 1114. 按序打印
  7. oracle 10 expdp impdp 导入、导出
  8. 获取淘宝开发平台的sessionKey
  9. flash动画设计期末作业_「2019年下学期」第二十五二十六节:期末作品三-吉祥物设计...
  10. Spring+SpringMVC+Mybatis 多数据源整合
  11. javamail发送html正文文件_Python实现-生成测试报告amp;自动邮件发送
  12. TCP: SYN ACK FIN RST PSH URG 详解【转】
  13. servlet-cookie实现向客户端写cookie信息
  14. 揭秘2018图灵奖评选:Jeff Dean李开复和Lecun写信推荐Hinton
  15. 硬盘属于计算机主机吗,电脑主机换硬盘后还是不是原来的主机?
  16. [渣译文] SignalR 2.0 系列: SignalR简介
  17. Microsoft Visual Studio 2012 旗舰版 镜像 ISO 官方下载地址 旗舰版 序列号 SN VS2012_ULT_chs.iso
  18. 商城购物系统设计与实现(Java毕业设计-SSM项目)
  19. 版本管理工具 SVN和git
  20. Python基础入门篇【26】--python基础入门练习卷B

热门文章

  1. 企业级项目实战讲解!java类内部定义枚举
  2. 【转载】儒林外史人物——荀玫
  3. (第2篇)一篇文章教你轻松安装hadoop
  4. Problem B: 字符类的封装
  5. CentOS 7 搭建 LAMP
  6. linux install StarDict
  7. 使用 Arduino 和 LM35 温度传感器监测温度
  8. Visual Studio无法查找或打开 PDB 文件解决办法
  9. What's the difference between markForCheck() and detectChanges()
  10. 文件上传速度查询方法