见《宋书·范晔传》:“言之皆有实证,非为空谈。”子平有较高的或然率,但如果没有人物经历来佐证,就变成三教九流,成为“玄学”实在可惜。老外搞个mbti性格测试就巴巴说是科学,有智慧的老前辈总结的经验,因为不懂而无法传承,散落在明间成为偷偷么么被人看不起,实在是看不过去。
有时候感觉西方人很笨,调研70多个家庭跟踪他们的一生,然后给出结论发表论文。然后中国的学生就认为人家严谨有科学研究精神,何曾想过这70个样本想对人类这么大基数根本不值得一提。另外一个视角,研究问题真的需要采用这么笨的方法吗?梁湘润大师等都说看一个人生的70%就不错了,是的人的极限估计也差不多是这些,另外20~30%还需要靠国运、环境和个人修养,人之一生岂是几百页的子平能讲完的呢,那岂不是白活。
我是会一些计算机,另外对梁老的一些观点特别认同,大部分人都是普通人,每天为了家计小常奔波,为了妻财子禄寿而焦虑,要想做人上人,你真的能承受他们所经历的吗?
这篇博客主要是从百度百科上采集人物经历,后面再结合子平理论进行实证。

1 采集人物信息

# -*- coding: utf-8 -*-
# @time    : 2022/1/22 11:03
# @author  : dzm
# @dsec    : 百度娱乐人物
from sqlalchemy.engine import create_engine
from personspider.settings import MYSQL_CONFIG
import scrapy
from pyquery import PyQuery as pq
from personspider.utils import str_util,person_util
from personspider.items.baidu_person import BaiduPersonItem,BaiduPersonExperienceItem\,BaiduUrlItem,BaiduPersonRelationItem
from personspider.service.baidu_service import BaiduUrlService
import reclass yulespider(scrapy.Spider):name = 'baidu_yule'def start_requests(self):# 从数据库中读取链接engine = create_engine('mysql+pymysql://{}:{}@{}:3306/{}'.format(MYSQL_CONFIG['user'], MYSQL_CONFIG['password'],MYSQL_CONFIG['host'], MYSQL_CONFIG['db']),connect_args={'charset': 'utf8'}, pool_size=MYSQL_CONFIG['pool_size'])baiduUrlService = BaiduUrlService(engine)urls = baiduUrlService.get_urls()if urls:for url in urls:yield scrapy.Request(url=url.url, callback=self.parse,dont_filter=True)def parse(self, response):cur_url = response.request.urlcur_url = person_util.get_url(cur_url)soup = pq(response.text)# 人物基本信息basicInfo_blocks = soup('.basic-info .basicInfo-block')item = {}for basicInfo_block in basicInfo_blocks:size = pq(basicInfo_block)('dt').size()for i in range(size):name = pq(basicInfo_block)('dt:eq({})'.format(i)).text()name = str_util.clear(name)value = pq(basicInfo_block)('dd:eq({})'.format(i)).text()value = str_util.clear(value)item[name] = valueperson_item = self.pack_person(item, r'百度百科', cur_url)cur_name = person_item['cn_name']if 'birthday' in person_item.keys():print('中文名:{}, 出生日期:{},链接:{}'.format(person_item['cn_name'],person_item['birthday'],person_item['url']))else:print('中文名:{}, 链接:{}'.format(person_item['cn_name'],person_item['url']))# 出生日期不能为空,且需要有完整的年月日从中获取人物经历valid_person = person_item and 'birthday' in person_item.keys() \and person_item['birthday'] \and re.match('\d{4}[年\-]\d{1,2}[月\-]\d{1,2}日?',person_item['birthday'])if valid_person:person_id = person_item['id']yield person_item# 人物经历paras = soup('.para')for para in paras:content = pq(para).text()if re.match('^\d{4}年',content) and not re.match('^\d{4}年\d{1,2}月\d{1,2}日',content):experiences = person_util.get_experience(content)if experiences:for experience in experiences:if experience and experience['experience']:exp_item = BaiduPersonExperienceItem()exp_item['id'] = str_util.gen_md5(experience)exp_item['person_id'] = person_idexp_item['year'] = experience['year']if 'month' in experience.keys():exp_item['month'] = experience['month']exp_item['experience'] = experience['experience']yield exp_item# 正在采集的链接curl_url_item = BaiduUrlItem()curl_url_item['id'] = str_util.gen_md5(cur_url)curl_url_item['url'] = cur_urlcurl_url_item['status'] = '1'curl_url_item['name'] = nameyield curl_url_item# 人物关系relations = soup('.relations li')if relations:for i in range(len(relations)):relation = relations[i]url = 'https://baike.baidu.com' + pq(relation)('a').attr('href')url = person_util.get_url(url)# 人物关系取值各有不同name = pq(relation)('.title').text()if name:tag = pq(relation)('.name').text()else:name = pq(relation)('.name').attr('title')tag = pq(relation)('.name').text()tag = tag[:len(tag)-len(name)]if valid_person:# 关系relation_item = BaiduPersonRelationItem()relation_item['one'] = person_idrelation_item['one_name'] = cur_namerelation_item['one_url'] = cur_urlrelation_item['two'] = str_util.gen_md5(url)relation_item['two_name'] = namerelation_item['two_url'] = urlrelation_item['relation'] = tagyield relation_item# 链接url_item = BaiduUrlItem()url_item['id'] = str_util.gen_md5(url)url_item['url'] = urlurl_item['status'] = '0'url_item['name'] = nameyield url_itemdef pack_person(self,content,source, url):if content:item = BaiduPersonItem()item['source'] = sourceitem['url'] = urlitem['id'] = str_util.gen_md5(url)for key in content.keys():if key == '中文名':item['cn_name'] = content[key]elif key == '外文名':item['en_name'] = content[key]elif key == '性别':item['sex'] = content[key]elif key == '国籍':item['nation'] = content[key]elif key == '出生日期':birthday = re.search(r'\d{4}[年\-]\d{1,2}[月\-]\d{1,2}日?',content[key], re.S)if birthday:item['birthday'] = birthday.group(0)elif key == '出生地':item['birthplace'] = content[key]elif key == '外文名':item['deathday'] = content[key]elif key == '身高':item['height'] = person_util.get_height(content[key])elif key == '毕业院校':item['school'] = content[key]elif key == '职业':item['occupation'] = content[key]elif key == '主要成就':item['achievements'] = content[key]elif key == '代表作品':item['representation'] = content[key]return itemreturn Noneif __name__ == '__main__':pass

2 提取内容

import re
from personspider.utils import str_utildef get_url(url):index = url.index('?') if '?' in url else Noneif index:return url[:index]else:return urldef get_experience(text):'''获取经历'''pattern = r'(\d{4}年)'results = re.split(pattern,text,re.S)size = len(results)contents = []i = 1while i < size-1:# 年份year = results[i]# 经历experience = results[i+1].strip(',') #去掉首字母,experience = str_util.clear(experience)# 月result = re.search(r'\d{1,2}月', experience)if result:months = re.split(r'(\d{1,2}月)',experience,re.S)j = 1while j<len(months)-1:month = months[j]experience = months[j+1].strip(',')experience = str_util.clear(experience)contents.append({'year':year,'month':month,'experience':experience})j = j+2else:contents.append({'year':year,'experience':experience})i = i+2return contentsdef get_height(height):height = re.search(r'\d{1,4}(cm)?',height, re.S)if height:return height.group(0)else:return height
import hashlib
import redef gen_md5(item):'''将字符串转md5'''m = hashlib.md5()md5 = str(item).encode('utf-8')m.update(md5)md5 = m.hexdigest()return md5def remove_xa0(value):'''\xa0 是不间断空白符 &nbsp;'''return value.replace(u'\xa0',u'')def remove_quote(value):p = re.compile('\[[\d\-\]]+')return p.sub("",value)def remove_blank(value):return value.replace(' ','')def clear(value):value = remove_xa0(value)value = remove_quote(value)value = remove_blank(value)return value

3 数据管道

from sqlalchemy.engine import create_engine
from personspider.items.baidu_person import BaiduPersonItem,BaiduPersonExperienceItem\,BaiduUrlItem,BaiduPersonRelationItem
from personspider.service.baidu_service import BaiduPersonService,BaiduPersonExperienceService\,BaiduPerson,BaiduPersonExperience,BaiduPersonRelation,BaiduUrl\,BaiduPersonRelationService,BaiduUrlServiceclass MysqlPipeline(object):def __init__(self, engine):self.baiduPersonService = BaiduPersonService(engine)self.baiduPersonExperienceService = BaiduPersonExperienceService(engine)self.baiduUrlService = BaiduUrlService(engine)self.baiduPersonRelationService = BaiduPersonRelationService(engine)def process_item(self, item, spider):if type(item) == BaiduPersonItem:record = BaiduPerson(**item)self.baiduPersonService.insert(record)elif type(item) == BaiduPersonExperienceItem:record = BaiduPersonExperience(**item)self.baiduPersonExperienceService.insert(record)elif type(item) == BaiduUrlItem:record = BaiduUrl(**item)self.baiduUrlService.insert(record)elif type(item) == BaiduPersonRelationItem:record = BaiduPersonRelation(**item)self.baiduPersonRelationService.insert(record)@classmethoddef from_settings(cls,settings):mysql_config = settings.get('MYSQL_CONFIG')engine = create_engine('mysql+pymysql://{}:{}@{}:3306/{}'.format(mysql_config['user'], mysql_config['password'],mysql_config['host'], mysql_config['db']),connect_args={'charset': 'utf8'}, pool_size=mysql_config['pool_size'])return cls(engine)

4 写数据库

class BaiduPersonService(object):def __init__(self, engine):self.engine = engineSession = sessionmaker(engine)self.session = Session()self.emailService = EmailService()def exist(self, id):query = self.session.query(BaiduPerson).filter(BaiduPerson.id==id)return query.count()>0def insert(self, record):if not self.exist(record.id):try:record.create_time = datetime.datetime.now()self.session.add(record)self.session.commit()except Exception as e:title = r'{}写入数据库失败'.format(record.cn_name)content = r'ERROR {}'.format(str(e))self.emailService.sendEmail(title,content)

5 异常邮件发送
如果在解析过程中出现异常,总不需要时刻盯着吧,写个发邮件告知我,岂不是很安逸

import smtplib
from email.header import Header
from email.mime.text import MIMEText
from personspider.settings import EMAIL_CONFIGclass EmailService(object):def sendEmail(self,title, content):message = MIMEText(content, 'plain', 'utf-8')  # 内容, 格式, 编码message['From'] = "{}".format(EMAIL_CONFIG['sender'])message['To'] = ",".join(EMAIL_CONFIG['receivers'])message['Subject'] = titletry:smtpObj = smtplib.SMTP_SSL(EMAIL_CONFIG['smtp']['host'], 465)  # 启用SSL发信, 端口一般是465smtpObj.login(EMAIL_CONFIG['smtp']['user'], EMAIL_CONFIG['smtp']['password'])  # 登录验证smtpObj.sendmail(EMAIL_CONFIG['sender'], EMAIL_CONFIG['receivers'], message.as_string())  # 发送print("mail has been send successfully.")except smtplib.SMTPException as e:print(e)def send_email2(SMTP_host, from_account, from_passwd, to_account, subject, content):email_client = smtplib.SMTP(SMTP_host)email_client.login(from_account, from_passwd)# create msgmsg = MIMEText(content, 'plain', 'utf-8')msg['Subject'] = Header(subject, 'utf-8')  # subjectmsg['From'] = from_accountmsg['To'] = to_accountemail_client.sendmail(from_account, to_account, msg.as_string())email_client.quit()if __name__ == '__main__':emailService = EmailService()title = r'数据库异常'content = r'很多多多问题'emailService.sendEmail(title, content)

采集人物经历来佐证子平术相关推荐

  1. 启明创投邝子平谈禾赛上市:做硬科技领域长线投资人

    雷递网 雷建平 2月9日 禾赛科技(HESAI)今日成功在美国纳斯达克上市,募资1.9亿美元,成为中国激光雷达赴美上市第一股. 禾赛科技是中概股底稿审查正式落地后,首家向美国资本市场发起上市的大型中概 ...

  2. AI:2020年6月23日北京智源大会顶级大佬邝子平、李开复 、陆奇、张亚勤、曹勖文进行云上圆桌论坛《探讨AI与创业》

    AI:2020年6月23日北京智源大会顶级大佬邝子平.李开复 .陆奇.张亚勤.曹勖文进行云上圆桌论坛<探讨AI与创业> 目录 2020年北京智源大会人顶级大佬邝子平.李开复 .陆奇.张亚勤 ...

  3. 子平真诠释疑笔记(一)

    一.背景: 本人对八字命理比较喜欢,也会找一些八字来看,但是有一些疑惑,最近在学习<子平真诠>,研读过后解答了一些疑惑,特此记录下来,给自己和大家作为参考. 二.原则: 读原文,找八字来看 ...

  4. 子平真诠释疑笔记(四)

    说明:在子平真诠里,找了各个格局的八字,所有八字都包含伤官和印,各个格局解法和取运不同,并且只有伤官格叫伤官佩印,其他格局都是以"用"来解释. 问:在伤官格之外有伤官佩印么? 答: ...

  5. 邝子平:vc兼做pe?

    邝子平:谢谢各位的简单的介绍,我们现在就进入第一个主题,在讨论中国这个vc和pe怎么去界定,我们到底在做一些什么东西,这个话题以前,我先问一个简单的话题,到底在座的各位,你会把自己看成是一个vc,一个 ...

  6. 我的子平真诠学习笔记

    子平真诠 一.心得: https://blog.csdn.net/humors221/article/details/113913548 二.释疑笔记: (一)​ https://blog.csdn. ...

  7. [古藉分享]命理经典《渊海子平》(古本五卷本)

    简介 <渊海子平>是命理学著作,为宋代徐升根据当时著名的命理学家徐子平的批命方法记录而成的.这部书中第一次比较完整和系统地论述了"四柱八字算命法",可以说是中国古代命书 ...

  8. 八字推断系统:(二) 初试翻译子平真诠部分章节到模板

    <子平真诠>是八字中一个初级出门的书籍,,现以此为例,剖析一下将子平真诠中说涉及到的部分章节规则应用到普通的八字分析之中. 一.论十干十二支 天地之间,一气而己.惟有动静,遂分阴阳.有老少 ...

  9. 子平真诠释疑笔记(七)

    问:怎么看力量 答:力量包括身的强弱,根的重轻,五行的多寡,透干会支,得月令,十二长生,生克的力量,生克的阴阳等. 案例一: 子平真诠一.论十干十二支 有是五行,何以又有十干十二支乎?盖有阴阳,因生五 ...

最新文章

  1. Intel Realsense D435 摄像头插入电脑无法监测(识别)的可能原因及解决方案 USB SCP overflow
  2. 【bzoj1486】【[HNOI2009]梦幻布丁】启发式链表合并(详解)
  3. SAP UI5 aggregation field type
  4. 使用 dotnet watch 开发 ASP.NET Core 应用程序
  5. 为什么代码正确却没有爬虫的信息_为什么敷面膜没有效果?原来这才是敷面膜的正确步骤...
  6. 阿里面试官整理的JVM面试要点,99%的你都不知道!
  7. Andoid自动判断输入是电话,网址或者Email的方法----Linkify的应用!
  8. Linux的目录结构与磁盘分区
  9. 最大熵模型介绍及实现
  10. Java中Link,set 和Map的区别,ArrayList,HashSet和HashMap的区别。
  11. RMQ 区间最值问题
  12. three.js使用外部模型创建动画,使用GLTF格式文件动画创建动画(vue中使用three.js71)
  13. php判断bmi值,孕期你的身体质量BMI值达标了没?根据公式来测测吧
  14. 2018 “百度之星”程序设计大赛 - 初赛(A)
  15. kingcms php 下载,KingCMS 企业版_KingCMS官方网站
  16. 第三章 卡尔曼滤波 笔记
  17. Eboot 软件框架
  18. Ubuntu 16.04 LTS 安装libvips出现”Package vips was not found in the pkg-config search path”
  19. 计算机专业买什么笔记本牌子,计算机专业买什么笔记本
  20. win7 计算机 开不了,win7系统开不了机怎么办?云骑士教你解决开不了机的问题...

热门文章

  1. SBG Ellipse2最强替代型号推荐?AHRS INS/GNSS
  2. 软件开发中常见名词解释
  3. 【艾琪出品】-【计算机】《办公自动化基础》-韩伟颖(2002)南开离线作业学习资料
  4. H3C室外无线AP(WA4320X)胖瘦切换设置方法
  5. 知识付费小程序源码可开流量主
  6. C语言 猴子吃桃的问题 猴子第一天摘下若干个桃子,当即吃了一半,又多吃了一个。 第二天早上将剩下的桃子吃一半,又多吃一个。以后每天早上都吃了前一天剩下的一半零一个。 到第N天早上想再吃时只剩下一个桃子
  7. S2SH水费管理系统-JAVA【毕业设计、快速开发、源码、开题报告】
  8. A*算法求解迷宫问题(算法讲解与证明、python实现与可视化)
  9. 多功能纺织品易燃性测试仪市场现状及未来发展趋势
  10. 华为和中兴的一点对比