GitHub地址:TEST/HttpWebRequest at master · yangwohenmai/TEST · GitHub

爬取港交所数据最大的问题是如何获取港交所页面的Token,有了Token之后就可以从港交所接口请求数据了。

下面这段python首先解析港交所页面,从页面中获取港交所Token值,而后请求返回的数据,数据格式类似于Json,但是需要稍微处理一下,就可以用Json解析了。

完整的json数据如下:

{{"data": {"responsecode": "000","responsemsg": "","quote": {"hi": "74.350","rs_stock_flag": false,"fiscal_year_end": "31 Dec 2018","hist_closedate": "30 May 2019","replication_method": null,"amt_os": "3,856,240,500","primaryexch": "HKEX","ric": "0001.HK","product_subtype": null,"db_updatetime": "31 May 2019 09:36","mkt_cap_u": "B","am_u": "M","ew_sub_right": "","secondary_listing": false,"ew_amt_os_cur": null,"ccy": "HKD","management_fee": "","ew_underlying_code": null,"trdstatus": "N","nav": "","original_offer_price": "","issue": "","asset_class": null,"eps": 10.1109,"inline_upper_strike_price": "","sedol": "BW9P816","am": "697.27","iv": "","ew_strike": "","as": "74.100","geographic_focus": null,"incorpin": "Cayman Islands","etp_baseCur": null,"ew_amt_os": "","bd": "74.050","registrar": "Computershare Hong Kong Investor Services Ltd.","depositary": null,"exotic_type": null,"callput_indicator": null,"primary_market": null,"underlying_index": null,"lot": "500","lo52": "72.800","shares_issued_date": "30 Apr 2019","premium": "","strike_price_ccy": null,"yield": "","vo_u": "M","base_currency": null,"coupon": "","expiry_date": "","chairman": "Li Tzar Kuoi Victor","underlying_ric": "0001.HK","hi52": "92.500","issuer_name": "CK Hutchison Holdings Ltd.","h_share_flag": false,"ew_sub_per_from": "","div_yield": "4.28","interest_payment_date": "-","updatetime": "31 May 2019 16:08","aum_date": "","lo": "73.050","mkt_cap": "285.55","f_aum_hkd": null,"ew_sub_per_to": "","ls": "74.050","nav_date": "","csic_classification": null,"floating_flag": false,"issued_shares_note": null,"eff_gear": "","board_lot_nominal": "","hsic_ind_classification": "Conglomerates - Conglomerates","ew_desc": null,"inception_date": "","nc": "+1.050","aum": "","vo": "9.41","secondary_listing_flag": false,"listing_date": "1 Nov 1972","as_at_label": "as at","ew_amt_os_dat": "","nm": "CK Hutchison Holdings Ltd.","nm_s": "CKH HOLDINGS","sym": "1","inline_lower_strike_price": "","listing_category": "Primary Listing","ew_strike_cur": null,"exotic_warrant_indicator": null,"investment_focus": null,"call_price": "","tck": "0.050","strike_price": "","summary": "CK Hutchison Holdings Limited is an investment holding company mainly engaged in the retail business. Along with subsidiaries, the Company operates its business through five segments: the Retail segment, the Telecommunications segment, the Infrastructure segment, the Ports and Related Services segment, and the Husky Energy segment. The Retail segment is involved in the manufacturing and sale of health and beauty products, as well as consumer electronics and electrical appliances. It also operates supermarkets, as well as manufactures and distributes bottled water and beverage products. The Telecommunications segment provides mobile telecommunications and data services by 3 Group Europe, Hutchison Telecommunications Hong Kong Holdings, and Hutchison Asia Telecommunications. The Infrastructure segment is involved in the energy infrastructure, transportation infrastructure, water infrastructure, waste management, waste-to-energy and infrastructure related businesses.","op": "73.050","aum_u": "","nav_ccy": null,"os": "","wnt_gear": "","transfer_of_listing_date": "","hsic_sub_sector_classification": "Conglomerates","amt_ccy": null,"domicile_country": null,"entitlement_ratio": "","product_type": "EQTY","office_address": "48th Floor<br/>Cheung Kong Center<br/>2 Queen's Road Central<br/>Hong Kong","pc": "+1.44","days_to_expiry": null,"underlying_code": null,"pe": "7.32","eps_ccy": "HKD","hdr": false,"launch_date": "","hc": "73.000","isin": "KYG217651051","moneyness": ""}},"qid": "NULL"
}}

程序里我随便挑了几个字段输出出来

using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Text;namespace HttpWebRequestTest
{class Program{static void Main(string[] args){var hp = new HttpRequestClient();//访问网站string reslut = hp.httpGet("https://www.hkex.com.hk/?sc_lang=EN", HttpRequestClient.defaultHeaders);//定位token字符串头int index_head = reslut.IndexOf("evLtsLs");string InitToken = reslut.Substring(index_head, 100);//定位token字符串尾int index_last = InitToken.IndexOf('"');//截取tokenstring Token = reslut.Substring(index_head, index_last);//拼接链接字符串string link = string.Format("https://www1.hkex.com.hk/hkexwidget/data/getequityquote?sym=1&token={0}&lang=eng&qid=NULL&callback=0", Token);//从港交所接口获取数据string data = hp.httpGet(link, HttpRequestClient.defaultHeaders);//解析Json数据JObject JsonData = JsonConvert.DeserializeObject<JObject>(data.Substring(2,data.Length-3));Console.WriteLine("hi:" + JsonData["data"]["quote"]["hi"]);Console.WriteLine("fiscal_year_end:" + JsonData["data"]["quote"]["fiscal_year_end"]);Console.WriteLine("amt_os:" + JsonData["data"]["quote"]["amt_os"]);Console.WriteLine("primaryexch:" + JsonData["data"]["quote"]["primaryexch"]);Console.WriteLine("db_updatetime:" + JsonData["data"]["quote"]["db_updatetime"]);Console.WriteLine("ric:" + JsonData["data"]["quote"]["ric"]);Console.WriteLine("eps:" + JsonData["data"]["quote"]["eps"]);Console.ReadLine();}}//zetee//不能Host、Connection、User-Agent、Referer、Range、Content-Type、Content-Length、Expect、Proxy-Connection、If-Modified-Since//等header. 这些header都是通过属性来设置的 。public class HttpRequestClient{static HashSet<String> UNCHANGEHEADS = new HashSet<string>();static HttpRequestClient(){UNCHANGEHEADS.Add("Host");UNCHANGEHEADS.Add("Connection");UNCHANGEHEADS.Add("User-Agent");UNCHANGEHEADS.Add("Referer");UNCHANGEHEADS.Add("Range");UNCHANGEHEADS.Add("Content-Type");UNCHANGEHEADS.Add("Content-Length");UNCHANGEHEADS.Add("Expect");UNCHANGEHEADS.Add("Proxy-Connection");UNCHANGEHEADS.Add("If-Modified-Since");UNCHANGEHEADS.Add("Keep-alive");UNCHANGEHEADS.Add("Accept");ServicePointManager.DefaultConnectionLimit = 1000;//最大连接数}/// <summary>/// 默认的头/// </summary>public static string defaultHeaders = @"Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8Accept-Encoding:gzip, deflate, sdchAccept-Language:zh-CN,zh;q=0.8Cache-Control:no-cacheConnection:keep-alivePragma:no-cacheUpgrade-Insecure-Requests:1User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36";/// <summary>/// 是否跟踪cookies/// </summary>bool isTrackCookies = false;/// <summary>/// cookies 字典/// </summary>Dictionary<String, Cookie> cookieDic = new Dictionary<string, Cookie>();/// <summary>/// 平均相应时间/// </summary>long avgResponseMilliseconds = -1;/// <summary>/// 平均相应时间/// </summary>public long AvgResponseMilliseconds{get{return avgResponseMilliseconds;}set{if (avgResponseMilliseconds != -1){avgResponseMilliseconds = value + avgResponseMilliseconds / 2;}else{avgResponseMilliseconds = value;}}}public HttpRequestClient(bool isTrackCookies = false){this.isTrackCookies = isTrackCookies;}/// <summary>/// http请求/// </summary>/// <param name="url"></param>/// <param name="method">POST,GET</param>/// <param name="headers">http的头部,直接拷贝谷歌请求的头部即可</param>/// <param name="content">content,每个key,value 都要UrlEncode才行</param>/// <param name="contentEncode">content的编码</param>/// <param name="proxyUrl">代理url</param>/// <returns></returns>public string http(string url, string method, string headers, string content, Encoding contentEncode, string proxyUrl){HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);request.Method = method;if (method.Equals("GET", StringComparison.InvariantCultureIgnoreCase)){request.MaximumAutomaticRedirections = 100;request.AllowAutoRedirect = false;}fillHeaders(request, headers);fillProxy(request, proxyUrl);#region 添加Post 参数  if (contentEncode == null){contentEncode = Encoding.UTF8;}if (!string.IsNullOrWhiteSpace(content)){byte[] data = contentEncode.GetBytes(content);request.ContentLength = data.Length;using (Stream reqStream = request.GetRequestStream()){reqStream.Write(data, 0, data.Length);reqStream.Close();}}#endregionHttpWebResponse response = null;System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();try{sw.Start();response = (HttpWebResponse)request.GetResponse();sw.Stop();AvgResponseMilliseconds = sw.ElapsedMilliseconds;CookieCollection cc = new CookieCollection();string cookieString = response.Headers[HttpResponseHeader.SetCookie];if (!string.IsNullOrWhiteSpace(cookieString)){var spilit = cookieString.Split(';');foreach (string item in spilit){var kv = item.Split('=');if (kv.Length == 2)cc.Add(new Cookie(kv[0].Trim().ToString().Replace(",","|*|"), kv[1].Trim()));}}trackCookies(cc);}catch (Exception ex){sw.Stop();AvgResponseMilliseconds = sw.ElapsedMilliseconds;return ex.Message;}string result = getResponseBody(response);return result;}/// <summary>/// post 请求/// </summary>/// <param name="url"></param>/// <param name="headers"></param>/// <param name="content"></param>/// <param name="contentEncode"></param>/// <param name="proxyUrl"></param>/// <returns></returns>public string httpPost(string url, string headers, string content, Encoding contentEncode, string proxyUrl = null){return http(url, "POST", headers, content, contentEncode, proxyUrl);}/// <summary>/// get 请求/// </summary>/// <param name="url"></param>/// <param name="headers"></param>/// <param name="content"></param>/// <param name="proxyUrl"></param>/// <returns></returns>public string httpGet(string url, string headers, string content = null, string proxyUrl = null){return http(url, "GET", headers, null, null, proxyUrl);}/// <summary>/// 填充代理/// </summary>/// <param name="proxyUri"></param>private void fillProxy(HttpWebRequest request, string proxyUri){if (!string.IsNullOrWhiteSpace(proxyUri)){WebProxy proxy = new WebProxy();proxy.Address = new Uri(proxyUri);request.Proxy = proxy;}}/// <summary>/// 跟踪cookies/// </summary>/// <param name="cookies"></param>private void trackCookies(CookieCollection cookies){if (!isTrackCookies) return;if (cookies == null) return;foreach (Cookie c in cookies){if (cookieDic.ContainsKey(c.Name)){cookieDic[c.Name] = c;}else{cookieDic.Add(c.Name, c);}}}/// <summary>/// 格式cookies/// </summary>/// <param name="cookies"></param>private string getCookieStr(){StringBuilder sb = new StringBuilder();foreach (KeyValuePair<string, Cookie> item in cookieDic){if (!item.Value.Expired){if (sb.Length == 0){sb.Append(item.Key).Append("=").Append(item.Value.Value);}else{sb.Append("; ").Append(item.Key).Append(" = ").Append(item.Value.Value);}}}return sb.ToString();}/// <summary>/// 填充头/// </summary>/// <param name="request"></param>/// <param name="headers"></param>private void fillHeaders(HttpWebRequest request, string headers, bool isPrint = false){if (request == null) return;if (string.IsNullOrWhiteSpace(headers)) return;string[] hsplit = headers.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);foreach (string item in hsplit){string[] kv = item.Split(':');string key = kv[0].Trim();string value = string.Join(":", kv.Skip(1)).Trim();if (!UNCHANGEHEADS.Contains(key)){request.Headers.Add(key, value);}else{#region  设置http头switch (key){case "Accept":{request.Accept = value;break;}case "Host":{request.Host = value;break;}case "Connection":{if (value == "keep-alive"){request.KeepAlive = true;}else{request.KeepAlive = false;//just test}break;}case "Content-Type":{request.ContentType = value;break;}case "User-Agent":{request.UserAgent = value;break;}case "Referer":{request.Referer = value;break;}case "Content-Length":{request.ContentLength = Convert.ToInt64(value);break;}case "Expect":{request.Expect = value;break;}case "If-Modified-Since":{request.IfModifiedSince = Convert.ToDateTime(value);break;}default:break;}#endregion}}CookieCollection cc = new CookieCollection();string cookieString = request.Headers[HttpRequestHeader.Cookie];if (!string.IsNullOrWhiteSpace(cookieString)){var spilit = cookieString.Split(';');foreach (string item in spilit){var kv = item.Split('=');if (kv.Length == 2)cc.Add(new Cookie(kv[0].Trim(), kv[1].Trim()));}}trackCookies(cc);if (!isTrackCookies){request.Headers[HttpRequestHeader.Cookie] = "";}else{request.Headers[HttpRequestHeader.Cookie] = getCookieStr();}#region 打印头if (isPrint){for (int i = 0; i < request.Headers.AllKeys.Length; i++){string key = request.Headers.AllKeys[i];System.Console.WriteLine(key + ":" + request.Headers[key]);}}#endregion}/// <summary>/// 打印ResponseHeaders/// </summary>/// <param name="response"></param>private void printResponseHeaders(HttpWebResponse response){#region 打印头if (response == null) return;for (int i = 0; i < response.Headers.AllKeys.Length; i++){string key = response.Headers.AllKeys[i];System.Console.WriteLine(key + ":" + response.Headers[key]);}#endregion}/// <summary>/// 返回body内容/// </summary>/// <param name="response"></param>/// <returns></returns>private string getResponseBody(HttpWebResponse response){Encoding defaultEncode = Encoding.UTF8;string contentType = response.ContentType;if (contentType != null){if (contentType.ToLower().Contains("gb2312")){defaultEncode = Encoding.GetEncoding("gb2312");}else if (contentType.ToLower().Contains("gbk")){defaultEncode = Encoding.GetEncoding("gbk");}else if (contentType.ToLower().Contains("zh-cn")){defaultEncode = Encoding.GetEncoding("zh-cn");}}string responseBody = string.Empty;if (response.ContentEncoding.ToLower().Contains("gzip")){using (GZipStream stream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress)){using (StreamReader reader = new StreamReader(stream)){responseBody = reader.ReadToEnd();}}}else if (response.ContentEncoding.ToLower().Contains("deflate")){using (DeflateStream stream = new DeflateStream(response.GetResponseStream(), CompressionMode.Decompress)){using (StreamReader reader = new StreamReader(stream, defaultEncode)){responseBody = reader.ReadToEnd();}}}else{using (Stream stream = response.GetResponseStream()){using (StreamReader reader = new StreamReader(stream, defaultEncode)){responseBody = reader.ReadToEnd();}}}return responseBody;}public static string UrlEncode(string item, Encoding code){return System.Web.HttpUtility.UrlEncode(item.Trim('\t').Trim(), Encoding.GetEncoding("gb2312"));}public static string UrlEncodeByGB2312(string item){return UrlEncode(item, Encoding.GetEncoding("gb2312"));}public static string UrlEncodeByUTF8(string item){return UrlEncode(item, Encoding.GetEncoding("utf-8"));}public static string HtmlDecode(string item){return WebUtility.HtmlDecode(item.Trim('\t').Trim());}}
}

输出结果如下:

74.350
31 May 2019 09:36
3,856,240,500
0001.HK
HKEX

这中间有个问题要说一下,文中我只是拿00001这个代码做了个例子,所以向接口发出一次请求,只返回了一个代码的数据。如果你想每天批量爬取港股所有的行情数据,首先你要建立一个港股所有股票的码表,通过遍历这个码表,把每个股票代码对应的数据取出来。

取数据的核心请求链接是:

https://www1.hkex.com.hk/hkexwidget/data/getequityquote?sym=1&token=%s&lang=eng&qid=NULL&callback=NULL

其中链接里sym=1这个地方就是对应的股票代码,这里股票代码的00001,在连接里要把前面的0都去掉。同理,如果你想获取00002这个股票代码的数据,那么链接里就要写sym=2

每次替换sym后面对应的数字,就能获取相应股票的行情数据。

python版代码:

使用Python爬取港交所股票行情数据——附Python源码

使用C#爬取网页港股股票行情数据——附C#源码相关推荐

  1. 爬取东方财富网股票行情数据和资讯

    爬取东方财富网股票行情数据和资讯 这个需求源于我的一个练手项目 本篇博客参考:https://zhuanlan.zhihu.com/p/50099084 该博客介绍的东西本博客不做论述 使用技术: 语 ...

  2. python爬虫爬取网页上的天气数据

    目录 一:获取网页重要信息 二:爬取网页数据 三:源码分享 一:获取网页重要信息 在要爬取数据信息的网页上,F12进入查看网页内容 二:爬取网页数据 1 导入模块 import requests fr ...

  3. 简单的爬取网页基本信息(疫情数据)

    文章目录 前言 一.项目简介 二.项目实现过程 1.查看限制 2.读入数据 三.项目效果 总结 前言 随着网络的迅速发展,网络称为大量信息的载体,如何有效地提取并利用这些信息成为巨大的挑战.网络爬虫是 ...

  4. Windows下利用python+selenium+firefox爬取动态网页数据(爬取东方财富网指数行情数据)

    由于之前用urlib和request发现只能获取静态网页数据,目前爬取动态网页有两种方法, (1)分析页面请求 (2)Selenium模拟浏览器行为(霸王硬上弓),本文讲的就是此方法 一.安装sele ...

  5. 利用python爬取网页选考要求数据

    爬取背景:福建省发布了选考要求数据,想要获取数据进行分析,无奈数据量太大 需求分析:要爬取数据的网站为 http://fj.101.com/gaokao/#/,需要将数据存储为csv格式. 爬取代码如 ...

  6. iphone看python文件_Python实战 | 只需 ”三步“ 爬取二手iphone手机信息(转发送源码)...

    原标题:Python实战 | 只需 "三步" 爬取二手iphone手机信息(转发送源码) 本次实战是爬取二手苹果手机的信息,共爬取了300部手机信息,效果如下: 开发环境 环境:M ...

  7. Python爬虫爬取肯德基餐厅信息案例实现(含源码及详细解释)

    1. 需求: 爬取肯德基某一地区的餐厅数量以及其具体信息 (以下代码我仅仅展示出餐厅的店名信息,并将其用列表方式保存在.txt文件中) 2.学习python爬虫的好课推荐: b站上的路飞学城IT,提醒 ...

  8. python爬虫爬取王者荣耀官网全部英雄头像(源码分享)

    这篇文章为源码分享,爬虫的分析过程请阅读文章 <用python爬取王者荣耀官网的英雄皮肤海报(含源码)> 源码 import requests import json from bs4 i ...

  9. python爬虫爬取某网站全站图片案列(源码全给可白漂,仅供学习使用,切勿用作非法用途)

    爬取后保存的图片效果图 步骤入下(三步): 先去要爬取的目标网站(https://pixabay.com/)去注册一个账号. 2.注册后登录,浏览器右键鼠标,打开检查,找到登录后的cookies值. ...

最新文章

  1. 一句话说清聚集索引和非聚集索引以及MySQL的InnoDB和MyISAM
  2. bootstrap 两个轮播图冲突_Bootstrap的轮播图样式
  3. noj数据结构稀疏矩阵的加法十字链表_一个算法毁了一款好游戏?算法和数据结构到底有多重要?...
  4. ORA-01504问题
  5. Basic操作系统概念梳理
  6. CodeForces - 346A Alice and Bob(数论+博弈)
  7. linux fstab 参数,Linux fstab参数详解
  8. 获取两个数据的交集_Redis学习笔记统计该如何选择数据类型
  9. 文档 hbase_0783-6.2.0-如何在Hue中集成HBase
  10. 微信公众号开发 [04] 模板消息功能的开发
  11. 【引用】别让理想毁了人生
  12. eclipse 最全快捷键(网络收集)
  13. 阿里云云计算 44 云计算常见威胁
  14. data layui table 排序_具有排序、筛选、分组、虚拟化、编辑功能的React表格组件...
  15. 1.封包(二)(雷电模拟器+ProxyDroid+CCProxy+WPE) 的使用
  16. 用html制作学生个人博客,网页制作论坛(学生个人网页制作代码)
  17. ivx动效按钮 基础按钮制作 01
  18. python gca_Matplotlib入门-3-plt.gca( )挪动坐标轴
  19. PHP 命令行模式实战之cli+mysql 模拟队列批量发送邮件(在Linux环境下PHP 异步执行脚本发送事件通知消息实际案例)...
  20. 16281053_杨瑷彤_操作系统第五次实验-文件系统

热门文章

  1. #Android反编译#零基础脱掉360加固包的“外衣”
  2. 安卓修改大师是如何脱掉“360加固”的壳的?
  3. java计算机毕业设计高校网上报销系统MyBatis+系统+LW文档+源码+调试部署
  4. 四川大学计算机学院孙亚男,伍前红
  5. Python+Vue计算机毕业设计汽车销售网站7tfw2(源码+程序+LW+部署)
  6. 为什么输入法显示中文打不出中文_搜狗输入法打不出中文怎么回事 搜狗输入法在qq上打不出汉字解决办法...
  7. Unity骚操作:解决Unity里OnTriggerStay2D失灵问题。
  8. UG NX 10 扩大曲面
  9. 用千千静听修改mp3的属性
  10. activemq之Messages Enqueued、Messages Dequeued