http解析库http-parser

一、http-parser简介

1、简介

http-parser是一个用C编写的HTTP消息解析器，可以解析请求和响应，被设计用于高性能HTTP应用程序。它不会进行任何系统调用及内存分配，它不会缓冲数据，它可以被随时中断。根据你的体系结构，每个消息流只需要大约40个字节的数据(在每个连接的web服务器中。

2、特征

不依赖第三方库
处理持续流
分块解码
支持Upgrade
防止缓冲区溢出攻击

3、可以从HTTP消息中解析下列信息

报头域及值(Header fields and values)
内容长度(Content-Length)
请求方法
响应状态码
传输编码
HTTP版本
请求URL(网址)
消息体(Message body)

二、用法

1、下载

http-parser官方地址：https://github.com/nodejs/http-parser

解压：

unzip http-parser-master.zip

2、编译安装

make
make parsertrace
make url_parser
sudo make install

3、初始化

http-parser的每个tcp连接使用一个对象。使用初始化结构http_parser_init()并设置回调。初始化结构如下：

void http_parser_settings_init(http_parser_settings *settings);

对于请求解析器可能是这样的：

http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */
http_parser *parser = malloc(sizeof(http_parser));
http_parser_init(parser, HTTP_REQUEST);
parser->data = my_socket;

4、执行并错误检查

当套接字接收到数据时，执行解析器并检查错误。执行器函数如下

size_t http_parser_execute(http_parser *parser,const http_parser_settings *settings,const char *data,size_t len);

例：

size_t len = 80*1024, nparsed;
char buf[len];
ssize_t recved;recved = recv(fd, buf, len, 0);if (recved < 0) {/* Handle error. */
}/* Start up / continue the parser.* Note we pass recved==0 to signal that EOF has been received.*/
nparsed = http_parser_execute(parser, &settings, buf, recved);if (parser->upgrade) {/* handle new protocol */
} else if (nparsed != recved) {/* Handle error. Usually just close the connection. */
}

http_parser需要知道流的结尾在哪里。例如，一些服务发送的请求没有Content-Length并希望客户端输入知道EOF结束。为了告诉http_parser关于EOF的信息，请将0作为http_parser_execute()的第四个参数。在EOF期间仍然会遇到回调和错误，因此必须做好接收它们的准备。

报头消息信息（如状态码、方法和http版本）存储在解析器结构中。此数据仅临时存储在http_parser中，并在每个新消息上重置。如果后续需要此信息，可在headers_complete期间回调。

解析器透明地解码请求和响应的传输编码。即，分块编码在被发送到on_body回调之前被解码。

三、升级的特殊问题

http_parser支持将连接升级到不同的协议。一个常见的例子就是websocket协议，它发送一个请求，比如：

 GET /demo HTTP/1.1Upgrade: WebSocketConnection: UpgradeHost: example.comOrigin: http://example.comWebSocket-Protocol: sample

为了支持此功能，解析器将其视作为没有正文的普通http消息，同时发出on_headers_complete和on_message_complete回调。但是，http_parser_execute()将在头的末尾停止解析并返回。

在http_parser_execute()返回后，用户需要检查parser->upgrade是否设置为1。非http数据从http_parser_execute()返回值提供的缓冲区偏移量开始。

四、回调

在http_parser_execute()调用期间，将执行http_parser_settings中设置的回调。解析器保持状态，从不向后看，因此不需要缓存数据。如果你需要保持某些数据以供以后使用，可从回调中执行此操作。回调有两种类型：

通知：typedef int (*http_cb) (http_parser*); 回调：on_message_begin, on_headers_complete, on_message_complete.
数据：typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); 回调：(requests only) on_url, (common) on_header_field, on_header_value, on_body;

成功时回调必须返回0。返回非零值表示解析器出错，使其立即退出。

对于需要向回调传递本地信息或从回调传递本地信息的情况，可以使用http_parser对象的数据字段。这种情况的一个例子是使用线程处理套接字连接、解析请求，然后通过该套接字给出响应。通过实例化包含相关数据的线程本地结构（例如，接受的套接字、分配给回调写入的内存等），解析器的回调能够以线程安全的方式在线程的作用域和回调的作用域之间传递数据。这允许在多线程上下文中使用http_parser。

例：

typedef struct {socket_t sock;void* buffer;int buf_len;} custom_data_t;int my_url_callback(http_parser* parser, const char *at, size_t length) {/* access to thread local custom_data_t struct.Use this access save parsed data for later use into thread localbuffer, or communicate over socket*/parser->data;...return 0;
}...void http_parser_thread(socket_t sock) {int nparsed = 0;/* allocate memory for user data */custom_data_t *my_data = malloc(sizeof(custom_data_t));/* some information for use by callbacks.* achieves thread -> callback information flow */my_data->sock = sock;/* instantiate a thread-local parser */http_parser *parser = malloc(sizeof(http_parser));http_parser_init(parser, HTTP_REQUEST); /* initialise parser *//* this custom data reference is accessible through the reference to theparser supplied to callback functions */parser->data = my_data;http_parser_settings settings; /* set up callbacks */settings.on_url = my_url_callback;/* execute parser */nparsed = http_parser_execute(parser, &settings, buf, recved);.../* parsed information copied from callback.can now perform action on data copied into thread-local memory from callbacks.achieves callback -> thread information flow */my_data->buffer;...
}

如果你将HTTP消息分块解析（例，socket中的read()请求行、解析，读取一半头等），你的数据回调可能会被多次调用。http_parser保证数据指针只在回调的生存期内有效。如果适合应用程序，还可读取（read()）堆中的缓冲区，以避免复制内存。

如果读取/解析部分头，那么读取头将是一件棘手的任务。基本上，你需要记住最后一个头回调是字段还是值，并应用一下逻辑：

(on_header_field and on_header_value shortened to on_h_*)------------------------ ------------ --------------------------------------------
| State (prev. callback) | Callback   | Description/action                         |------------------------ ------------ --------------------------------------------
| nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
|                        |            | into it                                    |------------------------ ------------ --------------------------------------------
| value                  | on_h_field | New header started.                        |
|                        |            | Copy current name,value buffers to headers |
|                        |            | list and allocate new buffer for new name  |------------------------ ------------ --------------------------------------------
| field                  | on_h_field | Previous name continues. Reallocate name   |
|                        |            | buffer and append callback data to it      |------------------------ ------------ --------------------------------------------
| field                  | on_h_value | Value for current header started. Allocate |
|                        |            | new buffer and copy callback data to it    |------------------------ ------------ --------------------------------------------
| value                  | on_h_value | Value continues. Reallocate value buffer   |
|                        |            | and append callback data to it             |------------------------ ------------ --------------------------------------------

另：http_parser_parse_url()提供了一个简单的零拷贝url解析器。此库的用户可能希望使用它来解析从on_url构造的url。

五、实例

#include <http_parser.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <time.h>using namespace std;static http_parser *parser;int on_message_begin(http_parser* _) {(void)_;printf("\n***MESSAGE BEGIN***\n\n");return 0;
}int on_headers_complete(http_parser* _) {(void)_;printf("\n***HEADERS COMPLETE***\n\n");return 0;
}int on_message_complete(http_parser* _) {(void)_;printf("\n***MESSAGE COMPLETE***\n\n");return 0;
}int on_url(http_parser* _, const char* at, size_t length) {(void)_;printf("Url: %.*s\n", (int)length, at);return 0;
}int on_status(http_parser* _, const char* at, size_t length) {(void)_;printf("Status: %.*s\n", (int)length, at);return 0;
}int on_header_field(http_parser* _, const char* at, size_t length) {(void)_;printf("Header field: %.*s\n", (int)length, at);return 0;
}int on_header_value(http_parser* _, const char* at, size_t length) {(void)_;printf("Header value: %.*s\n", (int)length, at);return 0;
}int on_body(http_parser* _, const char* at, size_t length) {(void)_;printf("Body: %.*s\n", (int)length, at);return 0;
}int on_chunk_header(http_parser* _) {(void)_;printf("\n***CHUNK HEADER***\n\n");return 0;
}int on_chunk_complete(http_parser* _) {(void)_;printf("\n***CHUNK COMPLETE***\n\n");return 0;
}// http_parser的回调函数，需要获取HEADER后者BODY信息，可以在这里面处理
// 注意其中变量前面“.”表示的是当前结构体中的成员变量，
// 类似于对象.成员,同时可以可以乱序，如果未指定则必须要按原先的顺序
static http_parser_settings settings_null = {.on_message_begin = on_message_begin,    // 相当于settings_null.on_message_begin = on_message_begin,.on_url = on_url,.on_status = on_status,.on_header_field = on_header_field,.on_header_value = on_header_value,.on_headers_complete = on_headers_complete,.on_body = on_body,.on_message_complete = on_message_complete,.on_chunk_header = on_chunk_header,.on_chunk_complete = on_chunk_complete
};int main(void)
{const char* buf;float start, end;size_t parsed;parser = (http_parser *)malloc(sizeof(http_parser));buf =  "GET http://admin.omsg.cn/uploadpic/2016121034000012.png HTTP/1.1\r\nHost: \admin.omsg.cn\r\nAccept:*/*\r\nConnection: Keep-Alive\r\n\r\n";start = (float)clock() / CLOCKS_PER_SEC;for (int i = 0; i < 1; i++) {// 初始化parser为Request类型http_parser_init(parser, HTTP_REQUEST); // 执行解析过程parsed = http_parser_execute(parser, &settings_null, buf, strlen(buf));}end = (float)clock() / CLOCKS_PER_SEC;buf = "HTTP/1.1 200 OK\r\n""Date: Tue, 04 Aug 2009 07:59:32 GMT\r\n""Server: Apache\r\n""X-Powered-By: Servlet/2.5 JSP/2.1\r\n""Content-Type: text/xml; charset=utf-8\r\n""Connection: close\r\n""\r\n""<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n""<SOAP-ENV:Envelope xmlns:SOAP-ENV=\"http://schemas.xmlsoap.org/soap/envelope/\">\n""  <SOAP-ENV:Body>\n""    <SOAP-ENV:Fault>\n""       <faultcode>SOAP-ENV:Client</faultcode>\n""       <faultstring>Client Error</faultstring>\n""    </SOAP-ENV:Fault>\n""  </SOAP-ENV:Body>\n""</SOAP-ENV:Envelope>";// 初始化parser为Responsehttp_parser_init(parser, HTTP_RESPONSE);// 执行解析过程parsed = http_parser_execute(parser, &settings_null, buf, strlen(buf));free(parser);parser = NULL;printf("Elapsed %f seconds.\n", (end - start));return 0;}

编译：

gcc -Wall -Wextra -Werror -Wno-error=unused-but-set-variable -O3 http-parser-master/http_parser.o demo.cpp -o demo

执行结果：

参考：https://github.com/nodejs/http-parser