服务器架设笔记——使用Apache插件解析简单请求

一般来说，对于一个请求，服务器都会对其进行解析，以确定请求的合法性以及行进的路径。于是本节将讲解如何获取请求的数据。（转载请指明出于breaksoftware的csdn博客）

我们使用《服务器架设笔记——编译Apache及其插件》一文中的方法创建一个Handler工程——get_request。该工程中，我们可以操作的入口函数是

static int get_request_handler(request_rec *r)
{r->content_type = "text/html";

通过该入口函数，我们可以直接得到的数据就是request_rec结构体对象指针r。通过查阅源码，我们得到其定义

/*** @brief A structure that represents the current request*/
struct request_rec {/** The pool associated with the request */apr_pool_t *pool;/** The connection to the client */conn_rec *connection;/** The virtual host for this request */server_rec *server;/** Pointer to the redirected request if this is an external redirect */request_rec *next;/** Pointer to the previous request if this is an internal redirect */request_rec *prev;/** Pointer to the main request if this is a sub-request* (see http_request.h) */request_rec *main;/* Info about the request itself... we begin with stuff that only* protocol.c should ever touch...*//** First line of request */char *the_request;/** HTTP/0.9, "simple" request (e.g. GET /foo\n w/no headers) */int assbackwards;/** A proxy request (calculated during post_read_request/translate_name)*  possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE,*                  PROXYREQ_RESPONSE*/int proxyreq;/** HEAD request, as opposed to GET */int header_only;/** Protocol version number of protocol; 1.1 = 1001 */int proto_num;/** Protocol string, as given to us, or HTTP/0.9 */char *protocol;/** Host, as set by full URI or Host: */const char *hostname;/** Time when the request started */apr_time_t request_time;/** Status line, if set by script */const char *status_line;/** Status line */int status;/* Request method, two ways; also, protocol, etc..  Outside of protocol.c,* look, but don't touch.*//** M_GET, M_POST, etc. */int method_number;/** Request method (eg. GET, HEAD, POST, etc.) */const char *method;/***  'allowed' is a bitvector of the allowed methods.**  A handler must ensure that the request method is one that*  it is capable of handling.  Generally modules should DECLINE*  any request methods they do not handle.  Prior to aborting the*  handler like this the handler should set r->allowed to the list*  of methods that it is willing to handle.  This bitvector is used*  to construct the "Allow:" header required for OPTIONS requests,*  and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes.**  Since the default_handler deals with OPTIONS, all modules can*  usually decline to deal with OPTIONS.  TRACE is always allowed,*  modules don't need to set it explicitly.**  Since the default_handler will always handle a GET, a*  module which does *not* implement GET should probably return*  HTTP_METHOD_NOT_ALLOWED.  Unfortunately this means that a Script GET*  handler can't be installed by mod_actions.*/apr_int64_t allowed;/** Array of extension methods */apr_array_header_t *allowed_xmethods;/** List of allowed methods */ap_method_list_t *allowed_methods;/** byte count in stream is for body */apr_off_t sent_bodyct;/** body byte count, for easy access */apr_off_t bytes_sent;/** Last modified time of the requested resource */apr_time_t mtime;/* HTTP/1.1 connection-level features *//** The Range: header */const char *range;/** The "real" content length */apr_off_t clength;/** sending chunked transfer-coding */int chunked;/** Method for reading the request body* (eg. REQUEST_CHUNKED_ERROR, REQUEST_NO_BODY,*  REQUEST_CHUNKED_DECHUNK, etc...) */int read_body;/** reading chunked transfer-coding */int read_chunked;/** is client waiting for a 100 response? */unsigned expecting_100;/** The optional kept body of the request. */apr_bucket_brigade *kept_body;/** For ap_body_to_table(): parsed body *//* XXX: ap_body_to_table has been removed. Remove body_table too or* XXX: keep it to reintroduce ap_body_to_table without major bump? */apr_table_t *body_table;/** Remaining bytes left to read from the request body */apr_off_t remaining;/** Number of bytes that have been read  from the request body */apr_off_t read_length;/* MIME header environments, in and out.  Also, an array containing* environment variables to be passed to subprocesses, so people can* write modules to add to that environment.** The difference between headers_out and err_headers_out is that the* latter are printed even on error, and persist across internal redirects* (so the headers printed for ErrorDocument handlers will have them).** The 'notes' apr_table_t is for notes from one module to another, with no* other set purpose in mind...*//** MIME header environment from the request */apr_table_t *headers_in;/** MIME header environment for the response */apr_table_t *headers_out;/** MIME header environment for the response, printed even on errors and* persist across internal redirects */apr_table_t *err_headers_out;/** Array of environment variables to be used for sub processes */apr_table_t *subprocess_env;/** Notes from one module to another */apr_table_t *notes;/* content_type, handler, content_encoding, and all content_languages* MUST be lowercased strings.  They may be pointers to static strings;* they should not be modified in place.*//** The content-type for the current request */const char *content_type;   /* Break these out --- we dispatch on 'em *//** The handler string that we use to call a handler function */const char *handler;        /* What we *really* dispatch on *//** How to encode the data */const char *content_encoding;/** Array of strings representing the content languages */apr_array_header_t *content_languages;/** variant list validator (if negotiated) */char *vlist_validator;/** If an authentication check was made, this gets set to the user name. */char *user;/** If an authentication check was made, this gets set to the auth type. */char *ap_auth_type;/* What object is being requested (either directly, or via include* or content-negotiation mapping).*//** The URI without any parsing performed */char *unparsed_uri;/** The path portion of the URI, or "/" if no path provided */char *uri;/** The filename on disk corresponding to this response */char *filename;/* XXX: What does this mean? Please define "canonicalize" -aaron *//** The true filename, we canonicalize r->filename if these don't match */char *canonical_filename;/** The PATH_INFO extracted from this request */char *path_info;/** The QUERY_ARGS extracted from this request */char *args;/*** Flag for the handler to accept or reject path_info on* the current request.  All modules should respect the* AP_REQ_ACCEPT_PATH_INFO and AP_REQ_REJECT_PATH_INFO* values, while AP_REQ_DEFAULT_PATH_INFO indicates they* may follow existing conventions.  This is set to the* user's preference upon HOOK_VERY_FIRST of the fixups.*/int used_path_info;/** A flag to determine if the eos bucket has been sent yet */int eos_sent;/* Various other config info which may change with .htaccess files* These are config vectors, with one void* pointer for each module* (the thing pointed to being the module's business).*//** Options set in config files, etc. */struct ap_conf_vector_t *per_dir_config;/** Notes on *this* request */struct ap_conf_vector_t *request_config;/** Optional request log level configuration. Will usually point*  to a server or per_dir config, i.e. must be copied before*  modifying */const struct ap_logconf *log;/** Id to identify request in access and error log. Set when the first*  error log entry for this request is generated.*/const char *log_id;/*** A linked list of the .htaccess configuration directives* accessed by this request.* N.B. always add to the head of the list, _never_ to the end.* that way, a sub request's list can (temporarily) point to a parent's list*/const struct htaccess_result *htaccess;/** A list of output filters to be used for this request */struct ap_filter_t *output_filters;/** A list of input filters to be used for this request */struct ap_filter_t *input_filters;/** A list of protocol level output filters to be used for this*  request */struct ap_filter_t *proto_output_filters;/** A list of protocol level input filters to be used for this*  request */struct ap_filter_t *proto_input_filters;/** This response can not be cached */int no_cache;/** There is no local copy of this response */int no_local_copy;/** Mutex protect callbacks registered with ap_mpm_register_timed_callback* from being run before the original handler finishes running*/apr_thread_mutex_t *invoke_mtx;/** A struct containing the components of URI */apr_uri_t parsed_uri;/**  finfo.protection (st_mode) set to zero if no such file */apr_finfo_t finfo;/** remote address information from conn_rec, can be overridden if* necessary by a module.* This is the address that originated the request.*/apr_sockaddr_t *useragent_addr;char *useragent_ip;/** MIME trailer environment from the request */apr_table_t *trailers_in;/** MIME trailer environment from the response */apr_table_t *trailers_out;
};

这是个非常大的结构体，可谓是包罗万象。对于初学者来说，想完全弄明白各项是什么还是比较困难的。而我们的需求很简单，我们就列出我们可能需要关心的数据

    /** First line of request */char *the_request;

请求的第一行数据

    /** Protocol version number of protocol; 1.1 = 1001 */int proto_num;/** Protocol string, as given to us, or HTTP/0.9 */char *protocol;/** Host, as set by full URI or Host: */const char *hostname;

协议的版本和请求的类型

    /** Time when the request started */apr_time_t request_time;

请求的时间

    /** The URI without any parsing performed */char *unparsed_uri;/** The path portion of the URI, or "/" if no path provided */char *uri;/** The filename on disk corresponding to this response */char *filename;

未进行urldecode的URI、经过urldecode的URI和处理该请求的文件路径

    /** The PATH_INFO extracted from this request */char *path_info;/** The QUERY_ARGS extracted from this request */char *args;

请求中的路径和参数

    /** A struct containing the components of URI */apr_uri_t parsed_uri;

请求解析的详细结果

    char *useragent_ip;

请求来源的IP

/** MIME header environment from the request */apr_table_t *headers_in;

以table形式保存的http头信息

对于基础数据类型我们很容易编写出例程

 if (r->the_request) {ap_rprintf(r, "the request : %s\n", r->the_request);}else {ap_rprintf(r, "the request is NULL\n");}if (r->protocol) {ap_rprintf(r, "protocol : %s\n", r->protocol);}else {ap_rprintf(r, "protocol is NULL\n");}ap_rprintf(r, "proto_num is %d\n", r->proto_num);

而对于请求时间apr_time_t类型，我们可以参考《服务器架设笔记——Apache模块开发基础知识》中对模块的介绍。我们查看源码，可以编写出如下例程

static void print_time(request_rec* r) {if (!r) {ap_rprintf(r, "request_rec pointer is NULL\n");return;}char data_str[128] = {0};apr_status_t status = apr_ctime(data_str, r->request_time);if (APR_SUCCESS != status) {ap_rprintf(r, "apr_ctime error\n");    }else {ap_rprintf(r, "ctime\t:\t%s\n", data_str);}apr_time_exp_t exp_t;memset(&exp_t, 0, sizeof(exp_t));status = apr_time_exp_gmt(&exp_t, r->request_time);if (APR_SUCCESS != status) {ap_rprintf(r, "apr_time_exp_gmt error\n");}else {ap_rprintf(r, "exp time\t:\n");ap_rprintf(r, "\ttm_usec\t:\t%d\n", exp_t.tm_usec);ap_rprintf(r, "\ttm_sec\t:\t%d\n", exp_t.tm_sec);ap_rprintf(r, "\ttm_min\t:\t%d\n", exp_t.tm_min);ap_rprintf(r, "\ttm_hour\t:\t%d\n", exp_t.tm_hour);ap_rprintf(r, "\ttm_mday\t:\t%d\n", exp_t.tm_mday);ap_rprintf(r, "\ttm_mon\t:\t%d\n", exp_t.tm_mon);ap_rprintf(r, "\ttm_year\t:\t%d\n", exp_t.tm_year);ap_rprintf(r, "\ttm_wday\t:\t%d\n", exp_t.tm_wday);ap_rprintf(r, "\ttm_yday\t:\t%d\n", exp_t.tm_yday);ap_rprintf(r, "\ttm_isdst\t:\t%d\n", exp_t.tm_isdst);ap_rprintf(r, "\ttm_gmtoff\t:\t%d\n", exp_t.tm_gmtoff);}
}

其中apr_time_exp_t的定义在《apr_time.h》中。

/*** a structure similar to ANSI struct tm with the following differences:*  - tm_usec isn't an ANSI field*  - tm_gmtoff isn't an ANSI field (it's a BSDism)*/
struct apr_time_exp_t {/** microseconds past tm_sec */apr_int32_t tm_usec;/** (0-61) seconds past tm_min */apr_int32_t tm_sec;/** (0-59) minutes past tm_hour */apr_int32_t tm_min;/** (0-23) hours past midnight */apr_int32_t tm_hour;/** (1-31) day of the month */apr_int32_t tm_mday;/** (0-11) month of the year */apr_int32_t tm_mon;/** year since 1900 */apr_int32_t tm_year;/** (0-6) days since Sunday */apr_int32_t tm_wday;/** (0-365) days since January 1 */apr_int32_t tm_yday;/** daylight saving time */apr_int32_t tm_isdst;/** seconds east of UTC */apr_int32_t tm_gmtoff;
};

对于已分析过了的请求结构体apr_uri_t的例程也非常简单，我就不再列出来，只是把其结构体定义贴一下。大家一看就明白

/*** A structure to encompass all of the fields in a uri*/
struct apr_uri_t {/** scheme ("http"/"ftp"/...) */char *scheme;/** combined [user[:password]\@]host[:port] */char *hostinfo;/** user name, as in http://user:passwd\@host:port/ */char *user;/** password, as in http://user:passwd\@host:port/ */char *password;/** hostname from URI (or from Host: header) */char *hostname;/** port string (integer representation is in "port") */char *port_str;/** the request path (or NULL if only scheme://host was given) */char *path;/** Everything after a '?' in the path, if present */char *query;/** Trailing "#fragment" string, if present */char *fragment;/** structure returned from gethostbyname() */struct hostent *hostent;/** The port number, numeric, valid only if port_str != NULL */apr_port_t port;/** has the structure been initialized */unsigned is_initialized:1;/** has the DNS been looked up yet */unsigned dns_looked_up:1;/** has the dns been resolved yet */unsigned dns_resolved:1;
};

这些例程中麻烦的是对apr_table_t的解析。因为网上很难找到对该table的遍历代码，于是我只能参考apr_table_clone中代码得出如下

static void print_table(request_rec *r, const apr_table_t* t) {const apr_array_header_t* array = apr_table_elts(t);apr_table_entry_t* elts = (apr_table_entry_t*)array->elts;for (int i = 0; i < array->nelts; i++) {ap_rprintf(r, "\t%s : %s\n", elts[i].key, elts[i].val);}
}

我们请求一个URL：http://192.168.191.129/AP%26AC%3aHE?a=b#c

其返回如下

headers_in startHost : 192.168.191.129Connection : keep-aliveCache-Control : max-age=0Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8User-Agent : Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36Accept-Encoding : gzip,deflate,sdchAccept-Language : zh-CN,zh;q=0.8
headers_in endheaders_out start
headers_out endthe request : GET /AP%26AC%3aHE?a=b HTTP/1.1
protocol : HTTP/1.1
proto_num is 1001
method : GET
host name : 192.168.191.129
unparsed uri : /AP%26AC%3aHE?a=b
uri : /AP&AC:HE
filename : /usr/local/apache2/htdocs/AP&AC:HE
path info :
args : a=b
user is NULL
log id is NULL
useragent ip : 192.168.191.1
ctime   :   Mon Feb 16 18:20:39 2015
exp time    :tm_usec    :   200039tm_sec    :   39tm_min    :   20tm_hour   :   10tm_mday   :   16tm_mon    :   1tm_year    :   115tm_wday  :   1tm_yday    :   46tm_isdst  :   0tm_gmtoff  :   0
scheme is NULL
hostinfo is NULL
user is NULL
password is NULL
hostname is NULL
port_str is NULL
path : /AP&AC:HE
query : a=b
fragment is NULL
The sample page from mod_get_request.c

服务器架设笔记——使用Apache插件解析简单请求相关推荐

服务器架设笔记——编译Apache及其插件
之前一直从事Windows上的客户端软件开发,经常会处理和服务器交互相关的业务.由于希望成为一个全栈式的工程师,我对Linux上服务器相关的开发也越来越感兴趣.趁着年底自由的时间比较多,我可以对这块做 ...
RHCE课程-RH253Linux服务器架设笔记五-APACHE服务器配置(4)
JSP(Java Server Pages)是由Sun Microsystems公司倡导.许多公司一起参与建立的一种基于Java技术的动态网页技术标准. Apache只是一个Web服务器,不能运行JS ...
服务器架设笔记——打通MySQL和Apache
在<服务器架设笔记--使用Apache插件解析简单请求>一文中,我们已经可以获取请求内容.这只是万里长征的第一步.因为一般来说,客户端向服务器发起请求,服务器会有着复杂的业务处理逻辑.举个 ...
服务器架设笔记——Apache模块开发基础知识
通过上节的例子,我们发现Apache插件开发的一个门槛便是学习它自成体系的一套API.虽然Apache的官网上有对这些API的详细介绍,但是空拿着一些零散的说明书,是很难快速建立起一套可以运行的系统. ...
服务器架设笔记——多模块和全局数据
随着项目工程的发展,多模块设计和性能优化是在所难免的.本文我将基于一些现实中可能遇到的需求,讲解如何在Apache的Httpd插件体系中实现这些功能.(转载请指明出于breaksoftware的csd ...
CentOS4.4下邮件服务器架设笔记之windows AD整合功能实现
1.通过"CentOS4.4下邮件服务器架设笔记之邮件网关功能实现"这一篇文章,我们已经实现了邮件网关功能,但是对于microsoft ad 平台下exchange邮件系统用户来说 ...
RHCE课程-RH253Linux服务器架设笔记五-DNS服务器配置（2）
上季我们学习了,DNS的原理和bind软件的相关简介,还有安装架设了BIND软件的DNS服务器,还有就是正向区域和反向区域的一些技巧,今天我们要学的就是DNS的辅助服务器的架设,还有DNS的缓存域名服 ...
Web服务器踩坑之旅03:解析HTTP请求报文
项目地址: 本文实现的文件在源码中的SimpleWebServer/http_parser目录下本文内容目标:解析HTTP报文,从而获取客户请求的文件的文件名及文件地址浏览器与服务器间的通信过程 ...
服务器架设笔记——httpd插件支持mysql字符集选择
mysql数据库默认的字符集是latin1.默认情况下,我们编译的httpd插件是可以正常读取该类型的数据库,并且不会出现乱码.但是,如果我们的数据库变成其他格式,比如UTF8,那么默认读取出来的数据 ...

服务器架设笔记——使用Apache插件解析简单请求

服务器架设笔记——使用Apache插件解析简单请求相关推荐

最新文章

热门文章