【译】Emiller Nginx模块开发指南(第一部分)

这篇文章被认为是nginx模块开发的标准教程,因此翻译过来希望对大家有所帮助。

原文链接:http://www.evanmiller.org/nginx-modules-guide.html

这一部分介绍了nginx的一些基础知识,对已经熟悉nginx的开发者帮助不大,但如果初识nginx,建议还是好好读一下,并且最好能有所扩展。

 

由于格式问题,这篇文章看起来可能会不太舒服,可以直接阅读我的google doc:

中文版       中英对照版

Emiller's Guide To Nginx Module Development

Emiller Nginx模块开发指南

By Evan Miller    作者:Evan Miller

 

DRAFT: August 13, 2009 (changes)

Bruce Wayne: What's that?

Lucius Fox: The Tumbler? Oh... you wouldn't be interested in that.

To fully appreciate Nginx, the web server, it helps to understand Batman, the comic book character.

Batman is fast. Nginx is fast. Batman fights crime. Nginx fights wasted CPU cycles and memory leaks. Batman performs well under pressure. Nginx, for its part, excels under heavy server loads.

But Batman would be almost nothing without the Batman utility belt.

作者上来先来了一段废话说nginx巨像蝙蝠侠,都很快什么什么的,而且nginx还能把cpu和内存处理的巨牛B,并且在巨大的压力下还能很happy的工作。但是蝙蝠侠是要靠一个腰带的,没了腰带蝙蝠侠就不行了。

 

Figure 1: The Batman utility belt, gripping Christian Bale's love handles.

特点1:蝙蝠侠腰带什么的

 

At any given time, Batman's utility belt might contain a lock pick, several batarangs, bat-cuffs, a bat-tracer, bat-darts, night vision goggles, thermite grenades, smoke pellets, a flashlight, a kryptonite ring, an acetylene torch, or an Apple iPhone. When Batman needs to tranquilize, blind, deafen, stun, track, stop, smoke out, or text-message the enemy, you better believe he's reaching down for his bat-belt. The belt is so crucial to Batman's operations that if Batman had to choose between wearing pants and wearing the utility belt, he would definitely choose the belt. In fact, he *did* choose the utility belt, and that's why Batman wears rubber tights instead of pants (Fig. 1).

对蝙蝠侠腰带感兴趣的同学,请去看电影,这一段不翻译,主要原因是看不懂。。。

Instead of a utility belt, Nginx has a module chain. When Nginx needs to gzip or chunk-encode a response, it whips out a module to do the work. When Nginx blocks access to a resource based on IP address or HTTP auth credentials, a module does the deflecting. When Nginx communicates with Memcache or FastCGI servers, a module is the walkie-talkie.

跟蝙蝠侠腰带相对的,nginx有一个“模块链”(译注:这条链子是仅仅是用来hold模块的,与模块的调用和运行基本关系不大)。当nginx需要gzip或者chunk-encode(译注:需要了解gzip和chunk-encode的同学可以参考http://www.w3schools.com的文章)一个响应的时候,他就调用一个模块搞定这些。nginx要阻止一个IP段或者非法http请求的时候,也是通过调用模块链上的模块来搞定的。nginx跟Memcache或者FastCGI通讯的时候,又有一些模块充当通讯工具。

Batman's utility belt holds a lot of doo-hickeys, but occasionally Batman needs a new tool. Maybe there's a new enemy against whom bat-cuffs and batarangs are ineffectual. Or Batman needs a new ability, like being able to breathe underwater. That's when Batman rings up Lucius Fox to engineer the appropriate bat-gadget.

又是一段关于蝙蝠侠的。。。一如既往的不翻译。并且看不懂。。

Figure 2: Bruce Wayne (née Batman) consults with his engineer, Lucius Fox

The purpose of this guide is to teach you the details of Nginx's module chain, so that you may be like Lucius Fox. When you're done with the guide, you'll be able to design and produce high-quality modules that enable Nginx to do things it couldn't do before. Nginx's module system has a lot of nuance and nitty-gritty, so you'll probably want to refer back to this document often. I have tried to make the concepts as clear as possible, but I'll be blunt, writing Nginx modules can still be hard work.

特点2:这一段话主要是说,你读完了这片guide基本上就能写一些相对比较牛B的扩展出来了,让nignx去做一些新的工作。但是,由于nginx还是有一些潜规则的,需要你掌握一些奇技淫巧才能比较游刃有余的控制他,那么你可能会一次一次的来翻这个文档。最后,作者说:写nginx模块还是有点难度滴(译注:标准废话)。

 

But whoever said making bat-tools would be easy?

(译注:但是通过这片文章我真的了解了蝙蝠侠。。。谢谢作者。。。)

 

Table of Contents

(译注:目录不翻译)

Prerequisites

High-Level Overview of Nginx's Module Delegation

Components of an Nginx Module

Module Configuration Struct(s)

Module Directives

The Module Context

create_loc_conf

merge_loc_conf

The Module Definition

Module Installation

Handlers

Anatomy of a Handler (Non-proxying)

Getting the location configuration

Generating a response

Sending the header

Sending the body

Anatomy of an Upstream (a.k.a. Proxy) Handler

Summary of upstream callbacks

The create_request callback

The process_header callback

Keeping state

Handler Installation

Filters

Anatomy of a Header Filter

Anatomy of a Body Filter

Filter Installation

Load-Balancers

The enabling directive

The registration function

The upstream initialization function

The peer initialization function

The load-balancing function

The peer release function

Writing and Compiling a New Nginx Module

Advanced Topics

Code References

 

0. Prerequisites

0. 前戏(译注:请原谅我这么直言不讳的翻译)

 

You should be comfortable with C. Not just "C-syntax"; you should know your way around a struct and not be scared off by pointers and function references, and be cognizant of the preprocessor. If you need to brush up, nothing beats K&R.

写nginx扩展需要你对C语言有一定的了解,注意,这里所谓的了解不仅仅是C语法,你应该了解更多,比如数据结构啊,指针啊,函数引用啊,还有预处理什么的。如果你还没有到这个份上,那看看这本书:K&R(译注:作者推荐的,我也没看过)。

 

Basic understanding of HTTP is useful. You'll be working on a web server, after all.

还有,就是了解一些关于HTTP协议的东西也挺有用的,毕竟你是在搞一个web server么~

 

You should also be familiar with Nginx's configuration file. If you're not, here's the gist of it: there are four contexts (called main, server, upstream, and location) which can contain directives with one or more arguments. Directives in the main context apply to everything; directives in the server context apply to a particular host/port; directives in the upstream context refer to a set of backend servers; and directives in a location context apply only to matching web locations (e.g., "/", "/images", etc.) A location context inherits from the surrounding server context, and a server context inherits from the main context. The upstream context neither inherits nor imparts its properties; it has its own special directives that don't really apply elsewhere. I'll refer to these four contexts quite a bit, so... don't forget them.

另外就是要对nginx的配置文件巨熟悉。无论如何,先废话介绍一下(译注:熟悉配置文件的请跳过):配置文件里边有四种上下文(译注:原文的context,不知道翻译成啥更贴切,后边还是用context的原文),这些context中都有一些带参数的指令。main context里的指令适用于所有其他的context;server context适用于制定的主机和端口;upstream context里的指令会提交到后端服务器;还有location context指令只应用于匹配到的web location(就是“/”,“/images”什么的)。location context的配置是从他前一个location继承来的(译注:就跟apache里的VirtualHost差不多的意思),server context是从main context继承来的。upstream context不继承任何一个属性;它有自己单独的指令,这些指令在别处也没用。在这里小提一下这四种contexts,恩,记住他们。

 

Let's get started.

开始吧(译注:终于开始了,泪奔庆祝)

 

1. High-Level Overview of Nginx's Module Delegation

1. nginx模块委托概览

 

 

  • Nginx modules have three roles we'll cover:
  • nginx模块有三种角色:
  • handlers process a request and produce output
  • 处理器(handler),用来处理请求和加工输出
  • filters manipulate the output produced by a handler
  • 过滤器(filter),处理handler加工出来的输出数据。
  • load-balancers choose a backend server to send a request to, when more than one backend server is eligible
  • 负载均衡器(load-balancer),如果有多个合法的后端服务器,那么就选择一个后端服务器,并且把请求转发过去。

 

 

Modules do all of the "real work" that you might associate with a web server: whenever Nginx serves a file or proxies a request to another server, there's a handler module doing the work; when Nginx gzips the output or executes a server-side include, it's using filter modules. The "core" of Nginx simply takes care of all the network and application protocols and sets up the sequence of modules that are eligible to process a request. The de-centralized architecture makes it possible for *you* to make a nice self-contained unit that does something you want.

你能想到的与web server相关的工作,几乎都是模块来完成的:nginx处理一个文件请求或者是把一个请求转发到别的服务器,是由一个handler模块来做这件事的;nginx要gzip一个输出或者执行一次SSI,就会调用一个filter模块。nginx的“core”模块关注所有的网络和应用协议,还有一坨模块以怎样的顺序去处理请求。这种松散架构(译注:原文是de-centralized architecture,我比较喜欢翻成松散架构,或许不准确,大家可以自行换成自己喜欢的词)使得开发者可以非常方便自由的做爱做的事(译注:这种架构也是我最喜欢的,我的一个Perl项目,也采用了这种架构,非常灵活;还有LotusPHP的php框架,也采用了这种架构,非常强大)。

 

Note: Unlike modules in Apache, Nginx modules are not dynamically linked. (In other words, they're compiled right into the Nginx binary.)

注意:跟apache的模块不同,nginx的模块不是动态加载的,也就是说nginx的模块必须和主程序一起编译进去(译注:调试时候略显痛苦,不过拷二进制文件的方法可以稍微缓解一下)。

 

How does a module get invoked? Typically, at server startup, each handler gets a chance to attach itself to particular locations defined in the configuration; if more than one handler attaches to a particular location, only one will "win" (but a good config writer won't let a conflict happen). Handlers can return in three ways: all is good, there was an error, or it can decline to process the request and defer to default handler (typically something that serves static files).

模块是如何被调用的呢?一般来说,在服务启动的时候,每一个handler都会根据配置文件找到属于他的location,然后挂上去;如果多个handler要挂在同一个location,那只有一个能成功挂上(好的配置文件不会让这种情况发生)(译注:别干那种让多个handler竞争的傻事...否则则死的巨悲惨)。handler有三种返回方式:一切正常,发生错误,或这拒绝处理请求并转向默认handler(一般是静态文件,译注:404之类的)。

 

If the handler happens to be a reverse proxy to some set of backend servers, there is room for another type of module: the load-balancer. A load-balancer takes a request and a set of backend servers and decides which server will get the request. Nginx ships with two load-balancing modules: round-robin, which deals out requests like cards at the start of a poker game, and the "IP hash" method, which ensures that a particular client will hit the same backend server across multiple requests.

如果这个handler是一个反向代理,那么load-balancer就有一次被调用的机会。load-balancer拿到一个请求和一坨后端服务器,并且决定把这个请求发到具体的哪一台上。nginx带了两个负载均衡模块:轮询(译注:作者举例就像发牌,我觉得这不需要任何举例);还有一种是:IP哈希,保证来自同一个IP的请求都能发到同一台机器(译注:在性能调忧的过程中,IP哈希可能更好用一点)。

 

If the handler does not produce an error, the filters are called. Multiple filters can hook into each location, so that (for example) a response can be compressed and then chunked. The order of their execution is determined at compile-time. Filters have the classic "CHAIN OF RESPONSIBILITY" design pattern: one filter is called, does its work, and then calls the next filter, until the final filter is called, and Nginx finishes up the response.

如果handler不报错,就会调用filter。每个location都可以挂n个filter,所以一个响应可以先被压缩,然后分块。这些filter的执行顺序在编译阶段就决定了(译注:忘了apache是怎么处理这个顺序问题的,我想知道,求指教)。filter采用了经典的“CHAIN OF RESPONSIBILITY”设计模式:一个filter被调用,做属于它的工作,然后调用下一个,直到最后一个filter被调用,nginx完成这次响应。

 

The really cool part about the filter chain is that each filter doesn't wait for the previous filter to finish; it can process the previous filter's output as it's being produced, sort of like the Unix pipeline. Filters operate on buffers, which are usually the size of a page (4K), although you can change this in your nginx.conf. This means, for example, a module can start compressing the response from a backend server and stream it to the client before the module has received the entire response from the backend. Nice!

filter chain真正牛的地方是,每一个filter可以不用等他之前的filter完全结束,就开始对前一个filter的输出进行操作,就像Unix pipeline一样(译注:整个协议是个流,而不是一个块,所以一个响应在这堆filter里不具有原子性,这样filter的效率大大提高)。filter一般在一个4K大小的一个缓冲区里进行操作,但可以在nginx.conf里改。也就是说,例如一个模块可以在完全接收到后端传来的响应之前,就开始对一部分数据进行压缩并传到客户端。牛B!

 

So to wrap up the conceptual overview, the typical processing cycle goes:

Client sends HTTP request → Nginx chooses the appropriate handler based on the location config → (if applicable) load-balancer picks a backend server → Handler does its thing and passes each output buffer to the first filter → First filter passes the output to the second filter → second to third → third to fourth → etc. → Final response sent to client

总结一下这些概念性的东西,一般的的操作周期就是:

客户端发了一个http请求 → nginx根据location配置挑了一个handler → (如果一切正常)load-balancer会挑一个后端服务器 → handler做它该做的事,并且把一个一个的输出缓冲发给第一个filter → 第一个filter → 第一个filter把输出传给第二个filter → 第二个给第三个 → 依次往后传 → 最后响应发还给客户端

 

I say "typically" because Nginx's module invocation is extremely customizable. It places a big burden on module writers to define exactly how and when the module should run (I happen to think too big a burden). Invocation is actually performed through a series of callbacks, and there are a lot of them. Namely, you can provide a function to be executed:

这里我所谓“一般的”,是因为nginx模块的调用是可以非常定制化的。他给模块开发这极大的自由去定义模块在何时如何去运行(有时候这个自由甚至显得太大了)(译注:原文用了burden,责任这个词,我更倾向这种对模块运行时的定义是一种自由)。模块调用是通过很多的回调来实现的,也就是说,你要写一堆可以被调用的函数:

 

  • Just before the server reads the config file
  • 在服务器载入配置文件之前
  • For every configuration directive for the location and server for which it appears;
  • 为location和server的每个配置指令
  • When Nginx initializes the main configuration
  • 当nginx初始化main配置
  • When Nginx initializes the server (i.e., host/port) configuration
  • 当nginx初始化server配置
  • When Nginx merges the server configuration with the main configuration
  • 当nginx合并server和main配置
  • When Nginx initializes the location configuration
  • 当nginx初始化location配置
  • When Nginx merges the location configuration with its parent server configuration
  • 当nginx合并location还有他上一级的server配置
  • When Nginx's master process starts
  • 当nginxmaster进程启动(译注:nginx是一个master进程带一堆对worker的工作模式)
  • When a new worker process starts
  • 当一个心的worker进程启动
  • When a worker process exits
  • 当一个worker进程结束
  • When the master exits
  • 当主进程结束
  • Handling a request
  • 处理一个请求
  • Filtering response headers
  • 过滤响应的headers
  • Filtering the response body
  • 过滤响应的body
  • Picking a backend server
  • 选择后端服务器
  • Initiating a request to a backend server
  • 初始化发给后端服务器的请求
  • Re-initiating a request to a backend server
  • 重新初始化发送给后端服务器的请求
  • Processing the response from a backend server
  • 处理后端服务器发来的响应
  • Finishing an interaction with a backend server
  • 完成与后端服务器的合作

 

 

Holy mackerel! It's a bit overwhelming. You've got a lot of power at your disposal, but you can still do something useful using only a couple of these hooks and a couple of corresponding functions. Time to dive into some modules.

(译注:作者又开始感慨了)我嘞个擦!看了这么多的东西。你应该储备了很多的知识了,但你似乎还是只能用这些钩子方法什么的工作。还是深入这些模块看看吧。

标签:nginx module 模块开发 扩展开发 Emiller

添加新评论