Linux 内核编程指南

Linux 内核编程指南

PeterJay Salzman

MichaelBurian

OriPomerantz

2007−05−18ver 2.6.4

<Linux内核编程指南>是免费的，你可以在Open Software License许可证1.1版的约束下修改并重新发布本书; 可以在网站http://opensource.org/licenses/osl.php找到该许可证的副本.

作者期待本书对您有所帮助，但对针对任何特定目标的适用性和商业前景则不做任何保证。

如果您对本书做了修改，则需保证本书维护者，Peter Jay Salzman， p@dirac.org，可以获得相应的修改，包括源代码以及新的版本信息，以利作者整合您的修改并向Linux社区提供一致的版本信息。

如果您商业化发行了本书，本书作者及Linux Documentation Project (LDP)项目组对您的捐赠、稿酬甚或是几本成书都将非常感谢。这类捐赠体现了您对开源软件及LDP的支持。

如果您有任何问题，请不吝发送邮件至上述邮箱。

前言

1. 作者

本书最初是由Ori Pomerantz.针对V2.2内核所写。后来，Ori 没有充足的时间来维护本书。毕竟，Linux内核的变化太快了。

PeterJay Salzman 接手维护本书并针对V 2.4内核做了更新。但之后Peter也没有时间针对V2.6内核升级本书，所以Michael Burian成为本书的合作维护者，并将本书针对V2.6内核做了更新.

2. 本书的版本

Linux内核在不断更新. 对于本书是否应该去除有关旧内核的相关内容一直以来是有疑问的。Michael Burian和我最终决定为每一个内核的稳定版本创建本书的一个分支。所以，本书的2.4.x版本只针对2.4内核讲解而本书的2.6.x版本只讲解2.6内核. 并将不再整合更早版本的内核信息。如您需要相关信息，则应该阅读本书的相应版本。

书中的讨论、源代码应适用于绝大多数的架构，但我本人不做任何保证。一个例外是第12章：中断处理，该章节的内容只适用于x86架构。

3. 致谢

下列人员对本书提出了意见和建议: Ignacio Martin, David Porter, Daniele

PaoloScarpazza, Dimo Velev, Francois Audeon 和 Horst Schirmeier.

第一章. 简介

1.1. 什么是内核模块（What Is A KernelModule?）

你想写一个内核模块。你会用C，也曾开发过几个应用程序，而现在你想到更深层次的地方看看，在那里，一个指针可以擦掉整个文件系统，而一次core dump则意味着系统重启。

到底什么是内核模块？模块就是一块代码，可以根据命令被载入内核或从内核中移除。

模块在不需要重启系统的前提下丰富了内核的功能。例如，有一类模块是设备驱动，它使得内核可以访问连接到系统的硬件。

如果没有模块的支持，我们只能通过编译整个内核（monolithic kernels）的方式将新功能加入到内核镜像中。这种方式的不利之处除了内核尺寸庞大之外，每次添加新功能我们都不得不重新编译整个内核。

1.2. 模块是如何被加入内核的？（How Do Modules GetInto The Kernel?）

运行命令lsmod,能看到那些被加载到内核的模块。lsmod是通过读取/proc/modules文件来获取这些信息的。这些模块是如何进入内核的？当内核需要一个目前不支持的功能时，内核模块守护进程（kernel module daemon kmod）会执行modprobe以加载一个模块。一个字符串会被传入modprobe，该字符串可能是以下两种格式之一：

· 模块名，比如softdog 或 ppp.

· 一个通用ID, 比如char−major−10−30.

如果传递给 modprobe 的是一个通用ID, 它首先会在文件/etc/modprobe.conf.[2] 中寻找该字符串，如果它找到了一个如下的alias行:

aliaschar−major−10−30 softdog

modprobe知道该ID代表模块softdog.ko.

接下来，modprobe 会去文件 /lib/modules/version/modules.dep中查找是否有其它模块需要被先行加载。Modules.dep文件由depmod −a 命令创建，其中包含了模块的依存关系（module dependencies）. 例如, msdos.ko 要求fat.ko 先于其被加载到内核。如果模块A中包含会被模块B使用的符号（symbols），包括变量和函数，则模块B依赖于模块A。

最后，modprobe 调用 insmod 来加载所有被依赖模块进入内核，并最终加载目标模块。 Modprobe将insmod 定向到 /lib/modules/version/[3], 该目录是模块的标准目录。

Insmod对模块的位置信息一无所知，而modprobe则知道模块的缺省位置，并指导如何获得模块间的依存关系从而以正确次序加载它们。例如，想要加载msdos，有两种方法：

insmod/lib/modules/2.6.11/kernel/fs/fat/fat.ko

insmod/lib/modules/2.6.11/kernel/fs/msdos/msdos.ko

或:

modprobemsdos

如你所见：insmod 要求传入完整路径，并要求用户保证正确的加载次序。而 modprobe 只需要传入不带扩展名的模块名，并通过读取/lib/modules/version/modules.dep 文件得到所有必要信息。

Linux将modprobe, insmod 和 depmod 打包提供，称之为module−init−tools。该包在之前的版本中被叫作modutils。两个包可以被设置为在系统中共存以同时支持2.4和2.6内核。只要确认运行该工具的较新版本，用户无需关心细节。

现在你已经了解了模块如何被加载到内核。但如果要开发的模块需要依赖其它模块（称作‘stacking modules’），还需要多了解些东西。在进一步学习这个相对’高级‘的课题之前，我们还需要了解更多的基础知识。

1.2.1. 开始之前（Before We Begin）

在进入具体代码之前，还有几个问题要先行讨论。

每个人的系统都是不同的，每个人都有自己的习惯做法。有时候，编译运行"hello world" 都会磕磕绊绊。但可以明确的是，当你过了这一关，以后就会顺利多了。

1.2.1.1. 模块版本（Modversioning）

内核不能加载针对另一个内核编译的模块，除非打开了CONFIG_MODVERSIONS选项。我们会在稍后详解模块版本。在此之前，如果您的系统的modversioning选项是打开的，本指南中的例程可能不能正常工作。然而，大多数Linux的发行版都是打开该选项的。如果您在加载模块时遇到模块版本的问题，就需要关闭该选项然后重新编译内核。

1.2.1.2. 图形界面（Using X）

强烈建议读者自行输入、编译并加载本指南中的例程，并建议读者采用控制台做上述工作，而不要使用图形界面（X）。

模块无法使用printf()输出信息到屏幕, 但可以将信息保存在日志中，并通过控制台将日志发送到屏幕上。如果采用X，你只能自行打开日志文件查看了。要随时访问这些信息，只能采用控制台模式。

1.2.1.3. 编译和内核版本（Compiling Issues andKernel Version）

Linux发行版本通常会将打好补丁的内核源代码放在不同于标准的其它地方，这往往会带来问题。

更常见的问题是Linux发行版中的Linux头文件不完整，当你试图编译内核时你需要用到这些头文件。墨菲定律指出那些缺少的头文件往往恰好是你的模块所需要的。

为了避免这两个问题，我强烈推荐读者自行下载、编译、引导内核。该过程可以从Linux Kernel HOWTO 中取得帮助.

讽刺的是，这也可能引发问题。缺省地，你的系统中的gcc会在默认目录，而不是你自行下载的内核目录（通常放在/usr/src）中寻找头文件。

可以通过使用gcc的 –I 选项解决这个问题。

第二章. Hello World

2.1. Hello World(part 1): The Simplest Module

第一位穴居程序员在石头计算机上凿刻出的第一个程序就是输出“hello world”。罗马编程教材也是以“SalutMundi”程序作为开始。我不知道违反这个传统的程序员遭遇了什么，但我想还是不知道为妙。我们将以一系列的hello world程序开始，让你了解编写一个内核模块的方方面面。

下面可能是最简单的内核模块。先不要编译它，我们将在下一节讲述有关编译的内容。

Example2-1. hello-1.c

* hello-1.c – Thesimplest kernel module

#include<linux/module.h> /* Needed by allmodule*/

#include<linux/kernel.h> /* Needed forKERN_INFO*/

Int init_module(void)

{

Printk(KERN_INFO “Hello world 1. \n”);

* A non 0 return means init-modulefailed; module can’t be loaded.

return 0;

}

Voidclean_module(void)

{

Printk(KERN_INFO “Goodbye world 1. \n”);

}

内核模块至少需要两个函数：一个‘开始’（初始化）函数叫作init_module( )，该函数在模块被加载到内核时（insmod）被调用；一个‘结束’（清除）函数cleanup_module( ),

该函数在模块被卸载时（rmmod）被调用。

事实上，自V2.3.13起，情况有所变化：你可以使用任何名字作为开始/结束函数的名字。我们将在章节2.3中详细介绍该方法。事实上，更加推荐您使用新方法。然而，很多人还是习惯于使用init_module() 和 cleanup_module()作为函数名。

通常, init_module()要么在内核中注册一个句柄, 要么取代系统中原有的函数（一般做些特殊的动作，而后调用原有的函数）。cleanup_module() 函数则还原 init_module()的动作，以使模块可被安全地卸载。

最后，每个模块都要包含linux/module.h.。包含linux/kernel.h则只是为了在printk()使用宏以定义日志级别。章节2.1.1将详述该宏。

2.1.1. 介绍printk（）（Introducing printk()）

无论你怎么想， printk() 不是用来和用户交互信息的，尽管我们确实是为了这个目的在hello-1中使用它。它是内核记录日志的一种机制，用来保存信息和发出警告。因此，每个printk()语句都有一个优先级，例如<1>,(KERN_ALERT)。系统定义了8个优先级并分别为它们定义了宏，你大可不必使用具体的数字。可以在linux/kernel.h.中看到这些宏并了解其具体含义。如果你没有指定优先级，则将使用缺省优先级, DEFAULT_MESSAGE_LOGLEVEL。

如果优先级小于int console_loglevel, 信息将会被打印到你正在使用的终端上（terminal）。

如果syslogd 和 klogd 都在运行, 无论信息是否会被打印到控制台（console），都会被追加到文件/var/log/messages中。

为了确保printk信息被打印到控制台，而不是仅仅输出到日志文件，我们使用了高优先级, 比如 KERN_ALERT。当你真正开发模块时，要根据实际情况选择相应的优先级。

2.2. 编译内核模块（Compiling KernelModules）

编译内核模块和编译应用程序略有不同。较早的内核版本要求用户自己在makefile中维护相应设置。尽管采用层级结构，还是会在次及的makefile中形成很多重复定义，导致整个makefile系统大且难以维护。幸运地是，现在有kbuild帮我们做这件事，编译可加载模块的过程已经完全被整合到标准的内核构建机制中了。要学习如何编译不在官方内核中的模块（如本指南中的例程），可参考linux/Documentation/kbuild/modules.txt。

让我们来看看用于编译hello−1.c的makefile：

Example 2−2. Makefile for a basic kernel module

obj−m+= hello−1.o

all:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) modules

clean:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) clean

从技术角度看只有第一行是必须的，目标“all”和“clean”只是为了方便。

现在执行命令make就可以编译该模块了。会看到类似下面的输出：

hostname:~/lkmpg−examples/02−HelloWorld#make

make−C /lib/modules/2.6.11/buildM=/root/lkmpg−examples/02−HelloWorld modules

make[1]:Entering directory `/usr/src/linux−2.6.11'

CC [M]/root/lkmpg−examples/02−HelloWorld/hello−1.o

Buildingmodules, stage 2.

MODPOST

CC/root/lkmpg−examples/02−HelloWorld/hello−1.mod.o

LD [M]/root/lkmpg−examples/02−HelloWorld/hello−1.ko

make[1]:Leaving directory `/usr/src/linux−2.6.11'

hostname:~/lkmpg−examples/02−HelloWorld#

请留意内核版本2.6支持新的模块名名规则：内核模块的扩展名为.ko ( 取代了老版本内核中的.o 扩展名)，以利区分内核模块和通常的目标文件。这是因为.ko文件中包含了额外的模块信息。我们很快就能看到这么做的好处。

可以使用 modinfo hello−*.ko 命令查看模块信息.

hostname:~/lkmpg−examples/02−HelloWorld#modinfo hello−1.ko

filename:hello−1.ko

vermagic:2.6.11 preempt PENTIUMII 4KSTACKS gcc−3.3

depends:

到目前为止，没有什么特别的信息。但如果查看例程hello-5.ko，情况就不一样了。

hostname:~/lkmpg−examples/02−HelloWorld#modinfo hello−5.ko

filename:hello−5.ko

license:GPL

author:Peter Jay Salzman

vermagic:2.6.11 preempt PENTIUMII 4KSTACKS gcc−3.3

depends:

parm:myintArray:An array of integers (array of int)

parm:mystring:A character string (charp)

parm:mylong:A long integer (long)

parm:myint:An integer (int)

parm:myshort:A short integer (short)

hostname:~/lkmpg−examples/02−HelloWorld#

包含了更多有用的信息，用于汇报bug的作者信息，许可证，甚至还有模块参数的简短说明。

内核模块makefile的细节请参考

linux/Documentation/kbuild/makefiles.txt.

请确保在修改makefile前阅读该文档及相关文档。这可能会为你节省很多时间。

现在到了用insmod ./hello−1.ko加载你自己的模块的时候了。 (暂时忽略可能出现的有关内核污染 -tainted kernels – 的信息我们马上会讲到)。所有载入内核的模块都会在/proc/modules文件中被列出。Cat该文件确认你的模块已经成为内核的一部分。恭喜你，现在你的身份是Linux内核代码的作者了!

新鲜劲儿过去后，请用rmmod hello−1命令从内核中卸载你的模块。看看/var/log/messages文件，能看到上述动作已经被记录到系统日志中了。

还有个练习给大家：看到hello-1中init_module()函数内的return语句了?

把返回值改成负数，重现编译、加载该模块，发生什么了？

2.3. Hello World (part 2)

对于2.4内核, 你可以重命名init 和 cleanup 函数; 它们不再非得叫init_module() 和 cleanup_module()了. 这多亏了宏module_init()和 module_exit()。

这两个宏在linux/init.h中定义。唯一需要留意的是必须在使用这两个宏之前定义开始/结束函数，否则会编译错误。

下面的例子演示了具体做法：:

Example 2−3. hello−2.c

*hello−2.c − Demonstrating the module_init() and module_exit() macros.

* Thisis preferred over using init_module() and cleanup_module().

#include<linux/module.h> /* Needed by all modules */

#include<linux/kernel.h> /* Needed for KERN_INFO */

#include<linux/init.h> /* Needed for the macros */

staticint __init hello_2_init(void)

{

printk(KERN_INFO "Hello, world2\n");

return 0;

}

staticvoid __exit hello_2_exit(void)

{

printk(KERN_INFO "Goodbye, world2\n");

}

module_init(hello_2_init);

module_exit(hello_2_exit);

现在，我们有两个真正的内核模块了。增加一个模块非常简单：

Example 2−4. Makefile for both our modules

obj−m+= hello−1.o

obj−m+= hello−2.o

all:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) modules

clean:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) clean

可以看看 linux/drivers/char/Makefile ，对真正的makefile有个印象。如你所见，有些被‘硬编码’(hardwired)到内核中了（obj-y），但那些obj-m去哪了？

那些熟悉shellscripts的读者会很容易就明白，那些obj−$(CONFIG_FOO)项会被扩展为obj−y 或 obj−m, 取决于CONFIG_FOO 变量是被定义为y 还是 m. 这些CONFIG_FOO 变量正式你在linux/.confi文件, 或你上次运行make menuconfig 时定义的。

2.4. Hello World (part 3): The __init and __exit Macros

下面的例程演示了2.2及其后续内核的一个特色。留意init 和 cleanup函数的定义有了变化。

对于内核内置的驱动程序而言，宏 __init 导致 init 函数在调用结束后立即被清除，其申请的内存也马上被释放。而可加载模块却不会如此。如果你考虑下init函数是何时被调用的，这么做是完全合理的。

还有一个和__init类似的宏， __initdata，该宏作用于变量，而不是函数。

宏__exit 导致内核内置模块的cleanup函数被忽略，和__init类似，这个宏不对可加载宏产生任何影响。同样地，如果你想想cleanup函数何时被执行，也就能理解为何要这么做了：内置模块根本就不需要cleanup函数，而可加载模块需要。

这些宏定义在linux/init.h 中，用于释放内核内存。当你启动内核，看到类似于“Freeing unused kernel memory: 236k freed, ”的输出时，正是内核释放的。

Example 2−5. hello−3.c

*hello−3.c − Illustrating the __init, __initdata and __exit macros.

#include<linux/module.h> /* Needed by all modules */

#include<linux/kernel.h> /* Needed for KERN_INFO */

#include<linux/init.h> /* Needed for the macros */

staticint hello3_data __initdata = 3;

staticint __init hello_3_init(void)

{

printk(KERN_INFO "Hello, world%d\n", hello3_data);

return 0;

}

staticvoid __exit hello_3_exit(void)

{

printk(KERN_INFO "Goodbye, world3\n");

}

module_init(hello_3_init);

module_exit(hello_3_exit);

2.5. Hello World (part 4): Licensing and Module

Documentation

如果你的内核版本是2.4或更新，当加载模块时，你可能已经留意到了如下的信息：

#insmod xxxxxx.o

Warning:loading xxxxxx.ko will taint the kernel: no license

Seehttp://www.tux.org/lkml/#export−tainted for information about tainted modules

Modulexxxxxx loaded, with warnings

Kernel2.4及其后续版本支持一种新的机制以识别GPL家族授权，如此一来，如果代码不是开源的，用户会受到相应警告。这套机制是由宏MODULE_LICENSE()实现的。下面的代码演示了如何使用该宏。

通过设置GPL，可以避免打印相应警告。这套机制在linux/module.h:中定义：

* Thefollowing license idents are currently accepted as indicating free

*software modules

*"GPL" [GNU Public License v2 or later]

*"GPL v2" [GNU Public License v2]

*"GPL and additional rights" [GNU Public License v2 rights and more]

*"Dual BSD/GPL" [GNU Public License v2

* orBSD license choice]

*"Dual MIT/GPL" [GNU Public License v2

* orMIT license choice]

*"Dual MPL/GPL" [GNU Public License v2

* orMozilla license choice]

* Thefollowing other idents are available

*"Proprietary" [Non free products]

*There are dual licensed components, but when running with Linux it is the

* GPLthat is relevant so this is a non issue. Similarly LGPL linked with GPL

* is aGPL combined work.

* Thisexists for several reasons

* 1.So modinfo can show license info for users wanting to vet their setup

* isfree

* 2.So the community can ignore bug reports including proprietary modules

* 3.So vendors can do likewise based on their own policies

类似地，宏MODULE_DESCRIPTION() 用来描述module的功能；宏MODULE_AUTHOR()声明作者， MODULE_SUPPORTED_DEVICE() 声明该模块支持何种类型的设备。

这些宏全都在linux/module.h 中定义，kernel本身并不会引用它们，它们只是用来保存信息，可以用如objdump这样的工具来查看它们。

作为给读者的一个练习，请试着在linux/drivers 目录下找到这些宏并且看看模块作者是如何利用它们的。

建议在/usr/src/linux−2.6.x/目录下使用grep −inr MODULE_AUTHOR *

Example 2−6. hello−4.c

*hello−4.c − Demonstrates module documentation.

#include<linux/module.h> /* Needed by all modules */

#include<linux/kernel.h> /* Needed for KERN_INFO */

#include<linux/init.h> /* Needed for the macros */

#defineDRIVER_AUTHOR "Peter Jay Salzman <p@dirac.org>"

#defineDRIVER_DESC "A sample driver"

staticint __init init_hello_4(void)

{

printk(KERN_INFO"Hello, world 4\n");

return0;

}

staticvoid __exit cleanup_hello_4(void)

{

printk(KERN_INFO"Goodbye, world 4\n");

}

module_init(init_hello_4);

module_exit(cleanup_hello_4);

* Youcan use strings, like this:

* Getrid of taint message by declaring code as GPL.

MODULE_LICENSE("GPL");

* Orwith defines, like this:

MODULE_AUTHOR(DRIVER_AUTHOR);/* Who wrote this module? */

MODULE_DESCRIPTION(DRIVER_DESC);/* What does this module do */

* Thismodule uses /dev/testdevice. The MODULE_SUPPORTED_DEVICE macro might

* beused in the future to help automatic configuration of modules, but is

*currently unused other than for documentation purposes.

MODULE_SUPPORTED_DEVICE("testdevice");

2.6. 传递命令行参数给模块（Passing Command LineArguments to a Module）

模块可以接受来自命令行的参数，但并不是以你熟悉的argc/argv的方式。

要允许模块接收参数，必须将存储命令行参数的那些变量声明为全局变量，并使用宏

module_param() (定义在linux/moduleparam.h中)。在运行时，insmod命令会将命令行的参数赋值给这些变量，例如./insmod mymodule.ko myvariable=5.

为避免混淆，最好将变量声明和宏放到模块的最前面。请看下面的例子：

宏 module_param()有三个参数: 变量的名字，变量的类型以及所对应文件的权限。整型变量可以是带符号的或不带符号的。如果要使用整型数组或字符串，可以使用宏

module_param_array()和 module_param_string().

intmyint = 3;

module_param(myint,int, 0);

对于数组的支持和2.4内核的做法有些不同。为了保存参数的个数，可以传递第三个参数：一个指向参数个数变量的指针。也可以忽略参数计数而传递一个空（NULL）指针。

intmyintarray[2];

module_param_array(myintarray,int, NULL, 0); /* not interested in count */

intmyshortarray[4];

intcount;

module_parm_array(myshortarray,short, , 0); /* put count into "count" variable */

这么做的一个好处是可以为参数设置缺省值，像一个端口或是IO地址。如果参数有缺省值，就会执行自动检测（本文档内另行解释）。否则就保留当前值。

还有一个宏MODULE_PARM_DESC(), 用于描述模块可以接受的参数。该宏有两个参数：变量名和描述字符串。

Example 2−7. hello−5.c

*hello−5.c − Demonstrates command line argument passing to a module.

#include<linux/module.h>

#include<linux/moduleparam.h>

#include<linux/kernel.h>

#include<linux/init.h>

#include<linux/stat.h>

MODULE_LICENSE("GPL");

MODULE_AUTHOR("PeterJay Salzman");

staticshort int myshort = 1;

staticint myint = 420;

staticlong int mylong = 9999;

staticchar *mystring = "blah";

staticint myintArray[2] = { −1, −1 };

staticint arr_argc = 0;

*module_param(foo, int, 0000)

* Thefirst param is the parameters name

* Thesecond param is it's data type

* Thefinal argument is the permissions bits,

* forexposing parameters in sysfs (if non−zero) at a later stage.

module_param(myshort,short, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP);

MODULE_PARM_DESC(myshort,"A short integer");

module_param(myint,int, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

MODULE_PARM_DESC(myint,"An integer");

module_param(mylong,long, S_IRUSR);

MODULE_PARM_DESC(mylong,"A long integer");

module_param(mystring,charp, 0000);

MODULE_PARM_DESC(mystring,"A character string");

*module_param_array(name, type, num, perm);

* Thefirst param is the parameter's (in this case the array's) name

* Thesecond param is the data type of the elements of the array

* Thethird argument is a pointer to the variable that will store the number

* ofelements of the array initialized by the user at module loading time

* Thefourth argument is the permission bits

module_param_array(myintArray,int, &arr_argc, 0000);

MODULE_PARM_DESC(myintArray,"An array of integers");

staticint __init hello_5_init(void)

{

int i;

printk(KERN_INFO "Hello, world5\n=============\n");

printk(KERN_INFO "myshort is ashort integer: %hd\n", myshort);

printk(KERN_INFO "myint is aninteger: %d\n", myint);

printk(KERN_INFO "mylong is along integer: %ld\n", mylong);

printk(KERN_INFO "mystring isa string: %s\n", mystring);

for (i = 0; i < (sizeofmyintArray / sizeof (int)); i++)

{

printk(KERN_INFO"myintArray[%d] = %d\n", i, myintArray[i]);

}

printk(KERN_INFO "got %d arguments formyintArray.\n", arr_argc);

return 0;

}

staticvoid __exit hello_5_exit(void)

{

printk(KERN_INFO "Goodbye, world5\n");

}

module_init(hello_5_init);

module_exit(hello_5_exit);

做如下测试：

satan#insmod hello−5.ko mystring="bebop" mybyte=255 myintArray=−1

mybyteis an 8 bit integer: 255

myshortis a short integer: 1

myintis an integer: 20

mylongis a long integer: 9999

mystringis a string: bebop

myintArrayis −1 and 420

satan#rmmod hello−5

Goodbye,world 5

satan#insmod hello−5.ko mystring="supercalifragilisticexpialidocious" \

>mybyte=256 myintArray=−1,−1

mybyteis an 8 bit integer: 0

myshortis a short integer: 1

myintis an integer: 20

mylongis a long integer: 9999

mystringis a string: supercalifragilisticexpialidocious

myintArrayis −1 and −1

satan#rmmod hello−5

Goodbye,world 5

satan#insmod hello−5.ko mylong=hello

hello−5.o:invalid argument syntax for mylong: 'h'

2.7. 多文件模块（Modules SpanningMultiple Files）

有时候用多个文件来实现一个模块比较合理。下面是一个例子：

Example 2−8. start.c

*start.c − Illustration of multi filed modules

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

intinit_module(void)

{

printk(KERN_INFO "Hello, world − thisis the kernel speaking\n");

return 0;

}

Example 2−9. stop.c

*stop.c − Illustration of multi filed modules

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

voidcleanup_module()

{

printk(KERN_INFO "Short is the life ofa kernel module\n");

}

Example 2−10. Makefile

obj−m+= hello−1.o

obj−m+= hello−2.o

obj−m+= hello−3.o

obj−m+= hello−4.o

obj−m+= hello−5.o

obj−m+= startstop.o

startstop−objs:= start.o stop.o

all:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) modules

clean:

make −C /lib/modules/$(shell uname−r)/build M=$(PWD) clean

迄今为止，所有的例子都可以使用这个makefile。前五行都是例行的定义，但对于最后一个例子而言，需要为模块所包含的文件定义一个对象名，并且告诉make这些文件属于这个模块。

2.8. 为已编译好的kernel构建模块（Building modules for a precompiled kernel）

强烈建议你重新编译kernel，以打开几个有用的调试功能。例如强制卸载模块 (MODULE_FORCE_UNLOAD): 这个功能打开时，用户可以使用rmmod –f module 命令强制卸载模块。这个特性能为你节省很多时间。

有时，你会想加载模块到正在运行的内核，如一个Linux发行版或你之前编译好的内核，不允许你重新编译内核，或不允许你重启机器。这种情况下，你还是可以添加模块到内核。如果你确信不会发生这种情况，请略过本章。

现在，假设你安装好了内核代码树并编译好了内核，如果你试图添加模块，大多数情况下你会碰到以下错误：

insmod: errorinserting 'poet_atkm.ko': −1 Invalid module format

可以到 /var/log/messages 找到更多信息:

Jun 422:07:54 localhost kernel: poet_atkm: version magic '2.6.5−1.358custom686

REGPARM4KSTACKS gcc−3.3' should be '2.6.5−1.358 686 REGPARM4KSTACKS gcc−3.3'

也就是说, 由于版本（更准确的说，版本魔数，version magics）不匹配，内核拒绝添加该模块。顺便说下, version magics以静态字符串的形式存储在模块对象中，以vermagic:开头。当和init/vermagic.o文件链接时，版本信息被添加到模块当中file. 想要看version magics和其它模块中的字符串，可以使用以下命令：

modinfo module.ko command:

[root@pcsenonsrv02−HelloWorld]# modinfo hello−4.ko

license:GPL

author:Peter Jay Salzman <p@dirac.org>

description:A sample driver

vermagic:2.6.5−1.358 686 REGPARM 4KSTACKS gcc−3.3

depends:

虽然使用 −−force−vermagic 选项可以避免这个问题，但这个方法是不安全的，而且对于产品化的模块来说毫无疑问是无法接受的。所以，我们会想要在和之前编译内核时一模一样的环境中编译模块。本章后续内容将教你如何操作。

首先，确保拥有内核代码树，而且版本和正在运行的版本是相同的。

其次，找到编译当前运行内核所用的配置文件。通常，该文件位于/boot目录，文件名类似于config−2.6.x. 或许你只是想把它复制到你的内核代码树：

cp/boot/config−`uname −r` /usr/src/linux−`uname −r`/.config

再回到之前的错误信息，仔细看看版本魔数字符串：即使一模一样的两个配置文件，版本魔数也有可能有些微不同，这点差别足以让内核拒绝加入模块。这个差别，称作custom 字符串，由于某些Linux发行版中修改了makefile，只出现在模块的版本魔数中，而不会出现在内核的版本魔数中。请检查你的代码树中的makefile，/usr/src/linux/Makefile, 确保其中的版本信息和正在运行的内核的版本信息完全相符。举例来说，你的makefile可能是这样的：

VERSION= 2

PATCHLEVEL= 6

SUBLEVEL= 5

EXTRAVERSION= −1.358custom

...

上例中，你需要将符号EXTRAVERSION的赋值改回−1.358. 建议备份用于编译内核的makefile，位于/lib/modules/2.6.5−1.358/build. 简单的

cp /lib/modules/`uname −r`/build/Makefile/usr/src/linux−`uname −r` 命令就足够了。此外，如果你已经用未经改动的（错误的）makefile编译内核，需要重新运行make, 或者根据文件/lib/modules/2.6.x/build/include/linux/version.h的内容，直接在文件/usr/src/linux−2.6.x/include/linux/version.h中修改符号UTS_RELEASE 。

然后，运行maket以更新配置和版本信息:

[root@pcsenonsrvlinux−2.6.x]# make

CHKinclude/linux/version.h

UPDinclude/linux/version.h

SYMLINKinclude/asm −> include/asm−i386

SPLITinclude/linux/autoconf.h −> include/config/*

HOSTCCscripts/basic/fixdep

HOSTCCscripts/basic/split−include

HOSTCCscripts/basic/docproc

HOSTCCscripts/conmakehash

HOSTCCscripts/kallsyms

CCscripts/empty.o

...

如果你并不想真正地重新编译内核，可以在SPLIT之后使用 (CTRL−C) 中断make进程，此时相关文件都已更新完毕。现在可以回到模块所在目录进行编译：模块将在和编译当前内核一样的环境下被编译，并顺利地加入内核。

Chapter 3. 准备工作（Preliminaries）

3.1. 模块 vs 程序（Modules vs Programs）

3.1.1. 模块如何开始和结束（How modules begin andend）

通常，应用程序从main() 函数开始，执行一些指令并在执行完后停止运行。内核模块则稍有不同。一个模块要么从init_module（）开始运行，要么从你用宏module_init所指定的函数开始运行。这是模块的入口函数：它告诉内核该模块提供哪些功能并设置内核在需要时运行该模块。做完上述工作，入口函数立即返回，而内核在需要模块提供某个功能前也不会做任何事。

所有模块要么调用cleanup_module（），要么调用宏module_exit所指定的函数结束运行。这是模块的退出函数：它undo入口函数所作的设置并注销模块。

每个模块都需要有入口/出口函数。由于存在多种方法指定入口/出口函数，我会尽量使用术语入口函数和出口函数，如果我不小心使用了init_module和cleanup_module, 读者应该可以理解所指。

3.1.2. 模块可用的函数（Functions availableto modules）

程序员经常使用他人定义的函数，一个常见的例子是printf()。你会调用这个由标准C库， libc，提供的函数，但只有在链接时该函数定义才会链接到你的程序，以确保调用顺利执行。

内核模块则有所不同，在hello world例程中，你可能注意到了我们使用的是另一个函数， printk()，但并没有包含标准I/O库的头文件. 这是因为模块是对象文件，它所用的符号直到模块被载入时（insmod）才做解析。符号的定义来自于内核本身；模块可以使用的局限于有内核提供的函数。如果对内核导出了哪些符号感到好奇，可以看看/proc/kallsyms。

有关库函数和系统调用的差别，需要牢记于心的是库函数是高层函数，完全运行在用户空间，库函数将做实际工作的那些函数---系统调用---重新包装以方便程序员使用。系统调用运行在核心态，由内核提供。库函数printf()是通用的打印函数，但它实际所做的只是格式化字符串并利用底层系统调用write()将数据写入标准输出。

想看看printf（）做了哪些系统调用？很简单，用gcc −Wall −o hello hello.c编译以下程序：

#include<stdio.h>

intmain(void)

{printf("hello"); return 0; }

使用strace ./hello 运行该程序，看到了么？每一行都是一个系统调用。strace[4] 是一个很便利的工具，用于显示应用程序所做的系统调用, 包括参数及返回值。如果想要知道诸如应用程序试图访问什么文件等信息，这个工具不可或缺。在最后，你会看到类似write(1, "hello", 5hello)的一行输出。它就是隐藏在printf() 面具下的真面孔。由于大多数人通常使用库函数来做文件 I/O ( fopen, fputs,fclose)，你可能对write（）系统调用并不熟悉。如果是这样，试试man 2 write. Man的第二节用于系统调用 (like kill() and read().) 第三节用于你可能更熟悉的库函数调用（比如 cosh()， random()).

甚至可以用自己写得模块来取代内核的系统调用，稍后我们就会这么做。黑客们经常利用其作为后门或木马，而你可以用它做些有益的事情，比如每当有人试图从你的系统中删除文件时让内核输出“嘻嘻，痒!” everytime someone tries to delete a file onyour system.

3.1.3. 用户空间 vs 内核空间（User Space vs Kernel Space）

内核所做的一切都关乎于资源（A kernel is all about access to resources）,无论资源是视频卡，硬盘还是内存。应用程序经常会竞争资源。比如我刚刚保存了这个文档， updatedb开始向数据库写入数据。而我的vim和updatedb同时都在使用硬盘。内核需要保证动作有序，不能让用户随心所欲地访问资源。为了这个目的， CPU可以在不同的模式下运行，每种模式提供给用户不同的权限。Intel 80386 有4种模式，称作rings。Unix 只使用其中两个rings： ring0, 也称作 `supervisor mode'，该模式允许任何动作；最低级ring, 称作`user mode'.

回忆下有关系统调用和库函数的区别。典型地，库函数运行在用户态，它可能会调用多个系统调用，这些系统调用则运行在特权态，因为它们属于内核的一部分。一旦系统调用完成工作，系统就马上返回到用户态。

3.1.4. 命名空间（Name Space）

编写小的C代码时，我们总是使用那些对阅读代码的人有意义的、方便记忆的变量名。但是，如果你所开发的程序是某个大型程序的一部分，你所使用的所有全局变量对所有开发人员可见，反之亦然，可能产生名字冲突的问题。当一块代码有太多含义不明，不容易被分辨的全局变量时，就会碰到名字污染的麻烦。在大型项目中，必须要设法记住那些被保留的名字，要有命名规则以产生独一无二的名字和符号。开发内核时，即便是再小的模块也会和整个内核链接，命名规则实在是个大问题。最好的应对措施是将模块内的变量声明为静态的（static）变量，并为模块内的所有符号加前缀。传统上，内核使用的前缀都是小写字母。如果你不想把所有变量都声明为静态的，则可以声明一个symbol table 并将其注册到内核。稍后你会看到怎么做。 /proc/kallsyms文件保管着所有的内核符号，你的模块可以访问这些符号，因为模块也处于内核的代码空间。

3.1.5. 代码空间（Code space）

内存管理是个非常繁复的主题 --- O'Reilly 出版的 `Understanding TheLinux Kernel' 绝大部分内容其实就是在讲内存管理! 尽管并不想成为内存管理的大师，但在开发模块之前，我们确实需要了解一些相关概念。

如果从未认真思考过一个segfault错误到底意味着什么，你可能会惊讶于听到指针并未指向真正的物理内存。当创建一个进程，内核会划出一片物理内存给该进程，进程使用该内存存储它的代码、变量、堆、栈和其它必要信息。这块内存从0x00000000开始，大小则满足进程的需要。由于任何两个进程所用的内存不能重叠，当任何一个进程访问内存时，比如0xbffff978，实际上是访问不同的真实物理内存地址! 进程所访问的0xbffff978 实际上是内核保留给该进程的那块内存中的偏移地址（offset）。大多数进程，如例程 Hello, World，无法访问其它进程的内存空间。实际上，有一些方法可以让进程间互相访问，后面会讲到。

内核也拥有它自己专有的内存空间。由于模块可以动态地载入/移出内核（相对于那些半自治的对象），它共享内核的内存空间，而不是拥有自己专有的内存。因此，如果模块发生segfault，内核就一定发生segfault；如果模块由于off-by-one错而改写了其它数据，就是在改写内核数据（或内核代码）。这些错误比你想像的还要糟糕，所以编写模块时一定要小心。

顺便提一下，任何采用单内核的OS都会有上述问题。在采用微内核的OS中，模块拥有自己的代码空间。GNUHurd 和 QNX Neutrino都是采用微内核架构的系统。

3.1.6. 设备驱动（Device Drivers）

有一类模块被称作设备驱动, 设备驱动操作具体硬件比如电视卡或串口。在unix中, 硬件由/dev 目录下的一个设备文件表示，用以和硬件通讯。而设备驱动则代表应用程序进行通讯。所以es1370.o 声卡驱动将设备文件/dev/sound 连接到声卡Ensoniq IS1370。用户空间的应用程序，比如mp3blaster可以在不知道系统中到底安装的何种声卡硬件的情况下使用 /dev/sound 文件。

3.1.6.1. 主/从设备号（Major and Minor Numbers）

看几个具体的设备文件。下面是代表IDE硬盘上三个分区的设备文件：

# ls−l /dev/hda[1−3]

brw−rw−−−−1 root disk 3, 1 Jul 5 2000 /dev/hda1

brw−rw−−−−1 root disk 3, 2 Jul 5 2000 /dev/hda2

brw−rw−−−−1 root disk 3, 3 Jul 5 2000 /dev/hda3

留意到被逗号隔开的那两列数字吗？第一个数字称为设备的主设备号，第二个数字是从设备号。主设备号标识该硬件由哪个设备驱动管理。每个驱动程序都被分配了一个唯一的主设备号，所有具有相同主设备号的设备文件都由该驱动程序管理。上面的例子中主设备号都是3，这是因为它们都由相同的驱动程序管理。

驱动程序用从设备号区分它所管理的所有硬件。回到上例，尽管三个设备都由一个驱动程序管理，但驱动程序可以借从设备号区分它们。

设备分为两类：字符设备和块设备。区别在于块设备拥有buffer，可以用于调整request的次序以提高效率。对存储设备来说这一点至关重要：相较于相隔很远的扇区，读/写相邻的扇区总是更快些。另一个区别是块设备的输入输出都是以块（block）为单位的，块的大小视不同设备而有所不同；而字符设备则可以输入输出任意多个字节。大多数的设备都是字符设备，因为并不需要buffer，而且输入输出的大小没有限制。可以用 ls −l 命令查看设备文件的类型，‘b’代表块设备，‘c’代表字符设备。上例中的设备都是块设备，下面则是字符设备（串口）的例子：

crw−rw−−−−1 root dial 4, 64 Feb 18 23:34 /dev/ttyS0

crw−r−−−−−1 root dial 4, 65 Nov 17 10:26 /dev/ttyS1

crw−rw−−−−1 root dial 4, 66 Jul 5 2000 /dev/ttyS2

crw−rw−−−−1 root dial 4, 67 Jul 5 2000 /dev/ttyS3

如果想知道系统中哪些主设备号已经被占用，可以参看文件

/usr/src/linux/Documentation/devices.txt.

在系统被安装时，所有的设备文件都已籍由命令mknod 创建好了。要想创建新的字符设备`coffee'，主设备号/从设备号为12和2, 只需键入命令mknod /dev/coffee c 12 2.

设备文件并不一定非要放入目录 /dev, 但大家都是这么做的。既然Linus都这么做了，你为什么不呢？但是，如果你只是为了测试而创建设备文件，把它放到你编译内核的工作目录之中也是可以的，但要记得在开发完驱动之后将其移到正确的位置。

前面的讨论中隐藏了几个要点需要在这里明确一下。在访问设备文件时，内核参考该设备文件的主设备号来决定使用哪个驱动程序访问硬件。着意味着内核并不关心从设备号，只有设备驱动需要知道从设备号进而明确具体操作哪个硬件。

顺便提一下，这里提到‘硬件’(hardware)时，是指比你的拿在手中的PCI卡更抽象一些的概念。看看以下两个设备文件：

% ls−l /dev/fd0 /dev/fd0u1680

brwxrwxrwx1 root floppy 2, 0 Jul 5 2000 /dev/fd0

brw−rw−−−−1 root floppy 2, 44 Jul 5 2000 /dev/fd0u1680

现在你马上能认识到这两个设备文件代表两个块设备并且由同一个设备驱动管理。你可能还意识到这两个设备文件都代表软盘驱动器，尽管事实上系统只有一个软驱。为什么有两个设备文件？答案是一个代表容量为1.44M的软驱，而另一个则代表同一个软盘驱动器，只是它支持1.68MB的容量。这是一个同一个硬件但具有两个不同从设备号的设备文件的例子。所以请留意本文中的‘硬件’是一个抽象的概念。

Chapter 4. 字符设备文件（Character DeviceFiles）

4.1. 字符设备驱动（Character DeviceDrivers）

4.1.1. file_operations结构（ The file_operations Structure）

file_operations结构定义在文件linux/fs.h 中，其中包括指向驱动程序中定义的那些执行不同设备操作的函数的指针。

例如，每个字符设备驱动程序都需要定义一个读取设备的函数。该函数的地址会被保存在file_operations 结构中以执行该操作。下面是2.6.5内核中的定义：

structfile_operations {

struct module *owner;

loff_t(*llseek) (struct file *, loff_t,int);

ssize_t(*read) (struct file *, char __user*, size_t, loff_t *);

ssize_t(*aio_read) (struct kiocb *, char__user *, size_t, loff_t);

ssize_t(*write) (struct file *,const char __user *, size_t, loff_t *);

ssize_t(*aio_write) (struct kiocb*, const char __user *, size_t,

loff_t);

int (*readdir) (struct file *, void*, filldir_t);

unsigned int (*poll) (struct file*, struct poll_table_struct *);

int (*ioctl) (struct inode *,struct file *, unsigned int,

unsigned long);

int (*mmap) (struct file *, structvm_area_struct *);

int (*open) (struct inode *, structfile *);

int (*flush) (struct file *);

int (*release) (struct inode *,struct file *);

int (*fsync) (struct file *, structdentry *, int datasync);

int (*aio_fsync) (struct kiocb *,int datasync);

int (*fasync) (int, struct file *,int);

int (*lock) (struct file *, int,struct file_lock *);

ssize_t(*readv) (struct file *,const struct iovec *, unsigned long,

loff_t *);

ssize_t(*writev) (struct file *,const struct iovec *, unsigned long,

loff_t *);

ssize_t(*sendfile) (struct file *,loff_t *, size_t, read_actor_t,

void __user *);

ssize_t(*sendpage) (struct file *,struct page *, int, size_t,

loff_t *, int);

unsigned long (*get_unmapped_area)(struct file *, unsigned long,

unsigned long, unsigned long,

unsigned long);

};

驱动程序不一定要实现所有的操作。举例来说，视频卡的驱动就不需要读取设备的目录结构，则可以将file_operations机构中的相应域设为NULL。有一个gcc扩展可以让对结构的赋值更加方便，当你在新版的驱动代码中看到它时可能会有些吃惊，请看下例：

structfile_operations fops = {

read: device_read,

write: device_write,

open: device_open,

release: device_release

};

C99支持另一种赋值方法，绝对好过使用GNU扩展。本文作者使用的gcc版本是2.95，支持新的C99语法。推荐您使用这种复制方式，因为它提供更好的兼容性，有助于驱动程序的移植：

structfile_operations fops = {

.read = device_read,

.write = device_write,

.open = device_open,

.release = device_release

};

该方法看上去非常清晰，而且所有未被显示地赋值的域都会被gcc初始化为NULL。一个包含诸如read, write, open, ... 等系统调用的函数指针的该结构的实例一般被称作fops。

4.1.2. file结构（The file structure）

内核用一个file结构描述系统中的设备，该结构定义在linux/fs.h中。要注意的是file是内核级别的数据结构，应用程序无法访问它；而结构FILE则是由glibc定义的，无法被内核代码访问。这个名字有些误导：file代表的是一个抽象的‘文件’，而不是磁盘上的‘文件‘。磁盘上的文件由一个叫做inode的数据结构描述。

File结构的实例通常称作filp。

去看看file的定义，绝大多数的域，例如struct dentry，可以被忽略因为它们并不为驱动程序所用。驱动程序不会直接填充file结构，驱动程序要用到的那些域是在其它地方被创建的。

4.1.3. 注册设备（Registering A Device）

之前提到过，通过/dev目录下的设备文件来访问字符设备，主设备号告诉我们该设备文件由哪个驱动程序管理，从设备号则只被驱动程序使用，用以区分具体要操作的设备-在同一个驱动程序管理多个设备的情况下。为系统添加驱动意味着将其注册到内核，和在模块初始化时为驱动分配一个主设备号是同义词。可以利用函数register_chrdev 注册设备，该函数定义在linux/fs.h.

intregister_chrdev(unsigned int major, const char *name, struct file_operations*fops);

unsigned int major 是想要申请的主设备号；const char *name 是设备的名字，该名字会出现在/proc/devices中； struct file_operations *fops 是指向驱动程序中file_operations结构的指针。返回值为负数说明注册失败。留意该函数没有代表此设备号的参数，这是因为内核并不关心次设备号，只有驱动程序关心。

现在的问题变成如何才能获得一个‘合法的‘，而不是’抢劫‘一个正在被系统使用的设备号？最简单的方法是查看Documentation/devices.txt 文件，找一个未被使用的设备号。这个方法的不利之处在于无法确认该设备号未来是否会被分配。最理想的方法是由内核动态分配一个主设备号。如果调用register_chrdev()时，将参数major设置为0，则该函数返回一个动态申请的主设备号。内核分配的不利之处是在得到动态分配的主设备号之前，无法创建设备文件，因为主设备号是未知的。有些方法可以克服这个问题。第一，驱动程序自身可以打印出新得到的主设备号，然后就能手动创建设备文件。第二，新注册的设备会显示在/proc/devices中，我们既可手动创建设备文件，也可以写一个shellscript读取该文件并构建设备文件。第三种方法是可以让驱动程序自身创建设备文件 – 在注册成功后调用mknod，在cleanup_module中删除该设备文件(rm).

4.1.4. 注销设备（Unregistering ADevice）

不能随意地卸载模块，即使root也不行。如果在设备文件已经被进程打开的情况下卸载了相应的内核模块，再次使用该文件时会导致跳转到相应的函数（read/write）原先所在的地址。如果够幸运，没有别的代码被装载到该地址，只会看到一条报错信息。如果不那么幸运，其它的内核代码已被装载到该地址，这意味着跳转到另一个内核函数当中，结果无法预料，但肯定不是好的结果。

一般情况下，如果不允许做某事，从执行函数返回一个错误码（负数）就行。但对于

cleanup_module 来说，这是行不通的，因为它的返回类型是void。然而，每个模块都有一个计数器，记录有多少个进程正在使用该模块。/proc/modules的第三个参数就是模块的计数器。如果该参数不为0，模块卸载（rmmod）会失败。

注意不必在cleanup_module中检查计数器，因为系统调用sys_delete_module（定义在linux/module.c.）会帮你做这件事。不能直接操作计数器，而是要通过定义在linux/module.h中的函数完成诸如计数器加、减的动作：

· try_module_get(THIS_MODULE): Increment the usecount.

· module_put(THIS_MODULE): Decrement the use count.

保证计数器计数准确至关重要：如果你搞错了它的值，将不能卸载相应模块，而只能选择重启系统。

4.1.5. chardev.c

下面的例子是一个简单的字符设备驱动程序，chardev。可以使用cat命令查看它的设备文件，驱动程序会将该设备文件被读取的次数写入该文件。

我们并不支持真正写入设备文件（比如echo "hi" > /dev/hello)，但会捕捉这样的企图并告知用户该操作不被支持。如果你没看到我们对读到buffer中的数据的操作也不必担心-事实上也没做什么，只是读取数据然后打印一条消息表明读到了数据。

Example 4−1. chardev.c

*chardev.c: Creates a read−only char device that says how many times

*you've read from the dev file

#include<linux/kernel.h>

#include<linux/module.h>

#include<linux/fs.h>

#include<asm/uaccess.h> /* for put_user */

*Prototypes − this would normally go in a .h file

intinit_module(void);

voidcleanup_module(void);

staticint device_open(struct inode *, struct file *);

staticint device_release(struct inode *, struct file *);

staticssize_t device_read(struct file *, char *, size_t, loff_t *);

staticssize_t device_write(struct file *, const char *, size_t, loff_t *);

#defineSUCCESS 0

#defineDEVICE_NAME "chardev" /* Dev name as it appears in /proc/devices */

#defineBUF_LEN 80 /* Max length of the message from the device */

*Global variables are declared as static, so are global within the file.

staticint Major; /* Major number assigned to our device driver */

staticint Device_Open = 0; /* Is device open?

* Usedto prevent multiple access to device */

staticchar msg[BUF_LEN]; /* The msg the device will give when asked */

staticchar *msg_Ptr;

staticstruct file_operations fops = {

.read= device_read,

.write= device_write,

.open= device_open,

.release= device_release

};

* Thisfunction is called when the module is loaded

intinit_module(void)

{

Major= register_chrdev(0, DEVICE_NAME, &fops);

if(Major < 0) {

printk(KERN_ALERT"Registering char device failed with %d\n", Major);

returnMajor;

}

printk(KERN_INFO"I was assigned major number %d. To talk to\n", Major);

printk(KERN_INFO"the driver, create a dev file with\n");

printk(KERN_INFO"'mknod /dev/%s c %d 0'.\n", DEVICE_NAME, Major);

printk(KERN_INFO"Try various minor numbers. Try to cat and echo to\n");

printk(KERN_INFO"the device file.\n");

printk(KERN_INFO"Remove the device file and module when done.\n");

returnSUCCESS;

}

* Thisfunction is called when the module is unloaded

voidcleanup_module(void)

{

*Unregister the device

intret = unregister_chrdev(Major, DEVICE_NAME);

if(ret < 0)

printk(KERN_ALERT"Error in unregister_chrdev: %d\n", ret);

}

*Methods

*Called when a process tries to open the device file, like

*"cat /dev/mycharfile"

staticint device_open(struct inode *inode, struct file *file)

{

staticint counter = 0;

if(Device_Open)

return−EBUSY;

Device_Open++;

sprintf(msg,"I already told you %d times Hello world!\n", counter++);

msg_Ptr= msg;

try_module_get(THIS_MODULE);

returnSUCCESS;

}

*Called when a process closes the device file.

staticint device_release(struct inode *inode, struct file *file)

{

Device_Open−−;/* We're now ready for our next caller */

*Decrement the usage count, or else once you opened the file, you'll

*never get get rid of the module.

module_put(THIS_MODULE);

return0;

}

*Called when a process, which already opened the dev file, attempts to

* readfrom it.

staticssize_t device_read(struct file *filp, /* see include/linux/fs.h */

char*buffer, /* buffer to fill with data */

size_tlength, /* length of the buffer */

loff_t* offset)

{

*Number of bytes actually written to the buffer

intbytes_read = 0;

* Ifwe're at the end of the message,

*return 0 signifying end of file

if(*msg_Ptr == 0)

return0;

*Actually put the data into the buffer

while(length && *msg_Ptr) {

* Thebuffer is in the user data segment, not the kernel

*segment so "*" assignment won't work. We have to use

*put_user which copies data from the kernel data segment to

* theuser data segment.

put_user(*(msg_Ptr++),buffer++);

length−−;

bytes_read++;

}

* Mostread functions return the number of bytes put into the buffer

returnbytes_read;

}

*Called when a process writes to dev file: echo "hi" > /dev/hello

staticssize_t

device_write(structfile *filp, const char *buff, size_t len, loff_t * off)

{

printk(KERN_ALERT"Sorry, this operation isn't supported.\n");

return−EINVAL;

}

4.1.6. 为多个内核版本开发模块（Writing Modules forMultiple Kernel Versions）

作为内核提供给进程的接口的主要组成部分，一般来说，不同内核版本提供的系统调用是一样的。系统可能会增加新的系统调用，但原有的系统调用的行为必须保持不变。这对系统保证向后兼容是必要的 --- 新的内核要保证能让原有的进程顺利运行。绝大多数情况下，设备文件也需要保持一致。另一方面，内核的内部接口在不同版本间是可变的。

不同的内核版本是存在差异的，如果想要支持多个内核版本，你会发现要增加很多条件编译。具体来说就是要比较宏LINUX_VERSION_CODE和KERNEL_VERSION. 以版本a.b.c 为例, 这个宏的值为 $2^{16}a+2^{8}b+c$.

本文档之前的所有版本中都有如何编写向后兼容的代码的详尽指导，但之后不会再这么做了。从现在起，读者应该选择和内核版本一致的那个版本的本文档。我们决定用内核的版本号作为相应的本文档的版本，至少主版本号和次版本号是一样的。因此，作为读者，你应该确保根据你想研究的内核版本来选择正确的本文档的版本。

Chapter 5. /proc文件系统（The /proc File System）

5.1. /proc文件系统（The /proc File System）

Linux提供额外的机制让内核/内核模块得以发送信息到进程−−−/proc 文件系统。最初的设计是用来方便的访问进程的信息（名字的由来），现在则在内核中被普遍用以发布信息，比如/proc/modules保存模块列表而 /proc/meminfo 保存了内存使用的相关数据。

使用proc文件系统的方法和驱动程序所采用的方法非常相似−−− 创建一个数据结构以保存/proc文件的信息，包括指向句柄函数的指针(在我们的例子中只有一个句柄函数，该函数在试图读取/proc文件时被调用)。另外， init_module将该结构注册到内核而cleanup_module 将其注销。使用proc_register_dynamic[8] 的原因在于我们不想提前确定用于文件的inode号，而是交由内核决定以防止冲突。一般的文件系统位于磁盘上，而不是内存中（/proc位于内存中），这种情况下inode号实际上是指向文件的index-node（简称为inode）在磁盘上的位置的指针。Inode保存文件的相关信息，比如文件的权限，文件在磁盘上的位置等。由于在文件被打开/关闭时我们并不知道，也就无法将try_module_get 和 try_module_put放到模块内合适的地方，而且模块如果在文件被打开的情况被移除，恶果也就不可避免了。

下面是一个简单的例子来展示如何使用/proc文件，是HelloWorld的/proc版。分成三个部分：在init_module中创建文件/proc/helloworld；当读取该文件时，回调函数procfs_read返回一个数值（和一个buffer）；在cleanup_module函数中删除文件/proc/helloworld。

/proc/helloworld文件在模块被加载时由函数create_proc_entry创建。返回值的类型是'structproc_dir_entry *'，将会被用于配置/proc/helloworld (例如设置该文件的拥有者。如果返回值是NULL，则创建文件失败。每次读取文件/proc/helloworld，函数procfs_read都会被调用。这个函数的两个参数非常重要：buffer和offset。Buffer中的内容会返回到读取该文件的应用程序（如cat）。Offset则指明了文件中的偏移量。如果返回值不是NULL，该函数会被再次调用。要千万当心这个函数，如果它从不返回0（NULL），则会一直跑下去。

% cat/proc/helloworld

HelloWorld!

Example 5−1. procfs1.c

*procfs1.c − create a "file" in /proc

#include<linux/module.h> /* Specifically, a module */

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/proc_fs.h> /* Necessary because we use the proc fs */

#defineprocfs_name "helloworld"

/**

* Thisstructure hold information about the /proc file

structproc_dir_entry *Our_Proc_File;

/* Putdata into the proc fs file.

*Arguments

*=========

* 1.The buffer where the data is to be inserted, if

* youdecide to use it.

* 2. Apointer to a pointer to characters. This is

*useful if you don't want to use the buffer

*allocated by the kernel.

* 3.The current position in the file

* 4.The size of the buffer in the first argument.

* 5.Write a "1" here to indicate EOF.

* 6. Apointer to data (useful in case one common

* readfor multiple /proc/... entries)

*Usage and Return Value

*======================

* Areturn value of zero means you have no further

*information at this time (end of file). A negative

*return value is an error condition.

* ForMore Information

*====================

* Theway I discovered what to do with this function

*wasn't by reading documentation, but by reading the

* codewhich used it. I just looked to see what uses

* theget_info field of proc_dir_entry struct (I used a

*combination of find and grep, if you're interested),

* andI saw that it is used in <kernel source

*directory>/fs/proc/array.c.

* Ifsomething is unknown about the kernel, this is

*usually the way to go. In Linux we have the great

*advantage of having the kernel source code for

* free− use it.

int

procfile_read(char*buffer,

char**buffer_location,

off_toffset, int buffer_length, int *eof, void *data)

{

intret;

printk(KERN_INFO"procfile_read (/proc/%s) called\n", procfs_name);

* Wegive all of our information in one go, so if the

* userasks us if we have more information the

*answer should always be no.

* Thisis important because the standard read

*function from the library would continue to issue

* theread system call until the kernel replies

* thatit has no more information, or until its

*buffer is filled.

if(offset > 0) {

/* wehave finished to read, return 0 */

ret =0;

} else{

/*fill the buffer, return the buffer size */

ret =sprintf(buffer, "HelloWorld!\n");

}

returnret;

}

intinit_module()

{

Our_Proc_File= create_proc_entry(procfs_name, 0644, NULL);

if(Our_Proc_File == NULL) {

remove_proc_entry(procfs_name,&proc_root);

printk(KERN_ALERT"Error: Could not initialize /proc/%s\n",

procfs_name);

return−ENOMEM;

}

Our_Proc_File−>read_proc= procfile_read;

Our_Proc_File−>owner= THIS_MODULE;

Our_Proc_File−>mode= S_IFREG | S_IRUGO;

Our_Proc_File−>uid= 0;

Our_Proc_File−>gid= 0;

Our_Proc_File−>size= 37;

printk(KERN_INFO"/proc/%s created\n", procfs_name);

return0; /* everything is ok */

}

voidcleanup_module()

{

remove_proc_entry(procfs_name,&proc_root);

printk(KERN_INFO"/proc/%s removed\n", procfs_name);

}

5.2. 读写 /proc文件（Read and Write a /proc File）

我们已经看过了简单地读取文件/proc/helloworld的示例。还可以对/proc文件进行写操作，工作原理和读操作是一样的，当写文件时回调函数被调用。但还是有一点差别，由于数据从应用程序而来，所以必须要把数据从用户空间导入到内核空间 (利用 copy_from_user 或 get_user)。之所以要使用 copy_from_user 或 get_user是因为Linux的内存 (这里只在Intel架构下，其它的处理器或有不同)是分段的。这意味着一个指针自身并不能指向内存中的唯一地址，而只是指向某个段中的偏移地址，而且你并不需要知道具体要使用的是哪个段。系统为内核保留了一个段，为每个进程保留一个段。

进程只能访问它自己的段，所以当你编写一个普通的进程时，并不需要考虑段。当编写内核模块时，通常也只是想要访问内核自己的段，该段由系统管理。然而，当要在进程和内核间传递内存中的数据时，内核函数所接收的指针指向进程的段。宏 put_user和 get_user 允许内核函数访问该段。这两个宏只能处理一个字符， copy_to_user和 copy_from_user可以处理多个字符。由于buffer位于内核空间，写操作需要从用户空间导入数据，而读操作则无此必要因为数据已经在内核空间中了。

Example 5−2. procfs2.c

/**

*procfs2.c − create a "file" in /proc

#include<linux/module.h> /* Specifically, a module */

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/proc_fs.h> /* Necessary because we use the proc fs */

#include<asm/uaccess.h> /* for copy_from_user */

#definePROCFS_MAX_SIZE 1024

#definePROCFS_NAME "buffer1k"

/**

* Thisstructure hold information about the /proc file

staticstruct proc_dir_entry *Our_Proc_File;

/**

* Thebuffer used to store character for this module

staticchar procfs_buffer[PROCFS_MAX_SIZE];

/**

* Thesize of the buffer

staticunsigned long procfs_buffer_size = 0;

/**

* Thisfunction is called then the /proc file is read

int

procfile_read(char*buffer,

char**buffer_location,

off_toffset, int buffer_length, int *eof, void *data)

{

intret;

printk(KERN_INFO"procfile_read (/proc/%s) called\n", PROCFS_NAME);

if(offset > 0) {

/* wehave finished to read, return 0 */

ret =0;

} else{

/*fill the buffer, return the buffer size */

memcpy(buffer,procfs_buffer, procfs_buffer_size);

ret =procfs_buffer_size;

}

returnret;

}

/**

* Thisfunction is called with the /proc file is written

intprocfile_write(struct file *file, const char *buffer, unsigned long count,

void*data)

{

/* getbuffer size */

procfs_buffer_size= count;

if(procfs_buffer_size > PROCFS_MAX_SIZE ) {

procfs_buffer_size= PROCFS_MAX_SIZE;

}

/*write data to the buffer */

if (copy_from_user(procfs_buffer, buffer, procfs_buffer_size) ) {

return−EFAULT;

}

returnprocfs_buffer_size;

}

/**

*Thisfunction is called when the module is loaded

intinit_module()

{

/*create the /proc file */

Our_Proc_File= create_proc_entry(PROCFS_NAME, 0644, NULL);

if(Our_Proc_File == NULL) {

remove_proc_entry(PROCFS_NAME,&proc_root);

printk(KERN_ALERT"Error: Could not initialize /proc/%s\n",

PROCFS_NAME);

return−ENOMEM;

}

Our_Proc_File−>read_proc= procfile_read;

Our_Proc_File−>write_proc= procfile_write;

Our_Proc_File−>owner= THIS_MODULE;

Our_Proc_File−>mode= S_IFREG | S_IRUGO;

Our_Proc_File−>uid= 0;

Our_Proc_File−>gid= 0;

Our_Proc_File−>size= 37;

printk(KERN_INFO"/proc/%s created\n", PROCFS_NAME);

return0; /* everything is ok */

}

/**

*Thisfunction is called when the module is unloaded

voidcleanup_module()

{

remove_proc_entry(PROCFS_NAME,&proc_root);

printk(KERN_INFO"/proc/%s removed\n", PROCFS_NAME);

}

5.3. 利用标准文件系统管理/proc文件（Manage /proc file with standard filesystem）

我们已经看过如何利用/proc接口读写/proc文件。利用inode管理/proc文件也是可行的。The main interest is to useadvanced function, like permissions.

Linux有一套标准的机制注册文件系统。由于每个文件系统都必须有自己的函数来处理inode和文件操作，系统中有一个特殊的数据结构保存指向这些函数的指针，struct inode_operations，其中包括指向file_operations结构的指针。在 /proc文件系统中，每当登记一个新文件，都可以指定用于访问该文件的inode_operations结构。这就是我们可以利用的： struct inode_operations 包含指向struct file_operations 的指针，而struct file_operations 包含指向我们自定义的函数procfs_read和 procfs_write的指针。

另一个有趣之处是函数module_permission。每当要对/proc文件进行操作时都会调用它，以判断是否允许该操作。当前只是根据要执行的操作和当前用户（通过指针current获得，current中保存有运行当前进程的用户信息）的uid来进行判断，实际上我们可以增加判断条件，比如是否有其它进程正对该文件进行操作，时间或最近接收到的输入。重要的是要留意在内核中，输入输出的角色是颠倒的：读函数用于输出，而写函数用于输入。原因则在于是站在用户的角度来看待读写的−−− 当进程要读取数据时，需要内核做输出；而进程想内核写入数据时，内核输入该数据。

Example 5−3. procfs3.c

*procfs3.c − create a "file" in /proc, use the file_operation way

* tomanage the file.

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/proc_fs.h> /* Necessary because we use proc fs */

#include<asm/uaccess.h> /* for copy_*_user */

#definePROC_ENTRY_FILENAME "buffer2k"

#definePROCFS_MAX_SIZE 2048

/**

* Thebuffer (2k) for this module

staticchar procfs_buffer[PROCFS_MAX_SIZE];

/**

* Thesize of the data hold in the buffer

staticunsigned long procfs_buffer_size = 0;

/**

* Thestructure keeping information about the /proc file

staticstruct proc_dir_entry *Our_Proc_File;

/**

* Thisfuntion is called when the /proc file is read

staticssize_t procfs_read(struct file *filp, /* see include/linux/fs.h */

char *buffer, /* buffer to fillwith data */

size_t length, /* length of thebuffer */

loff_t * offset)

{

static int finished = 0;

* We return 0 to indicate end offile, that we have

* no more information. Otherwise,processes will

* continue to read from us in anendless loop.

if ( finished ) {

printk(KERN_INFO "procfs_read:END\n");

finished = 0;

return 0;

}

finished = 1;

* We use put_to_user to copy thestring from the kernel's

* memory segment to the memorysegment of the process

* that called us. get_from_user,BTW, is

* used for the reverse.

if ( copy_to_user(buffer,procfs_buffer, procfs_buffer_size) ) {

return −EFAULT;

}

printk(KERN_INFO "procfs_read: read%lu bytes\n", procfs_buffer_size);

return procfs_buffer_size; /* Return thenumber of bytes "read" */

}

* Thisfunction is called when /proc is written

staticssize_t

procfs_write(structfile *file, const char *buffer, size_t len, loff_t * off)

{

if ( len > PROCFS_MAX_SIZE ) {

procfs_buffer_size =PROCFS_MAX_SIZE;

}

else {

procfs_buffer_size = len;

}

if ( copy_from_user(procfs_buffer,buffer, procfs_buffer_size) ) {

return −EFAULT;

}

printk(KERN_INFO"procfs_write: write %lu bytes\n", procfs_buffer_size);

return procfs_buffer_size;

}

* Thisfunction decides whether to allow an operation

*(return zero) or not allow it (return a non−zero

*which indicates why it is not allowed).

* Theoperation can be one of the following values:

* 0 −Execute (run the "file" − meaningless in our case)

* 2 −Write (input to the kernel module)

* 4 −Read (output from the kernel module)

* Thisis the real function that checks file

*permissions. The permissions returned by ls −l are

* forreferece only, and can be overridden here.

staticint module_permission(struct inode *inode, int op, struct nameidata *foo)

{

* We allow everybody to read fromour module, but

* only root (uid 0) may write to it

if (op == 4 || (op == 2 &&current−>euid == 0))

return 0;

* If it's anything else, access isdenied

return −EACCES;

}

* Thefile is opened − we don't really care about

*that, but it does mean we need to increment the

* module'sreference count.

intprocfs_open(struct inode *inode, struct file *file)

{

try_module_get(THIS_MODULE);

return 0;

}

* Thefile is closed − again, interesting only because

* ofthe reference count.

intprocfs_close(struct inode *inode, struct file *file)

{

module_put(THIS_MODULE);

return 0; /* success */

}

staticstruct file_operations File_Ops_4_Our_Proc_File = {

.read = procfs_read,

.write = procfs_write,

.open = procfs_open,

.release = procfs_close,

};

*Inode operations for our proc file. We need it so

*we'll have some place to specify the file operations

*structure we want to use, and the function we use for

*permissions. It's also possible to specify functions

* tobe called for anything else which could be done to

* aninode (although we don't bother, we just put

*NULL).

staticstruct inode_operations Inode_Ops_4_Our_Proc_File = {

.permission = module_permission, /* checkfor permissions */

};

*Module initialization and cleanup

intinit_module()

{

/* create the /proc file */

Our_Proc_File =create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);

/* check if the /proc file wascreated successfuly */

if (Our_Proc_File == NULL){

printk(KERN_ALERT "Error:Could not initialize /proc/%s\n",

PROC_ENTRY_FILENAME);

return −ENOMEM;

}

Our_Proc_File−>owner =THIS_MODULE;

Our_Proc_File−>proc_iops =&Inode_Ops_4_Our_Proc_File;

Our_Proc_File−>proc_fops =&File_Ops_4_Our_Proc_File;

Our_Proc_File−>mode = S_IFREG |S_IRUGO | S_IWUSR;

Our_Proc_File−>uid = 0;

Our_Proc_File−>gid = 0;

Our_Proc_File−>size = 80;

printk(KERN_INFO "/proc/%screated\n", PROC_ENTRY_FILENAME);

return 0; /* success */

}

voidcleanup_module()

{

remove_proc_entry(PROC_ENTRY_FILENAME,&proc_root);

printk(KERN_INFO "/proc/%sremoved\n", PROC_ENTRY_FILENAME);

}

需要更多例子? 好吧，首先你要知道，procfs有可能会被sysfs取代。其次，如果真得需要，强烈建议阅读内核文档，linux/Documentation/DocBook/. 在内核顶层目录下运行 make help 查阅如何将文档转化为你中意的格式。例如: make htmldocs .

5.4. 利用seq_file管理/proc文件（Manage /proc filewith seq_file）

就像我们看到的，写/proc文件可能相当‘复杂’。有一个API，seq_file，可以帮助格式化/proc文件以利输出。这个API由顺序执行的三个函数start(),next(), 和stop()组成。每当有用户读取/proc文件时，该API就开始工作。

首先调用函数start()，如果函数返回值不是NULL，则调用函数next()。 Nex t()函数是个递归函数，目的是遍历所有的数据。每当next()被调用，函数show()也同时会被调用。show()函数将数据写入buffer供用户读取。函数 next()只有当返回值为NULL时才停止递归调用。此时，函数stop()被调用。

千万当心：当stop()退出后，函数start()再次被调用！只有当start()返回NULL时整个循环才真正结束。请参见下图"How seq_file works"。

Figure 5−1. How seq_file works

Seq_fileprovides basic functions for file_operations, as seq_read, seq_lseek, and someothers. But nothing to write in the /proc file. Of course, you can still usethe same way as in the previous example.

Example 5−4. procfs4.c

/**

*procfs4.c − create a "file" in /proc

* Thisprogram uses the seq_file library to manage the /proc file.

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/proc_fs.h> /* Necessary because we use proc fs */

#include<linux/seq_file.h> /* for seq_file */

#definePROC_NAME "iter"

MODULE_AUTHOR("PhilippeReynes");

MODULE_LICENSE("GPL");

/**

* Thisfunction is called at the beginning of a sequence.

* ie,when:

* −the /proc file is read (first time)

* −after the function stop (end of sequence)

staticvoid *my_seq_start(struct seq_file *s, loff_t *pos)

{

static unsigned long counter = 0;

/* beginning a new sequence ? */

if ( *pos == 0 )

{

/* yes => return a non nullvalue to begin the sequence */

return &counter;

}

else

{

/* no => it's the end of thesequence, return end to stop reading */

*pos = 0;

return NULL;

}

/**

* Thisfunction is called after the beginning of a sequence.

* It'scalled untill the return is NULL (this ends the sequence).

staticvoid *my_seq_next(struct seq_file *s, void *v, loff_t *pos)

{

unsigned long *tmp_v = (unsignedlong *)v;

(*tmp_v)++;

(*pos)++;

return NULL;

}

/**

* Thisfunction is called at the end of a sequence

staticvoid my_seq_stop(struct seq_file *s, void *v)

{

/* nothing to do, we use a static value instart() */

}

/**

* Thisfunction is called for each "step" of a sequence

staticint my_seq_show(struct seq_file *s, void *v)

{

loff_t *spos = (loff_t *) v;

seq_printf(s, "%Ld\n",*spos);

return 0;

}

/**

* Thisstructure gather "function" to manage the sequence

staticstruct seq_operations my_seq_ops = {

.start = my_seq_start,

.next = my_seq_next,

.stop = my_seq_stop,

.show = my_seq_show

};

/**

* Thisfunction is called when the /proc file is open.

staticint my_open(struct inode *inode, struct file *file)

{

return seq_open(file, &my_seq_ops);

};

/**

* Thisstructure gather "function" that manage the /proc file

staticstruct file_operations my_file_ops = {

.owner = THIS_MODULE,

.open = my_open,

.read = seq_read,

.llseek = seq_lseek,

.release = seq_release

};

/**

* Thisfunction is called when the module is loaded

intinit_module(void)

{

struct proc_dir_entry *entry;

entry = create_proc_entry(PROC_NAME,0, NULL);

if (entry) {

entry−>proc_fops =&my_file_ops;

}

return 0;

}

/**

* Thisfunction is called when the module is unloaded.

voidcleanup_module(void)

{

remove_proc_entry(PROC_NAME, NULL);

}

Ifyou want more information, you can read this web page:

· http://lwn.net/Articles/22355/

· http://www.kernelnewbies.org/documents/seq_file_howto.txt

Youcan also read the code of fs/seq_file.c in the linux kernel.

Chapter 6. Using /proc For Input

6.1. TODO: Write a chapter about sysfs

Thisis just a placeholder for now. Finally I'd like to see a (yet to be written)chapter about sysfs instead here.

Ifyou are familiar with sysfs and would like to take part in writing thischapter, feel free to contact us (the LKMPG maintainers) for further details.

Chapter 7. 与设备文件通讯（Talking To DeviceFiles）

7.1. Talking to Device Files (writes and IOCTLs)

设备文件代表具体的物理设备。绝大多数的物理设备用于输入输出，所以必须要有某种机制使得设备驱动程序可以获得设备的输出，并把进程的数据送往设备。这个机制就是靠打开设备文件并读写之实现的，就好像读写磁盘上的文件一样。下面的例子中，由device_write实现写操作。

这个实现往往并不充分。想像有个modem连接到串口上（即使是内置的modem，站在CPU的角度来看，还是一个modem接到串口。所以不用想太多），自然而然地会想到利用设备文件向modem写入数据（modem命令或数据，通过电话线传输）和读取modem数据（modem命令响应或数据，通过电话线传输）。然而，有一个问题，该如何和串口通讯，如设置波特率？

This is notalways enough. Imagine you had a serial port connected to a modem (even if youhave an internal modem, it is still implemented from the CPU's perspective as aserial port connected to a modem, so you don't have to tax your imagination toohard). The natural thing to do would be to use the device file to write things tothe modem (either modem commands or data to be sent through the phone line) andread things from the modem (either responses for commands or the data receivedthrough the phone line). However, this leaves open the question of what to dowhen you need to talk to the serial port itself, for example to send the rateat which data is sent and received.

答案是Unix使用特殊的函数，称作ioctl (Input Output ConTroL的简写)。每个设备拥有自己的ioctl命令，可以读（进程到内核），也可以写（内核到进程），要么全都支持，要么都不支持。 ioctl函数需要三个参数：设备文件的文件描述符，ioctl号和你想传递的任何信息。Ioctl号将主设备号、ioctl类型，命令和参数类型进行编码。通常使用宏_IO, _IOR, _IOW 或 _IOWR −−− 取决于类型 −−− 在头文件中创建。该头文件要被使用该ioctl的程序（以生成ioctl命令）和内核模块（以理解该ioctl命令）同时加入。下例中， chardev.h是头文件， ioctl.c是应用程序。

想在自己额内核模块中使用ioctl，最好使用官方分配的ioctl。这样的话，如果你不小心得到了别的模块的ioctl，或是别的模块得到了你的，你将有机会发觉这个错误。更多的信息请参考Documentation/ioctl−number.txt

Example 7−1. chardev.c

*chardev.c − Create an input/output character device

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/fs.h>

#include<asm/uaccess.h> /* for get_user and put_user */

#include"chardev.h"

#defineSUCCESS 0

#defineDEVICE_NAME "char_dev"

#defineBUF_LEN 80

* Isthe device open right now? Used to prevent

*concurent access into the same device

staticint Device_Open = 0;

* Themessage the device will give when asked

staticchar Message[BUF_LEN];

* Howfar did the process reading the message get?

*Useful if the message is larger than the size of the

*buffer we get to fill in device_read.

staticchar *Message_Ptr;

* Thisis called whenever a process attempts to open the device file

staticint device_open(struct inode *inode, struct file *file)

{

#ifdef DEBUG

printk(KERN_INFO"device_open(%p)\n", file);

#endif

* We don't want to talk to twoprocesses at the same time

if (Device_Open)

return −EBUSY;

Device_Open++;

* Initialize the message

Message_Ptr = Message;

try_module_get(THIS_MODULE);

return SUCCESS;

}

staticint device_release(struct inode *inode, struct file *file)

{

#ifdef DEBUG

printk(KERN_INFO"device_release(%p,%p)\n", inode, file);

#endif

* We're now ready for our nextcaller

Device_Open−−;

module_put(THIS_MODULE);

return SUCCESS;

}

* Thisfunction is called whenever a process which has already opened the

*device file attempts to read from it.

staticssize_t device_read(struct file *file, /* see include/linux/fs.h */

char __user * buffer, /* buffer to be*filled with data */

size_t length, /* length of thebuffer */

loff_t * offset)

{

* Number of bytes actually writtento the buffer

int bytes_read = 0;

#ifdef DEBUG

printk(KERN_INFO"device_read(%p,%p,%d)\n", file, buffer, length);

#endif

* If we're at the end of themessage, return 0

* (which signifies end of file)

if (*Message_Ptr == 0)

return 0;

* Actually put the data into thebuffer

while (length &&*Message_Ptr) {

* Because the buffer is in the userdata segment,

* not the kernel data segment,assignment wouldn't

* work. Instead, we have to useput_user which

* copies data from the kernel datasegment to the

* user data segment.

put_user(*(Message_Ptr++),buffer++);

printk(KERN_INFO "Read %dbytes, %d left\n", bytes_read, length);

#endif

* Read functions are supposed toreturn the number

* of bytes actually inserted intothe buffer

return bytes_read;

}

* Thisfunction is called when somebody tries to

*write into our device file.

staticssize_t

device_write(structfile *file,

const char __user * buffer, size_t length,loff_t * offset)

{

int i;

#ifdef DEBUG

printk(KERN_INFO"device_write(%p,%s,%d)", file, buffer, length);

#endif

for (i = 0; i < length&& i < BUF_LEN; i++)

get_user(Message[i], buffer + i);

Message_Ptr = Message;

* Again, return the number of inputcharacters used

return i;

}

* Thisfunction is called whenever a process tries to do an ioctl on our

*device file. We get two extra parameters (additional to the inode and file

*structures, which all device functions get): the number of the ioctl called

* andthe parameter given to the ioctl function.

* Ifthe ioctl is write or read/write (meaning output is returned to the

* callingprocess), the ioctl call returns the output of this function.

intdevice_ioctl(struct inode *inode, /* see include/linux/fs.h */

struct file *file, /* ditto */

unsigned int ioctl_num, /* numberand param for ioctl */

unsigned long ioctl_param)

* Switch according to the ioctlcalled

switch (ioctl_num) {

case IOCTL_SET_MSG:

* Receive a pointer to a message(in user space) and set that

* to be the device's message. Getthe parameter given to

* ioctl by the process.

temp = (char *)ioctl_param;

* Find the length of the message

get_user(ch, temp);

for (i = 0; ch && i <BUF_LEN; i++, temp++)

get_user(ch, temp);

device_write(file, (char*)ioctl_param, i, 0);

break;

case IOCTL_GET_MSG:

* Give the current message to thecalling process −

* the parameter we got is apointer, fill it.

i = device_read(file, (char*)ioctl_param, 99, 0);

* Put a zero at the end of thebuffer, so it will be

* properly terminated

put_user('\0', (char *)ioctl_param+ i);

break;

case IOCTL_GET_NTH_BYTE:

* This ioctl is both input(ioctl_param) and

* output (the return value of thisfunction)

return Message[ioctl_param];

/*Module Declarations */

* Thisstructure will hold the functions to be called

* whena process does something to the device we

*created. Since a pointer to this structure is kept in

* thedevices table, it can't be local to

*init_module. NULL is for unimplemented functions.

structfile_operations Fops = {

.read = device_read,

.write = device_write,

.ioctl = device_ioctl,

.open = device_open,

.release = device_release, /*a.k.a. close */

};

*Initialize the module − Register the character device

intinit_module()

{

int ret_val;

* Register the character device(atleast try)

ret_val =register_chrdev(MAJOR_NUM, DEVICE_NAME, &Fops);

* Negative values signify an error

if (ret_val < 0) {

printk(KERN_ALERT "%s failedwith %d\n",

"Sorry, registering thecharacter device ", ret_val);

return ret_val;

}

printk(KERN_INFO "%s The majordevice number is %d.\n",

"Registeration is asuccess", MAJOR_NUM);

printk(KERN_INFO "If you wantto talk to the device driver,\n");

printk(KERN_INFO "you'll haveto create a device file. \n");

printk(KERN_INFO "We suggestyou use:\n");

printk(KERN_INFO "mknod %s c%d 0\n", DEVICE_FILE_NAME, MAJOR_NUM);

printk(KERN_INFO "The devicefile name is important, because\n");

printk(KERN_INFO "the ioctlprogram assumes that's the\n");

printk(KERN_INFO "file you'lluse.\n");

return 0;

}

*Cleanup − unregister the appropriate file from /proc

voidcleanup_module()

{

int ret;

* Unregister the device

ret = unregister_chrdev(MAJOR_NUM,DEVICE_NAME);

* If there's an error, report it

if (ret < 0)

printk(KERN_ALERT "Error:unregister_chrdev: %d\n", ret);

}

Example 7−2. chardev.h

*chardev.h − the header file with the ioctl definitions.

* Thedeclarations here have to be in a header file, because

* theyneed to be known both to the kernel module

* (inchardev.c) and the process calling ioctl (ioctl.c)

#ifndefCHARDEV_H

#defineCHARDEV_H

#include<linux/ioctl.h>

* Themajor device number. We can't rely on dynamic

*registration any more, because ioctls need to know

* it.

#defineMAJOR_NUM 100

* Setthe message of the device driver

#defineIOCTL_SET_MSG _IOR(MAJOR_NUM, 0, char *)

* _IORmeans that we're creating an ioctl command

*number for passing information from a user process

* tothe kernel module.

* Thefirst arguments, MAJOR_NUM, is the major device

* numberwe're using.

* Thesecond argument is the number of the command

*(there could be several with different meanings).

* Thethird argument is the type we want to get from

* theprocess to the kernel.

* Getthe message of the device driver

#defineIOCTL_GET_MSG _IOR(MAJOR_NUM, 1, char *)

* ThisIOCTL is used for output, to get the message

* ofthe device driver. However, we still need the

*buffer to place the message in to be input,

* asit is allocated by the process.

* Getthe n'th byte of the message

#defineIOCTL_GET_NTH_BYTE _IOWR(MAJOR_NUM, 2, int)

* TheIOCTL is used for both input and output. It

*receives from the user a number, n, and returns

*Message[n].

* Thename of the device file

#defineDEVICE_FILE_NAME "char_dev"

#endif

Example 7−3. ioctl.c

*ioctl.c − the process to use ioctl's to control the kernel module

*Until now we could have used cat for input and output. But now

* weneed to do ioctl's, which require writing our own process.

* devicespecifics, such as ioctl numbers and the

#include<fcntl.h> /* open */

#include<unistd.h> /* exit */

#include<sys/ioctl.h> /* ioctl */

*Functions for the ioctl calls

ioctl_set_msg(intfile_desc, char *message)

{

int ret_val;

ret_val = ioctl(file_desc,IOCTL_SET_MSG, message);

if (ret_val < 0) {

printf("ioctl_set_msgfailed:%d\n", ret_val);

exit(−1);

}

ioctl_get_msg(intfile_desc)

{

int ret_val;

char message[100];

* Warning − this is dangerousbecause we don't tell

* the kernel how far it's allowedto write, so it

* might overflow the buffer. In areal production

* program, we would have used twoioctls − one to tell

* the kernel the buffer length andanother to give

* it the buffer to fill

ret_val = ioctl(file_desc,IOCTL_GET_MSG, message);

if (ret_val < 0) {

printf("ioctl_get_msgfailed:%d\n", ret_val);

exit(−1);

}

printf("get_msgmessage:%s\n", message);

}

ioctl_get_nth_byte(intfile_desc)

{

int i;

char c;

printf("get_nth_bytemessage:");

i = 0;

do {

c = ioctl(file_desc,IOCTL_GET_NTH_BYTE, i++);

if (c < 0) {

printf("ioctl_get_nth_bytefailed at the %d'th byte:\n",i);

* Main− Call the ioctl functions

main()

{

int file_desc, ret_val;

char *msg = "Message passed byioctl\n";

file_desc = open(DEVICE_FILE_NAME,0);

if (file_desc < 0) {

printf("Can't open devicefile: %s\n", DEVICE_FILE_NAME);

exit(−1);

}

ioctl_get_nth_byte(file_desc);

ioctl_get_msg(file_desc);

ioctl_set_msg(file_desc, msg);

close(file_desc);

}

Chapter 8. 系统调用System Calls

8.1. 系统调用System Calls

截至目前，我们只是使用内核定义好的机制来注册/proc文件和设备句柄。如果你要做的事情都在内核开发者的预见范围之内，比如编写设备驱动，这样也挺好。但如果你想做些不寻常的事情，比如改变系统的行为，那就只有靠自己了。

这里是内核编程的凶险之处。下面的例子中，我杀掉了系统调用open()。意味着再不能打开任何文件、不能运行任何程序、甚至不能使用shutdown 关闭电脑。我不得不拔掉电源。幸运的是，没有文件损坏。为了确保不丢失任何文件，在执行insmod和 rmmod前先运行sync。.

忘掉/proc文件，忘掉设备文件，它们只是细节问题。所有进程都要使用的，真正的内核通信机制是系统调用。当进程向内核申请某个服务 (打开一个文件，创建另一个进程或申请更多内存)，都会用到系统调用。想要改变系统的行为，修改系统调用才是王道。如果你想看看程序使用了哪些系统调用，运行strace <arguments>.

一般说来，进程不具备访问内核的能力。进程无法访问内核的内存段，也无法调用内核函数。CPU保证这些限制有效(这也是为什么称其为‘保护模式’)。

系统调用则不受上述约束。进程会先设置好寄存器而后执行一个特殊的指令，该指令跳到内核中一个预先定义好的地址（该地址对进程可读，但不可写）。对Intel架构而言，是靠中断0x80实现的。硬件会立即知道你跳入了该地址，工作模式则相应的由用户态变为内核态−−−可以做任何想做的事情。

这个进程跳入的、预先定义好的内核地址称作system_call。该地址的程序检查系统调用号，内核通过系统调用号得知进程需要内核提供何种服务。该程序进而查阅系统调用表(sys_call_table)调用相应的系统调用，并且在系统调用返回时做些系统检查的工作，而后返回到进程（或跳至另一个进程，如果之前的进程时间片已用尽）。有兴趣的话可以参阅源代码：arch/$<$architecture$>$/kernel/entry.S。上述逻辑就在ENTRY(system_call)这一段之后。

所以，如果想要更改系统调用的工作内容，我们需要自己写一个函数（通常是增加自己的代码，而后调用原先的系统调用），然后修改系统调用表（sys_call_table）中相应的（系统调用号）指针，使之指向我们自己的代码。稍后我们可能会移除自己的代码，而且不想因此将系统置于不稳定的状态，因此在cleanup_module中将系统调用表恢复到原始状态就非常有必要了。

下面的例子就是一个这样的内核模块。它‘监视’特定的用户，每当该用户打开一个文件时就用printk输出一条信息。为了达成目的，用我们自己的函数，our_sys_open 取代系统中的打开文件的系统调用。这个函数检查当前进程的uid，如果是我们想要监视的用户，就调用printk()显示要打开文件的文件名。而后调用系统的open()真正打开文件。

init_module 函数取代了系统调用表中的相应位置，将原先的指针保存到一个变量中。 cleanup_module 函数使用该变量恢复系统调用表。这个方法相当危险，因为可能会有两个模块修改同一个系统调用。假设有两个内核模块A和B，分别实现了自己的open调用，A_open 和 B_open。当A被加载到内核，A_open被加入到系统调用表。当做好自己的工作后会调用原先的sys_open。接下来，B加载到内核，将B_open加入系统调用表，当工作做完后会调用它所认为的原先的系统调用，但在这种情况下，实际上调用的是A_open。接下来，如果B先被移除，则万事大吉−−−恢复到调用A_open而后调用原先的系统调用。但是，如果A先移除，然后B被移除，系统就会崩溃。移除A会恢复到sys_open，而B移除时则会恢复到A_open. 而此时A_open已不在内存中！粗粗看来，如果在更改前检查系统调用是否和我们自己的调用相同，如果相同则不允许更改，这样就可以解决问题。但这会引起更大的问题，当A被移除，当看到系统调用改为B_open, 而不再是A_open，因而不会将系统调用改回sys_open。

不幸地是， B_open还会去调用已经被移除的A_open，所以即使没有移除B，系统也会崩溃。Note that all therelated problems make syscall stealing unfeasiable for production use. 为了避免引发问题，系统已不再导出系统调用表sys_call_table，这意味着如果你不仅仅是想推演一下例程，你必须得为内核打上补丁以使其导出系统调用表。在example目录下有README 和补丁文件。如你想像，做这些改动可并不容易。千万别在有实际用途的机器上干这些事。你得下载本文档的tarball以得到补丁和README。根据你的内核版本，你或许还要手工加补丁。

还没跳到下一章？好吧，这就是本章的全部内容了。如果 Wyle E. Coyote 是个黑客，这一定是他第一件要做的事情 ;)

Example 8−1. syscall.c

*syscall.c

*System call "stealing" sample.

* Thenecessary header files

*Standard in kernel modules

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module, */

#include<linux/moduleparam.h> /* which will have params */

#include<linux/unistd.h> /* The list of system calls */

* Forthe current (process) structure, we need

* thisto know who the current user is.

#include<linux/sched.h>

#include<asm/uaccess.h>

* Thesystem call table (a table of functions). We

* justdefine this as external, and the kernel will

* fillit up for us when we are insmod'ed

*sys_call_table is no longer exported in 2.6.x kernels.

* Ifyou really want to try this DANGEROUS module you will

* haveto apply the supplied patch against your current kernel

* andrecompile it.

externvoid *sys_call_table[];

* UIDwe want to spy on − will be filled from the

*command line

staticint uid;

module_param(uid,int, 0644);

* Apointer to the original system call. The reason

* wekeep this, rather than call the original function

*(sys_open), is because somebody else might have

*replaced the system call before us. Note that this

* isnot 100% safe, because if another module

*replaced sys_open before us, then when we're inserted

*we'll call the function in that module − and it

*might be removed before we are.

*Another reason for this is that we can't get sys_open.

* It'sa static variable, so it is not exported.

asmlinkageint (*original_call) (const char *, int, int);

* Thefunction we'll replace sys_open (the function

*called when you call the open system call) with. To

* findthe exact prototype, with the number and type

* ofarguments, we find the original function first

*(it's at fs/open.c).

* Intheory, this means that we're tied to the

*current version of the kernel. In practice, the

*system calls almost never change (it would wreck havoc

* andrequire programs to be recompiled, since the system

*calls are the interface between the kernel and the

*processes).

asmlinkageint our_sys_open(const char *filename, int flags, int mode)

{

int i = 0;

char ch;

* Check if this is the user we'respying on

if (uid == current−>uid) {

* Report the file, if relevant

printk("Opened file by %d:", uid);

do {

get_user(ch, filename + i);

* Call the original sys_open −otherwise, we lose

* the ability to open files

return original_call(filename,flags, mode);

}

*Initialize the module − replace the system call

intinit_module()

{

* Warning − too late for it now,but maybe for

* next time...

printk(KERN_ALERT "I'mdangerous. I hope you did a ");

printk(KERN_ALERT "sync beforeyou insmod'ed me.\n");

printk(KERN_ALERT "Mycounterpart, cleanup_module(), is even");

printk(KERN_ALERT "moredangerous. If\n");

printk(KERN_ALERT "you valueyour file system, it will ");

printk(KERN_ALERT "be\"sync; rmmod\" \n");

printk(KERN_ALERT "when youremove this module.\n");

* Keep a pointer to the originalfunction in

* original_call, and then replacethe system call

* in the system call table withour_sys_open

original_call =sys_call_table[__NR_open];

sys_call_table[__NR_open] =our_sys_open;

* To get the address of the functionfor system

* call foo, go tosys_call_table[__NR_foo].

printk(KERN_INFO "Spying onUID:%d\n", uid);

return 0;

}

*Cleanup − unregister the appropriate file from /proc

voidcleanup_module()

{

* Return the system call back tonormal

if (sys_call_table[__NR_open] !=our_sys_open) {

printk(KERN_ALERT "Somebodyelse also played with the ");

printk(KERN_ALERT "open systemcall\n");

printk(KERN_ALERT "The systemmay be left in ");

printk(KERN_ALERT "an unstablestate.\n");

}

sys_call_table[__NR_open] =original_call;

}

Chapter 9. 进程阻塞Blocking Processes

9.1. 进程阻塞Blocking Processes

正忙的时候有人要你帮忙你会怎么做？被人打扰时你也只能说：“晚点再说，忙着呢，走好！” 但如果你是内核，被进程打扰时你有另外一个选择：可以让进程进入睡眠，直到可以提供服务时再唤醒它。实际上，进程不断地被内核放入睡眠队列又被唤醒（这是在单CPU平台上让多个进程看起来同时在运行的方法）

下面的内核模块是一个例子。任何时候，文件/proc/sleep只能被一个进程打开。如果文件已经是打开的，模块则调用wait_event_interruptible[12]。这个函数会更改任务（task，内核数据结构，保存进程的信息和正在执行的系统调用，如果有的话）的状态为TASK_INTERRUPTIBLE, 意思是该任务被加入到等待队列（WaitQ），被唤醒后才会再次运行，WaitQ中包括等待访问该文件的那些进程。之后，该函数调用调度器（scheduler）切换到另一个进程。

当进程使用完文件，首先关闭该文件，函数module_close 会被调用，这个函数唤醒等待队列中的所有进程 (没有唤醒单个进程的机制)后即刻返回，刚刚关闭文件的那个进程会继续运行。此时，调度器会将CPU的控制权交给另一个进程（运行），该进程在调用module_interruptible_sleep_on[13]后开始运行，并马上设置全局变量以通知其它进程该文件仍旧是打开的。当其它进程进入运行状态，会看到这个全局变量并返回到睡眠状态。

使用tail−f 命令使文件维持打开状态，同时让另一个进程试图访问该文件（同样是在后台，因此无需切换到另一个vt）。一旦第一个后台进程被kill %1杀掉，第二个进程马上会被唤醒，并完成访问该文件的工作。to keep the file open in thebackground, while trying to access it with another process (again in thebackground, so that we need not switch to a different vt). As soon as the firstbackground process is killed with kill %1 , the secondis woken up, is able to access the file and finally terminates.

为了让我们的生活更有趣， module_close在唤醒等待进程这件事上并不具有垄断权。信号，如Ctrl+c (SIGINT)同样可以唤醒进程。这种情况下，我们会想要立即返回并发出错误码−EINTR。 这么做非常重要，举例来说，用户因此就可以在收到该文件前杀掉进程。This is important so users can,for example, kill the process before it receives the file.

还有一点要注意。有时进程不愿睡眠，它们希望要么马上执行任务，要么被告知任务无法完成。这类进程在打开文件时使用标志O_NONBLOCK。如果进程想要执行的操作被阻塞（block），就如本例中这样，内核会返回错误码−EAGAIN。 from operations which would otherwise block, such as opening the file inthis example. 源代码目录中的cat_noblock程序可以被用来以O_NONBLOCK模式打开文件。

hostname:~/lkmpg−examples/09−BlockingProcesses#insmod sleep.ko

hostname:~/lkmpg−examples/09−BlockingProcesses#cat_noblock /proc/sleep

Lastinput:

hostname:~/lkmpg−examples/09−BlockingProcesses#tail −f /proc/sleep &

Lastinput:

tail:/proc/sleep: file truncated

[1]6540

hostname:~/lkmpg−examples/09−BlockingProcesses#cat_noblock /proc/sleep

Openwould block

hostname:~/lkmpg−examples/09−BlockingProcesses#kill %1

[1]+Terminated tail −f /proc/sleep

hostname:~/lkmpg−examples/09−BlockingProcesses#cat_noblock /proc/sleep

Lastinput:

hostname:~/lkmpg−examples/09−BlockingProcesses#

Example 9−1. sleep.c

*sleep.c − create a /proc file, and if several processes try to open it at

* thesame time, put all but one to sleep

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/proc_fs.h> /* Necessary because we use proc fs */

#include<linux/sched.h> /* For putting processes to sleep and

wakingthem up */

#include<asm/uaccess.h> /* for get_user and put_user */

* Themodule's file functions

* Herewe keep the last message received, to prove that we can process our

* input

#defineMESSAGE_LENGTH 80

staticchar Message[MESSAGE_LENGTH];

staticstruct proc_dir_entry *Our_Proc_File;

#definePROC_ENTRY_FILENAME "sleep"

*Since we use the file operations struct, we can't use the special proc

*output provisions − we have to use a standard read function, which is this

*function

staticssize_t module_output(struct file *file, /* see include/linux/fs.h */

char *buf, /* The buffer to put data to(inthe user segment) */

size_t len, /* The length of thebuffer */

loff_t * offset)

{

static int finished = 0;

int i;

char message[MESSAGE_LENGTH + 30];

* Return 0 to signify end of file −that we have nothing

* more to say at this point.

* If you don't understand this bynow, you're hopeless as a kernel

* programmer.

sprintf(message, "Lastinput:%s\n", Message);

for (i = 0; i < len &&message[i]; i++)

put_user(message[i], buf + i);

finished= 1;

returni; /* Return the number of bytes "read" */

}

* Thisfunction receives input from the user when the user writes to the /proc

*file.

staticssize_t module_input(struct file *file, /* The file itself */

const char *buf, /* The buffer withinput */

size_t length, /* The buffer'slength */

loff_t * offset)

{

/* offset to file − ignore */

int i;

* Put the input into Message, wheremodule_output will later be

* able to use it

for (i = 0; i < MESSAGE_LENGTH −1 && i < length; i++)

get_user(Message[i], buf + i);

* we want a standard, zeroterminated string

Message[i] = '\0';

* We need to return the number ofinput characters used

return i;

}

* 1 ifthe file is currently open by somebody

intAlready_Open = 0;

*Queue of processes who want our file

DECLARE_WAIT_QUEUE_HEAD(WaitQ);

*Called when the /proc file is opened

staticint module_open(struct inode *inode, struct file *file)

{

* If the file's flags includeO_NONBLOCK, it means the process doesn't

* want to wait for the file. Inthis case, if the file is already

* open, we should fail with −EAGAIN,meaning "you'll have to try

* again", instead of blockinga process which would rather stay awake.

if ((file−>f_flags &O_NONBLOCK) && Already_Open)

return −EAGAIN;

* This is the correct place fortry_module_get(THIS_MODULE) because

* if a process is in the loop,which is within the kernel module,

* the kernel module must not beremoved.

try_module_get(THIS_MODULE);

* If the file is already open, waituntil it isn't

while (Already_Open) {

int i, is_sig = 0;

* This function puts the currentprocess, including any system

* calls, such as us, to sleep.Execution will be resumed right

* after the function call, eitherbecause somebody called

* wake_up(&WaitQ) (onlymodule_close does that, when the file

* is closed) or when a signal, suchas Ctrl−C, is sent

* to the process

wait_event_interruptible(WaitQ,!Already_Open);

* If we woke up because we got asignal we're not blocking,

* return −EINTR (fail the systemcall). This allows processes

* to be killed or stopped.

* Emmanuel Papirakis:

* This is a little update to workwith 2.2.*. Signals now are contained * in two words (64 bits) and are storedin a structure that contains an * array of two unsigned longs. We now have tomake 2 checks in our if.

* Ori Pomerantz:

* Nobody promised me they'll neveruse more than 64 bits, or that this * book won't be used for a version of Linuxwith a word size of 16 bits. * This code would work in any case.

for (i = 0; i < _NSIG_WORDS&& !is_sig; i++)

is_sig = current−>pending.signal.sig[i]& ~current−> blocked.sig[i];

if (is_sig) {

* It's important to putmodule_put(THIS_MODULE) here,

* because for processes where theopen is interrupted

* there will never be acorresponding close. If we

* don't decrement the usage counthere, we will be

* left with a positive usage countwhich we'll have no

* way to bring down to zero, givingus an immortal

* module, which can only be killedby rebooting

* the machine.

module_put(THIS_MODULE);

return −EINTR;

}

* If we got here, Already_Open mustbe zero

* Open the file

Already_Open = 1;

return 0; /* Allow the access */

}

*Called when the /proc file is closed

intmodule_close(struct inode *inode, struct file *file)

{

* Set Already_Open to zero, so oneof the processes in the WaitQ will

* be able to set Already_Open backto one and to open the file. All

* the other processes will becalled when Already_Open is back to one,

* so they'll go back to sleep.

Already_Open = 0;

* Wake up all the processes inWaitQ, so if anybody is waiting for the

* file, they can have it.

wake_up(&WaitQ);

module_put(THIS_MODULE);

return 0; /* success */

}

* Thisfunction decides whether to allow an operation (return zero) or not

*allow it (return a non−zero which indicates why it is not allowed).

* Theoperation can be one of the following values:

* 0 −Execute (run the "file" − meaningless in our case)

* 2 −Write (input to the kernel module)

* 4 −Read (output from the kernel module)

* Thisis the real function that checks file permissions. The permissions

*returned by ls −l are for reference only, and can be overridden here.

staticint module_permission(struct inode *inode, int op, struct nameidata *nd)

{

* We allow everybody to read fromour module, but only root (uid 0)

* may write to it

if (op == 4 || (op == 2 &&current−>euid == 0))

return 0;

* If it's anything else, access isdenied

return −EACCES;

}

*Structures to register as the /proc file, with pointers to all the relevant

*functions.

* Fileoperations for our proc file. This is where we place pointers to all

* thefunctions called when somebody tries to do something to our file. NULL

*means we don't want to deal with something.

staticstruct file_operations File_Ops_4_Our_Proc_File = {

.read = module_output, /*"read" from the file */

.write = module_input, /*"write" to the file */

.open = module_open, /* called whenthe /proc file is opened */

.release = module_close, /* calledwhen it's closed */

};

*Inode operations for our proc file. We need it so we'll have somewhere to

*specify the file operations structure we want to use, and the function we

* usefor permissions. It's also possible to specify functions to be called

* foranything else which could be done to an inode (although we don't bother,

* wejust put NULL).

staticstruct inode_operations Inode_Ops_4_Our_Proc_File = {

.permission = module_permission, /* checkfor permissions */

};

*Module initialization and cleanup

*Initialize the module − register the proc file

int init_module()

{

Our_Proc_File =create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);

if (Our_Proc_File == NULL) {

remove_proc_entry(PROC_ENTRY_FILENAME,&proc_root);

printk(KERN_ALERT "Error:Could not initialize /proc/test\n");

return −ENOMEM;

}

Our_Proc_File−>owner= THIS_MODULE;

Our_Proc_File−>proc_iops= &Inode_Ops_4_Our_Proc_File;

Our_Proc_File−>proc_fops= &File_Ops_4_Our_Proc_File;

Our_Proc_File−>mode= S_IFREG | S_IRUGO | S_IWUSR;

Our_Proc_File−>uid= 0;

Our_Proc_File−>gid= 0;

Our_Proc_File−>size= 80;

printk(KERN_INFO"/proc/test created\n");

return0;

}

*Cleanup − unregister our file from /proc. This could get dangerous if

*there are still processes waiting in WaitQ, because they are inside our

* openfunction, which will get unloaded. I'll explain how to avoid removal

* of akernel module in such a case in chapter 10.

voidcleanup_module()

{

remove_proc_entry(PROC_ENTRY_FILENAME,&proc_root);

printk(KERN_INFO "/proc/testremoved\n");

}

Example 9−2. cat_noblock.c

/*cat_noblock.c − open a file and display its contents, but exit rather than

* waitfor input */

#include<stdio.h> /* standard I/O */

#include<fcntl.h> /* for open */

#include<unistd.h> /* for read */

#include<stdlib.h> /* for exit */

#include<errno.h> /* for errno */

#defineMAX_BYTES 1024*4

main(intargc, char *argv[])

{

int fd; /* The file descriptor for the fileto read */

size_t bytes; /* The number ofbytes read */

char buffer[MAX_BYTES]; /* Thebuffer for the bytes */

/* Usage */

if (argc != 2) {

printf("Usage: %s<filename>\n", argv[0]);

puts("Reads the content of afile, but doesn't wait for input");

exit(−1);

}

/* Open the file for reading in nonblocking mode */

fd = open(argv[1], O_RDONLY |O_NONBLOCK);

/* If open failed */

if (fd == −1) {

if (errno = EAGAIN)

puts("Open would block");

/* Read the file and output itscontents */

do {

int i;

/* Read characters from the file */

bytes = read(fd, buffer,MAX_BYTES);

/* If there's an error, report itand die */

if (bytes == −1) {

if (errno = EAGAIN)

puts("Normally I'd block, butyou told me not to");

else

puts("Another readerror");

exit(−1);

}

/* Print the characters */

if (bytes > 0) {

for(i=0; i<bytes; i++)

putchar(buffer[i]);

}

/* While there are no errors andthe file isn't over */

} while (bytes > 0);

}

Chapter 10. 取代printk（Replacing Printks）

10.1. Replacing printk

1.2.1.2节中，我说过X和内核编程互不相容。对开发内核模块来说这是事实，但在实际中，我们总想能够把消息发送到任意tty。 youwant to be able to send messages to whichever tty[15] the command to load themodule came from.

做到这一点要用到current，一个指向当前正在运行的进程的指针，来获得当前任务的tty结构。然后，在tty结构中找到指向写字符串函数的指针，就用这个这个函数向tty写一个字符串。

Example 10−1. print_string.c

*print_string.c − Send output to the tty we're running on, regardless if it's

*through X11, telnet, etc. We do this by printing the string to the tty

*associated with the current task.

#include<linux/kernel.h>

#include<linux/module.h>

#include<linux/init.h>

#include<linux/sched.h> /* For current */

#include<linux/tty.h> /* For the tty declarations */

#include<linux/version.h> /* For LINUX_VERSION_CODE */

MODULE_LICENSE("GPL");

MODULE_AUTHOR("PeterJay Salzman");

staticvoid print_string(char *str)

{

struct tty_struct *my_tty;

* tty struct went into signalstruct in 2.6.6

#if (LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,5) )

* The tty for the current task

my_tty = current−>tty;

#else

* The tty for the current task, for2.6.6+ kernels

my_tty =current−>signal−>tty;

#endif

* If my_tty is NULL, the currenttask has no tty you can print to

* (ie, if it's a daemon). If so,there's nothing we can do.

if (my_tty != NULL) {

* my_tty−>driver is a structwhich holds the tty's functions,

* one of which (write) is used towrite strings to the tty.

* It can be used to take a stringeither from the user's or

* kernel's memory segment.

* The function's 1st parameter isthe tty to write to,

* because the same function wouldnormally be used for all

* tty's of a certain type. The 2ndparameter controls

* whether the function receives astring from kernel

* memory (false, 0) or from usermemory (true, non zero).

* BTW: this param has been removedin Kernels > 2.6.9

* The (2nd) 3rd parameter is apointer to a string.

* The (3rd) 4th parameter is thelength of the string.

* As you will see below, sometimesit's necessary to use

* preprocessor stuff to create codethat works for different

* kernel versions. The (naive)approach we've taken here

* does not scale well. The rightway to deal with this

* is described in section 2 of

*linux/Documentation/SubmittingPatches

((my_tty−>driver)−>write)(my_tty, /* The tty itself */

#if ( LINUX_VERSION_CODE <=KERNEL_VERSION(2,6,9) )

0, /* Don't take the string fromuser space */

#endif

str, /* String */

strlen(str)); /* Length */

* ttys were originally hardwaredevices, which (usually)

* strictly followed the ASCIIstandard. In ASCII, to move to

* a new line you need twocharacters, a carriage return and a

* line feed. On Unix, the ASCIIline feed is used for both

* purposes − so we can't just use\n, because it wouldn't have

* a carriage return and the nextline will start at the

* column right after the line feed.

* This is why text files aredifferent between Unix and

* MS Windows. In CP/M andderivatives, like MS−DOS and

* MS Windows, the ASCII standardwas strictly adhered to,

* and therefore a newline requirsboth a LF and a CR.

#if ( LINUX_VERSION_CODE <=KERNEL_VERSION(2,6,9) )

((my_tty−>driver)−>write)(my_tty, 0, "\015\012", 2);

#else

((my_tty−>driver)−>write)(my_tty, "\015\012", 2);

#endif

}

staticint __init print_string_init(void)

{

print_string("The module hasbeen inserted. Hello world!");

return 0;

}

staticvoid __exit print_string_exit(void)

{

print_string("The module has beenremoved. Farewell world!");

}

module_init(print_string_init);

module_exit(print_string_exit);

10.2. Flashing keyboard LEDs

特定情况下，你可能想以一种更简单、更直观的方式和外界沟通。让键盘上的LED灯闪烁就是这样的方式：它能马上引起注意，也能显示某种状态。每个键盘都有LED，无需设置而且操作简单.

下面是一个小模块，加载后开始闪烁键盘LED，直到被卸载。

Example 10−2. kbleds.c

*kbleds.c − Blink keyboard leds until the module is unloaded.

#include<linux/module.h>

#include<linux/config.h>

#include<linux/init.h>

#include<linux/tty.h> /* For fg_console, MAX_NR_CONSOLES */

#include<linux/kd.h> /* For KDSETLED */

#include<linux/vt.h>

#include<linux/console_struct.h> /* For vc_cons */

MODULE_DESCRIPTION("Examplemodule illustrating the use of Keyboard LEDs.");

MODULE_AUTHOR("DanielePaolo Scarpazza");

MODULE_LICENSE("GPL");

structtimer_list my_timer;

structtty_driver *my_driver;

charkbledstatus = 0;

#defineBLINK_DELAY HZ/5

#defineALL_LEDS_ON 0x07

#defineRESTORE_LEDS 0xFF

*Function my_timer_func blinks the keyboard LEDs periodically by invoking

*command KDSETLED of ioctl() on the keyboard driver. To learn more on virtual

*terminal ioctl operations, please see file:

*/usr/src/linux/drivers/char/vt_ioctl.c, function vt_ioctl().

* Theargument to KDSETLED is alternatively set to 7 (thus causing the led

* modeto be set to LED_SHOW_IOCTL, and all the leds are lit) and to 0xFF

* (anyvalue above 7 switches back the led mode to LED_SHOW_FLAGS, thus

* theLEDs reflect the actual keyboard status). To learn more on this,

*please see file:

*/usr/src/linux/drivers/char/keyboard.c, function setledstate().

staticvoid my_timer_func(unsigned long ptr)

{

int *pstatus = (int *)ptr;

if (*pstatus == ALL_LEDS_ON)

*pstatus = RESTORE_LEDS;

else

*pstatus = ALL_LEDS_ON;

(my_driver−>ioctl)(vc_cons[fg_console].d−>vc_tty, NULL, KDSETLED,

*pstatus);

my_timer.expires = jiffies + BLINK_DELAY;

add_timer(&my_timer);

}

staticint __init kbleds_init(void)

{

int i;

printk(KERN_INFO "kbleds:loading\n");

printk(KERN_INFO "kbleds:fgconsole is %x\n", fg_console);

for (i = 0; i < MAX_NR_CONSOLES;i++) {

if (!vc_cons[i].d)

break;

printk(KERN_INFO "poet_atkm:console[%i/%i] #%i, tty %lx\n", i,

MAX_NR_CONSOLES,vc_cons[i].d−>vc_num, (unsigned long)vc_cons[i].d−>vc_tty);

}

printk(KERN_INFO "kbleds:finished scanning consoles\n");

my_driver =vc_cons[fg_console].d−>vc_tty−>driver;

printk(KERN_INFO "kbleds: ttydriver magic %x\n", my_driver−>magic);

* Set up the LED blink timer thefirst time

init_timer(&my_timer);

my_timer.function = my_timer_func;

my_timer.data = (unsignedlong)&kbledstatus;

my_timer.expires = jiffies +BLINK_DELAY;

add_timer(&my_timer);

return 0;

}

staticvoid __exit kbleds_cleanup(void)

{

printk(KERN_INFO "kbleds:unloading...\n");

del_timer(&my_timer);

(my_driver−>ioctl)(vc_cons[fg_console].d−>vc_tty, NULL, KDSETLED,

RESTORE_LEDS);

}

module_init(kbleds_init);

module_exit(kbleds_cleanup);

如果本章的示例不适合你调试的需要，还有其它的一些小技巧值得一试。

是否曾对make menuconfig 中的CONFIG_LL_DEBUG项的用途好奇过？使能该项你就可以对串口进行低层次访问。这个特性或许并不很强大，可以修改kernel/printk.c或任何其它基本的系统调用以使用printascii，这样一来，就可以（虚拟地）跟踪代码在串口线上所做的任何事。当你需要把内核移植到新的、不被支持的新架构上时，这通常是你要干的第一件事。通过网络控制台记录日志或许也值得一试。

你已经看到了一些帮助调试的方法，还有一些事也需要关注。调式总是会干扰系统的正常运行。有时那些调试代码会改变整个环境，以至于bug都不再出现了。因此的确保调试代码尽可能短小且被排除在正式发行版之外。

Chapter 11. 任务调度Scheduling Tasks

11.1. Scheduling Tasks

我们时常会碰到要在特定时间完成某个任务的情形。如果任务是由进程执行的，可以通过把该任务放入crontab 文件来达成这个目的。如果任务是由内核模块执行的，有两种可能。

第一种，我们可以把一个进程放入 crontab文件，该进程必要时通过系统调用唤醒模块，例如如果打开一个文件。然而这个方法效率低下—从crontab中开始一个新进程，将另一段可执行代码调入内存，所有这些只是为了唤醒一个原本就在内存中的模块。

第二种，我们创建一个函数，该函数在每个时钟中断都会被调用。实现方法是创建一个任务，包含workqueue_struct 结构，该结构内保存一个指向该函数的指针。接下去，利用queue_delayed_work将该任务加入任务列表 my_workqueue, 该列表中的任务将在下一个时钟中断到来时执行。由于我们想要让该函数一直被执行，每次该函数被调用时都需要将其放回my_workqueue，为下一个时钟中断做好准备。

还要注意一点：当模块被rmmod卸载时，首先会检测该模块的计数器（reference count），为0时才调用module_cleanup。而后该模块及其所有函数被卸载。必须要用正确的方式卸载，否则结果不会好。看看下面的代码是如何保证安全的。

Example 11−1. sched.c

*sched.c − scheduale a function to be called on every timer interrupt.

* Thenecessary header files

*Standard in kernel modules

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/proc_fs.h> /* Necessary because we use the proc fs */

#include<linux/workqueue.h> /* We scheduale tasks here */

#include<linux/sched.h> /* We need to put ourselves to sleep and wake up later */

#include<linux/init.h> /* For __init and __exit */

#include<linux/interrupt.h> /* For irqreturn_t */

structproc_dir_entry *Our_Proc_File;

#definePROC_ENTRY_FILENAME "sched"

#defineMY_WORK_QUEUE_NAME "WQsched.c"

* Thenumber of times the timer interrupt has been called so far

staticint TimerIntrpt = 0;

staticvoid intrpt_routine(void *);

staticint die = 0; /* set this to 1 for shutdown */

* Thework queue structure for this task, from workqueue.h

staticstruct workqueue_struct *my_workqueue;

staticstruct work_struct Task;

staticDECLARE_WORK(Task, intrpt_routine, NULL);

* Thisfunction will be called on every timer interrupt. Notice the void*

*pointer − task functions can be used for more than one purpose, each time

*getting a different parameter.

staticvoid intrpt_routine(void *irrelevant)

{

* Increment the counter

TimerIntrpt++;

* If cleanup wants us to die

if (die == 0)

queue_delayed_work(my_workqueue,&Task, 100);

}

* Putdata into the proc fs file.

ssize_tprocfile_read(char *buffer,

char **buffer_location,

off_t offset, int buffer_length,int *eof, void *data)

{

int len; /* The number of bytesactually used */

* It's static so it will still bein memory

* when we leave this function

static char my_buffer[80];

* We give all of our information inone go, so if anybody asks us

* if we have more information theanswer should always be no.

if (offset > 0)

return 0;

* Fill the buffer and get itslength

len = sprintf(my_buffer,"Timer called %d times so far\n", TimerIntrpt);

* Tell the function which called uswhere the buffer is

*buffer_location = my_buffer;

* Return the length

return len;

}

*Initialize the module − register the proc file

int__init init_module()

{

* Create our /proc file

Our_Proc_File =create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);

if (Our_Proc_File == NULL) {

remove_proc_entry(PROC_ENTRY_FILENAME,&proc_root);

printk(KERN_ALERT "Error:Could not initialize /proc/%s\n",

PROC_ENTRY_FILENAME);

return −ENOMEM;

}

Our_Proc_File−>read_proc =procfile_read;

Our_Proc_File−>owner =THIS_MODULE;

Our_Proc_File−>mode = S_IFREG |S_IRUGO;

Our_Proc_File−>uid = 0;

Our_Proc_File−>gid = 0;

Our_Proc_File−>size = 80;

* Put the task in the work_timertask queue, so it will be executed at

* next timer interrupt

my_workqueue =create_workqueue(MY_WORK_QUEUE_NAME);

queue_delayed_work(my_workqueue,&Task, 100);

printk(KERN_INFO "/proc/%screated\n", PROC_ENTRY_FILENAME);

return 0;

}

*Cleanup

void__exit cleanup_module()

{

* Unregister our /proc file

remove_proc_entry(PROC_ENTRY_FILENAME,&proc_root);

printk(KERN_INFO "/proc/%sremoved\n", PROC_ENTRY_FILENAME);

die = 1; /* keep intrp_routine fromqueueing itself */

cancel_delayed_work(&Task); /*no "new ones" */

flush_workqueue(my_workqueue); /*wait till all "old ones" finished */

destroy_workqueue(my_workqueue);

* Sleep until intrpt_routine iscalled one last time. This is

* necessary, because otherwisewe'll deallocate the memory holding

* intrpt_routine and Task whilework_timer still references them.

* Notice that here we don't allowsignals to interrupt us.

* Since WaitQ is now not NULL, thisautomatically tells the interrupt

* routine it's time to die.

}

* somework_queue related functions

* arejust available to GPL licensed Modules

MODULE_LICENSE("GPL");

Chapter 12. Interrupt Handlers

12.1. Interrupt Handlers

12.1.1. Interrupt Handlers

除过上一章，迄今为止我们在内核中所作的都是在响应进程的请求，要么处理特殊的文件，要么发送ioctl()，或者发出一个系统调用。但内核的工作可不只是响应进程的请求，另一个非常重要的工作是同连接到系统的硬件‘对话’。

CPU和系统中的其它硬件之间存在两种类型的互动。第一种，CPU向其它硬件发送命令；第二种，其它硬件向CPU发送信息。第二种称作中断。

硬件设备通常只有少量的RAM，如果没能及时读取该RAM中的信息，该信息可能就永远就读不到了。因此中断的实现必须要优先考虑其它硬件，而不是CPU。

Linux将硬件中断称作IRQ's (InterruptRe quests)[16]。有两类IRQ's，短的和长的。短IRQ只占用非常少的时间，该段时间内系统中的其它硬件会被阻塞掉，并忽略其它的中断。长IRQ则需要较长时间，该时间段内允许其它中断发生（但不允许同一个设备的中断）。应该尽可能把中断声明为长的。

收到中断请求后，CPU会放下手中的工作(除非正在处理更高优先级的中断，这种情况下，新中断会在现中断完成后被处理)，在栈上保存一些参数后调用中断处理程序（interrupt handler）。这意味着有些事是不能在中断处理程序中做的，因为系统处于一个不确定的状态。解决方法是中断处理程序先做那些需要马上被处理的工作，通常是从硬件读取信息或向硬件发送信息，而将对这些信息的处理工作放到以后再做（称作"bottom half")，然后立即返回。内核会保证尽快调用bottom half – bottom half可以做内核模块能做的所有事情。这套机制的实现方法是当收到IRQ时，通过调用request_irq() 来调用你自己的中断处理程序[17]。request_irq()接收IRQ号、你的中断处理程序的名字、标志、/proc/interrupts文件名（’interrupts’）和一个要传递到中断处理程序的参数。一般情况下系统支持的中断数是确定的，是由硬件决定的。标志可以包括SA_SHIRQ 以表示其它中断处理程序可以共享该IRQ (通常是因为几个硬件设备连到同一个IRQ) 。 SA_INTERRUPT 表示这是一个快速中断。This function will only succeedif there isn't already a handler on this IRQ, or if you're both willing toshare. 接下去，在中断处理程序中和硬件通讯并使用queue_work()mark_bh(BH_IMMEDIATE) 调度bottom half。

12.1.2. Intel架构下的键盘 (Keyboards on the Intel Architecture)

本章剩余部分仅仅针对Intel架构。如果你的机器是其它架构，示例代码将无法工作。

为本章编写示例代码时我碰到一个问题。一方面，一个有用的例子一定要能跑在所有人的机器上，而且结果要有意义。另一方面，内核中已经包括了常见设备的驱动程序，这些驱动程序无法和我要写的示例程序共存。

我想到的解决方案是改写键盘中断，并在一开始就关掉掉原有的键盘中断。由于该中断在内核中( drivers/char/keyboard.c)被定义为静态符号，因此无法恢复。如果你的文件系统对你非常重要，在insmod'本模块前，在另一个终端上先执行sleep 120; reboot。

示例代码使用IRQ1，Intel架构用该IRQ控制键盘。当接收到键盘中断后，读取键盘状态(that's the purpose of the inb(0x64))和键盘扫描码。而后，一旦kernel认为可以，就运行got_char，该函数获取键码（扫描码的前7位）和按键状态（bit8=0：按下； 1：松开）

Example 12−1. intrpt.c

*intrpt.c − An interrupt handler.

* Thenecessary header files

*Standard in kernel modules

#include<linux/kernel.h> /* We're doing kernel work */

#include<linux/module.h> /* Specifically, a module */

#include<linux/sched.h>

#include<linux/workqueue.h>

#include<linux/interrupt.h> /* We want an interrupt */

#include<asm/io.h>

#defineMY_WORK_QUEUE_NAME "WQsched.c"

staticstruct workqueue_struct *my_workqueue;

* Thiswill get called by the kernel as soon as it's safe

* todo everything normally allowed by kernel modules.

staticvoid got_char(void *scancode)

{

printk(KERN_INFO "Scan Code %x%s.\n",

(int)*((char *)scancode) & 0x7F,*((char *)scancode)

& 0x80 ? "Released" :"Pressed");

}

* Thisfunction services keyboard interrupts. It reads the relevant

*information from the keyboard and then puts the non time critical

* partinto the work queue. This will be run when the kernel considers it safe.

irqreturn_tirq_handler(int irq, void *dev_id, struct pt_regs *regs)

{

* This variables are static becausethey need to be

* accessible (through pointers) tothe bottom half routine.

static int initialised = 0;

static unsigned char scancode;

static struct work_struct task;

unsigned char status;

* Read keyboard status

status = inb(0x64);

scancode = inb(0x60);

if (initialised == 0) {

INIT_WORK(&task, got_char,&scancode);

initialised = 1;

} else {

PREPARE_WORK(&task, got_char,&scancode);

}

queue_work(my_workqueue,&task);

return IRQ_HANDLED;

}

*Initialize the module − register the IRQ handler

intinit_module()

{

my_workqueue =create_workqueue(MY_WORK_QUEUE_NAME);

* Since the keyboard handler won'tco−exist with another handler,

* such as us, we have to disable it(free its IRQ) before we do

* anything. Since we don't knowwhere it is, there's no way to

* reinstate it later − so thecomputer will have to be rebooted

* when we're done.

free_irq(1, NULL);

* Request IRQ 1, the keyboard IRQ,to go to our irq_handler.

* SA_SHIRQ means we're willing tohave othe handlers on this IRQ.

* SA_INTERRUPT can be used to makethe handler into a fast interrupt.

return request_irq(1, /* The numberof the keyboard IRQ on PCs */

irq_handler,/* our handler */

SA_SHIRQ,"test_keyboard_irq_handler",

(void*)(irq_handler));

* This is only here forcompleteness. It's totally irrelevant, since

* we don't have a way to restorethe normal keyboard interrupt so the

* computer is completely uselessand has to be rebooted.

free_irq(1, NULL);

}

* somework_queue related functions are just available to GPL licensed Modules

MODULE_LICENSE("GPL");

Chapter 13. 对称多处理Symmetric MultiProcessing

13.1. Symmetrical Multi−Processing

要提升性能，最简单、最便宜的方案之一就是在板子上多加一个CPU。既可以让不同的CPU做不同的工作(非对称多处理，asymmetrical multi−processing)，也可以让它们做一样的工作(对称多处理，symmetricalmulti−processing, 简称SMP).。高效地非对称多处理需要预先知道计算机所要执行的任务，对于通用的OS，如Linux，这是不可能的。而SMP就相对容易实现了。

说到相对容易，我其实是想说不很容易。在SMP环境中，多个CPU共享同一块内存，因此一个CPU上运行的代码可能会影响到另一个CPU正在使用的那段内存。无法确认你之前为某个变量的赋值是否被改变了；另一个CPU可能在你不注意的时候也访问过该变量。Obviously, it'simpossible to program like this.

编写进程时这不成问题，因为进程在某一时刻总是运行在一个CPU上。而内核则可能会被运行在不同CPU上的不同进程调用。

2.0内核没有这个问题，因为整个内核受一个自旋锁（spinlock）控制。意味着如果一个CPU处于内核态，另一个CPU必须等到前一个CPU退出才能进入内核态。这种做法是安全的，但效率不高。2.2.x内核允许多个CPU同时处于内核态，模块编写者一定要了解这一点。

Chapter 14. Common Pitfalls

14.1. Common Pitfalls

在编写实际的内核模块之前，还有几点你需要留意。

如果因为我没有讲到而导致问题发生，把问题报给我。而我将把由于你购买本书所得到的收入都退还给你。

使用标准库

不能使用标准库。内核模块只能使用内核函数，可以在/proc/kallsyms找到这些函数。

关中断

短时间内这么做没问题，但如果你随后没有开中断，就只能断电了。

把头塞到老虎嘴里

也许不用提醒你这个，但还是啰嗦一下吧，以防万一。

Appendix A. Changes: 2.0 To 2.2

A.1. Changes between 2.4 and 2.6

A.1.1. Changes between 2.4 and 2.6

我不确定所有改动都被记录了。你可以通过比较针对2.4内核和2.6内核的本文找到一些移植的线索。除此之外，对那些要把驱动程序从2.4移植到2.6内核的人来说，可能会想要看看这个网站：http://lwn.net/Articles/driver−porting/ .

如果还是未能找到所需，试着找一个和你的驱动类似的、支持2.4和2.6内核的驱动程序。用文件比较工具xxdiff 或 meld 比较一下两个版本的差异会有巨大帮助。还可以看看你的驱动是否在linux/Documentation/内核文档的覆盖范围之内。另外，开始工作前找个相应的邮件列表并提些问题也能得到指点。

Appendix B. Where To Go From Here

B.1. Where From Here?

我本可以很容易地为本文增加几个章节：增加文件系统，增加协议栈等等。也可以为一些内核机制做些解释，比如内核启动时的自解压缩或磁盘接口。

我选择不这么做。写作本文的目标是为编写内核模块提供一个出发点并讲述相关技术。对那些对内核编程真正有兴趣的读者，我推荐 Juan−Mariano deGoyeneche 的系列文章。还有，如Linus所说，最好地学习内核的方法就是读源代码。

希望我已经帮你成为更好的程序员，至少从技术中找到了乐趣。如果你写了有用的内核代码，希望你能以GPL发布它，让我也得以使用它。

如果想对本文有所贡献，请联系维护者以获得细节信息。你也已经看到了，还有一章没有完成，等着用sysfs的示例代码去完成它呢。

Linux 内核编程指南相关推荐

Linux内核模块编程指南(一)(转)
Linux内核模块编程指南(一)(转) 当第一个原始的程序员在最开始的窑洞计算机之墙上凿过第一个程序时,那是一个在羚羊图案上画上"Hello, world"的程序.罗马人的编程书籍 ...
linux内核调试指南
Hunnad的专栏 * 条新通知 * 登录 * 注册 * 欢迎 * 退出 * 我的博客 * 配置 * 写文章 * 文章管理 * 博客首页 * * * * 空间 * 博客 * 好友 * 相册 * 留言 ...
linux内核调试指南 1
大海里的鱼有很多,而我们需要的是鱼钩一只一些前言作者前言知识从哪里来为什么撰写本文档为什么需要汇编级调试 ***第一部分:基础知识*** 总纲:内核世界的陷阱源码阅读的陷阱代码调试的陷阱 ...
初探linux内核编程，参数传递以及模块间函数调用
一.前言我们一起从3个小例子来体验一下linux内核编程.如下: 1. 内核编程之hello world 2. 模块参数传递 3. 模块间函数调用二.准备工作首先,在你的linux系统上面安装l ...
linux内核_Linux驱动编程的本质就是Linux内核编程
由于Linux驱动编程的本质属于Linux内核编程,因此我们非常有必要熟悉Linux内核以及Linux内核的特点. 这篇文章将会帮助读者打下Linux驱动编程的基础知识. 本篇文章分为如下三个小节进行 ...
【华为云技术分享】Linux内核编程环境 (1)
在上一期中,我们介绍了Linux内核的源码结构,这一期我们介绍Linux内核编程环境,首先介绍的是Linux内核的编译方法. 一.Linux内核编译方法本期中我们以Linux 4.19.94版内核来 ...
linux内核测试指南第一章
linux内核测试指南第一章内核,补丁,内核树和编译 1.1 内核 Linux内核的当前版本通常可以从linux内核档案网站(http://www.kernel.org/)以一个大的压缩文件的 ...
Linux内核编程接口函数
Linux内核编程接口函数转载请注明出处: http://blog.csdn.net/drivelinux/article/details/8656280 字符设备相关函数 1.alloc_chrd ...
linux内核编程（hello world示例程序）
linux内核编程(hello world) Linux可加载内核模块是 Linux 内核的最重要创新之一.它们提供了可伸缩的.动态的内核.其它开发者可以不用重新编译整个内核便可以开发内核层的程序,极 ...

Linux 内核编程指南

Linux 内核编程指南相关推荐

最新文章

热门文章