CPUFreq简介

CPUFreq是一种实时的电压和频率调节技术,也叫DVFS(Dynamic Voltage and Frequency Scaling)动态电压频率调节。

为何需要CPUFreq

随着技术的发展,CPU的频率越来越高,性能越来越好,芯片制造工艺也越来越先进。但高性能的同时也带来高发热。其实移动嵌入式设备并不需要时刻保持高性能。因此,需要一种机制,实现动态地调节频率和电压,以实现性能和功耗的平衡。

CPUFreq软件框架

和一般的linux子系统类似,CPUFreq采用了机制与策略分离的设计架构。分为三个模块:

  • cpufreq core: 对cpufreq governors和cpufreq drivers进行了封装和抽象并定义了清晰的接口,从而在设计上完成了对机制和策略的分离。

  • cpufreq drivers:位于cpucore的底层,用于设置具体cpu硬件的频率。通过cpufreq driver可以使cpu频率得到调整。cpufreq driver借助Linux Cpufreq标准子系统中的cpufreq_driver结构体,完成cpu调频驱动的注册及实现。

  • cpufreq governor:位于cpucore的上层,用于CPU升降频检测,根据系统和负载,决定cpu频率要调节到多少。cpufreq governor借助于linux cpufreq子系统中cpufreq_governor结构体,完成了cpu调频策略的注册和实现。

CPUFreq实现原理

linux cpufreq通过向系统注册实现cpufreq driver和cpufreq governor。cpu governor实现调频的策略,cpu driver实现调频的实际操作,从而完成动态调节频率和电压。一般情况下,优先调节频率,频率无法满足,再调节电压以实现调频。

CPUFreq sys用户态接口

cpufreq相关的节点位于/sys/devices/system/cpu/cpu0/cpufreq目录下:

$ cd /sys/devices/system/cpu/cpu0/cpufreq

可以看到以下节点:

shell@tiny4412:/sys/devices/system/cpu/cpu0/cpufreq # ls
affected_cpus
cpuinfo_cur_freq
cpuinfo_max_freq
cpuinfo_min_freq
cpuinfo_transition_latency
related_cpus
scaling_available_governors
scaling_cur_freq
scaling_driver
scaling_governor
scaling_max_freq
scaling_min_freq
scaling_setspeed
stats

具体含义如下表:

CPUFreq实现分析 

CPUFreq Core层

CPUFreq子系统将一些共同的逻辑代码组织在一起,构成了CPUFreq核心模块。这些公共逻辑模块向CPUFreq和其它内核模块提供了必要的API完成一个完整的CPUFreq子系统。这一节我们分析CPUFreq核心层的一些重要API的实现及使用。

代码位置:

/drivers/cpufreq/cpufreq.c

CPUFreq子系统初始化

static int __init cpufreq_core_init(void)
{int cpu;if (cpufreq_disabled())return -ENODEV;for_each_possible_cpu(cpu) {per_cpu(cpufreq_policy_cpu, cpu) = -1;init_rwsem(&per_cpu(cpu_policy_rwsem, cpu));}cpufreq_global_kobject = kobject_create_and_add("cpufreq", &cpu_subsys.dev_root->kobj);BUG_ON(!cpufreq_global_kobject);#if defined(CONFIG_ARCH_SUNXI) && defined(CONFIG_HOTPLUG_CPU)/* register reboot notifier for process cpus when reboot */register_reboot_notifier(&reboot_notifier);
#endifreturn 0;
}
core_initcall(cpufreq_core_init);

可见,CPUFreq子系统在系统启动阶段由Initcall机制调用完成核心部分的初始化工作。cpufreq_policy_cpu是一个per_cpu变量,在smp系统下,每个cpu可以有自己独立的policy,也可以与其它cpu共用一个policy。通过kobject_create_and_add函数建立cpufreq节点,这与我们之前看到的sys下的cpufreq节点相吻合。该节点以后会用来放其它一些参数。
参数cpu_subsys是内核的一个全局变量,是由更早期的初始化时初始化的,代码在drivers/base/cpu.c中:

struct bus_type cpu_subsys = {.name = "cpu",.dev_name = "cpu",
};
EXPORT_SYMBOL_GPL(cpu_subsys);void __init cpu_dev_init(void)
{if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))panic("Failed to register CPU subsystem");cpu_dev_register_generic();
}

这将会建立一根cpu总线,总线下挂着系统中所有的cpu,cpu总线设备的根目录就位于:/sys/devices/system/cpu,同时,/sys/bus下也会出现一个cpu的总线节点。cpu总线设备的根目录下会依次出现cpu0,cpu1,…… cpux节点,每个cpu对应其中的一个设备节点。CPUFreq子系统利用这个cpu_subsys来获取系统中的cpu设备,并在这些cpu设备下面建立相应的cpufreq对象,这个我们在后面再讨论。
这样看来,cpufreq子系统的初始化其实没有做什么重要的事情,只是初始化了几个per_cpu变量和建立了一个cpufreq文件节点。下图是初始化过程的序列图:

注册cpufreq_governor

系统中可以同时存在多个governor策略,一个policy通过cpufreq_policy结构中的governor指针和某个governor相关联。要想一个governor被policy使用,首先要把该governor注册到cpufreq的核心中,我们可以通过核心层提供的API来完成注册:

int cpufreq_register_governor(struct cpufreq_governor *governor)
{int err;if (!governor)return -EINVAL;if (cpufreq_disabled())return -ENODEV;mutex_lock(&cpufreq_governor_mutex);governor->initialized = 0;err = -EBUSY;if (__find_governor(governor->name) == NULL) {err = 0;list_add(&governor->governor_list, &cpufreq_governor_list);}mutex_unlock(&cpufreq_governor_mutex);return err;
}

核心层定义了一个全局链表变量:cpufreq_governor_list,注册函数首先根据governor的名称,通过__find_governor()函数查找该governor是否已經被注册过,如果没有被注册过,则把代表该governor的结构体添加到cpufreq_governor_list链表中。

注册cpufreq_driver驱动

与governor不同,系统中只会存在一个cpufreq_driver驱动,cpufreq_driver是平台相关的,负责最终实施频率的调整动作,而选择工作频率的策略是由governor完成的。所以,系统中只需要注册一个cpufreq_driver即可,它只负责如何控制该平台的时钟系统,从而设定由governor确定的工作频率。核心提供了一个API:cpufreq_register_driver来完成注册工作。
下面我们分析一下这个函数的工作过程:

int cpufreq_register_driver(struct cpufreq_driver *driver_data)
{unsigned long flags;int ret;if (cpufreq_disabled())return -ENODEV;// 从代码可以看到,verify和init回调函数必须要实现,而setpolicy和target回调则至少要被实现其中的一个。if (!driver_data || !driver_data->verify || !driver_data->init ||((!driver_data->setpolicy) && (!driver_data->target)))return -EINVAL;pr_debug("trying to register driver %s\n", driver_data->name);if (driver_data->setpolicy)driver_data->flags |= CPUFREQ_CONST_LOOPS;write_lock_irqsave(&cpufreq_driver_lock, flags);//检查全局变量cpufreq_driver是否已经被赋值,如果没有,则传入的参数被赋值给全局变量cpufreq_driver,从而保证了系统中只会注册一个cpufreq_driver驱动if (cpufreq_driver) {write_unlock_irqrestore(&cpufreq_driver_lock, flags);return -EBUSY;}cpufreq_driver = driver_data;write_unlock_irqrestore(&cpufreq_driver_lock, flags);//通过subsys_interface_register给每一个cpu建立一个cpufreq_policyret = subsys_interface_register(&cpufreq_interface);if (ret)goto err_null_driver;if (!(cpufreq_driver->flags & CPUFREQ_STICKY)) {int i;ret = -ENODEV;/* check for at least one working CPU */for (i = 0; i < nr_cpu_ids; i++)if (cpu_possible(i) && per_cpu(cpufreq_cpu_data, i)) {ret = 0;break;}/* if all ->init() calls failed, unregister */if (ret) {pr_debug("no CPU initialized for driver %s\n",driver_data->name);goto err_if_unreg;}}//注册cpu hot plug通知,以便在cpu hot plug的时候,能够动态地处理各个cpu policy之间的关系(比如迁移负责管理的cpu等等)register_hotcpu_notifier(&cpufreq_cpu_notifier);pr_debug("driver %s up and running\n", driver_data->name);return 0;
err_if_unreg:subsys_interface_unregister(&cpufreq_interface);
err_null_driver:write_lock_irqsave(&cpufreq_driver_lock, flags);cpufreq_driver = NULL;write_unlock_irqrestore(&cpufreq_driver_lock, flags);return ret;
}

cpufreq_interface结构体如下:

 static struct subsys_interface cpufreq_interface = {.name       = "cpufreq",.subsys     = &cpu_subsys,.add_dev    = cpufreq_add_dev,.remove_dev = cpufreq_remove_dev,
};

subsys_interface_register遍历子系统下面的每一个子设备,然后用这个子设备作为参数,调用cpufrq_interface结构的add_dev回调函数,这里的回调函数被指向了cpufreq_add_dev。

下图是cpufreq_driver注册过程的序列图:

通过__cpufreq_set_policy函数,最终使得该policy正式生效。到这里,每个cpu的policy已经建立完毕,并正式开始工作。

__cpufreq_set_policy函数时序图如下:

其它API

int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
int cpufreq_unregister_notifier(struct notifier_block *nb, unsigned int list);

以上两个API用于注册和注销cpufreq系统的通知消息,第二个参数可以选择通知的类型,可以有以下两种类型:

  • CPUFREQ_TRANSITION_NOTIFIER 收到频率变更通知
  • CPUFREQ_POLICY_NOTIFIER 收到policy更新通知

cpufreq_driver_target:用来设置目标频率,实际回调cpufreq的target函数。

int __cpufreq_driver_target(struct cpufreq_policy *policy,unsigned int target_freq,unsigned int relation)
{int retval = -EINVAL;unsigned int old_target_freq = target_freq;if (cpufreq_disabled())return -ENODEV;/* Make sure that target_freq is within supported range */if (target_freq > policy->max)target_freq = policy->max;if (target_freq < policy->min)target_freq = policy->min;pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",policy->cpu, target_freq, relation, old_target_freq);if (target_freq == policy->cur)return 0;if (cpufreq_driver->target)retval = cpufreq_driver->target(policy, target_freq, relation);return retval;
}

CPUFreq driver层

通常一个驱动工程师驱动需要实现是大多是cpufreq driver,这部有具体的cpu差异。cpufreq driver主要完成平台相关的CPU频率/电压的控制,它在cpufreq framework中是非常简单的一个模块,主要是定义一个struct cpufreq_driver变量,填充必要的字段,并根据平台的特性,实现其中的回调函数。然后注册到系统中去。
cpufreq_driver 结构体如下所示。

struct cpufreq_driver {struct module           *owner;  //一般这THIS_MODULEchar            name[CPUFREQ_NAME_LEN]; //cpufreq driver名字,如"cpufreq-sunxi"u8          flags; //标志:可以设置一些值,如CPUFREQ_STICKY,表示就算所有的init调用都失败了,driver也不被remove。bool            have_governor_per_policy;/* needed by all drivers */int (*init)     (struct cpufreq_policy *policy); //必须实现,用于在cpufreq core在cpu device添加后运行int (*verify)   (struct cpufreq_policy *policy); //必须实现,在当上层软件需要设定一个新的policy的时候,会调用driver的verify回调函数,检查该policy是否合法/* define one out of two */int (*setpolicy)    (struct cpufreq_policy *policy); //一般不实现 int (*target)   (struct cpufreq_policy *policy, //实际的调频函数unsigned int target_freq,unsigned int relation);/* should be defined, if possible */unsigned int    (*get)  (unsigned int cpu); //用于获取指定cpu的频率值/* optional */unsigned int (*getavg)  (struct cpufreq_policy *policy,unsigned int cpu);int (*bios_limit)   (int cpu, unsigned int *limit);int (*exit)     (struct cpufreq_policy *policy);int (*suspend)  (struct cpufreq_policy *policy);int (*resume)   (struct cpufreq_policy *policy);struct freq_attr    **attr;
}

下面例子填充并实现cpufreq_driver结构体中这些必要成员。

static struct cpufreq_driver sunxi_cpufreq_driver = {.name   = "cpufreq-sunxi",.flags  = CPUFREQ_STICKY,.init   = sunxi_cpufreq_init,.verify = sunxi_cpufreq_verify,.target = sunxi_cpufreq_target,.get    = sunxi_cpufreq_get,.attr   = sunxi_cpufreq_attr,
};

先看一下init函数,init函数主要完成从device tree里获取对应的clock,regulator配置最大最小频率等。
device tree配置如下:

cpu@0 {device_type = "cpu";compatible = "arm,cortex-a53","arm,armv8";reg = <0x0 0x0>;enable-method = "psci";cpufreq_tbl = < 4800006480007200008160009120001008000110400011520001200000>;clock-latency = <2000000>;clock-frequency = <1008000000>;cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0 &SYS_SLEEP_0>;
};

Init函数如下:

static int __init sunxi_cpufreq_initcall(void)
{struct device_node *np;const struct property *prop;struct cpufreq_frequency_table *freq_tbl;const __be32 *val;int ret, cnt, i;np = of_find_node_by_path("/cpus/cpu@0");if (!np) {CPUFREQ_ERR("No cpu node found\n");return -ENODEV;}if (of_property_read_u32(np, "clock-latency",&sunxi_cpufreq.transition_latency))sunxi_cpufreq.transition_latency = CPUFREQ_ETERNAL;prop = of_find_property(np, "cpufreq_tbl", NULL);if (!prop || !prop->value) {CPUFREQ_ERR("Invalid cpufreq_tbl\n");ret = -ENODEV;goto out_put_node;}cnt = prop->length / sizeof(u32);val = prop->value;freq_tbl = kmalloc(sizeof(*freq_tbl) * (cnt + 1), GFP_KERNEL);if (!freq_tbl) {ret = -ENOMEM;goto out_put_node;}for (i = 0; i < cnt; i++) {freq_tbl[i].index = i;freq_tbl[i].frequency = be32_to_cpup(val++);}freq_tbl[i].index = i;freq_tbl[i].frequency = CPUFREQ_TABLE_END;sunxi_cpufreq.freq_table = freq_tbl;#ifdef CONFIG_DEBUG_FSsunxi_cpufreq.cpufreq_set_us = 0;sunxi_cpufreq.cpufreq_get_us = 0;
#endifsunxi_cpufreq.last_freq = ~0;sunxi_cpufreq.clk_pll = clk_get(NULL, PLL_CPU_CLK);if (IS_ERR_OR_NULL(sunxi_cpufreq.clk_pll)) {CPUFREQ_ERR("Unable to get PLL CPU clock\n");ret = PTR_ERR(sunxi_cpufreq.clk_pll);goto out_err_clk_pll;}sunxi_cpufreq.clk_cpu = clk_get(NULL, CPU_CLK);if (IS_ERR_OR_NULL(sunxi_cpufreq.clk_cpu)) {CPUFREQ_ERR("Unable to get CPU clock\n");ret = PTR_ERR(sunxi_cpufreq.clk_cpu);goto out_err_clk_cpu;}sunxi_cpufreq.vdd_cpu = regulator_get(NULL, CPU_VDD);if (IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu)) {CPUFREQ_ERR("Unable to get CPU regulator\n");ret = PTR_ERR(sunxi_cpufreq.vdd_cpu);/* do not return error even if error*/}/* init cpu frequency from dt */ret = __init_freq_dt();if (ret == -ENODEV
#ifdef CONFIG_CPU_VOLTAGE_SCALING|| ret == -EINVAL
#endif)goto out_err_dt;pr_debug("[cpufreq] max: %uMHz, min: %uMHz, ext: %uMHz, boot: %uMHz\n",sunxi_cpufreq.max_freq / 1000, sunxi_cpufreq.min_freq / 1000,sunxi_cpufreq.ext_freq / 1000, sunxi_cpufreq.boot_freq / 1000);#ifdef CONFIG_CPU_VOLTAGE_SCALING__vftable_show();sunxi_cpufreq.last_vdd = sunxi_cpufreq_getvolt();
#endifmutex_init(&sunxi_cpufreq.lock);ret = cpufreq_register_driver(&sunxi_cpufreq_driver);if (ret) {CPUFREQ_ERR("failed register driver\n");goto out_err_register;} else {goto out_put_node;}out_err_register:mutex_destroy(&sunxi_cpufreq.lock);
out_err_dt:if (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu)) {regulator_put(sunxi_cpufreq.vdd_cpu);}clk_put(sunxi_cpufreq.clk_cpu);
out_err_clk_cpu:clk_put(sunxi_cpufreq.clk_pll);
out_err_clk_pll:kfree(freq_tbl);
out_put_node:of_node_put(np);return ret;
}

从上面可以看出,init函数主要的工作是从device tree中获取资源并配置最大最小频率等,然后注册一个cpufreq驱动。

下看看一下cpufreq_frequency_table_verify的实现,该函数主要是确保在policy->min和policy->max之间至少有一个有效
频率,并且所有其他的指标都符合。

static int sunxi_cpufreq_verify(struct cpufreq_policy *policy)
{return cpufreq_frequency_table_verify(policy, sunxi_cpufreq.freq_table);
}

get函数主要是获取当前cpu频率。

static unsigned int sunxi_cpufreq_get(unsigned int cpu)
{unsigned int current_freq = 0;
#ifdef CONFIG_DEBUG_FSktime_t calltime = ktime_get();
#endifclk_get_rate(sunxi_cpufreq.clk_pll);current_freq = clk_get_rate(sunxi_cpufreq.clk_cpu) / 1000;#ifdef CONFIG_DEBUG_FSsunxi_cpufreq.cpufreq_get_us = ktime_to_us(ktime_sub(ktime_get(), calltime));
#endifreturn current_freq;
}

target是实现调频调压的操作者。

static int sunxi_cpufreq_target(struct cpufreq_policy *policy,__u32 freq, __u32 relation)
{int ret = 0;unsigned int            index;struct cpufreq_freqs    freqs;
#ifdef CONFIG_DEBUG_FSktime_t calltime;
#endif
#ifdef CONFIG_SMPint i;
#endif
#ifdef CONFIG_CPU_VOLTAGE_SCALING
unsigned int new_vdd;
#endifmutex_lock(&sunxi_cpufreq.lock);/* avoid repeated calls which cause a needless amout of duplicated* logging output (and CPU time as the calculation process is* done) */if (freq == sunxi_cpufreq.last_freq)goto out;CPUFREQ_DBG(DEBUG_FREQ, "request frequency is %uKHz\n", freq);if (unlikely(sunxi_boot_lock))freq = freq > sunxi_cpufreq.boot_freq ? sunxi_cpufreq.boot_freq : freq;/* try to look for a valid frequency value from cpu frequency table */if (cpufreq_frequency_table_target(policy, sunxi_cpufreq.freq_table,freq, relation, &index)) {CPUFREQ_ERR("try to look for %uKHz failed!\n", freq);ret = -EINVAL;goto out;}/* frequency is same as the value last set, need not adjust */if (sunxi_cpufreq.freq_table[index].frequency == sunxi_cpufreq.last_freq)goto out;freq = sunxi_cpufreq.freq_table[index].frequency;CPUFREQ_DBG(DEBUG_FREQ, "target is find: %uKHz, entry %u\n", freq, index);/* notify that cpu clock will be adjust if needed */if (policy) {freqs.cpu = policy->cpu;freqs.old = sunxi_cpufreq.last_freq;freqs.new = freq;#ifdef CONFIG_SMP/* notifiers */for_each_cpu(i, policy->cpus) {freqs.cpu = i;cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);}
#elsecpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
#endif}#ifdef CONFIG_CPU_VOLTAGE_SCALING/* get vdd value for new frequency */new_vdd = __get_vdd_value(freq * 1000);CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd);if (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu) && (new_vdd > sunxi_cpufreq.last_vdd)) {CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd);if (regulator_set_voltage(sunxi_cpufreq.vdd_cpu, new_vdd*1000, new_vdd*1000)) {CPUFREQ_ERR("try to set cpu vdd failed!\n");/* notify everyone that clock transition finish */if (policy) {freqs.cpu = policy->cpu;;freqs.old = freqs.new;freqs.new = sunxi_cpufreq.last_freq;
#ifdef CONFIG_SMP/* notifiers */for_each_cpu(i, policy->cpus) {freqs.cpu = i;cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);}
#elsecpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
#endif}return -EINVAL;}}
#endif#ifdef CONFIG_DEBUG_FScalltime = ktime_get();
#endif/* try to set cpu frequency */
#ifndef CONFIG_SUNXI_ARISCif (__set_cpufreq_by_ccu(freq))
#elseif (arisc_dvfs_set_cpufreq(freq, ARISC_DVFS_PLL1, ARISC_DVFS_SYN, NULL, NULL))
#endif{CPUFREQ_ERR("set cpu frequency to %uKHz failed!\n", freq);#ifdef CONFIG_CPU_VOLTAGE_SCALINGif (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu) && (new_vdd > sunxi_cpufreq.last_vdd)) {if (regulator_set_voltage(sunxi_cpufreq.vdd_cpu,sunxi_cpufreq.last_vdd*1000, sunxi_cpufreq.last_vdd*1000)) {CPUFREQ_ERR("try to set voltage failed!\n");sunxi_cpufreq.last_vdd = new_vdd;}}
#endif/* set cpu frequency failed */if (policy) {freqs.cpu = policy->cpu;freqs.old = freqs.new;freqs.new = sunxi_cpufreq.last_freq;#ifdef CONFIG_SMP/* notifiers */for_each_cpu(i, policy->cpus) {freqs.cpu = i;cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);}
#elsecpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
#endif}ret = -EINVAL;goto out;}#ifdef CONFIG_DEBUG_FSsunxi_cpufreq.cpufreq_set_us = ktime_to_us(ktime_sub(ktime_get(), calltime));
#endif#ifdef CONFIG_CPU_VOLTAGE_SCALINGif(sunxi_cpufreq.vdd_cpu && (new_vdd < sunxi_cpufreq.last_vdd)) {CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd);if(regulator_set_voltage(sunxi_cpufreq.vdd_cpu, new_vdd*1000, new_vdd*1000)) {CPUFREQ_ERR("try to set voltage failed!\n");new_vdd = sunxi_cpufreq.last_vdd;}}sunxi_cpufreq.last_vdd = new_vdd;
#endif/* notify that cpu clock will be adjust if needed */if (policy) {
#ifdef CONFIG_SMPfor_each_cpu(i, policy->cpus) {freqs.cpu = i;cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);}
#elsecpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
#endif}sunxi_cpufreq.last_freq = freq;CPUFREQ_DBG(DEBUG_FREQ, "DVFS done! Freq[%uMHz] Volt[%umv] ok\n", \sunxi_cpufreq_get(0) / 1000, sunxi_cpufreq_getvolt());out:mutex_unlock(&sunxi_cpufreq.lock);return ret;
}

代码比较较容易理解,这里不再分析,流程图如下:

CPUFreq governor层

上面提到过,governor的作用是根据系统的负载,检测系统的负载状况,然后根据当前的负载,选择出某个可供使用的工作频率,然后把该工作频率传递给cpufreq_driver,完成频率的动态调节。内核默认提供了5种governor供我们使用.
- Performance: 性能优先的governor,直接将cpu频率设置为policy->{min,max}中的最大值。一般会被选做默认的governor以节省系统启动时间,之后再切换.
- Powersave:功耗优先的governor,直接将cpu频率设置为policy->{min,max}中的最小值。
- Userspace: 由用户空间程序通过scaling_setspeed文件修改频率。一般用作调试。
- Ondemand:根据CPU的当前使用率,动态的调节CPU频率。
- interactive: 交互式动态调节CPU频率,与Ondemand类似,由谷歌开发并广泛使用于手机平板等设备上。本文主要讨论该governor。
我们看一下cpufreq_governor结构体:

struct cpufreq_governor {char    name[CPUFREQ_NAME_LEN]; //governor的名字,这里被赋值为interactiveint initialized; //初始化标志位int (*governor) (struct cpufreq_policy *policy,unsigned int event);  //这个calback用于控制governor的行为,比较重要,是governor的一个去切入点ssize_t (*show_setspeed)    (struct cpufreq_policy *policy,char *buf);int (*store_setspeed)   (struct cpufreq_policy *policy,unsigned int freq);unsigned int max_transition_latency; /* HW must be able to switch tonext freq faster than this value in nano secs or wewill fallback to performance governor */struct list_head    governor_list; //所有注册的governor都会被add到这个链表里面struct module       *owner;
};

定义一个governor如下:

struct cpufreq_governor cpufreq_gov_interactive = {.name = "interactive",.governor = cpufreq_governor_interactive,.max_transition_latency = 10000000,.owner = THIS_MODULE,
};

governor是这个结构的核心字段,cpufreq_governor注册后,cpufreq的核心层通过该字段操纵这个governor的行为,包括:初始化、启动、退出等工作。

  • 一个governor如何被初始化的?
    当一个governor被policy选定后,核心层会通过 __ufreq_set_policy函数对该cpu的policy进行设定。如果policy认为这是一个新的governor(和原来使用的旧的governor不相同),policy会通过__cpufreq_governor函数,并传递CPUFREQ_GOV_POLICY_INIT参数,而__cpufreq_governor函数实际上是调用cpufreq_governor结构中的governor回调函数。
    下面是它收到CPUFREQ_GOV_POLICY_INIT参数时的代码片段:
    case CPUFREQ_GOV_POLICY_INIT:if (have_governor_per_policy()) {WARN_ON(tunables);} else if (tunables) {tunables->usage_count++;policy->governor_data = tunables;return 0;}tunables = kzalloc(sizeof(*tunables), GFP_KERNEL);if (!tunables) {pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__);return -ENOMEM;}tunables->usage_count = 1;tunables->io_is_busy = true;tunables->above_hispeed_delay = default_above_hispeed_delay;tunables->nabove_hispeed_delay =ARRAY_SIZE(default_above_hispeed_delay);tunables->go_hispeed_load = DEFAULT_GO_HISPEED_LOAD;tunables->target_loads = default_target_loads;tunables->ntarget_loads = ARRAY_SIZE(default_target_loads);tunables->min_sample_time = DEFAULT_MIN_SAMPLE_TIME;tunables->timer_rate = DEFAULT_TIMER_RATE;tunables->boostpulse_duration_val = DEFAULT_MIN_SAMPLE_TIME;tunables->timer_slack_val = DEFAULT_TIMER_SLACK;spin_lock_init(&tunables->target_loads_lock);spin_lock_init(&tunables->above_hispeed_delay_lock);policy->governor_data = tunables;if (!have_governor_per_policy())common_tunables = tunables;rc = sysfs_create_group(get_governor_parent_kobj(policy),get_sysfs_attr());if (rc) {kfree(tunables);policy->governor_data = NULL;if (!have_governor_per_policy())common_tunables = NULL;return rc;}if (!policy->governor->initialized) {idle_notifier_register(&cpufreq_interactive_idle_nb);cpufreq_register_notifier(&cpufreq_notifier_block,CPUFREQ_TRANSITION_NOTIFIER);}#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFYif (!input_handler_register_count) {cpumask_clear(&interactive_cpumask);rc = input_register_handler(&cpufreq_interactive_input_handler);if (rc)return rc;}tunables->input_event_freq = policy->max *DEFAULT_INPUT_EVENT_FRFQ_PERCENT / 100;tunables->input_dev_monitor = true;input_handler_register_count++;
#endifbreak;

时序图如下:

经过sysfs_create_group后在/sys/devices/system/cpu/cpufreq/interactive建立了对应的sys节点,节点主要包括:

boost: interactive对突发任务的处理。
boostpulse:对突发任务的处理频率上升后持续的时间
go_hispeed_load:高频阈值。当系统的负载超过该值,升频,否则降频。
hispeed_freq: 当workload达到 go_hispeed_load时,频率将被拉高到这个值
input_boost:对input事件,如触屏等突发处理
min_sample_time:最小采样时间。每次调频结果必须维持至少这个时间。
timer_rate: 采样定时器的采样率。

当CPU不处于idel状态时,timer_rate作为采样速率来计算CPU的workload. 当CPU处于idel状态,此时使用一个可延时定时器,会导致CPU不能从idel状态苏醒来响应定时器. 定时器的最大的可延时时间用timer_slack表示,默认值80000 uS.
- 一个governor如何被启动的?
类似governor初始化,event CPUFREQ_GOV_START被调用:

    case CPUFREQ_GOV_START:mutex_lock(&gov_lock);freq_table = cpufreq_frequency_get_table(policy->cpu);//如果没有设置hispeed_freq的值的话,就设置hispeed_freq为policy->maxif (!tunables->hispeed_freq)tunables->hispeed_freq = policy->max;//遍历所有处于online状态的CPUfor_each_cpu(j, policy->cpus) {pcpu = &per_cpu(cpuinfo, j);pcpu->policy = policy;pcpu->target_freq = policy->cur;pcpu->freq_table = freq_table;pcpu->floor_freq = pcpu->target_freq;pcpu->floor_validate_time =ktime_to_us(ktime_get());pcpu->hispeed_validate_time =pcpu->floor_validate_time;pcpu->max_freq = policy->max;down_write(&pcpu->enable_sem);del_timer_sync(&pcpu->cpu_timer);del_timer_sync(&pcpu->cpu_slack_timer);//启动相关的定时器 cpufreq_interactive_timer_start(tunables, j);//启动定时器以后governor就可以工作了,所以设置pcpu->governor_enabled为1pcpu->governor_enabled = 1;up_write(&pcpu->enable_sem);}mutex_unlock(&gov_lock);break;

现在,governor 字段被设置为cpufreq_governor_interactive,我们看看它的实现:

static int cpufreq_governor_interactive(struct cpufreq_policy *policy,unsigned int event)
{int rc;unsigned int j;struct cpufreq_interactive_cpuinfo *pcpu;struct cpufreq_frequency_table *freq_table;struct cpufreq_interactive_tunables *tunables;unsigned long flags;if (have_governor_per_policy())tunables = policy->governor_data;elsetunables = common_tunables;WARN_ON(!tunables && (event != CPUFREQ_GOV_POLICY_INIT));switch (event) {case CPUFREQ_GOV_POLICY_INIT:if (have_governor_per_policy()) {WARN_ON(tunables);} else if (tunables) {tunables->usage_count++;policy->governor_data = tunables;return 0;}tunables = kzalloc(sizeof(*tunables), GFP_KERNEL);if (!tunables) {pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__);return -ENOMEM;}tunables->usage_count = 1;tunables->io_is_busy = true;tunables->above_hispeed_delay = default_above_hispeed_delay;tunables->nabove_hispeed_delay =ARRAY_SIZE(default_above_hispeed_delay);tunables->go_hispeed_load = DEFAULT_GO_HISPEED_LOAD;tunables->target_loads = default_target_loads;tunables->ntarget_loads = ARRAY_SIZE(default_target_loads);tunables->min_sample_time = DEFAULT_MIN_SAMPLE_TIME;tunables->timer_rate = DEFAULT_TIMER_RATE;tunables->boostpulse_duration_val = DEFAULT_MIN_SAMPLE_TIME;tunables->timer_slack_val = DEFAULT_TIMER_SLACK;spin_lock_init(&tunables->target_loads_lock);spin_lock_init(&tunables->above_hispeed_delay_lock);policy->governor_data = tunables;if (!have_governor_per_policy())common_tunables = tunables;rc = sysfs_create_group(get_governor_parent_kobj(policy),get_sysfs_attr());if (rc) {kfree(tunables);policy->governor_data = NULL;if (!have_governor_per_policy())common_tunables = NULL;return rc;}if (!policy->governor->initialized) {idle_notifier_register(&cpufreq_interactive_idle_nb);cpufreq_register_notifier(&cpufreq_notifier_block,CPUFREQ_TRANSITION_NOTIFIER);}#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFYif (!input_handler_register_count) {cpumask_clear(&interactive_cpumask);rc = input_register_handler(&cpufreq_interactive_input_handler);if (rc)return rc;}tunables->input_event_freq = policy->max *DEFAULT_INPUT_EVENT_FRFQ_PERCENT / 100;tunables->input_dev_monitor = true;input_handler_register_count++;
#endifbreak;case CPUFREQ_GOV_POLICY_EXIT:if (!--tunables->usage_count) {if (policy->governor->initialized == 1) {cpufreq_unregister_notifier(&cpufreq_notifier_block,CPUFREQ_TRANSITION_NOTIFIER);idle_notifier_unregister(&cpufreq_interactive_idle_nb);}sysfs_remove_group(get_governor_parent_kobj(policy),get_sysfs_attr());kfree(tunables);common_tunables = NULL;}#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFYif (input_handler_register_count > 0)input_handler_register_count--;if (!input_handler_register_count) {cpumask_clear(&interactive_cpumask);input_unregister_handler(&cpufreq_interactive_input_handler);}
#endifpolicy->governor_data = NULL;break;case CPUFREQ_GOV_START:mutex_lock(&gov_lock);freq_table = cpufreq_frequency_get_table(policy->cpu);if (!tunables->hispeed_freq)tunables->hispeed_freq = policy->max;for_each_cpu(j, policy->cpus) {pcpu = &per_cpu(cpuinfo, j);pcpu->policy = policy;pcpu->target_freq = policy->cur;pcpu->freq_table = freq_table;pcpu->floor_freq = pcpu->target_freq;pcpu->floor_validate_time =ktime_to_us(ktime_get());pcpu->hispeed_validate_time =pcpu->floor_validate_time;pcpu->max_freq = policy->max;down_write(&pcpu->enable_sem);del_timer_sync(&pcpu->cpu_timer);del_timer_sync(&pcpu->cpu_slack_timer);cpufreq_interactive_timer_start(tunables, j);pcpu->governor_enabled = 1;up_write(&pcpu->enable_sem);}#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFYcpumask_or(&interactive_cpumask, &interactive_cpumask, policy->cpus);
#endifmutex_unlock(&gov_lock);break;case CPUFREQ_GOV_STOP:mutex_lock(&gov_lock);for_each_cpu(j, policy->cpus) {pcpu = &per_cpu(cpuinfo, j);down_write(&pcpu->enable_sem);pcpu->governor_enabled = 0;del_timer_sync(&pcpu->cpu_timer);del_timer_sync(&pcpu->cpu_slack_timer);up_write(&pcpu->enable_sem);}#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFYcpumask_andnot(&interactive_cpumask, &interactive_cpumask, policy->cpus);
#endifmutex_unlock(&gov_lock);break;case CPUFREQ_GOV_LIMITS:if (policy->max < policy->cur)__cpufreq_driver_target(policy,policy->max, CPUFREQ_RELATION_H);else if (policy->min > policy->cur)__cpufreq_driver_target(policy,policy->min, CPUFREQ_RELATION_L);for_each_cpu(j, policy->cpus) {pcpu = &per_cpu(cpuinfo, j);down_read(&pcpu->enable_sem);if (pcpu->governor_enabled == 0) {up_read(&pcpu->enable_sem);continue;}spin_lock_irqsave(&pcpu->target_freq_lock, flags);if (policy->max < pcpu->target_freq)pcpu->target_freq = policy->max;else if (policy->min > pcpu->target_freq)pcpu->target_freq = policy->min;spin_unlock_irqrestore(&pcpu->target_freq_lock, flags);up_read(&pcpu->enable_sem);/* Reschedule timer only if policy->max is raised.* Delete the timers, else the timer callback may* return without re-arm the timer when failed* acquire the semaphore. This race may cause timer* stopped unexpectedly.*/if (policy->max > pcpu->max_freq) {down_write(&pcpu->enable_sem);del_timer_sync(&pcpu->cpu_timer);del_timer_sync(&pcpu->cpu_slack_timer);cpufreq_interactive_timer_start(tunables, j);up_write(&pcpu->enable_sem);}pcpu->max_freq = policy->max;}break;}return 0;
}

该函数主要初始化两个定时器,cpufreq_interactive_timer和cpufreq_interactive_nop_timer。
关键在于cpufreq_interactive_timer定时器的实现。

Linux电源管理(四)CPUFreq相关推荐

  1. linux系统电源时钟,linux电源管理的一些梳理

    由于项目产品需要过能源之星3.0,所以最近做了一些电源管理低功耗方面的工作,抽个时间正好梳理一下. 其实Linux 电源管理非常复杂,牵扯到很多方面,比如系统级的待机.频率电压变换.系统空闲时的处理以 ...

  2. Linux电源管理(一)电源管理系统架构

    概述 Linux 电源管理非常复杂,牵扯到系统级的待机.频率电压变换.系统空闲时的处理以及每个设备驱动对于系统待机的支持和每个设备的运行时电源管理,可以说和系统中的每个设备驱动都息息相关. 对于消费电 ...

  3. Linux电源管理(1)_整体架构 -- wowo

    1. 前言 在这个世界中,任何系统的运转都需要能量.如树木依靠光能生长,如马儿依靠食物奔跑,如计算机系统依靠电能运行.而能量的获取是有成本的,因此如果能在保证系统运转的基础上,尽量节省对能量的消耗,就 ...

  4. Linux 电源管理子系统

    Linux 在消费电子领域的应用已经相当普遍,而对于消费电子产品而言,省电是一个重要的议题. Linux 电源管理非常复杂,牵扯到系统级的待机.频率电压变换.系统空闲时的处理以及每个设备驱动对系统待机 ...

  5. Linux电源管理(10)_autosleep

    Linux电源管理(10)_autosleep 作者:wowo 发布于:2014-9-18 23:42 分类:电源管理子系统 1. 前言 Autosleep也是从Android wakelocks补丁 ...

  6. linux 电源管理 regulator,Linux内核电源管理综述

    资料: http://blog.csdn.net/bingqingsuimeng/article/category/1228414 http://os.chinaunix.net/a2006/0519 ...

  7. Linux电源管理(5)_Hibernate和Sleep功能介绍【转】

    本文转载自:http://www.wowotech.net/pm_subsystem/std_str_func.html 1. 前言 Hibernate和Sleep两个功能是Linux Generic ...

  8. linux 电池管理软件,Linux电源管理(2)_Generic PM之基本概念和软件架构

    Linux电源管理(2)_Generic PM之基本概念和软件架构 作者:wowo 发布于:2014-5-13 19:24 分类:电源管理子系统 1. 前言 这里的Generic PM,是蜗蜗自己起的 ...

  9. Linux电源管理(2)_Generic PM之基本概念和软件架构(蜗窝科技,www.wowotech.net)

    1. 前言 这里的Generic PM,是蜗蜗自己起的名字,指Linux系统中那些常规的电源管理手段,包括关机(Power off).待机(Standby or Hibernate).重启(Reboo ...

  10. 九万字图文讲透彻 Linux 电源管理及实例分析

    九万字图文讲透彻 Linux 电源管理及实例分析. 计算机运行在物理世界中,物理世界中的一切活动都需要消耗能量.能量的形式有很多种,如热能.核能.化学能等.计算机消耗的是电能,其来源是电池或者外电源. ...

最新文章

  1. SAP零售行业解决方案初阶 1
  2. Xamarin XAML语言教程基本页面ContentPage占用面积
  3. 洛谷P2219 [HAOI2007]修筑绿化带(单调队列)
  4. C# winform 上传文件到服务器
  5. 操作系统:分享Win11几个实用小技巧,赶快收藏吧!
  6. c语言else匹配问题
  7. 教室信息管理系统mysql_教师信息管理系统(方式一:数据库为oracle数据库;方式二:存储在文件中)...
  8. Win11系统如何设置任务栏新消息提醒
  9. 如何在MacBook连接鼠标时,停用内置触控式轨迹板?
  10. 自带的jvm监控不准_如何实时监控 Flink 集群和作业?
  11. Atitit mac os 版本 新特性 attilax大总结
  12. BZOJ2038[2009国家集训队] 小Z的袜子(hose)
  13. winform c# chart控件添加边界值线条以及扩展性功能
  14. 硬件工程师成长之路(1)——元件基础
  15. numpy之histogram
  16. ppt 计算机图标不见了,我PPT的图标变成这样了,为什么
  17. 基于java web技术的班级同学录网站-计算机毕业设计
  18. 2020-4-12 深度学习笔记18 - 直面配分函数 5 ( 去噪得分匹配,噪声对比估计NCE--绕开配分函数,估计配分函数)
  19. 最终酬劳高达7.5亿美元,库克是功成身退还是潦草收场?
  20. 面向对象的有限元代码:OOFEM

热门文章

  1. Echarts饼图中间添加文字
  2. npm 安装时报错sha1不对应
  3. 单场淘汰制场次计算方法_体育比赛的方法和编排
  4. 嵌入式FTP服务器的移植与配置(1):VSFTPD-2.0.6移植
  5. Java 性能监控和调优
  6. python字符串的切片方式是[n、m、不包括m_python(6) 字符串操作
  7. 中国隧道掘进机(TBM)行业需求状况与投资价值评估报告2022-2027年
  8. 什么是小瀑布陷阱? 敏捷与小瀑布的区别是什么?
  9. AC-DMIS 5.3 测针注释信息阅读
  10. python随机生成大写字母_python随机生成大小写字母数字混合密码(仅20行代码)