简单记录一下关于仿真优化的一些知识点和思考。主要基于：Handbook of Simulation Optimization, Michael Fu

Table of Contents

Overview

Discrete Optimization

Three fundamental type of errors:

Optimality Conditions

Different scenarios depending on the solution space size:

Ranking and Selection

Ordinal Optimization (OO)

Globally Convergent Adaptive Random Search

Locally Convergent Adaptive Random Search

Commercial Solvers

Overview

这是本书的overview 实际上也可以看做是这一field的overview.

SimuOpt : optimize, when the obj function $f$ cannot be computed directly, but can be simulated, with noise (focus on stochastic simulation environment).

一种分类方式：Discrete vs Continuous

Discrete Optimization
- Solution space is small -> Ranking & Selection (based on statistics or simulation budget allocation)
- Solution space is large be finite -> Ordinal Optimization (no need to estimate accurately every candidate, only need to know their order. Much faster convergence (exponential))
- Solution space is countably infinite -> Random Search (globally or locally convergent)
Continuous Opt
- RSM (Response Surface Methodology). Also has constraint considerations and robust variants
- Stochastic Approximation (RM, KW, simutaneous perturbation stochastic approximation for high-dim pbs)
- SAA (Sample Average Approximation) with consideration on stochastic constraints.
- Random Search, focus on estimation and on the search procedure. Model-based RS is newer class, assuming probability matrix is known.

Since stochasticity is the keyword, some base knowledge is important for DO as well as for CO.

Statistics
- How to estimate a solution
- How to know soluiton x is better than y
- How to know to what extent we are covering the optimal solution in the search
- How many replications de we need...
- Hypothesis testing
Stochastic constraints
Variance reduction
...

Discrete Optimization

Three fundamental type of errors:

The optimial solution is never simulated (about search)
The opt that was simulated is not selected (about estimation)
The one selected is not well estimated (about estimation)

Optimality Conditions

are needed to 1) ensure the correctness of the algo; 2) define the stopping criteria
for constrain free non-linear optimization, we stop at a settle point
for integer optimization, we check the gap between LB and UB
here for SBO, it's difficult because:
- the cost of solution g(x) can only be estimated
- no structural info can be used to prune solution zone
- complete enumeration of the solution space is often computationally intractable

Different scenarios depending on the solution space size:

Small. Less than hundreds of candidate. The key is then how to well estimate all solutions and return the best. Practically we analyze the Probability of Selection Correctness. (PSC). Algo stops til $P(x^* \in \Theta^*) \geq 1-\alpha$ where x* is the selected best solution.
Large.
- Impossible to simulation all candidates. The idea is then to find a "good enough" solution, which means that x* is among the t-best solutions, with a certain probability. $P(|T\cap X|\geq 1) \geq 1-\alpha$ . This is used in ordinal optimization
- Or, choose methods with globally convergence ( $\lim_{m \rightarrow \infty}P (x^* \in \Theta^*) \rightarrow 1$ ) or locally convergence ( $\lim_{m \rightarrow \infty}P (x^* \in \mathcal L) \rightarrow 1$ ) guarantee. $\mathcal{L}$ is the set of all local optimums depending on the definition of neighborhood structure. Local optimum can be tested statistically by controling the type1 and type2 error. Because a neighborhood is often not large.
- Hypothesis testing: if the hypothesis is right, what's the probabilty of our observation? This is a proof by contradiction, emphasizing the rejection instead of the acceptance.
- （Meta）heuristics often found in commertial solvers. These algorithms work well for difficult deterministic integer programs, and they are somewhat tolerant of sampling variabilities. However, they typically do not satisfy any optimality conditions for DOvS problems and may be misled by sampling variabilities.

Ranking and Selection

Two formulations are concerned:

indifferent zone formulation (IZF)
bayesian formulation (BF)

IZF (Frequentist)

Assume $g(x_1)+\delta=g(x_2)=g(x_3)...$ which is the most difficult case. The objective is to find x1 which is at least $\delta$ -better than all the others.

Bachhofer's procedure: assume estimation variance $\sigma_1^2=\sigma_2^2=...$ , Bachhofer decides the number of replications to estimate each solution. Then it suffices to chooses the best one based on the sample mean.
Paulson's procedure: filter progressively. At each iteration: take one observation of each solution, calculate the sample mean, and filter out some bad solutions. This is more efficient than Bachhofer's since a large number of solutions may be filtered out at early stages.
Gupta's procedure (subset selection): similar to 1 and 2. Returning a set of solutions $I$ and guarantee that $P(x_1 \in I) \geq 1-\alpha$

Based on the principle of the above 3 procedures, further procedures include:

NSGS: a two-stage procedure. Compute a initial sample mean, then according to the variance of estimations, decide the amount of extra replications to make. Finally select the best.
KN: contrast to NSGS, this is not a two-stage procedure but a iterative one, adding replications progressively.

BF (Bayesian) (Does not provide PSC guarantees)

Used when prior information is available.

Helps to choose the next solution to explore, based on prior information and previous sample results, and also the simulation budgets. This involves a MDP problem and can be possibly solved by ADP/RL.

Generic Bayes Procedure: basicly a RL procedure: Simulation (state)->Choose the next solution (action)->loop
Since it's hard to find the optimal Actor, some heuristics are proposed:
1. OCBA (Optimal Computing Budget Allocation)
2. EVI (Expected Value of Information)
3. KG (Knowledge Gradient)

Conclusion

Brankle et al. found that no R&S procedure is dominent in all situations. (thousands of pb structures tested). BF is often more efficient in terms of nb of samples, but it doesn't provide correct-selection optimality guarantee like frequentist does.

Ordinal Optimization (OO)

When the solution space is large, OO proposes "sofe optimization", which selects a subset S from $\Theta$ and limit the analysis to S. We are interested in the probability that $P(|T\cap X|\geq l) \geq 1-\alpha$ , where T is the set of top t solutions in the whole space. $l$ is called the alignment level and the probability is alignment probability.

Two basic idea behind OO:

Estimating the order between solutions is much easier than estimating obj values
Acception good enough solutions leads to exponential reduction in computational burden

OO is more an analysis than new algorithm, the procedure will be:

First determine the AP (alignment probability)
Then that will determine the cardinality of subset $S\subset\Theta$
Then just run R&S and you got the guarantee that the solution is among the top t.

In practical i don't think this is so interesting, since it just tells you that, the larger $|S|$ is, the better.

Globally Convergent Adaptive Random Search

Designed for large but finite solution space. Guarantee $\lim_{m \rightarrow \infty}P (x^* \in \Theta^*) \rightarrow 1$

Generic GCARS:

init
Sampling: $\mathcal{F}_m(\cdot|\mathcal{M}_m)$
Estimation: $\mathcal{SAR}_m(\mathcal{E}_m|\mathcal{M}_m)$
Iteration: update V(x) for x in $\mathcal{E}_m$

Several algoritms are described:

Stochastic Ruler Algo: accept a solution by uniformly choosing a ruler u~U(lb,ub)
Stochastic Branch and Bound: each time choose a partition of $\Theta$ with the minimum LB, then partition it finer and finer
Nested Partition: an enhancement of SBB with less information to memorize
R-BEESE(Balanced Explorative and Exploitative Search with estimation). On each iteration:
1. with a probability q, refine the current x* with more replications
2. else with a probability p sample from Global(theta)
3. else sample from Local(theta)

Locally Convergent Adaptive Random Search

Similar to GCARC, but with a statistical procedure to test the local optimality of $x_m^*$ .

COMPASS (Convergen Optimization via Most-Promising-Area Stochastic Search)

init, sample a neighborhood of solutions and retain the best
move to the next neighborhood by choosing $\Theta_m=\{x\in \Thea: ||x-x^*||\leq ||y-x^*||, y\in \mathcal{V}_m, y\neq x^*\}$ . In otherword, always focus on the closest neighbors of x*. Here a LP can be solved to find the neighborhood. Called constraint pruning.

AHA (Adaptive Hyperbox Algo)

Like COMPASS, but define the neighborhood as the hyperbox around x* : $\mathcal{H}_m=\{x:l_m^{(k)}\leq x_m^{(k)}\leq u_m^{(k)}, 1\leq k\leq d\}$ where d is the dimension of x.

Commercial Solvers

Most simulation modeling softwares includes SBO tool, but most of them are based on R&S or meta-heuristics like SA. Meta-heuristics have been observed to be effective on difficult deterministic optimization problems but they usually provide no performance guarantees. Some advises are:

Do preliminary tests to control sampling variability
Re-run several times the solver (multi-start with different random seed)
Estimating the final solutions set carefully to be sure to select the best.

Conclusion

Most of the above mentioned algorithms are black-box algorithms that do not depend on problem structures. This can be considered in defining the neighborhood in LCRS, for instance.

[NOTE in progress] Simulation Optimization相关推荐

Note of Numerical Optimization Ch.3
目录 Numerical Optimization Ch.3 Line Search Methods Step Length Convergence of Line Search Methods Ra ...
人工智能/数据科学比赛汇总 2019.6
内容来自 DataSciComp,人工智能/数据科学比赛整理平台. Github:iphysresearch/DataSciComp 本项目由 ApacheCN 强力支持. 微博 | 知乎 | CSD ...
Python编程语言学习：sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略
Python编程语言学习:sklearn.manifold的TSNE函数的简介.使用方法.代码实现之详细攻略目录 Manifold简介 TSNE简介-数据降维且可视化 TSNE使用方法 TSNE代码 ...
常见EI会议出版商IOP出版计划供参考
本文整理了IOP出版商截至2020年3月10号的会议出版计划,供大家参考.IOP近年来比较火爆,主要原因是其会议系列提交EI compendex数据审核收录.本文罗列了这些会议列表,希望对大家有帮助. ...
INFORMS 及 EJOR 系列主编汇总
INFORMS 及 EJOR 系列主编汇总 INFORMS Informs是The Institute for Operations Research and the Management Scien ...
cs231n的第一次作业svm
SVM算dw svm实现公式中,线性函数为损失函数为为了更快得到期望W,需要找寻梯度dw.图像数据Xi = [ D*1 ],W[k*D]或者W[k*D+1],k表示有k个类别.W的每一个行向量便是 ...
C231n-SVM-assignment1-完全代码及注释
题目介绍: Multiclass Support Vector Machine exercise Complete and hand in this completed worksheet (incl ...
【CS231n】Two Layer Neural Network 代码实现
1. 代码实现 github:https://github.com/GIGpanda/CS231n 主要包括连个.py文件,一个是two_layer_net.py,另外一个是neural_net.py ...
CS231n (winter 2016) : Assignment1
转自简书:Deepool 前言: 以斯坦福cs231n课程的python编程任务为主线,展开对该课程主要内容的理解和部分数学推导.该课程的学习资料和代码如下: 视频和PPT 笔记 assig ...

[NOTE in progress] Simulation Optimization

Overview

Discrete Optimization

Three fundamental type of errors:

Optimality Conditions

Different scenarios depending on the solution space size:

Ranking and Selection

Ordinal Optimization (OO)

Globally Convergent Adaptive Random Search

Locally Convergent Adaptive Random Search

Commercial Solvers

Conclusion

[NOTE in progress] Simulation Optimization相关推荐

最新文章

热门文章