代做STA355H1S作业、代写R编程语言作业、R实验作业代写、代做deadline留学生作业
Assignment #2 STA355H1S
due Friday February 15, 2019
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files
only) – the deadline is 11:59pm on February 15. You are strongly encouraged to do problems
3 through 6 but these are not to be submitted for grading.
1. On Quercus, there is a file containing data on the lengths (in minutes) of 272 eruptions
of the Old Faithful geyser in Yellowstone National Park. Using R and some of the methods
discussed in class, answer the following questions.
(a) Do the data appear to be normal? If not, do they appear to be unimodal?
(b) Use the density function in R to estimate the density. Choose a variety of bandwidths
(the parameter bw) and describe how the estimates change as the bandwidth changes.
(c) One automated approach to selecting the bandwidth parameter h is leave-one-out
cross-validation. This is a fairly general procedure that is useful for selecting tuning
parameters in a variety of statistical problems.
If f and g are density functions, then we can define the Kullback-Leibler divergence
DKL(fkg) = Z ∞f(x) lnf(x)g(x)!dx.
For a given density f, DKL(fkg) is minimized over densities g when g = f (and DKL(fkf) =
0). In the context of bandwith selection, define bfh(x) to be a density estimator with bandwidth
h and f(x) to be the true (but unknown) density that produces the data. Ideally, we
would like to minimize DKL(fkfh) with respect to h but since f is unknown, the best we
can do is to minimize an estimate of DKL(fkfh). Noting that
DKL(fkfh) = ln(fh(x))f(x) dx +Z ∞ln(f(x))f(x) dx= Ef [ln(fh(X))] + constant,
this suggests that we should try to maximize an estimate of Ef [ln(fh(X))], which can be
estimated for a given h by the following (leave-one-out) substitution principle estimator:
is the density estimate with bandwidth h using all the observations except Xi
. (Note that
this has the flavour of maximum likelihood estimation for the bandwidth h. Note that if we
replaced bf
(Xi) by bfh(Xi) in the formula for CV(h) then it would always be maximized
at h = 0 – the “leave-one-out” approach avoids this.)
On Quercus, there is a function kde.cv (in a file kde.cv.txt) that computes CV(h) for
various bandwidth parameters h. Use this function to estimate the density of the Old
Faithful data.
(d) Now assume that the data come from a mixture of two normal distributions so that the
density has the form:
are unknown parameters. Use the density estimates from parts (b)
and (c) and any other appropriate methods to come up with educated guesses of the values
of these parameters. (Don’t worry too much about your final answers – the process is much
more important here.)
2. Suppose that F is a distribution concentrated on the positive real line (i.e. F(x) = 0
for x < 0). If μ(F) = EF (X) then the mean population share of the distribution F is
defined as MPS(F) = F(μ(F)?) = PF (X < μ(F)). (When F is a continuous distribution,
F(μ(F)?) = F(μ(F)).) For most income distributions, MPS(F) > 1/2 with MPS(F) = 0 if
(and only if) all incomes are equal and MPS(F) → 1 as Gini(F) → 1.
(a) Suppose that F is a continuous distribution function with Lorenz curve
LF (t) = 1
(s) ds where μ(F) = Z 1
(s) ds.
Show that MPS(F) satisfies the condition
(MPS(F)) = 1
where L
(t) is the derivative (with respect to t) of the Lorenz curve.
(b) Given observations X1, · · · , Xn from a distribution F, a substitution principle estimate
of MPS(F) is
MPS( d F) = 1
I(Xi < Xˉ)
A sample of 200 incomes is given on Quercus in a file incomes.txt. Using these data
compute an estimate of MPS(F) and use the jackknife to give an estimate of its standard
error.
(c) Suppose that you know (or are willing to assume) that the data come from a log-normal
distribution – that is, ln(X1), · · · , ln(Xn) are independent N (μ,σ2) random variables. Show
that
EF (Xi) = exp(μ) exp(σ2/2)
and so
MPS(F) = PF (Xi < EF (Xi)) = PF (ln(Xi) < ln(EF (Xi)) = 1
(Hint: Evaluate EF (Xi) as E[exp(μ + σY )] where Y ~ N (0, 1).)
(d) Using the data used in part (b), compute an estimate of MPS(F) using the log-normal
assumption and give an estimate of its standard error. How do these compare to the estimates
in part (b)? Does the log-normal assumption seem to be valid for these data? (Hint: An
estimate of σ
2
is simply the sample variance of ln(x1), · · · , ln(xn). For the standard error,
you can use the Delta Method or the jackknife or both!)
Supplemental problems (not to be handed in):
3. Suppose that X1, · · · , Xn are independent random variables with common density f(xθ)
where f is symmetric around 0 (i.e. f(x) = f(x)) and θ is an unknown location parameter.
If Var(Xi) is finite then the sample mean Xˉ will be a reasonable estimator of μ; however, if
f has heavy tails then Xˉ will be less efficient than other estimators of θ, for example, the
sample median.
An useful alternative to the sample mean is the α-trimmed mean, which trims the smallest
α and largest α fractions of the data and averages the middle order statistics. Specifically,
if we define r = bnαc (where bxc is the integer part of x) then the α-trimmed mean, bθ(α),
is defined by
bθ(α) = 1n 2r
(a) Suppose (for simplicity) that bnαc = b(n?1)αc and define bθi(α) to be α-trimmed mean
with X(i) deleted from the sample. Find expressions for bθ1(α), · · · ,
bθn(α); in particular,
note that
bθ1(α) = · · · = bθr(α) and bθ(nr+1)(α) = · · · = bθn(α)
(b) Using the setup in part (a), show that the pseudo-values {Φi} are given by Φi =(n 2r)(n 1 2r)
for i = 1, · · · , r + 1
Φi =2r(n 2r)(n 1 2r)
for i = r + 2, · · · , n r 1
Φi =2rX(nr)2r(n 2r)(n 1 2r)for i = n r, · · · , n
and give a formula for the jackknife estimator of variance of bθ(α). (Think about how you
might use this variance estimator to choose an “optimal” value of r.)
4. Suppose that bθ1 and bθ2 are unbiased estimators of a parameter θ and consider estimators
of the form
eθ = a
bθ1 + (1 a)
bθ2.
(a) Show that eθ is unbiased for any a.
(b) Find the value of a that minimizes Var(eθ) in terms of Var(bθ1), Var(bθ2), and Cov(bθ1,
bθ2).
Under what conditions would a = 1? Can a be greater than 1 or less than 0?
5. A histogram is a very simple example of a density estimator. For a sample X1, · · · , Xn
from a continuous distribution with density f(x), we define breakpoints a0, · · · , ak satisfying
a0 < min(X1, · · · , Xn) < a1 < a2 < · · · < ak1 < max(X1, · · · , Xn) < ak
and define for x ∈ [aj1, aj ):
bf(x) = 1
n(aj aj1)
I(aj1 ≤ Xi < aj )
with bf(x) = 0 for x < a0 and x ≥ ak.
(a) Show that bf is a density function.
(b) For a given value of x, evaluate the mean and variance of bf(x).
(c) What conditions on a0, · · · , ak are needed for the bias and variance of bf(x) to go to 0 as
n → ∞6. Another measure of inequality based on the Lorenz curve is the Pietra index defined by
P(F) = max
0≤t≤1
{t LF (t)}
where LF (t) is the Lorenz curve.
(a) Show that g(t) = t LF (t) is maximized at t satisfying F1
(t) = μ(F).
(b) Using the result of part (a), show that
P(F) = EF [|X μ(F)|]2μ(F).
(You may assume that F has a density f.)
(c) Give a substitution principle estimator for the Pietra index P(F) based on the empirical
distribution function of X1, · · · , Xn. Using the data in Problem 2, compute an estimate of
P(F) and use the jackknife to compute an estimate of its standard error.

http://www.6daixie.com/contents/18/2626.html

代做STA355H1S作业、代写R编程语言作业、R实验作业代写、代做deadline留学生作业
Assignment #2 STA355H1S
due Friday February 15, 2019
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files
only) – the deadline is 11:59pm on February 15. You are strongly encouraged to do problems
3 through 6 but these are not to be submitted for grading.
1. On Quercus, there is a file containing data on the lengths (in minutes) of 272 eruptions
of the Old Faithful geyser in Yellowstone National Park. Using R and some of the methods
discussed in class, answer the following questions.
(a) Do the data appear to be normal? If not, do they appear to be unimodal?
(b) Use the density function in R to estimate the density. Choose a variety of bandwidths
(the parameter bw) and describe how the estimates change as the bandwidth changes.
(c) One automated approach to selecting the bandwidth parameter h is leave-one-out
cross-validation. This is a fairly general procedure that is useful for selecting tuning
parameters in a variety of statistical problems.
If f and g are density functions, then we can define the Kullback-Leibler divergence
DKL(fkg) = Z ∞f(x) lnf(x)g(x)!dx.
For a given density f, DKL(fkg) is minimized over densities g when g = f (and DKL(fkf) =
0). In the context of bandwith selection, define bfh(x) to be a density estimator with bandwidth
h and f(x) to be the true (but unknown) density that produces the data. Ideally, we
would like to minimize DKL(fkfh) with respect to h but since f is unknown, the best we
can do is to minimize an estimate of DKL(fkfh). Noting that
DKL(fkfh) = ln(fh(x))f(x) dx +Z ∞ln(f(x))f(x) dx= Ef [ln(fh(X))] + constant,
this suggests that we should try to maximize an estimate of Ef [ln(fh(X))], which can be
estimated for a given h by the following (leave-one-out) substitution principle estimator:
is the density estimate with bandwidth h using all the observations except Xi
. (Note that
this has the flavour of maximum likelihood estimation for the bandwidth h. Note that if we
replaced bf
(Xi) by bfh(Xi) in the formula for CV(h) then it would always be maximized
at h = 0 – the “leave-one-out” approach avoids this.)
On Quercus, there is a function kde.cv (in a file kde.cv.txt) that computes CV(h) for
various bandwidth parameters h. Use this function to estimate the density of the Old
Faithful data.
(d) Now assume that the data come from a mixture of two normal distributions so that the
density has the form:
are unknown parameters. Use the density estimates from parts (b)
and (c) and any other appropriate methods to come up with educated guesses of the values
of these parameters. (Don’t worry too much about your final answers – the process is much
more important here.)
2. Suppose that F is a distribution concentrated on the positive real line (i.e. F(x) = 0
for x < 0). If μ(F) = EF (X) then the mean population share of the distribution F is
defined as MPS(F) = F(μ(F)?) = PF (X < μ(F)). (When F is a continuous distribution,
F(μ(F)?) = F(μ(F)).) For most income distributions, MPS(F) > 1/2 with MPS(F) = 0 if
(and only if) all incomes are equal and MPS(F) → 1 as Gini(F) → 1.
(a) Suppose that F is a continuous distribution function with Lorenz curve
LF (t) = 1
(s) ds where μ(F) = Z 1
(s) ds.
Show that MPS(F) satisfies the condition
(MPS(F)) = 1
where L
(t) is the derivative (with respect to t) of the Lorenz curve.
(b) Given observations X1, · · · , Xn from a distribution F, a substitution principle estimate
of MPS(F) is
MPS( d F) = 1
I(Xi < Xˉ)
A sample of 200 incomes is given on Quercus in a file incomes.txt. Using these data
compute an estimate of MPS(F) and use the jackknife to give an estimate of its standard
error.
(c) Suppose that you know (or are willing to assume) that the data come from a log-normal
distribution – that is, ln(X1), · · · , ln(Xn) are independent N (μ,σ2) random variables. Show
that
EF (Xi) = exp(μ) exp(σ2/2)
and so
MPS(F) = PF (Xi < EF (Xi)) = PF (ln(Xi) < ln(EF (Xi)) = 1
(Hint: Evaluate EF (Xi) as E[exp(μ + σY )] where Y ~ N (0, 1).)
(d) Using the data used in part (b), compute an estimate of MPS(F) using the log-normal
assumption and give an estimate of its standard error. How do these compare to the estimates
in part (b)? Does the log-normal assumption seem to be valid for these data? (Hint: An
estimate of σ
2
is simply the sample variance of ln(x1), · · · , ln(xn). For the standard error,
you can use the Delta Method or the jackknife or both!)
Supplemental problems (not to be handed in):
3. Suppose that X1, · · · , Xn are independent random variables with common density f(xθ)
where f is symmetric around 0 (i.e. f(x) = f(x)) and θ is an unknown location parameter.
If Var(Xi) is finite then the sample mean Xˉ will be a reasonable estimator of μ; however, if
f has heavy tails then Xˉ will be less efficient than other estimators of θ, for example, the
sample median.
An useful alternative to the sample mean is the α-trimmed mean, which trims the smallest
α and largest α fractions of the data and averages the middle order statistics. Specifically,
if we define r = bnαc (where bxc is the integer part of x) then the α-trimmed mean, bθ(α),
is defined by
bθ(α) = 1n 2r
(a) Suppose (for simplicity) that bnαc = b(n?1)αc and define bθi(α) to be α-trimmed mean
with X(i) deleted from the sample. Find expressions for bθ1(α), · · · ,
bθn(α); in particular,
note that
bθ1(α) = · · · = bθr(α) and bθ(nr+1)(α) = · · · = bθn(α)
(b) Using the setup in part (a), show that the pseudo-values {Φi} are given by Φi =(n 2r)(n 1 2r)
for i = 1, · · · , r + 1
Φi =2r(n 2r)(n 1 2r)
for i = r + 2, · · · , n r 1
Φi =2rX(nr)2r(n 2r)(n 1 2r)for i = n r, · · · , n
and give a formula for the jackknife estimator of variance of bθ(α). (Think about how you
might use this variance estimator to choose an “optimal” value of r.)
4. Suppose that bθ1 and bθ2 are unbiased estimators of a parameter θ and consider estimators
of the form
eθ = a
bθ1 + (1 a)
bθ2.
(a) Show that eθ is unbiased for any a.
(b) Find the value of a that minimizes Var(eθ) in terms of Var(bθ1), Var(bθ2), and Cov(bθ1,
bθ2).
Under what conditions would a = 1? Can a be greater than 1 or less than 0?
5. A histogram is a very simple example of a density estimator. For a sample X1, · · · , Xn
from a continuous distribution with density f(x), we define breakpoints a0, · · · , ak satisfying
a0 < min(X1, · · · , Xn) < a1 < a2 < · · · < ak1 < max(X1, · · · , Xn) < ak
and define for x ∈ [aj1, aj ):
bf(x) = 1
n(aj aj1)
I(aj1 ≤ Xi < aj )
with bf(x) = 0 for x < a0 and x ≥ ak.
(a) Show that bf is a density function.
(b) For a given value of x, evaluate the mean and variance of bf(x).
(c) What conditions on a0, · · · , ak are needed for the bias and variance of bf(x) to go to 0 as
n → ∞6. Another measure of inequality based on the Lorenz curve is the Pietra index defined by
P(F) = max
0≤t≤1
{t LF (t)}
where LF (t) is the Lorenz curve.
(a) Show that g(t) = t LF (t) is maximized at t satisfying F1
(t) = μ(F).
(b) Using the result of part (a), show that
P(F) = EF [|X μ(F)|]2μ(F).
(You may assume that F has a density f.)
(c) Give a substitution principle estimator for the Pietra index P(F) based on the empirical
distribution function of X1, · · · , Xn. Using the data in Problem 2, compute an estimate of
P(F) and use the jackknife to compute an estimate of its standard error.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

微信:codinghelp

转载于:https://www.cnblogs.com/codeinghelper/p/10392285.html

Assignment #2 STA355H1S相关推荐

  1. MATH6005 Final Assignment MATH6005 2018-19

    MATH6005 2018-19 MATH6005 Final Assignment 1. Instructions Your Assignment 3 should consist of three ...

  2. React | Expected an assignment or function call and instead saw.....

      记一个 react 程序报的错误:Expected an assignment or function call and instead saw an expression,直译是:需要是一个函数 ...

  3. python list越界原因(list assignment index out of range)

    分析: list assignment index out of range:列表超过限制 情况1:list[index]index超出范围 情况2:list是一个空的,没有一个元素,进行list[0 ...

  4. SAP HUM 如何把HU号码与Outbound Delivery 解除Assignment?

    SAP HUM 如何把HU号码与Outbound Delivery 解除Assignment? 比如如下交货单, 完成了WM层面的拣配,分配了2个HU号码.完成了过账后来取消了PGI,并且使用LT09 ...

  5. 不允许 ASSIGNMENT 语句中包含 FOR XML 子句。

    DECLARE @guidList NVARCHAR(max) SELECT   @guidList=( CAST(OrderGUID AS nvarchar(max)) +',')  FROM Or ...

  6. python UnboundLocalError: local variable 'log_f' referenced before assignment 错误

    在写一个python程序,用finally处理异常的时候,报了"UnboundLocalError: local variable 'log_f' referenced before ass ...

  7. CS224n Assignment 2

    为什么80%的码农都做不了架构师?>>>    本文由码农场同步,最新版本请查看原文:http://www.hankcs.com/nlp/cs224n-assignment-2.ht ...

  8. assignment makes pointer from integer without a...

    2019独角兽企业重金招聘Python工程师标准>>> warning: assignment makes pointer from integer without a cast 今 ...

  9. 2015 Multi-University Training Contest 1 - 1002 Assignment

    Assignment Problem's Link:  http://acm.hdu.edu.cn/showproblem.php?pid=5289 Mean: 给你一个数列和一个k,求连续区间的极值 ...

最新文章

  1. Java公开课-02.抽象类和接口
  2. Eclipse编译运行Native代码步骤详解
  3. spring18-4: spring aop
  4. 卷积神经网络训练的三个概念(Epoch,Batch,Iteration)
  5. java 区间树_线段树(区间树)之区间染色和4n推导过程
  6. 【数据结构 JavaScript版】- web前端开发精品课程【红点工场】 --javascript-- 链表实现...
  7. Loadrunner中对中文进行UTF-8转码的探索
  8. 大学生慕课第二周学习笔记
  9. 学习ES6路线了解图
  10. oracle case grouping,ORACLE GROUPING函数的使用
  11. 解决PowerDesigner 错误:Invalid repository user or password!
  12. Eclipse的下载与安装以及JDK环境的配置
  13. niceScroll
  14. linux设置ipsan_Linux下搭建iSCSI共享存储详细步骤(服务器模拟IPSAN存储)
  15. qq邮箱收件服务器用户名密码,iphone6/6s+设置QQ邮箱时显示用户名或密码错误的解决方法介绍...
  16. 数模电路基础知识 —— 2. 常见电路符号说明 (导线、电源、接地、变压器与保险丝)
  17. Echars 双击Legend 显示自己隐藏其他Legend
  18. 单片机温度传感器c语言编码,单片机中使用DS18B20温度传感器C语言程序.doc
  19. 网站内嵌编辑器ace
  20. 跟李沐学深度学习-softmax回归

热门文章

  1. adxl345取出值怎么算角度_改了别人的程序和一些自己的研究,用ADXL345测量角度成功...
  2. Drupal菜鸟笔记之使用Paragraphs建站
  3. CodeKK源码地址
  4. 八数码问题中的逆序数
  5. 商业银行资产管理理论之:商业贷款理论、转移理论和预期收入理论
  6. 化危为机,“戴”你走近“小企业”数字化生存
  7. 【计算机毕设】项目数据库设计
  8. 踏山河各种版本,求各位小伙伴点个关注
  9. WebRTC实践获取视频流
  10. Ratione dicta accusantium iste iste natus.