以下分享Shihao Gu等的文献的一个章节,因为读到这里很感动,所以特地记下来。
A number of aspects of empirical asset pricing make it a particularly attractive field for analysis with
machine learning methods.

  1. Two main research agendas have monopolized modern empirical asset pricing research. The
    first seeks to describe and understand differences in expected returns across assets. The second
    focuses on dynamics of the aggregate market equity risk premium. Measurement of an asset’s risk
    premium is fundamentally a problem of prediction—the risk premium is the conditional expectation
    of a future realized excess return. Machine learning, whose methods are largely specialized for
    prediction tasks, is thus ideally suited to the problem of risk premium measurement.
  2. The collection of candidate conditioning variables for the risk premium is large. The profession
    has accumulated a staggering list of predictors that various researchers have argued possess forecast-
    ing power for returns. The number of stock-level predictive characteristics reported in the literature
    numbers in the hundreds and macroeconomic predictors of the aggregate market number in the
    dozens.2Additionally, predictors are often close cousins and highly correlated. Traditional predic-
    tion methods break down when the predictor count approaches the observation count or predictors
    are highly correlated. With an emphasis on variable selection and dimension reduction techniques,
    machine learning is well suited for such challenging prediction problems by reducing degrees of free-
    dom and condensing redundant variation among predictors.
  3. Further complicating the problem is ambiguity regarding functional forms through which the
    high-dimensional predictor set enter into risk premia. Should they enter linearly? If nonlinearities
    are needed, which form should they take? Must we consider interactions among predictors? Such
    questions rapidly proliferate the set of potential model specifications. The theoretical literature offers
    little guidance for winnowing the list of conditioning variables and functional forms. Three aspects
    of machine learning make it well suited for problems of ambiguous functional form. The first is its
    diversity. As a suite of dissimilar methods it casts a wide net in its specification search. Second, with
    methods ranging from generalized linear models to regression trees and neural networks, machine
    learning is explicitly designed to approximate complex nonlinear associations. Third, parameter
    penalization and conservative model selection criteria complement the breadth of functional forms
    spanned by these methods in order to avoid overfit biases and false discovery.



