The fate of scientific hypotheses often relies on the ability of a computational model lớn explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation & mã sản phẩm evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable lớn compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to lớn approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data mix efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS & an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational & cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sầu sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, & easy lớn implement method for log-likelihood evaluation when exact techniques are not available.

Bạn đang xem: Log likelihood là gì

## Author summary

Researchers often validate scientific hypotheses by comparing data with the predictions of a mathematical or computational Mã Sản Phẩm. This comparison can be quantified by the ‘log-likelihood’, a number that captures how well the Model explains the data. However, for complex models common in neuroscience and computational biology, obtaining exact formulas for the log-likelihood can be difficult. Instead, the log-likelihood is usually approximated by simulating synthetic observations from the mã sản phẩm (‘sampling’), và seeing how often the simulated data match the actual observations. To reach correct scientific conclusions, it is crucial that the log-likelihood estimates produced by such sampling methods are accurate (unbiased). Here, we introduce inverse binomial sampling (IBS), a method which differs from traditional approaches in that the number of samples drawn from the model is not fixed, but adaptively adjusted in a simple way. For each data point, IBS samples from the mã sản phẩm until it matches the observation. We show that IBS is unbiased và has other desirable statistical properties, both theoretically và via empirical validation on three case studies from computational and cognitive neuroscience. Across all examples, IBS outperforms fixed sampling methods, demonstrating the utility of IBS as a practical, robust, và easy khổng lồ implement method for log-likelihood evaluation.

**Citation: **van Opheusden B, Acerbi L, Ma WJ (2020) Unbiased và efficient log-likelihood estimation with inverse binomial sampling. diyxaqaw.com Comput Biol 16(12): e1008483. https://doi.org/10.1371/journal.pcbi.1008483

**Editor: **Daniele Marinazzo, Ghent University, BELGIUM

**Received: **March 9, 2020; **Accepted: **October 30, 2020; **Published: ** December 23, 2020

**Data Availability: **Code và data lớn generate the figures in the paper is available at https://github.com/basvanopheusden/ibs-development. The IBS toolbox for efficient & unbiased log-likelihood estimation is available at https://github.com/lacerbi/ibs.

**Funding: **This work was supported by NSF grant IIS-1344256 và NIH grant R01MH118925 lớn W.J.M. The funders had no role in study thiết kế, data collection và analysis, decision lớn publish, or preparation of the manuscript.

**Competing interests: ** The authors have sầu declared that no competing interests exist.

This is a *diyxaqaw.com Computational Biology* Methods paper.

## Introduction

The *likelihood function* is one of the most important mathematical objects for modern statistical inference. Briefly, the likelihood function measures how well a mã sản phẩm with a given set of parameters can explain an observed data phối. For a data set of discrete observations, the likelihood has the intuitive sầu interpretation of the probability that a random sample generated from the Mã Sản Phẩm matches the data, for a given setting of the mã sản phẩm parameters.

In many scientific disciplines, such as computational neuroscience and cognitive science, computational models are used to lớn give sầu a precise quantitative sầu size khổng lồ scientific hypotheses and theories. Statistical inference then plays at least two fundamental roles for scientific discovery. First, our goal may be *parameter estimation* for a Mã Sản Phẩm of interest. Parameter values may have sầu a significance in themselves, for example we may be looking for differences in parameters between distinct experimental conditions in a clinical or behavioral study. Second, we may be considering a number of competing scientific hypotheses, instantiated by different models, and we want to lớn evaluate which Model ‘best’ captures the data according lớn some criteria, such as *explanation* (what evidence the data provide in favor of each model?) and *prediction* (which model best predicts new observations?).

Xem thêm: Tải Adobe Shockwave Player Là Gì, Cách Cài Đặt Shockwave Player

Crucially, the likelihood function is a key element for both parameter estimation and Model evaluation. A principled method to lớn find best-fitting mã sản phẩm parameters for a given data mix is maximum-likelihood estimation (MLE), which entails optimizing the likelihood function over the parameter space <1>. Other common parameter estimation methods, such as maximum-a-posteriori (MAP) estimation or full or approximate Bayesian inference of posterior distributions, still involve sầu the likelihood function <2>. Moreover, almost all Mã Sản Phẩm comparison metrics commonly used for scientific model evaluation are based on likelihood computations, from predictive sầu metrics such as Akaike’s information criterion (AIC; <3>), the deviance information criterion (DIC; <4>), the widely applicable information criterion (WAIC; <5>), leave-one-out cross-validation <6>; lớn evidence-based metrics such as the marginal likelihood <7> & (loose) approximations thereof, such as the Bayesian information criterion (BIC; <8>) or the Laplace approximation <7>.

However, many complex computational models, such as those developed in computational biology <9–11>, neuroscience <12, 13> & cognitive sầu science <14>, take the form of a generative sầu Model or *simulator*, that is an algorithm which given some context information and parameter settings returns one or more simulated observations (a synthetic data set). In those cases, the likelihood is often impossible lớn calculate analytically, và even when the likelihood might be available in theory, the numerical calculations needed to obtain it might be overwhelmingly expensive sầu and intractable in practice. In such situations, the only thing one can bởi is lớn run the Mã Sản Phẩm lớn simulate observations (‘samples’). In the absence of a likelihood function, common approaches to lớn ‘likelihood-free inference’ generally try và match summary statistics of the data with summary statistics of simulated observations <15, 16>.

In this paper, we ask instead the question of whether we can use samples from a simulator mã sản phẩm lớn *directly* estimate the likelihood of the full data set, without recurring khổng lồ summary statistics, in a ‘correct’ và ‘efficient’ manner, for some specific definition of these terms. The answer is *yes*, as long as we use the right *sampling method*.

In brief, a sampling method consists of a ‘sampling policy’ (a rule that determines how long to keep drawing samples for) and an ‘estimator’ which converts the samples to lớn a real-valued number. To estimate the likelihood of a single observation (e.g., the response of a participant on a single trial of a behavioral experiment), the most obvious sampling policy is khổng lồ draw a fixed amount of samples from the simulator model, & the simplest estimator is the fraction of samples that match the observation (or is ‘cthua thảm enough’ to it, for continuous observations). However, most basic applications, such as computing the likelihood of multiple observations, require one to lớn estimate the logarithm of the likelihood, or log-likelihood (see Methods for the underlying technical reasons). The ‘fixed sampling’ method described above sầu cannot provide unbiased estimates for the log-likelihood (see Methods). Such bias vanishes in the asymptotic limit of infinite samples, but drawing samples from the Mã Sản Phẩm can be computationally expensive, especially if the simulator mã sản phẩm is complex. In practice, the bias introduced by any fixed sampling method can translate to considerable biases in estimates of mã sản phẩm parameters, or even reverse the outcome of mã sản phẩm comparison analyses. In other words, using poor sampling methods can cause researchers lớn draw conclusions about scientific hypotheses which are not supported by their data.

In this work, we introduce *inverse binomial sampling* (IBS) as a powerful và simple technique for correctly & efficiently estimating log-likelihoods of simulator-based models. Crucially, IBS is a sampling method that provides uniformly unbiased estimates of the log-likelihood <17, 18> & calibrated estimates of their variance, which is also uniformly bounded.

We note that the problem of estimating functions *f*(*p*) from observations of a Bernoulli distribution with parameter *p* has been studied for mostly theoretical reasons in the mid-20th century, with major contributions represented by <17–21>. These works have largely focused on deriving the set of functions *f*(*p*) for which an unbiased estimate exists, và demonstrating that for those functions, the inverse sampling policy (see Methods) is in a precise sense ‘efficient’. Our main contribution here is to demonstrate that inverse binomial sampling provides a practically và theoretically efficient solution for a comtháng problem in computational modeling; namely likelihood-free inference of complex models. To baông xã up our claims, we provide theoretical arguments for the efficiency of IBS & a practical demonstration of its value for log-likelihood estimation & fitting of simulation-based models, in particular those used in computational cognitive sầu science. We note that <22> had previously proposed inverse binomial sampling as a method for likelihood-miễn phí inference for certain econometric models, but did not present an empirical assessment of the unique of the estimation & khổng lồ our knowledge has not led lớn further adoption of IBS.

The paper is structured as follows. In the Methods, after setting the stage with useful “Definitions and notation”, we describe more in detail the issues with the fixed sampling method & why they cannot be fixed (“Why fixed sampling fails”). We then present a series of arguments for why IBS solves these issues, và in particular why being unbiased here is of particular relevance (“Is inverse binomial sampling really better?”). In the Results, we present an empirical comparison of IBS and fixed sampling in the setting of maximum-likelihood estimation. As case studies, we take three model-fitting problems of increasing complexity from computational cognitive sầu science: an ‘orientation discrimination’ task, a ‘change localization’ task, & a complex sequential decision making task. In all problems, IBS generally produces lower error in the estimated parameters than fixed sampling with the same average number of samples. IBS also returns solutions that are very cchiến bại in value to lớn the true maximum log-likelihood. We conclude by discussing further applications và extensions of IBS in the Discussion. Our theoretical analyses & empirical results demonstrate the potential of IBS as a practical, robust, and easy-to-implement method for log-likelihood evaluation when exact or numerical solutions are unavailable.

Xem thêm: Phỏng Vấn Thuỷ Tiên Từng Mệt Mỏi Vì Làm Vợ Đan Trường, Phỏng Vấn Thuỷ Tiên

Implementations of IBS with tutorials and examples are available at the following link: https://github.com/lacerbi/ibs.