Optimal kernel choice for large-scale two-sample tests

资源分类

2020-01-13 |

133 |

126 |

Abstract

Given samples from distributions 图片.png a two-sample test determines whether to reject the null hypothesis that based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

上一篇：Strategic Impatience in Go/NoGo versus Forced-Choice Decision-Making

下一篇：Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Regularizing RNNs...

Recently, caption generation with an encoder-de...
Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Supervised Descen...

Many computer vision problems (e.
Learning Expressi...

Facial expression is temporally dynamic event w...
Attributed Graph ...

Graph clustering is a fundamental task which di...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com