Category: What type of the paper is? A measurement paper? An analysis of an existing system? A description of a research prototype?
Context: Which other papers is it related to? Which theoretical bases were used to analyse the problem?
Correctness: Do the assumptions appear to be valid?
Contributions: What are the paper’s main contributions?
Clarity: Is the paper well written?

2 (~ 1 hour)

Read the paper with greater care, but ignore details such as proofs. It helps to jot down the key points, or to make comments in the margins, as you read.

Look carefully at the figures, diagrams and other illustrations in the paper. Pay special attention to graphs. Are the axes properly labelled? Are results shown with error bars, so that conclusions are statistically significant? Common mistakes like these will separate rushed, shoddy work from the truly excellent.
Remember to mark relevant unread references for further reading (this is a good way to learn more about the background of the paper).

be able to summarize the main thrust of the paper, with supporting evidence, to someone else. This level of detail is appropriate for a paper in which you are interested, but does not lie in your research specialty

3

The key to the third pass is to attempt to virtually re-implement the paper: that is, making the same assumptions as the authors, re-create the work.

2021-05-23

机器学习

朴素贝叶斯

贝叶斯后验概率

$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)}
{P(x_1, \dots, x_n)}$

朴素贝叶斯

基于条件独立的假设可以简化为

$P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)}
{P(x_1, \dots, x_n)}$

分母为常数，则

$\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align}$

不同的贝叶斯算法的区别主要是在$P(x_i|y)$的计算上

常见贝叶斯算法

高斯朴素贝叶斯

$P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)$

多项式multinomial【文本处理使用较多】

每个类别的每个特征概率由右侧向量刻画：$\theta_y = (\theta_{y1},\ldots,\theta_{yn})$

$\hat{\theta}_{yi} = \frac{ N_{yi} + \alpha}{N_y + \alpha n}$,

其中

$N_{yi} = \sum_{x \in T} x_i$ ：为$x_i$出现在yi（类别y的第i个特征位置）的样本中的数量

$ N_{y} = \sum_{i=1}^{n} N_{yi}$ ：y类所有特征的总量

$\alpha$为平滑系数

两个基本假设

特征条件独立
特征等价

Reference

ref
docs-ppt

2021-05-23

机器学习

梯度提升算法

XGBoost

模型参数和目标函数（训练损失和正则项）

$\text{obj} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\Omega(f_i)$

增量训练：

$\begin{split}\hat{y}_i^{(0)} &= 0\\ \hat{y}_i^{(1)} &= f_1(x_i) = \hat{y}_i^{(0)} + f_1(x_i)\\ \hat{y}_i^{(2)} &= f_1(x_i) + f_2(x_i)= \hat{y}_i^{(1)} + f_2(x_i)\\ &\dots\\ \hat{y}_i^{(t)} &= \sum_{k=1}^t f_k(x_i)= \hat{y}_i^{(t-1)} + f_t(x_i)\end{split}$

泰勒展开

$\text{obj}^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + \mathrm{constant}$

其中：

$\begin{split}g_i &= \partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\\ h_i &= \partial_{\hat{y}_i^{(t-1)}}^2 l(y_i, \hat{y}_i^{(t-1)})\end{split}$

移除常量之后，step t的目标函数变为

$\sum_{i=1}^n [g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t)$

这就是第t颗树的目标函数，我们可以看出只依赖于一阶和二阶梯度，因此就可以支持自定义损失函数。

此外还需要加上正则项。

进一步推导（加上正则项），可以由第t棵树的目标函数。

$\begin{split}\text{obj}^{(t)} &\approx \sum_{i=1}^n [g_i w_{q(x_i)} + \frac{1}{2} h_i w_{q(x_i)}^2] + \gamma T + \frac{1}{2}\lambda \sum_{j=1}^T w_j^2\\ &= \sum^T_{j=1} [(\sum_{i\in I_j} g_i) w_j + \frac{1}{2} (\sum_{i\in I_j} h_i + \lambda) w_j^2 ] + \gamma T\end{split}$

其中$I_j = \{i|q(x_i)=j\}$ 是被分到 $j^{th}$ 叶子节点的样本索引集合。可以进一步重写 [$T$ 代表着第$t^{th}$颗树的叶子节点数量]

$\text{obj}^{(t)} = \sum^T_{j=1} [G_jw_j + \frac{1}{2} (H_j+\lambda) w_j^2] +\gamma T$

可以有答案

$\begin{split}w_j^\ast &= -\frac{G_j}{H_j+\lambda}\\ \text{obj}^\ast &= -\frac{1}{2} \sum_{j=1}^T \frac{G_j^2}{H_j+\lambda} + \gamma T\end{split}$

Reference

xgboost website-intro

野生芦苇

价值、合作、方法

Data Augmentation

基本方法

方法分类

不平衡类别

如何阅读论文

3-pass Approach

1 (~5 min)

2 (~ 1 hour)

3

朴素贝叶斯

贝叶斯后验概率

朴素贝叶斯

常见贝叶斯算法

梯度提升算法

XGBoost