目录
基础篇
第 1章绪论 ...................................................................................................3
1.1机器学习
............................................................................................. 3
1.
1.1什么是机器学习 ........................................................................ 3
1.
1.2机器学习的基本任务.................................................................. 5
1.1.3
K-近邻:一种“懒惰”学习方法 ................................................ 9
1.2概率机器学习
.................................................................................... 11
1.
2.1为什么需要概率机器学习 ......................................................... 11
1.
2.2概率机器学习包含的内容 ......................................................... 13
1.3延伸阅读
........................................................................................... 16
1.4习题
.................................................................................................. 17
第 2章概率统计基础 ....................................................................................19
2.1概率
.................................................................................................. 19
2.
1.1事件空间与概率 ...................................................................... 19
2.
1.2连续型和离散型随机变量 ......................................................... 21
2.1.3变量变换
................................................................................ 22
2.
1.4联合分布、边缘分布和条件分布 ............................................... 22
2.
1.5独立与条件独立 ...................................................................... 24
2.
1.6贝叶斯公式 ............................................................................. 24
2.
2常见概率分布及其数字特征 ................................................................ 25
2.
2.1随机变量的常用数字特征 ......................................................... 25
2.
2.2离散型变量的概率分布 ............................................................ 26
2.
2.3连续型变量的概率分布 ............................................................ 27
2.3统计推断
........................................................................................... 28
2.
3.1最大似然估计.......................................................................... 29
2.3.2误差
....................................................................................... 29
2.4贝叶斯推断
........................................................................................ 30
2.4.1基本流程
................................................................................ 30
2.
4.2常见应用和方法 ...................................................................... 32
2.
4.3在线贝叶斯推断 ...................................................................... 33
2.4.4共轭先验
................................................................................ 33
2.5信息论基础........................................................................................ 35
2.5.1 熵 .......................................................................................... 35
2.5.2 互信息.................................................................................... 36
2.5.3 相对熵.................................................................................... 36
2.6习题.................................................................................................. 37
第 3章线性回归模型 ....................................................................................39
3.1基本模型 ........................................................................................... 39
3.1.1 统计决策基本模型 ................................................................... 39
3.1.2 线性回归及最小二乘法 ............................................................ 40
3.1.3 概率模型及最大似然估计 ......................................................... 42
3.1.4 带基函数的线性回归................................................................ 43
3.2正则化线性回归 ................................................................................. 44
3.2.1 岭回归.................................................................................... 45 Lasso...................................................................................... 47
3.2.2
3.2.3 Lp范数正则化的线性回归........................................................ 49
3.3贝叶斯线性回归 ................................................................................. 50
3.3.1 最大后验分布估计 ................................................................... 51
3.3.2 贝叶斯预测分布 ...................................................................... 51
3.3.3 贝叶斯模型选择 ...................................................................... 54
3.3.4 经验贝叶斯和相关向量机 ......................................................... 56
3.4模型评估 ........................................................................................... 57
3.4.1 评价指标 ................................................................................ 57
3.4.2 交叉验证 ................................................................................ 58
3.5延伸阅读 ........................................................................................... 59
3.6习题.................................................................................................. 60
第 4章朴素贝叶斯分类器 .............................................................................61
4.1基本分类模型 .................................................................................... 61
4.1.1 贝叶斯分类器.......................................................................... 62
4.1.2 核密度估计 ............................................................................. 63
4.1.3 维数灾.................................................................................... 65
4.2朴素贝叶斯模型 ................................................................................. 66
4.2.1 生成式模型 ............................................................................. 66
4.2.2 朴素贝叶斯假设 ...................................................................... 67
4.2.3 最大似然估计.......................................................................... 68
4.2.4 最大后验估计.......................................................................... 69
4.3朴素贝叶斯的扩展.............................................................................. 70
4.3.1 多值特征 ................................................................................ 70
目录
4.3.2多类别分类 ............................................................................. 71
4.3.3连续型特征 ............................................................................. 72
4.3.4半监督朴素贝叶斯分类器 ......................................................... 73
4.3.5树增广朴素贝叶斯分类器 ......................................................... 73
4.4朴素贝叶斯的分析.............................................................................. 74
4.4.1分类边界 ................................................................................ 74
4.4.2预测概率 ................................................................................ 75
4.5延伸阅读 ........................................................................................... 76
4.6习题.................................................................................................. 77
第 5章对数几率回归和广义线性模型.............................................................78
5.1对数几率回归 .................................................................................... 78
5.1.1模型定义 ................................................................................ 78
5.1.2对数几率回归的隐变量表示...................................................... 79
5.1.3最大条件似然估计 ................................................................... 80
5.1.4正则化方法 ............................................................................. 82
5.1.5判别式模型与生成式模型对比 .................................................. 83
5.2随机梯度下降 .................................................................................... 84
5.2.1基本方法 ................................................................................ 85
5.2.2动量法.................................................................................... 86
5.2.3 AdaGrad方法......................................................................... 86
5.2.4 RMSProp法 ........................................................................... 86
5.2.5 Adam法................................................................................. 87
5.3贝叶斯对数几率回归 .......................................................................... 87
5.3.1拉普拉斯近似.......................................................................... 87
5.3.2预测分布 ................................................................................ 89
5.4广义线性模型 .................................................................................... 89
5.4.1指数族分布 ............................................................................. 89
5.4.2指数族分布的性质 ................................................................... 91
5.4.3广义线性模型.......................................................................... 92
5.5延伸阅读 ........................................................................................... 93
5.6习题.................................................................................................. 94
第 6章深度神经网络 ....................................................................................95
6.1神经网络的基本原理 .......................................................................... 95
6.1.1非线性学习的基本框架 ............................................................ 95
6.1.2感知机.................................................................................... 95
6.1.3多层感知机 ............................................................................. 97
6.1.4反向传播 ................................................................................ 98
6.2卷积神经网络 ...................................................................................101
6.2.1基本组成 ...............................................................................101
6.2.2批归一化 ...............................................................................104
6.2.3残差网络 ...............................................................................105
6.3循环神经网络 ...................................................................................106
6.3.1基本原理 ...............................................................................107
6.3.2长短时记忆网络 .....................................................................110
6.4延伸阅读 ..........................................................................................112
6.5习题.................................................................................................113
第 7章支持向量机与核方法 ........................................................................ 114
7.1硬间隔支持向量机.............................................................................114
7.1.1分类边界 ...............................................................................114
7.1.2线性可分的支持向量机 ...........................................................114
7.1.3硬间隔支持向量机的对偶问题 .................................................116
7.2软间隔支持向量机.............................................................................118
7.2.1软约束与损失函数 ..................................................................118
7.2.2软间隔 SVM的对偶问题 ........................................................120
7.2.3支持向量回归.........................................................................122
7.3核方法 .............................................................................................123
7.3.1核函数的基本性质 ..................................................................123
7.3.2表示定理 ...............................................................................125
7.3.3常见的核函数.........................................................................126
7.3.4概率生成模型诱导的核函数.....................................................127
7.3.5神经切线核 ............................................................................128
7.4多分类支持向量机.............................................................................129
7.4.1一对多...................................................................................129
7.4.2一对一...................................................................................130
7.4.3联合优化 ...............................................................................130
7.5支持向量机的概率解释 ......................................................................131
7.5.1 Platt校准..............................................................................131
7.5.2最大熵判别学习 .....................................................................131
7.6延伸阅读 ..........................................................................................132
7.7习题.................................................................................................133
第 8章聚类 ............................................................................................... 134
8.1聚类问题 ..........................................................................................134
8.1.1任务描述 ...............................................................................134
8.1.2距离度量 ...............................................................................135
目录
8.2 K-均值算法......................................................................................137
8.2.1优化目标 ...............................................................................137
8.2.2 K-均值算法介绍.....................................................................138
8.2.3迭代初值和停止条件...............................................................139
8.2.4 K-均值算法中的模型选择 .......................................................140
8.3混合高斯模型 ...................................................................................141
8.3.1隐变量模型 ............................................................................142
8.3.2混合分布模型.........................................................................142
8.3.3混合分布模型与聚类...............................................................144
8.4 EM算法 ..........................................................................................145
8.4.1高斯混合模型的 EM算法 .......................................................147
8.4.2 EM算法收敛性......................................................................148
8.4.3 EM算法与 K-均值的联系 ......................................................149
8.5评价指标 ..........................................................................................149
8.5.1外部评价指标.........................................................................149
8.5.2内部评价指标.........................................................................150
8.6延伸阅读 ..........................................................................................151
8.7习题.................................................................................................151
第 9章降维 ............................................................................................... 153
9.1降维问题 ..........................................................................................153
9.2主成分分析.......................................................................................154
9.2.1基本原理 ...............................................................................154
9.2.2高维 PCA..............................................................................156
9.3主成分分析的原理.............................................................................156
9.3.1最大化方差 ............................................................................157
9.3.2最小化重建误差 .....................................................................158
9.3.3概率主成分分析 .....................................................................159
9.4自编码器 ..........................................................................................160
9.4.1自编码器的基本模型...............................................................160
9.4.2稀疏自编码器.........................................................................161
9.4.3去噪自编码器.........................................................................162
9.5局部线性嵌入 ...................................................................................162
9.5.1局部线性嵌入的基本过程 ........................................................162
9.5.2最优局部线性重构 ..................................................................164
9.5.3保持局部最优重构的嵌入表示 .................................................165
9.5.4参数选择 ...............................................................................166
9.6词向量嵌入.......................................................................................167
9.6.1隐含语义分析.........................................................................167
9.6.2神经语言模型.........................................................................168
9.7延伸阅读 ..........................................................................................170
9.8习题.................................................................................................171
第 10章集成学习....................................................................................... 173
10.1决策树............................................................................................173
10.1.1 ID3算法 ............................................................................174
10.1.2 C4.5算法 ...........................................................................175
10.1.3 CART算法 ........................................................................175
10.2装包法............................................................................................176
10.2.1基本方法 ............................................................................176
10.2.2随机森林 ............................................................................177
10.3提升法............................................................................................178
10.3.1 AdaBoost算法....................................................................178
10.3.2从优化角度看 AdaBoost ......................................................179
10.3.3梯度提升 ............................................................................182
10.3.4梯度提升决策树 ..................................................................183
10.3.5 XGBoost算法 ....................................................................184
10.4概率集成学习..................................................................................185
10.4.1混合线性模型......................................................................185
10.4.2层次化混合专家模型............................................................186
10.5深度模型的集成 ..............................................................................188
10.5.1 Dropout:一种模型集成的策略 ............................................188
10.5.2深度集成 ............................................................................189
10.6延伸阅读 ........................................................................................190
10.7习题 ...............................................................................................190
第 11章学习理论....................................................................................... 192
11.1基本概念 ........................................................................................192
11.1.1偏差-复杂度分解 .................................................................193
11.1.2结构风险最小化 ..................................................................195
11.1.3 PAC理论 ...........................................................................196
11.1.4基本不等式 .........................................................................197
11.2有限假设空间..................................................................................198
11.2.1 Hoeffding不等式.................................................................198
11.2.2并集上界 ............................................................................199
11.3无限假设空间..................................................................................201
11.3.1 VC维 ................................................................................201
11.3.2 Rademacher复杂度.............................................................203
目录
11.3.3间隔理论 ............................................................................204
11.3.4 PAC贝叶斯........................................................................205
11.4深度学习理论..................................................................................206
11.4.1双重下降 ............................................................................207
11.4.2良性过拟合 .........................................................................208
11.4.3隐式正则化 ........................................................................209
11.5延伸阅读 ........................................................................................209
11.6习题 ...............................................................................................210
高级篇
第 12章概率图模型 ................................................................................... 215
12.1概述 ...............................................................................................215
12.2概率图模型的表示 ...........................................................................217
12.2.1贝叶斯网络 .........................................................................217
12.2.2马尔可夫随机场 ..................................................................221
12.2.3有向图与无向图的关系 ........................................................224
12.3概率图模型的推断 ...........................................................................226
12.3.1变量消减 ............................................................................226
12.3.2消息传递 ............................................................................229
12.3.3因子图................................................................................230
12.3.4最大概率取值......................................................................231
12.3.5连接树................................................................................231
12.4参数学习 ........................................................................................232
12.4.1贝叶斯网络的参数学习 ........................................................232
12.4.2马尔可夫随机场的参数学习..................................................233
12.4.3条件随机场 .........................................................................235
12.5结构学习 ........................................................................................236
12.5.1树状贝叶斯网络 ..................................................................236
12.5.2高斯马尔可夫随机场............................................................238
12.6延伸阅读 ........................................................................................238
12.7习题 ...............................................................................................239
第 13章变分推断....................................................................................... 241
13.1基本原理 ........................................................................................241
13.1.1变分的基本原理 ..................................................................241
13.1.2推断任务 ............................................................................242
13.2变分推断 ........................................................................................244
13.2.1对数似然的变分下界............................................................244
13.2.2平均场方法 .........................................................................245
13.2.3信念传播 ............................................................................248
13.3变分 EM ........................................................................................250
13.3.1从 EM到变分 EM ..............................................................250
13.3.2指数分布族的变分 EM算法.................................................251
13.3.3概率潜在语义分析 ...............................................................252
13.3.4随机 EM算法.....................................................................253
13.4变分贝叶斯 .....................................................................................254
13.4.1贝叶斯定理的变分表示 ........................................................255
13.4.2贝叶斯高斯混合模型............................................................255
13.5期望传播 ........................................................................................258
13.5.1基础 EP算法......................................................................258
13.5.2图模型的 EP算法 ...............................................................260
13.6延伸阅读 ........................................................................................261
13.7习题 ...............................................................................................261
第 14章蒙特卡洛方法 ................................................................................ 263
14.1概述 ...............................................................................................263
14.2基础采样算法..................................................................................264
14.2.1基于重参数化的采样............................................................264
14.2.2拒绝采样 ............................................................................266
14.2.3重要性采样 .........................................................................267
14.2.4重要性重采样......................................................................268
14.2.5原始采样 ............................................................................269
14.3马尔可夫链蒙特卡洛........................................................................269
14.3.1马尔可夫链 .........................................................................269
14.3.2 Metropolis Hastings采样.....................................................271
14.3.3 Gibbs采样 .........................................................................273
14.3.4 Gibbs采样的变种 ...............................................................274
14.4辅助变量采样..................................................................................274
14.4.1切片采样 ............................................................................275
14.4.2辅助变量采样......................................................................276
14.5基于动力学系统的 MCMC采样 .......................................................277
14.5.1动力学系统 .........................................................................277
14.5.2哈密尔顿方程的离散化 ........................................................278
14.5.3哈密尔顿蒙特卡洛 ...............................................................279
14.5.4随机梯度 MCMC采样.........................................................280
14.6延伸阅读 ........................................................................................281
14.7习题 ...............................................................................................282
目录
第 15章高斯过程....................................................................................... 284
15.1贝叶斯神经网络 ..............................................................................284
15.1.1贝叶斯线性回归 ..................................................................284
15.1.2贝叶斯神经网络 ..................................................................285
15.1.3无限宽贝叶斯神经网络 ........................................................286
15.2高斯过程回归..................................................................................287
15.2.1定义 ...................................................................................287
15.2.2无噪声情况下的预测............................................................288
15.2.3有噪声的预测......................................................................289
15.2.4残差建模 ............................................................................290
15.2.5协方差函数 .........................................................................291
15.3高斯过程分类..................................................................................293
15.3.1基本模型 ............................................................................293
15.3.2拉普拉斯近似推断 ...............................................................293
15.3.3期望传播近似推断 ...............................................................295
15.3.4与支持向量机的关系............................................................296
15.4稀疏高斯过程..................................................................................297
15.4.1基于诱导点的稀疏近似 ........................................................297
15.4.2稀疏变分高斯过程 ...............................................................299
15.5延伸阅读 ........................................................................................300
15.6习题 ...............................................................................................301
第 16章深度生成模型 ................................................................................ 302
16.1基本框架 ........................................................................................302
16.1.1生成模型基本概念 ...............................................................302
16.1.2基于层次化贝叶斯的建模 .....................................................303
16.1.3基于深度神经网络的建模 .....................................................304
16.2流模型............................................................................................305
16.2.1仿射耦合流模型 ..................................................................306
16.2.2残差流模型 .........................................................................308
16.2.3去量化................................................................................309
16.3自回归生成模型 ..............................................................................310
16.3.1神经自回归密度估计器 ........................................................310
16.3.2连续型神经自回归密度估计器 ..............................................312
16.4变分自编码器..................................................................................313
16.4.1模型定义 ............................................................................313
16.4.2基于重参数化的参数估计 .....................................................314
16.5生成对抗网络..................................................................................315
16.5.1基本模型 ............................................................................315
16.5.2沃瑟斯坦生成对抗网络 ........................................................318
16.6扩散概率模型..................................................................................319
16.6.1模型定义 ............................................................................319
16.6.2模型训练 ............................................................................320
16.6.3共享参数 ............................................................................321
16.7延伸阅读 ........................................................................................322
16.8习题 ...............................................................................................323
第 17章强化学习....................................................................................... 324
17.1决策任务 ........................................................................................324
17.2多臂老虎机 .....................................................................................325
17.2.1伯努利多臂老虎机 ...............................................................325
17.2.2上置信度区间算法 ...............................................................327
17.2.3汤普森采样算法 ..................................................................328
17.2.4上下文多臂老虎机 ...............................................................328
17.3马尔可夫决策过程 ...........................................................................330
17.3.1基本定义 ............................................................................330
17.3.2贝尔曼方程 .........................................................................332
17.3.3最优化值函数与最优策略 .....................................................333
17.3.4策略评估 ............................................................................334
17.3.5策略迭代算法......................................................................335
17.3.6值函数迭代算法 ..................................................................335
17.4强化学习 ........................................................................................336
17.4.1蒙特卡洛采样法 ..................................................................336
17.4.2时序差分学习......................................................................337
17.4.3 Sarsa算法 ..........................................................................338
17.4.4 Q-学习 ...............................................................................339
17.4.5值函数近似 .........................................................................340
17.4.6策略搜索 ............................................................................342
17.5延伸阅读 ........................................................................................343
17.6习题 ...............................................................................................344
参考文献 ....................................................................................................... 346