目 录
第一篇 Linux 入门
实训1 文件的创建、访问、修改、删除 ................................................... 2
1.1 实训目的 ······················································································ 2
1.2 实训要求 ······················································································ 2
1.3 实训原理 ······················································································ 2
1.4 实训步骤 ······················································································ 3
1.5 实训结果 ······················································································ 6
实训2 文件的创建、查看、内容修改 ....................................................... 8
2.1 实训目的 ······················································································ 8
2.2 实训要求 ······················································································ 8
2.3 实训原理 ······················································································ 8
2.4 实训步骤 ······················································································ 9
2.5 实训结果 ······················································································ 9
实训3 文本编辑常用技巧:复制、粘贴、删除 ....................................... 12
3.1 实训目的 ···················································································· 12
3.2 实训要求 ···················································································· 12
3.3 实训原理 ···················································································· 12
3.4 实训步骤 ···················································································· 15
3.5 实训结果 ···················································································· 17
第二篇 数据清洗
实训4 从文本文件中抽取数据到数据库 ................................................. 22
4.1 实训目的 ···················································································· 22
4.2 实训要求 ···················································································· 22
4.3 实训原理 ···················································································· 22
4.3.1 Kettle 简介 ··············································································· 22
4.3.2 从文本文件中抽取数据到数据库的方法 ··········································· 23
4.4 实训步骤 ···················································································· 23
4.4.1 安装 ······················································································· 23
4.4.2 从文本文件中抽取数据到数据库的步骤 ··········································· 26
4.5 实训结果 ···················································································· 29
实训5 从CSV 文件中抽取数据到数据库 ............................................... 31
5.1 实训目的 ···················································································· 31
5.2 实训要求 ···················································································· 31
5.3 实训原理 ···················································································· 31
5.4 实训步骤 ···················································································· 32
5.5 实训结果 ···················································································· 33
实训6 将Excel 文件数据导入数据库 ..................................................... 35
6.1 实训目的 ···················································································· 35
6.2 实训要求 ···················································································· 35
6.3 实训原理 ···················································································· 35
6.4 实训步骤 ···················································································· 35
6.5 实训结果 ···················································································· 39
实训7 将MySQL 数据迁移至MongoDB ............................................... 40
7.1 实训目的 ···················································································· 40
7.2 实训要求 ···················································································· 40
7.3 实训原理 ···················································································· 40
7.4 实训步骤 ···················································································· 41
7.5 实训结果 ···················································································· 44
实训8 数据库增量数据抽取 ................................................................... 45
8.1 实训目的 ···················································································· 45
8.2 实训要求 ···················································································· 45
8.3 实训原理 ···················································································· 45
8.4 实训步骤 ···················································································· 46
8.5 实训结果 ···················································································· 53
实训9 数据增删改的增量更新 ................................................................ 54
9.1 实训目的 ···················································································· 54
9.2 实训要求 ···················································································· 54
9.3 实训原理 ···················································································· 54
9.4 实训步骤 ···················································································· 55
9.5 实训结果 ···················································································· 60
实训10 数据脱敏 ................................................................................... 62
10.1 实训目的 ··················································································· 62
10.2 实训要求 ··················································································· 62
10.3 实训原理 ··················································································· 62
10.4 实训步骤 ··················································································· 63
10.5 实训结果 ··················································································· 67
实训11 数据检验 ................................................................................... 69
11.1 实训目的 ··················································································· 69
11.2 实训要求 ··················································································· 69
11.3 实训原理 ··················································································· 69
11.4 实训步骤 ··················································································· 69
11.4.1 设置检验规则 ·········································································· 69
11.4.2 非空验证 ················································································ 71
11.4.3 日期类型验证 ·········································································· 71
实训12 缺失值清洗 ................................................................................ 75
12.1 实训目的 ··················································································· 75
12.2 实训要求 ··················································································· 75
12.3 实训原理 ··················································································· 75
12.4 实训步骤 ··················································································· 75
12.4.1 运行SQL 脚本进行清洗 ····························································· 76
12.4.2 运用控件进行清洗 ···································································· 77
实训13 格式内容清洗 ............................................................................ 80
13.1 实训目的 ··················································································· 80
13.2 实训要求 ··················································································· 80
13.3 实训原理 ··················································································· 80
13.4 实训步骤 ··················································································· 80
13.4.1 对“格式错误类型1”进行清洗 ··················································· 80
13.4.2 对“格式错误类型2”进行清洗 ··················································· 84
实训14 逻辑错误清洗 ............................................................................ 88
14.1 实训目的 ··················································································· 88
14.2 实训要求 ··················································································· 88
14.3 实训原理 ··················································································· 88
14.4 实训步骤 ··················································································· 89
14.4.1 对“逻辑错误类型1”进行清洗 ··················································· 89
14.4.2 对“逻辑错误类型2”进行清洗 ··················································· 92
第三篇 数据可视化
实训15 饼图、柱状图、折线图、平行坐标图绘制 ................................. 98
15.1 实训目的 ··················································································· 98
15.2 实训要求 ··················································································· 98
15.3 实训原理 ··················································································· 98
15.4 实训步骤 ················································································· 100
15.4.1 导入数据与模块 ······································································ 100
15.4.2 数据提取 ··············································································· 101
15.4.3 图形绘制 ··············································································· 101
实训16 共享单车数据可视化分析 ........................................................ 109
16.1 实训目的 ················································································· 109
16.2 实训要求 ················································································· 109
16.3 实训步骤 ·················································································· 110
16.3.1 数据准备 ··············································································· 110
16.3.2 数据清洗 ··············································································· 111
16.3.3 数据处理 ··············································································· 111
16.3.4 数据挖掘 ··············································································· 112
16.3.5 可视化分析 ············································································ 114
实训17 小说云图绘制 .......................................................................... 120
17.1 实训目的 ················································································· 120
17.2 实训要求 ················································································· 120
17.3 实训原理 ················································································· 120
17.3.1 jieba 分词 ·············································································· 120
17.3.2 wordcloud 词云 ······································································· 120
17.4 实训步骤 ················································································· 121
17.4.1 导入模块 ··············································································· 121
17.4.2 读取文件,设置路径 ································································ 121
17.4.3 文本分词 ··············································································· 122
17.4.4 绘制词云 ··············································································· 123
实训18 篮球命中率可视化 ................................................................... 125
18.1 实训目的 ················································································· 125
18.2 实训要求 ················································································· 125
18.3 实训原理 ················································································· 125
18.4 实训步骤 ················································································· 126
18.4.1 导入模块和数据文件 ································································ 126
18.4.2 处理数据 ··············································································· 127
18.4.3 可视化分析 ············································································ 128
第四篇 环境大数据实战
实训19 二氧化碳含量预测 ................................................................... 136
19.1 实训目的 ················································································· 136
19.2 实训要求 ················································································· 136
19.3 实训原理 ················································································· 137
19.4 实训步骤 ················································································· 137
19.4.1 导入包并加载数据 ··································································· 137
19.4.2 初始数据可视化 ······································································ 138
19.4.3 ARIMA 时间序列模型 ······························································ 139
19.4.4 ARIMA 时间序列模型的参数选择 ··············································· 139
19.4.5 配置ARIMA 时间序列模型 ······················································· 140
19.4.6 验证预测 ··············································································· 142
19.4.7 生成和可视化预测 ··································································· 145
实训20 新加坡空气污染原因分析 ........................................................ 146
20.1 实训目的 ················································································· 146
20.2 实训要求 ················································································· 146
20.3 实训原理 ················································································· 146
20.4 实训步骤 ················································································· 147
20.4.1 数据准备 ··············································································· 147
20.4.2 验证假设1:制造业的增加将导致新加坡的空气污染增加 ················· 148
XII 大数据导论技术实训
20.4.3 验证假设2:建筑房屋数量的增加将导致新加坡的空气污染增加 ········ 151
20.4.4 验证假设3:车辆数量的增加将导致新加坡的空气污染增加 ·············· 157
实训21 上海历史天气统计 ................................................................... 160
21.1 实训目的 ················································································· 160
21.2 实训要求 ················································································· 160
21.3 实训原理 ················································································· 160
21.4 实训步骤 ················································································· 161
21.4.1 编写Mapper 程序 ···································································· 161
21.4.2 编写Reducer 程序 ··································································· 162
21.4.3 统计上海2016 年每月历史天气 ·················································· 162
实训22 上海每月空气质量统计 ............................................................ 164
22.1 实训目的 ················································································· 164
22.2 实训要求 ················································································· 164
22.3 实训原理 ················································································· 164
22.4 实训步骤 ················································································· 165
22.4.1 编写Mapper 程序 ···································································· 165
22.4.2 编写Reducer 程序 ··································································· 165
22.4.3 统计上海2016 年每月空气质量 ·················································· 166
实训23 北京和上海月均气温对比统计 ................................................. 168
23.1 实训目的 ················································································· 168
23.2 实训要求 ················································································· 168
23.3 实训原理 ················································································· 168
23.4 实训步骤 ················································································· 168
23.4.1 编写Mapper 程序 ···································································· 168
23.4.2 编写Reducer 程序 ··································································· 169
23.4.3 统计北京和上海2016 年月平均气温对比 ······································· 170
第五篇 金融大数据实战
实训24 最优投资组合(上) ............................................................... 172
24.1 实训目的 ················································································· 172
24.2 实训要求 ················································································· 172
24.3 实训原理 ················································································· 172
24.4 实训步骤 ················································································· 173
24.4.1 导入实训需要的模块 ································································ 173
24.4.2 读取数据 ··············································································· 173
24.4.3 观察缺失值 ············································································ 173
24.4.4 数据可视化 ············································································ 174
24.4.5 初步统计分析 ········································································· 175
24.4.6 投资组合优化 ········································································· 175
24.4.7 计算组合均值收益率 ································································ 176
24.5 实训结果 ················································································· 177
实训25 最优投资组合(下) ............................................................... 179
25.1 实训目的 ················································································· 179
25.2 实训要求 ················································································· 179
25.3 实训原理 ················································································· 179
25.4 实训步骤 ················································································· 180
25.4.1 最大夏普比率投资组合 ····························································· 180
25.4.2 最小方差投资组合 ··································································· 181
25.4.3 画散点图 ··············································································· 182
25.5 实训结果 ················································································· 182
实训26 股票走势预测 .......................................................................... 184
26.1 实训目的 ················································································· 184
26.2 实训要求 ················································································· 184
26.3 实训原理 ················································································· 184
26.4 实训步骤 ················································································· 185
26.4.1 导入模块 ··············································································· 185
26.4.2 ARIMA 模型建立 ···································································· 185
26.4.3 数据差分 ··············································································· 186
26.4.4 自相关图和偏自相关图 ····························································· 187
26.4.5 模型训练 ··············································································· 188
26.5 实训结果 ················································································· 188
第六篇 商业大数据实战
实训27 电商产品评论数据情感分析 ..................................................... 192
27.1 实训目的 ················································································· 192
27.2 实训要求 ················································································· 192
XIV 大数据导论技术实训
27.3 实训原理 ················································································· 192
27.4 实训步骤 ················································································· 193
27.4.1 评论数据抽取 ········································································· 193
27.4.2 评论文本去重 ········································································· 193
27.4.3 模型准备 ··············································································· 194
27.4.4 删除前缀评分 ········································································· 194
27.4.5 文本分词 ··············································································· 195
27.4.6 模型构建 ··············································································· 196
27.5 实训结果 ················································································· 197
实训28 eBay 汽车销售数据分析 .......................................................... 198
28.1 实训目的 ················································································· 198
28.2 实训要求 ················································································· 198
28.3 实训原理 ················································································· 199
28.3.1 数据标准化 ············································································ 199
28.3.2 数据可视化 ············································································ 199
28.4 实训步骤 ················································································· 199
28.4.1 数据加载和描述 ······································································ 199
28.4.2 数据剖析 ··············································································· 200
28.4.3 预处理 ·················································································· 202
28.4.4 可视化分析 ············································································ 204
28.5 实训结果 ················································································· 219
实训29 航空公司客户价值分析 ............................................................ 220
29.1 实训目的 ················································································· 220
29.2 实训要求 ················································································· 220
29.3 实训原理 ················································································· 220
29.4 实训步骤 ················································································· 220
29.4.1 数据准备 ··············································································· 220
29.4.2 数据处理 ··············································································· 221
29.4.3 数据预处理 ············································································ 222
29.4.4 构建模型 ··············································································· 225
29.5 实训结果 ················································································· 226
实训30 市场购物篮分析 ...................................................................... 227
30.1 实训目的 ················································································· 227
30.2 实训要求 ················································································· 227
30.3 实训原理 ················································································· 227
30.3.1 MLxtend ················································································ 227
30.3.2 关联规则 ··············································································· 227
30.3.3 Apriori 算法挖掘频繁项集 ························································· 228
30.4 实训步骤 ················································································· 228
30.4.1 用Pandas 和MLxtend 代码导入并读取数据 ··································· 228
30.4.2 数据处理 ··············································································· 228
30.4.3 One-Hot 编码 ·········································································· 229
30.4.4 使用算法包进行关联规则运算 ···················································· 230
30.4.5 结果检视 ··············································································· 231
30.4.6 德国流行的组合 ······································································ 231
附录A 大数据和人工智能实验环境 ...................................................... 233
A.1 大数据实验环境 ········································································· 233
A.2 人工智能实验环境 ······································································ 236