图书目录

目 录

第一篇 Linux 入门

实训1 文件的创建、访问、修改、删除 ................................................... 2

1.1 实训目的 ······················································································ 2

1.2 实训要求 ······················································································ 2

1.3 实训原理 ······················································································ 2

1.4 实训步骤 ······················································································ 3

1.5 实训结果 ······················································································ 6

实训2 文件的创建、查看、内容修改 ....................................................... 8

2.1 实训目的 ······················································································ 8

2.2 实训要求 ······················································································ 8

2.3 实训原理 ······················································································ 8

2.4 实训步骤 ······················································································ 9

2.5 实训结果 ······················································································ 9

实训3 文本编辑常用技巧:复制、粘贴、删除 ....................................... 12

3.1 实训目的 ···················································································· 12

3.2 实训要求 ···················································································· 12

3.3 实训原理 ···················································································· 12

3.4 实训步骤 ···················································································· 15

3.5 实训结果 ···················································································· 17

第二篇 数据清洗

实训4 从文本文件中抽取数据到数据库 ................................................. 22

4.1 实训目的 ···················································································· 22

4.2 实训要求 ···················································································· 22

4.3 实训原理 ···················································································· 22

4.3.1 Kettle 简介 ··············································································· 22

4.3.2 从文本文件中抽取数据到数据库的方法 ··········································· 23

4.4 实训步骤 ···················································································· 23

4.4.1 安装 ······················································································· 23

4.4.2 从文本文件中抽取数据到数据库的步骤 ··········································· 26

4.5 实训结果 ···················································································· 29

实训5 从CSV 文件中抽取数据到数据库 ............................................... 31

5.1 实训目的 ···················································································· 31

5.2 实训要求 ···················································································· 31

5.3 实训原理 ···················································································· 31

5.4 实训步骤 ···················································································· 32

5.5 实训结果 ···················································································· 33

实训6 将Excel 文件数据导入数据库 ..................................................... 35

6.1 实训目的 ···················································································· 35

6.2 实训要求 ···················································································· 35

6.3 实训原理 ···················································································· 35

6.4 实训步骤 ···················································································· 35

6.5 实训结果 ···················································································· 39

实训7 将MySQL 数据迁移至MongoDB ............................................... 40

7.1 实训目的 ···················································································· 40

7.2 实训要求 ···················································································· 40

7.3 实训原理 ···················································································· 40

7.4 实训步骤 ···················································································· 41

7.5 实训结果 ···················································································· 44

实训8 数据库增量数据抽取 ................................................................... 45

8.1 实训目的 ···················································································· 45

8.2 实训要求 ···················································································· 45

8.3 实训原理 ···················································································· 45

8.4 实训步骤 ···················································································· 46

8.5 实训结果 ···················································································· 53

实训9 数据增删改的增量更新 ................................................................ 54

9.1 实训目的 ···················································································· 54

9.2 实训要求 ···················································································· 54

9.3 实训原理 ···················································································· 54

9.4 实训步骤 ···················································································· 55

9.5 实训结果 ···················································································· 60

实训10 数据脱敏 ................................................................................... 62

10.1 实训目的 ··················································································· 62

10.2 实训要求 ··················································································· 62

10.3 实训原理 ··················································································· 62

10.4 实训步骤 ··················································································· 63

10.5 实训结果 ··················································································· 67

实训11 数据检验 ................................................................................... 69

11.1 实训目的 ··················································································· 69

11.2 实训要求 ··················································································· 69

11.3 实训原理 ··················································································· 69

11.4 实训步骤 ··················································································· 69

11.4.1 设置检验规则 ·········································································· 69

11.4.2 非空验证 ················································································ 71

11.4.3 日期类型验证 ·········································································· 71

实训12 缺失值清洗 ................................................................................ 75

12.1 实训目的 ··················································································· 75

12.2 实训要求 ··················································································· 75

12.3 实训原理 ··················································································· 75

12.4 实训步骤 ··················································································· 75

12.4.1 运行SQL 脚本进行清洗 ····························································· 76

12.4.2 运用控件进行清洗 ···································································· 77

实训13 格式内容清洗 ............................................................................ 80

13.1 实训目的 ··················································································· 80

13.2 实训要求 ··················································································· 80

13.3 实训原理 ··················································································· 80

13.4 实训步骤 ··················································································· 80

13.4.1 对“格式错误类型1”进行清洗 ··················································· 80

13.4.2 对“格式错误类型2”进行清洗 ··················································· 84

实训14 逻辑错误清洗 ............................................................................ 88

14.1 实训目的 ··················································································· 88

14.2 实训要求 ··················································································· 88

14.3 实训原理 ··················································································· 88

14.4 实训步骤 ··················································································· 89

14.4.1 对“逻辑错误类型1”进行清洗 ··················································· 89

14.4.2 对“逻辑错误类型2”进行清洗 ··················································· 92

第三篇 数据可视化

实训15 饼图、柱状图、折线图、平行坐标图绘制 ................................. 98

15.1 实训目的 ··················································································· 98

15.2 实训要求 ··················································································· 98

15.3 实训原理 ··················································································· 98

15.4 实训步骤 ················································································· 100

15.4.1 导入数据与模块 ······································································ 100

15.4.2 数据提取 ··············································································· 101

15.4.3 图形绘制 ··············································································· 101

实训16 共享单车数据可视化分析 ........................................................ 109

16.1 实训目的 ················································································· 109

16.2 实训要求 ················································································· 109

16.3 实训步骤 ·················································································· 110

16.3.1 数据准备 ··············································································· 110

16.3.2 数据清洗 ··············································································· 111

16.3.3 数据处理 ··············································································· 111

16.3.4 数据挖掘 ··············································································· 112

16.3.5 可视化分析 ············································································ 114

实训17 小说云图绘制 .......................................................................... 120

17.1 实训目的 ················································································· 120

17.2 实训要求 ················································································· 120

17.3 实训原理 ················································································· 120

17.3.1 jieba 分词 ·············································································· 120

17.3.2 wordcloud 词云 ······································································· 120

17.4 实训步骤 ················································································· 121

17.4.1 导入模块 ··············································································· 121

17.4.2 读取文件,设置路径 ································································ 121

17.4.3 文本分词 ··············································································· 122

17.4.4 绘制词云 ··············································································· 123

实训18 篮球命中率可视化 ................................................................... 125

18.1 实训目的 ················································································· 125

18.2 实训要求 ················································································· 125

18.3 实训原理 ················································································· 125

18.4 实训步骤 ················································································· 126

18.4.1 导入模块和数据文件 ································································ 126

18.4.2 处理数据 ··············································································· 127

18.4.3 可视化分析 ············································································ 128

第四篇 环境大数据实战

实训19 二氧化碳含量预测 ................................................................... 136

19.1 实训目的 ················································································· 136

19.2 实训要求 ················································································· 136

19.3 实训原理 ················································································· 137

19.4 实训步骤 ················································································· 137

19.4.1 导入包并加载数据 ··································································· 137

19.4.2 初始数据可视化 ······································································ 138

19.4.3 ARIMA 时间序列模型 ······························································ 139

19.4.4 ARIMA 时间序列模型的参数选择 ··············································· 139

19.4.5 配置ARIMA 时间序列模型 ······················································· 140

19.4.6 验证预测 ··············································································· 142

19.4.7 生成和可视化预测 ··································································· 145

实训20 新加坡空气污染原因分析 ........................................................ 146

20.1 实训目的 ················································································· 146

20.2 实训要求 ················································································· 146

20.3 实训原理 ················································································· 146

20.4 实训步骤 ················································································· 147

20.4.1 数据准备 ··············································································· 147

20.4.2 验证假设1:制造业的增加将导致新加坡的空气污染增加 ················· 148

XII 大数据导论技术实训

20.4.3 验证假设2:建筑房屋数量的增加将导致新加坡的空气污染增加 ········ 151

20.4.4 验证假设3:车辆数量的增加将导致新加坡的空气污染增加 ·············· 157

实训21 上海历史天气统计 ................................................................... 160

21.1 实训目的 ················································································· 160

21.2 实训要求 ················································································· 160

21.3 实训原理 ················································································· 160

21.4 实训步骤 ················································································· 161

21.4.1 编写Mapper 程序 ···································································· 161

21.4.2 编写Reducer 程序 ··································································· 162

21.4.3 统计上海2016 年每月历史天气 ·················································· 162

实训22 上海每月空气质量统计 ............................................................ 164

22.1 实训目的 ················································································· 164

22.2 实训要求 ················································································· 164

22.3 实训原理 ················································································· 164

22.4 实训步骤 ················································································· 165

22.4.1 编写Mapper 程序 ···································································· 165

22.4.2 编写Reducer 程序 ··································································· 165

22.4.3 统计上海2016 年每月空气质量 ·················································· 166

实训23 北京和上海月均气温对比统计 ................................................. 168

23.1 实训目的 ················································································· 168

23.2 实训要求 ················································································· 168

23.3 实训原理 ················································································· 168

23.4 实训步骤 ················································································· 168

23.4.1 编写Mapper 程序 ···································································· 168

23.4.2 编写Reducer 程序 ··································································· 169

23.4.3 统计北京和上海2016 年月平均气温对比 ······································· 170

第五篇 金融大数据实战

实训24 最优投资组合(上) ............................................................... 172

24.1 实训目的 ················································································· 172

24.2 实训要求 ················································································· 172

24.3 实训原理 ················································································· 172

24.4 实训步骤 ················································································· 173

24.4.1 导入实训需要的模块 ································································ 173

24.4.2 读取数据 ··············································································· 173

24.4.3 观察缺失值 ············································································ 173

24.4.4 数据可视化 ············································································ 174

24.4.5 初步统计分析 ········································································· 175

24.4.6 投资组合优化 ········································································· 175

24.4.7 计算组合均值收益率 ································································ 176

24.5 实训结果 ················································································· 177

实训25 最优投资组合(下) ............................................................... 179

25.1 实训目的 ················································································· 179

25.2 实训要求 ················································································· 179

25.3 实训原理 ················································································· 179

25.4 实训步骤 ················································································· 180

25.4.1 最大夏普比率投资组合 ····························································· 180

25.4.2 最小方差投资组合 ··································································· 181

25.4.3 画散点图 ··············································································· 182

25.5 实训结果 ················································································· 182

实训26 股票走势预测 .......................................................................... 184

26.1 实训目的 ················································································· 184

26.2 实训要求 ················································································· 184

26.3 实训原理 ················································································· 184

26.4 实训步骤 ················································································· 185

26.4.1 导入模块 ··············································································· 185

26.4.2 ARIMA 模型建立 ···································································· 185

26.4.3 数据差分 ··············································································· 186

26.4.4 自相关图和偏自相关图 ····························································· 187

26.4.5 模型训练 ··············································································· 188

26.5 实训结果 ················································································· 188

第六篇 商业大数据实战

实训27 电商产品评论数据情感分析 ..................................................... 192

27.1 实训目的 ················································································· 192

27.2 实训要求 ················································································· 192

XIV 大数据导论技术实训

27.3 实训原理 ················································································· 192

27.4 实训步骤 ················································································· 193

27.4.1 评论数据抽取 ········································································· 193

27.4.2 评论文本去重 ········································································· 193

27.4.3 模型准备 ··············································································· 194

27.4.4 删除前缀评分 ········································································· 194

27.4.5 文本分词 ··············································································· 195

27.4.6 模型构建 ··············································································· 196

27.5 实训结果 ················································································· 197

实训28 eBay 汽车销售数据分析 .......................................................... 198

28.1 实训目的 ················································································· 198

28.2 实训要求 ················································································· 198

28.3 实训原理 ················································································· 199

28.3.1 数据标准化 ············································································ 199

28.3.2 数据可视化 ············································································ 199

28.4 实训步骤 ················································································· 199

28.4.1 数据加载和描述 ······································································ 199

28.4.2 数据剖析 ··············································································· 200

28.4.3 预处理 ·················································································· 202

28.4.4 可视化分析 ············································································ 204

28.5 实训结果 ················································································· 219

实训29 航空公司客户价值分析 ............................................................ 220

29.1 实训目的 ················································································· 220

29.2 实训要求 ················································································· 220

29.3 实训原理 ················································································· 220

29.4 实训步骤 ················································································· 220

29.4.1 数据准备 ··············································································· 220

29.4.2 数据处理 ··············································································· 221

29.4.3 数据预处理 ············································································ 222

29.4.4 构建模型 ··············································································· 225

29.5 实训结果 ················································································· 226

实训30 市场购物篮分析 ...................................................................... 227

30.1 实训目的 ················································································· 227

30.2 实训要求 ················································································· 227

30.3 实训原理 ················································································· 227

30.3.1 MLxtend ················································································ 227

30.3.2 关联规则 ··············································································· 227

30.3.3 Apriori 算法挖掘频繁项集 ························································· 228

30.4 实训步骤 ················································································· 228

30.4.1 用Pandas 和MLxtend 代码导入并读取数据 ··································· 228

30.4.2 数据处理 ··············································································· 228

30.4.3 One-Hot 编码 ·········································································· 229

30.4.4 使用算法包进行关联规则运算 ···················································· 230

30.4.5 结果检视 ··············································································· 231

30.4.6 德国流行的组合 ······································································ 231

附录A 大数据和人工智能实验环境 ...................................................... 233

A.1 大数据实验环境 ········································································· 233

A.2 人工智能实验环境 ······································································ 236