SAS编程演义 Romance of SAS Programming 谷鸿秋 编著 内 容 简 介 本书以Base SAS为基础,重点讲解SAS编程技术,内容涵盖SAS 基础知识、数据导入导出、变量 与观测操作、数据集操作与管理、函数与例程、SAS格式以及宏编程等。另外,对于目前市面上SAS书 籍涉及很少,但是应用频繁的统计表格和统计图形,本书从设计原则、选择思路、绘制方法以及实例步 骤等方面做了系统、详细的介绍。 本书打破语法关键字的字典式编排方式,精心凝练10个既相对独立又互相联系的专题,就地取 材,采用SAS自带的小数据集,循序渐进,层层递进地来进行讲解讨论。无论是SAS初学者,还是江湖 老手,都可从中获益。SAS初学者可以较为全面系统地了解SAS编程技术及其应用场景,江湖老手亦可 从高阶技能中获得启发感悟。 本书封面贴有清华大学出版社防伪标签,无标签者不得销售。 版权所有,侵权必究。侵权举报电话:010-62782989 13701121933 图书在版编目(CIP)数据 SAS编程演义 / 谷鸿秋编著. — 北京:清华大学出版社,2017 ISBN 978-7-302-47057-1 Ⅰ. ①S… Ⅱ. ①谷… Ⅲ. ①统计分析 统计程序 研究 Ⅳ. ①C819 中国版本图书馆CIP数据核字(2017)第108485号 责任编辑:刘 洋 封面设计:李召霞 版式设计:方加青 责任校对:宋玉莲 责任印制:杨 艳 出版发行:清华大学出版社 网  址:http://www.tup.com.cn,http://www.wqbook.com 地  址:北京清华大学学研大厦A座 邮  编:100084 社 总 机:010-62770175 邮  购:010-62786544 投稿与读者服务:010-62776969,c-service@tup.tsinghua.edu.cn 质 量 反 馈:010-62772015,zhiliang@tup.tsinghua.edu.cn 印 装 者:三河市春园印刷有限公司 经  销:全国新华书店 开  本:187mm×235mm 印  张:21 字  数:428千字 版  次:2017年6月第1版   印  次:2017年6月第1次印刷 印  数:1~4000 定  价:79.00元 ————————————————————————————————————————————— 产品编号:072726-01 谨以此书献给 每次我外出时 比我妈妈还唠叨 煮的鸡蛋我一个路上来回都吃不完 总是叮嘱“好好谈朋友、好好锻炼身体、好好工作”的奶奶 SAS编程演义 前 言 蠢蠢欲动一年,奋指敲键三月,夜深人静百天,所幸的是这本书稿没有胎死腹中,终 于写完了。动笔之前,我曾异常兴奋,我自以为满腹经纶无处释放的日子从此结束。完稿以后, 我却沉静了,在接连填了一个又一个自己挖的坑以后,猛然抬头,发现后面其实还有更大 的坑要去填,于是乎内心不禁更加焦虑。不过我很感激这份焦虑,虽然它不足以保证我所 写出来的文字和代码是字字珠玑,篇篇精华,但是因为它,我可以挺起胸膛,拍着胸脯说: 10章专题10多万字,近180张图片、30多张表格和200多段代码,20多张语法卡片、30 个原创实用宏程序,这些都是热血铸就的良心作品,最起码它对得起我当初出发时的那份 心意。 缘 起 我还记得初学SAS编程时,因为看不懂SAS Help而懊恼,因为不理解@与@@ 的 区别而苦恼,因为分不清宏变量的%STR 、%NRSTR、%QUOTE、%BQOTE、%NRQUOTE 以及 %NRBQUOTE 等诸多quoting函数而哀伤。然而,光阴似箭,似水流年,这才不过 几年光景,那个曾经面对这些“简单问题”而烧心的少年,在面对后来同样烧心的学弟学 妹时竟然一脸诧异:“啊?这个应该很容易理解的吧!” 你看,时间是多么的狡诈,它 就这样轻易地抹平了我们学习过程中的苦与痛,当我们走得越远,当初的苦与痛就忘记得 越多。庆幸的是,我不是什么大神,走得也不远,那些苦与痛还没有忘得一干二净,那就 趁现在,赶紧记录下来,分享出来吧。 问 题 此前知乎里有一个提问:SAS入门书籍有哪些值得推荐?在回答里我把SAS学习分 成了三类(点到即止,套PROC 型;深入应用,编程统计型;走火入魔,开发工具型)并 推荐了相应的书籍。在整理市面上SAS 相关的书籍时,我总结了三个缺陷:①专门介绍 数据整理与图表呈现的书太少、太零碎,即便有,也鲜有高质量者;②几乎都采用语法关 键词按字典式的编排方式论述,缺乏从实际问题凝练的良好专题;③编程技术与使用场景 割裂,讲技术者纯讲技术,缺少对应的应用场景带入感。 特 色 本书试图在数据整理与图表呈现的内容上、编排方式上以及论述形式上有所突破和 改进。 在内容上,顾名思义,专门讨论数据整理和统计图表的制作,不贪大求全、忌蜻蜓点水。 精心提炼的10 个专题总计10 万字,涉及SAS 的八卦见闻、SAS 的基础知识、数据的导 入导出方式、变量与观测的各种操作、数据集的各种操作与管理、函数与例程、输入输出 格式、统计表格的制作、统计图形的绘制原则、选择方法以及各系列统计图形的绘制实例, 此外对SAS 宏变量、宏程序以及开发宏程序的原则、步骤、技巧等内容均有较为详细的论述。 在编排上,推陈出新,打破按语法关键字的字典式编排方式,精心挑选的10 个专题 构成10 个既相对独立又互相联系的章节。小节与小节之间、例子与例子之间,尽量由问 题层层引入,逐步推进,减少割裂与唐突感,增加使用场景的带入感。 此外,很多SAS 用户虽然都了解、接受甚至已经受益于SAS 在数据整理和统计分析 方面毋庸置疑的优势,但是在统计结果的呈现上,尤其是统计表格,特别是统计图形方面 都或多或少存在不甚了解抑或是误解的情况。因此,本书在统计表格的制作,尤其是统计 绘图方面花了大量的笔墨做串讲——是的,用一个又一个层层递进的疑问来串讲,避免单 纯的介绍绘图语法和SAS 技术,这在其他书中是很少见的。 最后,为了便于读者理解SAS 运行机制与原理,本书在论述时都尽量采用小数据、 小实例以便清晰简洁地说明问题,避免因行业背景的不同陷入具体实例的大坑。同时,为 了方便读者练手测试,几乎所有数据均就地取材,采用SASHelp 库中自带的数据集。 心 得 SAS Help 文档是学习SAS 不可多得的手边精品材料,如果还没有深刻体会到这一点, 那么赶紧去读读R 包的Help 文档。很多SAS 书籍取材于SAS Help 文档却闭口不提,这 是一个巨大的失误。因此,本书会专门引导,鼓励读者去多读SAS Help 文档、多查SAS Help 文档。 学习知识的理想状况是单调线性、循序渐进的推进,然而现实情况却是:知识本身是 错综复杂的网状结构。因此,我们经常需要迂回包抄、循环往复地学习。在介绍知识点时, 本书努力做到直线推进、循序渐进,但由于作者精力、能力有限,加之知识网状结构的客 观的、存在的现实,希望读者能有一个迂回包抄、循环往复的学习心态。 当然,我也有一个迂回包抄、循环往复、精进迭代的心态。本书还有很多的话题, 比如SAS 的综合矩阵语言(Integrated Matrix Language ,IML)、输出传递系统(Output delivery System, ODS )、正则表达式等没能在此版付诸实践;已经付诸实践的,也会因笔 者的见识、学识以及精力受限,而有所欠缺。因此,诚恳地欢迎诸位读者给出您的建设性 建议以及批评性意见,送达地址guhongqiu(at)yeah(dot)net。有您的反馈,下一版(如果有 的话),肯定会更好。 致 谢 如果您读到这里了,请不要嫌我啰唆,因为一路走来,需要感谢的人特多,而且感谢 应该是一个严肃的话题,因此,下面是一本正经的致谢。 感谢北京中医药大学曾光教授、刘仁权教授带我叩开流行病与卫生统计领域的大门; 感谢中国疾病预防控制中心吴尊友教授教我公共卫生的大义;感谢北京协和医学院李卫教 授携我走进临床研究的大门;感谢国家神经系统疾病临床医学研究中心王拥军、王伊龙教 授给我机会在实践中提升临床研究思维与技能。 感谢The Little SAS Book 的作者Lora D. Delwiche 女士,著名SAS 绘图博客Graphically Speaking 的博主、众多SAS 绘图专著的作者Sanjay Matange 先生,以及The DS2 Procedure: SAS Programming Methods at Work 的作者Peter Eberhardt 先生在本书写作过程中给予的支 持和帮助。 感谢SAS 中国研发中心总经理刘政先生;感谢SAS 中国研发中心分析产品开发部总 监高燕女士、SAS 中国研发中心商业智能和可视化分析产品部技术总监巫银良先生、SAS 中国区培训经理赵丹先生、SAS 大中华区市场总监蒋顺利先生在我准备书稿过程中给予 的支持;感谢SAS 中文论坛创始人、前海征信副总经理施亦明先生,SAS 中文资讯网的 创始人sxlion 以及人大经济论坛里的一大波ID(jingju11 、pobel、hopewell、davil2000、 kuhasu、ahuige、soporaeternus、YueweiLiu 、oloolo、bobguy、Imasasor、playmore、 crackman、dxystata)在SAS 的江湖里传道解惑。 感谢本书的编辑,清华大学出版社的刘洋先生。没有他的信任,这本书可能会散落于 江湖;没有他的信任,写作可能会被无数次的催稿打断。还好,他对我和这本书稿一直保 持足够的耐心。再次感谢清华大学出版社编辑部,精心挑选每章首页的山水画,配合标题, 意境深远。 来北京十多年,感谢中国气象科学研究院谷湘潜研究员、首都医科大学附属北京地坛 医院江宇泳教授给予的各方面关照;感谢中南大学谷潜平教授的建议;感谢国家神经系统 疾病临床医学研究中心的王彩云主任早上的烤红薯——无上美味、香甜至极。 最后,感谢因为SAS、因为此书,和我有了交集的你。 谷鸿秋 2017年5月24日 SAS编程演义 目 录 图目录·········································································································17 表目录·········································································································23 程序目录······································································································25 语法卡片目录································································································33 第1章 人生若只如初见:初识SAS ·····················································1 1.1 往事并不如烟 ·············································································1 1.1.1 逗你玩的发音 ····················································································1 1.1.2 有点趣的历史 ····················································································2 1.1.3 逝不去的江湖 ····················································································5 1.2 选择一厢情愿 ·············································································5 1.3 软件架构 ···················································································6 1.4 安装与许可 ················································································7 1.5 运行模式 ·················································································12 1.6 编程界面 ·················································································12 1.6.1 DMS界面 ························································································12 1.6.2 EG界面 ··························································································14 1.6.3 SAS Studio 界面 ················································································14 1.7 版本 ·······················································································16 1.7.1 购买版与大学版 ················································································17 1.7.2 免费云端版 ······················································································17 1.7.3 各操作系统平台版 ·············································································20 1.7.4 各语言版 ·························································································20 1.8 本章小结 ·················································································21 第2 章 清歌苦调两不厌:夯实基础 ····················································22 2.1 Foundation SAS ········································································· 22 2.1.1 Foundation SAS 的构成 ·······································································22 2.1.2 Base SAS ·························································································23 2.2 SAS 数据分析流程 ·····································································23 2.3 逻辑库与数据集 ········································································24 2.3.1 逻辑库 ····························································································24 2.3.2 数据集 ····························································································27 2.3.3 变量 ·······························································································31 2.4 SAS 编程语言···········································································32 2.4.1 SAS 程序结构 ···················································································32 2.4.2 SAS 语法规则 ···················································································33 2.4.3 SAS 语言元素 ···················································································35 2.4.4 三种逻辑结构 ···················································································40 2.4.5 数组结构 ·························································································42 2.4.6 函数与CALL 例程 ·············································································46 2.4.7 结构化查询语言SQL ··········································································46 2.4.8 SAS 宏MACRO ················································································48 2.5 理解SAS 运行机制 ····································································49 2.5.1 PDV 与DATA 步自循环 ······································································49 2.5.2 @ 与@@ 的困惑 ··············································································55 2.6 用好SAS Help 的秘诀 ·································································57 2.6.1 SAS Help 知多少 ···············································································57 2.6.2 看懂SAS Help 的基本套路··································································60 2.6.3 检索SAS Help 的小技巧·····································································61 2.6.4 熟悉SASHelp 下的数据集···································································62 2.7 本章小结 ·················································································63 第3 章 苔点狂吞纳线青:读取数据 ····················································64 3.1 读取对象与读取方式 ··································································64 3.2 数据读取策略 ···········································································65 3.3 读取DBMS 数据文件 ·································································66 3.3.1 SAS/ACCESS 与DBMS ······································································66 3.3.2 LIBNAME 语句访问DBMS 数据文件语法 ··············································67 3.4 读取PC 数据文件 ······································································67 3.4.1 小试牛刀 ·························································································68 3.4.2 PROC IMPORT 语法 ··········································································68 3.4.3 LIBNAME 访问PC 文件语法 ·······························································70 3.4.4 实例扩展 ·························································································70 3.5 读取Flat 数据文件 ·····································································72 3.5.1 读入CSV 文件 ··················································································72 3.5.2 读入TXT 特殊字符分隔的文件 ·····························································72 3.6 读取流式数据 ···········································································74 3.6.1 流式数据初探 ···················································································74 3.6.2 INPUT 语句一般语法·········································································74 3.6.3 列表读入式 ······················································································75 3.6.4 列读入式 ·························································································77 3.6.5 格式读入式 ······················································································77 3.6.6 命名读入式 ······················································································78 3.6.7 DATALINES 数据综合案例 ··································································79 3.6.8 关于列表、指针及格式等 ····································································79 3.7 顺带说说数据导出 ·····································································80 3.8 数据导入导出的宏 ·····································································81 3.9 本章小结 ·················································································81 第4 章 行舟来去泛纵横:变量观测 ····················································82 4.1 变量与观测操作概览 ··································································82 4.2 创建数据集 ··············································································83 4.2.1 DATA+SET 语句···············································································84 4.2.2 SQL CREATE 语句 ·············································································85 4.2.3 过程步的语句与选项 ··········································································85 4.3 观测与变量的筛选 ·····································································89 4.3.1 IF 与WHERE 的辨析 ·········································································90 4.3.2 KEEP、DROP 与RENAME ·································································91 4.3.3 PROC SQL 筛选变量与观测 ·································································93 4.4 横向的其他操作 ········································································93 4.4.1 新变量生成与赋值 ·············································································94 4.4.2 变量类型转换 ···················································································98 4.4.3 变量重新分组切割 ··········································································· 100 4.4.4 缺失变量查找 ················································································· 102 4.5 纵向的其他操作 ·······································································103 4.5.1 DATA 步实现累加 ··········································································· 103 4.5.2 PROC 步实现累加··········································································· 106 4.5.3 LAG 与DIF 的隔行取物 ···································································· 107 4.6 本章小结 ················································································108 第5 章 亦应帷幄运鸿筹:数据库集 ···················································109 5.1 修改数据集:MODIFY 语句 ························································109 5.1.1 MODIFY 语句实例 ·········································································· 109 5.1.2 MODIFY 语句注意事项 ···································································· 111 5.1.3 MODIFY 语句语法概览 ···································································· 112 5.2 更新数据集:UPDATE 语句 ························································113 5.2.1 UPDATE 语句实例 ··········································································· 113 5.2.2 UPDATE 语句注意事项 ····································································· 113 5.2.3 UPDATE 语句语法概览 ····································································· 114 5.3 行列互换(转置):PROC TRANSPOSE ········································114 5.3.1 行转列(宽表转长表) ····································································· 114 5.3.2 列转行(长表转宽表) ····································································· 116 5.3.3 PROC TRANSPOSE 注意事项 ···························································· 116 5.3.4 PROC TRANSPOSE 语法概览 ···························································· 117 5.4 横向拼接(并接):多SET/MERGE/PROC SQL······························117 5.4.1 一对一读入 ···················································································· 118 5.4.2 一对一并接 ···················································································· 119 5.4.3 匹配并接 ······················································································· 119 5.5 纵向拼接(串接):SET /PROC APPEND / PROC SQL ······················123 5.5.1 SET 语句 ······················································································· 124 5.5.2 PROC APPEND ··············································································· 124 5.5.3 PROC SQL ····················································································· 125 5.6 PROC SQL 表连接 ····································································126 5.6.1 左连接(LEFT JOIN) ······································································ 127 5.6.2 右连接(RIGHT JOIN) ···································································· 128 5.6.3 内连接(INNER JOIN)···································································· 129 5.6.4 全连接(FULL JOIN) ····································································· 129 5.6.5 表连接操作语法总结 ········································································ 130 5.7 PROC SQL 集运算 ····································································131 5.7.1 差(EXCEPT) ··············································································· 131 5.7.2 并(UNION) ················································································ 132 5.7.3 交(INTERSECT) ·········································································· 133 5.7.4 外并(OUTER UNION)··································································· 134 5.8 数据集管理 ·············································································134 5.8.1 初入门庭:制作变量字典 ·································································· 135 5.8.2 大雅之堂:数据集信息 ····································································· 136 5.8.3 大雅之堂:数据集操作 ····································································· 137 5.8.4 大雅之堂:变量信息 ········································································ 138 5.8.5 登峰造极:精通SAS 字典 ································································· 139 5.9 本章小结 ················································································143 第6 章 间有山川亦奇秀:函数例程 ···················································145 6.1 函数和例程是什么? ·································································145 6.2 为什么要用函数和例程? ···························································146 6.2.1 一个例子感受函数的便利 ··································································146 6.2.2 一个例子感受例程的便利 ··································································147 6.3 怎么用函数和例程? ·································································148 6.3.1 函数语法 ·······················································································148 6.3.2 例程语法 ·······················································································149 6.4 在哪里使用函数和例程? ···························································149 6.5 有哪些函数例程可用? ······························································150 6.6 高频函数和例程有哪些? ···························································152 6.7 不够用怎么办? ·······································································153 6.8 本章小结 ················································································154 第7章 翩跹翠袖拂云裳:巧用格式 ···················································155 7.1 何为格式? ·············································································155 7.2 为何要用格式? ·······································································156 7.3 格式名的样子 ··········································································157 7.4 常用系统格式 ··········································································158 7.4.1 常用输出格式 ·················································································158 7.4.2 常用输入格式 ·················································································159 7.5 PROC FORMAT自定义格式 ························································159 7.5.1 自定义格式 ····················································································159 7.5.2 用数据集定义格式 ···········································································161 7.5.3 PROC FORMAT语法 ·······································································161 7.6 格式的妙用 ·············································································162 7.6.1 格式使用位置 ·················································································162 7.6.2 变量重分组 ····················································································163 7.6.3 统计分析过程 ·················································································164 7.6.4 统计缺失观测 ·················································································165 7.6.5 条件显示 ·······················································································166 7.7 本章小结 ················································································167 第8章 菱花荇蔓随双桨:百变绘图 ···················································168 8.1 绘图软件知多少 ·······································································168 8.2 丑陋死板太难搞 ·······································································169 8.3 绘图系统瞄一瞄 ·······································································170 8.3.1 SAS/GRAPH ··················································································171 8.3.2 统计过程自动绘图 ···········································································172 8.3.3 ODS Graphics System ········································································173 8.4 基本原则不能少 ·······································································179 8.4.1 Less is more ····················································································179 8.4.2 图形元素设计 ·················································································180 8.5 统计图形的选择策略 ·································································180 8.5.1 统计图形选择总则 ···········································································181 8.5.2 案例说明 ·······················································································183 8.6 条图系列(Bar Chart) ·······························································184 8.6.1 单式条图 ·······················································································184 8.6.2 频数图 ··························································································185 8.6.3 带误差限的单式条图 ········································································186 8.6.4 单式百分比条图 ··············································································187 8.6.5 簇拥式复式条图 ··············································································188 8.6.6 簇拥式复式误差限条图 ·····································································188 8.6.7 堆叠式复式条图 ··············································································189 8.6.8 马赛克图 ·······················································································191 8.6.9 镜面式复式条图 ··············································································191 8.6.10 面板条图 ·····················································································193 8.7 直方图系列(Histogram) ···························································195 8.7.1 简单直方图 ····················································································195 8.7.2 重叠直方图 ····················································································196 8.7.3 镜面直方图 ····················································································197 8.7.4 面板直方图 ····················································································199 8.8 箱线图系列(Box Plot) ·····························································200 8.8.1 简单箱线图 ····················································································200 8.8.2 分组箱线图 ····················································································201 8.8.3 面板箱线图 ····················································································202 8.9 散点图系列(Scatter Plot) ·························································202 8.9.1 X-Y散点图 ····················································································203 8.9.2 X-Y散点回归图 ··············································································203 8.9.3 分组散点图 ····················································································204 8.9.4 面板散点图 ····················································································205 8.9.5 泡泡图 ··························································································206 8.9.6 矩阵散点图 ····················································································207 8.10 折线图系列(Line Plot) ···························································208 8.10.1 简单折线图 ··················································································208 8.10.2 误差限折线图 ···············································································209 8.10.3 分组误差限折线图 ·········································································211 8.10.4 面板误差限折线图 ·········································································211 8.11 面积图系列(Area Plot) ··························································212 8.11.1 面积图 ·························································································212 8.11.2 带状图 ·························································································213 8.12 拟合图系列(Fit Plot) ·····························································214 8.12.1 密度曲线 ·····················································································214 8.12.2 回归线 ························································································215 8.12.3 椭圆曲线 ·····················································································216 8.12.4 ROC曲线·····················································································217 8.12.5 Kaplan–Meier曲线 ·········································································217 8.12.6 LOESS曲线··················································································219 8.12.7 Spline曲线 ···················································································221 8.13 森林图系列(Forest Plot) ·························································222 8.13.1 简单森林图 ··················································································222 8.13.2 亚组分析森林图 ············································································223 8.14 地图系列(Map) ···································································226 8.14.1 纯地图 ························································································226 8.14.2 统计地图 ·····················································································227 8.14.3 热力地图 ·····················································································228 8.15 PROC SGPLOT总结 ································································229 8.16 八九打扮十分妖 ······································································231 8.16.1 更换样式文件 ···············································································231 8.16.2 SG过程语句选项 ···········································································232 8.16.3 GTL语句选项 ···············································································233 8.16.4 创建样式文件 ···············································································235 8.17 图片属性控制 ·········································································238 8.18 本章小结 ···············································································240 第9章 拙中藏巧混天成:统计表格 ···················································241 9.1 统计表格是什么? ····································································241 9.2 统计表格如何设计? ·································································244 9.3 统计表格有哪些用法? ······························································245 9.3.1 基线信息表格 ·················································································245 9.3.2 危险因素表格 ·················································································246 9.3.3 结局效应表格 ·················································································246 9.3.4 亚组分析表格 ·················································································247 9.4 SAS制表方式有哪些? ······························································248 9.4.1 统计汇总过程 ·················································································249 9.4.2 专用制表过程PROC TABULATE ························································250 9.4.3 专用报告过程PROC REPORT ····························································251 9.5 如何完美实现统计表格? ···························································252 9.6 手把手来说明 ··········································································253 9.6.1 完整实例 ·······················································································253 9.6.2 重点解读 ·······················································································255 9.6.3 技能升华 ·······················································································256 9.7 本章小结 ················································································257 第10章 一缕檀烟万佛名:宏中奥秘 ·················································259 10.1 学习宏的几个理由 ···································································259 10.2 认识宏 ··················································································261 10.3 宏变量 ··················································································261 10.3.1 宏变量特性 ··················································································262 10.3.2 创建宏变量的十种方法 ···································································262 10.3.3 宏符号表与作用域 ·········································································264 10.3.4 掩蔽宏变量 ··················································································267 10.3.5 显示宏变量值 ···············································································269 10.3.6 引用宏变量 ··················································································270 10.4 宏程序 ··················································································272 10.4.1 宏程序定义与调用 ·········································································272 10.4.2 宏程序定义与调用语法 ···································································272 10.4.3 存储与加密宏程序 ·········································································273 10.4.4 选择与循环宏语句 ·········································································274 10.4.5 宏函数及其分类 ············································································276 10.5 开发宏程序 ············································································277 10.5.1 原则 ···························································································277 10.5.2 步骤 ···························································································278 10.5.3 技巧 ···························································································280 10.6 本章小结 ···············································································281 附录·········································································································282 附录1 统计图形不同样式效果展示 ····················································282 附录2 部分统计图形彩图效果(ggStyle样式) ····································284 附录3 部分统计图形彩图效果(ggplot2样式) ····································285 附录4 SGPLOT绘图参考 ································································286 附录5 笔者原创宏工具分享列表 ·······················································292 附录6 优质SAS学习资源列表 ·························································293 附录7 SAS编程习惯与编程规范 ·······················································294 参考文献·································································································297 SAS编程演义 图 目 录 图1-1 SAS公司信息 ······················································································2 图1-2 SAS公司的几位创始人 ··········································································3 图1-3 SAS公司的行业分布及收益情况 ······························································4 图1-4 SAS安装光盘文件 ················································································8 图1-5 SAS订单信息文件 ················································································9 图1-6 SAS已安装的模块结果 ········································································10 图1-7 SAS许可模块结果 ··············································································10 图1-8 SAS SID文件 ·····················································································11 图1-9 窗口环境下DMS、EG以及SAS Studio的启动链接 ····································13 图1-10 SAS DMS编程界面 ············································································13 图1-11 SAS EG编程界面 ··············································································14 图1-12 SAS Studio编程界面 ··········································································15 图1-13 SAS版本号 ······················································································16 图1-14 SAS及其分析产品的主要版本发布日期 ··················································17 图1-15 SODA登录后界面 ·············································································18 图1-16 SODA SAS Studio界面 ········································································19 图1-17 SODA SAS Studio的便捷 ·····································································20 图1-18 SAS的各语言版本 ·············································································21 图2-1 Foundation SAS的组件 ·········································································22 图2-2 SAS数据分析流程 ··············································································24 图2-3 数据、数据集以及逻辑库的关系 ·····························································25 图2-4 SAS永久库的建立以及引擎 ··································································25 图2-5 永久逻辑库与临时逻辑库 ······································································26 图2-6 SAS 数据文件与视图图标 ·····································································27 图2-7 SAS 数据文件与视图内容 ·····································································28 图2-8 SAS 数据文件描述信息 ········································································29 图2-9 右击数据集查看变量信息 ······································································30 图2-10 DATA 步与PROC 步 ··········································································33 图2-11 SAS 中文名数据集和变量名 ·································································35 图2-12 程序的三种逻辑结构 ··········································································42 图2-13 遍历数组元素结果 ·············································································45 图2-14 DATA 步动作流程图 ···········································································50 图2-15 Input Buffery 与PDV ·········································································· 51 图2-16 SAS Help 官网 ··················································································58 图2-17 SAS 9.4 官网帮助文档 ········································································58 图2-18 SAS ODS Graph 帮助文档PDF 版 ··························································59 图2-19 SAS 本地帮助文档 ·············································································59 图2-20 PROC SGPLOT Help 语法风格注释 ························································60 图2-21 检索SAS Help ·················································································· 61 图2-22 SAS Help 检索结果——语法、概览、示例 ··············································62 图3-1 SAS/ACCESS 与DBMS 的关系 ······························································66 图3-2 PROC IMPORT 导入文本实质是DATA 步INFILE+INPUT 语句 ·····················73 图4-1 SAS 数据集中变量、观测操作概览 ·························································83 图4-2 过程步语句与选项生成的数据集 ·····························································86 图4-3 PROC LIFETEST 自动生成的K-M 生存曲线 ··············································87 图4-4 RESULTS 目录树 ················································································87 图4-5 ODS OUTPUT 语句抓取K-M 曲线数据 ····················································88 图4-6 ODS TRACE ON 语句追踪监控结果 ························································89 图4-7 条件赋值语句生成新变量 ······································································95 图4-8 循环语句赋值的新变量 ·········································································95 图4-9 变量类型与运算类型不匹配 ···································································99 图4-10 变量类型转换的结果 ·········································································100 图4-11 DATA 步累积语句求均值 ····································································103 图4-12 分组统计结果 ··················································································105 图4-13 LAG 与DIF 隔行取物 ········································································107 图5-1 PROC TRANSPOSE 与ARRAY 实现行转列结果········································115 图5-2 PROC TRANSPOSE 与ARRAY 实现列转行结果········································116 图5-3 数据集并接示意图 ··············································································118 图5-4 一对一读入实例图解 ···········································································119 图5-5 MERGE 语句实现各连接类型结果 ·························································122 图5-6 串接数据集示意图 ··············································································123 图5-7 表连接类型示意图 ··············································································127 图5-8 PROC SQL 左连接效果 ········································································128 图5-9 PROC SQL 右连接效果 ········································································128 图5-10 PROC SQL 内连接效果 ······································································129 图5-11 PROC SQL 全连接效果 ······································································130 图5-12 集合运算图解 ··················································································131 图5-13 PROC SQL EXCEPT 运算效果 ·····························································132 图5-14 PROC SQL UNION 运算效果 ·······························································133 图5-15 PROC SQL INTERSECT 运算效果 ························································133 图5-16 PROC SQL OUTER UNION 运算效果 ····················································134 图5-17 变量字典表 ·····················································································135 图5-18 PROC DATASETS 修改数据集权限后效果 ··············································136 图5-19 PROC DATASETS 修改变量信息效果 ····················································138 图5-20 SASHELP 库下的Vcolumn 视图 ···························································141 图5-21 获取SAS HELP 库下所有数据集信息效果 ··············································142 图5-22 获取数据集CARS 的所有变量信息效果 ·················································143 图6-1 函数示意图 ·······················································································145 图6-2 SAS 函数分类查询字典 ·······································································151 图 7-1 输入输出格式效果 ··············································································156 图7-2 有无格式输出效果对比 ········································································157 图7-3 PROC FORMAT 自定义格式效果···························································160 图7-4 PROC FORMAT 的CNTLOUT 选项导出格式定义数据 ································160 图7-5 自定义格式用于变量重分类效果 ····························································164 图7-6 统计过程中加载自定义格式效果 ····························································165 图7-7 利用自定义格式统计缺失、非缺失观测数效果 ··········································166 图7-8 自定义格式用于条件显示效果 ·······························································167 图8-1 Robert Allison 的图库 ··········································································169 图8-2 Sanjay Matange 的图库 ·········································································170 图8-3 SAS/GRAPH 模块画图效果 ··································································172 图8-4 生存分析自动产生K-M 曲线效果 ··························································172 图8-5 统计过程自动绘制ODS(Graphics)原理图 ·············································173 图8-6 ODS GRAPHICS 语句设置效果 ·····························································174 图8-7 SAS ODS GRAPHICS 设计器 ································································175 图8-8 SAS ODS GRAPHICS 编辑器效果 ··························································176 图8-9 PROC SGPLOT 绘图效果 ·····································································177 图8-10 GTL 的使用步骤 ···············································································178 图8-11 用GTL 画散点和回归线效果 ·······························································179 图8-12 Less is more 的截图展示 ·····································································180 图8-13 Andrew Abela 总结的四类图形 ·····························································181 图8-14 PROC SGPLOT 的HBAR 语句绘制条图效果 ···········································185 图8-15 PROC FREQ 过程调用ODS Graphics 系统绘制频数图效果 ·························186 图8-16 PROC SGPLOT 的VBAR 语句绘制带误差限单式条图效果 ·························186 图8-17 PROC SGPLOT 的HBAR 语句绘制百分比条图效果 ··································187 图8-18 PROC SGPLOT 的VBAR 语句绘制复式条图效果 ·····································188 图8-19 PROC SGPLOT 的VBAR 语句绘制簇拥式复式误差限条图效果 ···················189 图8-20 PROC SGPLOT 的VBAR 语句绘制簇拥式复式百分比条图 ·························190 图8-21 PROC SGPLOT 的VBAR 语句绘制堆叠式复式百分比条图 ·························190 图8-22 PROC FREQ 过程PLOTS 选项绘制马赛克图效果 ·····································191 图8-23 PROC SGPLOT 绘制横向镜面复式条图 ··················································193 图8-24 PROC SGPLOT 绘制纵向镜面复式条图 ··················································193 图8-25 PROC SGPANEL 绘制簇拥式面板条图···················································194 图8-26 PROC SGPANEL 绘制堆叠式面板条图···················································194 图8-27 简单直方图的SGPLOT 效果 ·······························································195 图8-28 PROG SGPLOT 绘制重叠直方图效果 ····················································196 图8-29 多个HISTOGRAM 语句绘制重叠直方图 ················································197 图8-30 GTL 绘制镜面直方图效果···································································199 图8-31 PROC SGPANEL 绘制面板直方图 ·························································199 图8-32 PROC SGPLOT 绘制纵向箱线图效果 ·····················································200 21 图8-33 PROC SGPLOT 绘制横向箱线图效果 ·····················································201 图8-34 PROC SGPLOT 绘制分组箱线图效果 ·····················································201 图8-35 PROC SGPANEL 绘制面板箱线图效果···················································202 图8-36 PROC SGPLOT 绘制散点图效果 ···························································203 图8-37 PROC SGPLOT 绘制散点回归图 ···························································204 图8-38 PROC SGPLOT 绘制分组散点图 ···························································205 图8-39 PROC SGPANEL 绘制面板散点图效果···················································206 图8-40 PROC SGPLOT 绘制泡泡图效果 ···························································207 图8-41 PROC SGSCATTER 绘制矩阵散点图效果 ···············································208 图8-42 PROC SGPLOT 的SERIES 绘制简单折线图 ············································209 图8-43 PROC SGPLOT 的VLINE 语句绘制简单折线图效果 ·································210 图8-44 PROC SGPLOT 绘制误差限折线图效果 ··················································210 图8-45 PROC SGPLOT 绘制分组折线图效果 ·····················································211 图8-46 PROC SGPLOT 绘制面板折线图效果 ·····················································212 图8-47 PROC SGPLOT 绘制面积图效果 ···························································213 图8-48 PROC SGPLOT 绘制带状图效果 ···························································214 图8-49 PROC UNIVARIATE 与PROC SGPLOT 拟合密度曲线效果 ·························215 图8-50 PROC REG 与PROC SGPLOT 拟合回归线效果 ·······································216 图8-51 PROC SGPLOT 绘制椭圆曲线效果 ························································216 图8-52 PROC LOGISTIC 绘制ROC 曲线效果 ····················································217 图8-53 PROC LIFETEST 绘制生存曲线效果图 ··················································218 图8-54 PROC LIFETEST 绘制累积事件曲线效果图 ············································219 图8-55 PROC SGPLOT 绘制LOESS 曲线效果 ···················································220 图8-56 PROC LOESS 绘制LOESS 曲线效果 ·····················································220 图8-57 PROC SGPLOT 绘制样条函数曲线效果 ··················································221 图8-58 PROC SGPLOT 绘制简单森林图效果 ·····················································223 图8-59 GTL 宏程序实现亚组分析森林图效果 ····················································225 图8-60 SAS 9.40M3 SGPLOT 实现亚组分析森林图效果 ·······································225 图8-61 PROC GMAP 绘制美国地图效果··························································226 图8-62 PROC GMAP 绘制统计地图BLOCK 效果 ···············································228 图8-63 PROC GMAP 绘制统计地图CHORO 效果 ··············································229 图8-64 PROC SGPLOT 语法总结图 ·································································230 图8-65 SG 过程语句选项修改图形外观效果 ······················································233 图8-66 GTL 语句选项修改图形外观效果 ··························································234 图8-67 查看样式步骤 ··················································································235 图8-68 自定义STYLE 模仿GGPLOT2 绘图效果 ················································237 图9-1 统计表格的元素图解 ···········································································242 图9-2 期刊统计表格展示1—简单表格 ····························································242 图9-3 期刊统计表格展示2—复杂表格 ····························································242 图9-4 期刊统计表格展示3—组合表格 ····························································243 图9-5 四大医学期刊《NEJM》《LANCET》《JAMA》以及《BMJ》的统计表格风格 ····243 图9-6 基线统计表格实例 ··············································································245 图9-7 危险因素统计表格实例 ········································································246 图9-8 结局效应表格实例 ··············································································247 图9-9 亚组分析统计表格实例 ········································································248 图9-10 统计表格期望的样式 ·········································································249 图9-11 统计汇总过程绘制统计表格效果··························································250 图9-12 PROC TABULATE 绘制统计表格效果 ····················································251 图9-13 PROC REPORT 绘制统计表格效果 ························································252 图9-14 %ggBaseline2 制表效果展示1 ······························································256 图9-15 %ggBaseline2 制表效果展示2 ······························································257 图10-1 %WINDOWS 语句开发的临床试验随机编码系统 ·····································261 图10-2 创建赋值宏变量时搜素符号表的顺序····················································266 图10-3 解析宏变量时搜素符号表的顺序 ··························································267 图 10-4 选择宏变量掩蔽的引用函数思路 ··························································269 SAS编程演义 表 目 录 表1-1 常见SAS模块及其功能简介 ····································································7 表1-2 SAS三大编程环境简要比较 ··································································15 表2-1 数据集逻辑构成示意 ············································································28 表2-2 算术运算符 ························································································37 表2-3 比较运算符 ························································································38 表2-4 逻辑运算符 ························································································38 表2-5 复合表达式运算顺序 ············································································38 表2-6 编译和执行阶段具体动作 ······································································49 表3-1 数据读取策略总结 ···············································································65 表5-1 逻辑中数据集操作概览 ········································································109 表5-2 并接数据集的形式和实现方法 ·······························································118 表5-3 SAS Dictionary库中所有表清单 ·····························································139 表5-4 SAS字典库里的表与其对应的SASHELP库下的视图 ·································140 表6-1 函数类别及其描述 ··············································································150 表6-2 常用函数列表 ····················································································152 表7-1 常用输出格式 ····················································································158 表7-2 常用输入格式 ····················································································159 表7-3 格式的用途归纳 ·················································································162 表8-1 ODS Graphics System的五大部件 ···························································174 表8-2 SAS ODS Graphics过程 ·······································································176 表8-3 统计图形的选择策略 ···········································································182 表8-4 SASHELP.CARS数据库变量清单 ···························································183 表8-5 不同输出终端及其支持的图片格式 ·························································238 表10-1 宏函数分类列表 ···············································································276 表10-2 宏程序开发步骤 ···············································································278 附表1 SGPLOT 和SGPANEL 过程各绘图语句的兼容性及其语法概览 ·····················286 附表2 SGPLOT 和SGPANEL 过程绘图语句 ······················································287 附表3 SGPLOT 过程其他可选用语句 ······························································290 附表4 SGPANEL 过程其他可选用语句 ····························································291 附表5 绘图语句属性选项 ·············································································291 附表6 笔者原创宏分享列表 ··········································································292 SAS编程演义 程 序 目 录 程序1-1 查看SAS安装、许可的模块 ·································································9 程序1-2 获取SAS版本号 ··············································································16 程序2-1 利用LIBNAME语句自建永久逻辑库 ····················································27 程序2-2 查看数据集描述信息与数据值 ·····························································30 程序2-3 SET语句建立数据文件 ······································································31 程序2-4 创建SAS视图 ·················································································31 程序2-5 SAS日期、时间以及日期时间的本质 ···················································32 程序2-6 SAS中文名数据集和变量名 ·······························································34 程序2-7 编程风格:规范与凌乱 ······································································35 程序2-8 SAS中的常量 ·················································································37 程序2-9 SAS语言元素演示 ···········································································39 程序2-10 IF-ELSE/THEN示例 ········································································40 程序2-11 IF-ELSE配合DO-END·····································································41 程序2-12 DO循环语句 ·················································································41 程序2-13 循环语句DO WHILE与DO UNTIL ····················································42 程序2-14 定义数组 ······················································································44 程序2-15 访问数组元素 ················································································45 程序2-16 函数与例程应用示例 ·······································································46 程序2-17 最简单的一个SQL过程 ···································································47 程序2-18 PROC SQL SELECT语句全从句示例 ···················································47 程序2-19 宏变量 ·························································································48 程序2-20 MACRO定义和调用 ·······································································48 程序2-21 PDV 演示程序 ···············································································51 程序2-22 验证PDV ······················································································55 程序2-23 @ 与@@ 示例程序 ·········································································56 程序2-24 @ 与@@ 的辨析 ············································································57 程序3-1 LIBNAME 语句访问DB2 数据文件 ······················································67 程序3-2 导入SPSS 文件 ················································································68 程序3-3 读入规整的EXCEL 文件 ····································································71 程序3-4 读入特定区域数据的EXCEL 文件 ························································71 程序3-5 PROC IMPORT 读入 CSV 文件 ····························································72 程序3-6 PROC IMPORT 导入制表符和空格分隔的文本 ········································72 程序3-7 INPUT 语句输入四格表数据 ·······························································74 程序3-8 INPUT 语句列表读入式案例 ·······························································76 程序3-9 INPUT 语句列读入式案例 ··································································77 程序3-10 INPUT 语句格式读入式案例 ······························································78 程序3-11 INPUT 语句命名读入式案例 ······························································78 程序3-12 INPUT 语句组合式读入案例 ······························································79 程序3-13 PROC EXPORT 导出实例 ·································································80 程序4-1 DATA 语句各种用法示例 ···································································84 程序4-2 PROC SQL 创建数据集 ······································································85 程序4-3 过程步语句与选项生成新数据集 ··························································86 程序4-4 ODS OUTPUT 语句抓取K-M 曲线数据 ·················································88 程序4-5 ODS TRACE ON 语句追踪监控过程步数据 ············································88 程序4-6 IF 与WHERE 语句筛选观测·······························································90 程序4-7 WHERE 选项筛选观测 ······································································91 程序4-8 RENAME 与KEEP、DROP ································································91 程序4-9 一个充满坑的错误程序 ······································································92 程序4-10 一个充满坑的错误程序的纠正 ···························································93 程序4-11 PROC SQL 实现变量与观测的筛选 ·····················································93 程序4-12 条件赋值语句生成新变量 ·································································94 程序4-13 循环语句批量赋值 ··········································································94 程序4-14 LENGTH 语句生成新变量 ································································95 程序4-15 用编程方法获取数据集观测数 ···························································96 程序4-16 编程方法获取数据集观测数效率比较··················································97 程序4-17 利用编译变量与临时变量完成随机抽样···············································98 程序4-18 CALL 例程生成新变量·····································································98 程序4-19 变量类型的转换 ·············································································99 程序4-20 IF-ELSE 创建分组变量····································································101 程序4-21 SELECT-WHEN 创建分组变量 ··························································101 程序4-22 PROC SELECT-CASE 创建分组变量 ··················································101 程序4-23 自定义格式创建分组变量 ································································102 程序4-24 查找缺失变量 ···············································································102 程序4-25 DATA 步累积语句求均值 ·································································103 程序4-26 RETAIN 和赋值语句求均值 ······························································104 程序4-27 分组累加 ·····················································································105 程序4-28 BY 语句产生的FIRST.VAR 和LAST.VAR ···········································106 程序4-29 PROC SQL 实现纵向操作 ································································106 程序4-30 PROC MEANS 实现纵向操作 ···························································107 程序4-31 LAG 与DIF 函数隔行取物 ·······························································107 程序5-1 MODIFY 语句修改所有观测 ······························································110 程序5-2 MODIFY 语句修改匹配观测 ······························································110 程序5-3 MODIFY 语句修改匹配观测补全默认省略的语句 ···································111 程序5-4 MODIFY 语句OUTPUT、REMOVE 以及REPLACE 用法 ························111 程序5-5 UPDATE 语句更新数据集 ·································································113 程序5-6 PROC TRANSPOSE 与ARRAY 实现行转列 ···········································115 程序5-7 PROC TRANSPOSE 与ARRAY 实现列转行 ···········································116 程序5-8 一对一读入 ····················································································118 程序5-9 一对一并接 ····················································································119 程序5-10 MERGE+BY 语句匹配并接数据集 ·····················································120 程序5-11 MERGE 语句实现各种连接类型························································121 程序5-12 PROC SQL 实现各种连接类型 ··························································122 程序5-13 SET 语句串接数据集······································································124 程序5-14 PROC APPEND 串接数据集 ·····························································125 程序5-15 PROC SQL 串接数据集 ···································································126 程序5-16 PROC SQL 左连接 ·········································································127 程序5-17 PROC SQL 右连接 ·········································································128 程序5-18 PROC SQL 内连接 ·········································································129 程序5-19 PROC SQL 全连接 ·········································································130 程序5-20 PROC SQL EXCEPT 运算 ································································131 程序5-21 PROC SQL UNION 运算 ··································································132 程序5-22 PROC SQL INTERSECT 运算 ···························································133 程序5-23 PROC SQL OUTER UNION 运算 ·······················································134 程序5-24 制作变量字典表 ············································································135 程序5-25 PROC DATASETS 获取描述信息 ·······················································136 程序5-26 PROC DATASETS 修改数据集标签及权限 ···········································136 程序5-27 PROC DATASETS 选择、拷贝、改名、删除数据集 ·······························137 程序5-28 PROC DATASETS 保留、删除数据集 ·················································137 程序5-29 PROC DATASETS 获取变量信息 ·······················································138 程序5-30 PROC DATASETS 修改变量信息 ·······················································138 程序5-31 PROC SQL 查看数据字典表的结构 ····················································141 程序5-32 获取SASHELP 库下所有数据集信息 ··················································142 程序5-33 获取数据集CARS 的所有变量信息 ····················································143 程序6-1 SAS 函数实例 ················································································146 程序6-2 SAS 例程示例 ················································································146 程序6-3 暴力求均值VS. 函数求均值 ······························································147 程序6-4 暴力法排序VS. 例程排序 ·································································147 程序6-5 函数语法举例 ················································································148 程序6-6 CALL 例程语法举例·······································································149 程序6-7 PROC FCMP 自定义函数 ··································································153 程序7-1 输入输出格式 ················································································156 程序7-2 SAS 内置格式清单 ··········································································158 程序7-3 PROC FORMAT 自定义格式 ·····························································160 程序7-4 PROC FORMAT 的CNTLOUT 选项导出格式定义数据 ····························160 程序7-5 用数据集定义格式 ··········································································161 程序7-6 格式的使用范围 ·············································································163 程序7-7 自定义格式用于变量重分类 ······························································163 程序7-8 统计过程中加载自定义格式 ······························································164 程序7-9 利用自定义格式统计缺失、非缺失观测数············································165 程序7-10 自定义格式用于条件显示 ·······························································166 程序8-1 SAS/GRAPH 模块画图举例 ·······························································171 程序8-2 生存分析自动产生K-M 曲线 ·····························································172 程序8-3 ODS GRAPHICS 语句设置 ································································174 程序8-4 SAS ODS GRAPHICS 编辑器 ····························································175 程序8-5 PROC SGPLOT 绘图 ·······································································177 程序8-6 用GTL 绘制散点和回归线图 ·····························································178 程序8-7 PROC SGPLOT 的HBAR 与VBAR 语句绘制条图 ··································184 程序8-8 PROC FREQ 过程调用ODS GRAPHICS 系统绘制频数图 ·························185 程序8-9 PROC SGPLOT 的VBAR 语句绘制带误差限单式条图 ····························186 程序8-10 PROC SGPLOT 的HBAR 语句绘制百分比条图 ····································187 程序8-11 PROC SGPLOT 的VBAR 语句绘制复式条图 ·······································188 程序8-12 PROC SGPLOT 的VBAR 语句绘制簇拥式复式误差限条图 ·····················189 程序8-13 PROC SGPLOT 的VBAR 语句绘制复式百分比条图 ······························189 程序8-14 PROC FREQ 过程PLOTS 选项绘制马赛克图 ·······································191 程序8-15 PROC SGPLOT 绘制镜面复式条图 ····················································192 程序8-16 PROC SGPANEL 绘制面板条图 ························································193 程序8-17 简单直方图的SGPLOT 和UNIVARIATE 过程·····································195 程序8-18 PROG SGPLOT 绘制重叠直方图 ·······················································196 程序8-19 多个HISTOGRAM 语句绘制重叠直方图 ············································197 程序8-20 GTL 绘制镜面直方图·····································································197 程序8-21 PROC SGPANEL 绘制面板直方图·····················································199 程序8-22 PROC SGPLOT 绘制箱线图 ·····························································200 程序8-23 PROC SGPLOT 绘制分组箱线图 ·······················································201 程序8-24 PROC SGPANEL 绘制面板箱线图·····················································202 程序8-25 PROC SGPLOT 绘制散点图 ·····························································203 程序8-26 PROC SGPLOT 绘制散点回归图 ·······················································204 程序8-27 PROC SGPLOT 绘制分组散点图 ·······················································204 程序8-28 PROC SGPANEL 绘制面板散点图·····················································205 程序8-29 PROC SGPLOT 绘制泡泡图 ·····························································206 程序8-30 PROC SGSCATTER 绘制矩阵散点图 ·················································207 程序8-31 PROC SGPLOT 的SERIES 语句绘制简单折线图 ·································209 程序8-32 PROC SGPLOT 的VLINE 语句绘制简单折线图 ···································209 程序8-33 PROC SGPLOT 绘制误差限折线图 ····················································210 程序8-34 PROC SGPLOT 绘制分组折线图 ·······················································211 程序8-35 PROC SGPANEL 绘制面板折线图·····················································212 程序8-36 PROC SGPLOT 绘制面积图 ·····························································213 程序8-37 PROC SGPLOT 绘制带状图 ·····························································214 程序8-38 拟合密度曲线 ··············································································215 程序8-39 拟合回归线 ·················································································215 程序8-40 PROC SGPLOT 绘制椭圆曲线 ··························································216 程序8-41 PROC LOGISTIC 绘制ROC 曲线 ······················································217 程序8-42 PROC LIFETEST 绘制生存曲线与累积事件曲线 ···································218 程序8-43 PROC SGPLOT 和PROC LOESS 绘制LOESS 曲线 ·······························219 程序8-44 PROC SGPLOT 绘制样条函数曲线 ····················································221 程序8-45 PROC SGPLOT 绘制简单森林图 ·······················································222 程序8-46 SAS 9.40M3 SGPLOT 实现亚组分析森林图核心程序 ·····························224 程序8-47 PROC GMAP 绘制美国地图 ····························································226 程序8-48 PROC GMAP 绘制统计地图 ····························································227 程序8-49 PROC GMAP 绘制热力地图 ····························································228 程序8-50 ODS HTML 默认输出样式 ······························································231 程序8-51 更换输出样式 ··············································································232 程序8-52 SG 过程语句选项修改图形外观 ························································232 程序8-53 GTL 语句选项修改图形外观 ····························································234 程序8-54 程序查看样式源文件 ·····································································236 程序8-55 PROC TEMPLATE 自定义样式 ·························································236 程序8-56 自定义样式文件模仿R 包ggplot2 网格 ··············································237 程序8-57 ODS 语句设置图片属性 ··································································238 程序8-58 学术期刊图片格式设置 ··································································239 程序8-59 一个语句搞定学术期刊图片格式设置················································239 程序9-1 统计汇总过程绘制统计表格 ······························································249 程序9-2 PROC TABULATE 绘制统计表格 ·······················································250 程序9-3 PROC REPORT 绘制统计表格 ···························································251 程序9-4 SAS 统计制表完整实例 ····································································253 程序9-5 %ggBaseline2 制表展示1 ··································································256 程序9-6 %ggBaseline2 制表展示2 ··································································257 程序10-1 快速获取系统信息 ········································································259 程序10-2 宏程序重复执行SAS 程序 ······························································260 程序10-3 宏程序条件执行SAS 程序 ······························································260 程序10-4 定义宏变量 ·················································································262 程序10-5 PROC SQL 创建宏变量 ··································································263 程序10-6 CALL SYMPUT 与SYMPUTX 创建宏变量 ·········································264 程序10-7 查看宏符号表以及判断宏作用域 ······················································265 程序10-8 宏变量掩蔽演示 ···········································································268 程序10-9 系统选项SYMBOLGEN 显示宏变量值 ··············································269 程序10-10 %PUT 语句显示宏变量值 ······························································270 程序10-11 %PUT 语句分类显示宏变量值 ························································270 程序10-12 直接引用宏变量 ··········································································271 程序10-13 间接引用宏变量 ··········································································271 程序10-14 宏变量与文本的分隔 ····································································271 程序10-15 宏程序定义与调用示例 ·································································272 程序10-16 存储与加密宏程序 ·······································································274 程序10-17 宏函数应用示例 ··········································································276 程序10-18 S1:硬代码初步实现宏任务 ···························································278 程序10-19 S2:硬代码初步宏变量化 ······························································279 程序10-20 S2:硬代码初步宏参数化 ······························································280 程序10-21 S3:宏代码测试优化 ····································································280 SAS编程演义 语法卡片目录 语法2-1 定义数组语句ARRAY语法参考卡片 ····················································44 语法2-2 PROC SQL SELECT语句语法参考卡片 ················································47 语法3-1 LIBNAME语句访问DBMS数据文件语法参考卡片 ·································67 语法3-2 PROC IMPORT语法参考卡片 ·····························································69 语法3-3 LIBNAME语句访问PC文件语法参考卡片 ············································70 语法3-4 INPUT语句一般语法参考卡片 ····························································75 语法3-5 INPUT语句列表读入式语法参考卡片 ···················································75 语法3-6 INPUT语句列读入式语法参考卡片 ······················································77 语法3-7 INPUT语句格式化读入式语法参考卡片 ················································77 语法3-8 INPUT语句命名读入式语法参考卡片 ···················································78 语法3-9 PROC EXPORT语法参考卡片 ·····························································80 语法4-1 DATA语句语法参考卡片 ···································································84 语法4-2 INPUT与PUT 函数语法参考卡片 ························································99 语法4-3 累加语句语法参考卡片 ·····································································104 语法4-4 RETAIN语句语法参考卡片 ·······························································104 语法5-1 MODIFY语句语法参考卡片 ······························································112 语法5-2 UPDATE语句语法参考卡片 ······························································114 语法5-3 PROC TRANSPOSE语法参考卡片 ······················································117 语法5-4 PROC APPEND语法参考卡片 ····························································125 语法5-5 PROC SQL表连接核心语法参考卡片 ···················································130 语法6-1 函数语法参考卡片 ···········································································148 语法6-2 CALL 例程语法参考卡片··································································149 语法7-1 格式名称语法参考卡片 ·····································································157 语法7-2 PROC FORMAT 语法参考卡片···························································161 语法8-1 PROC SGPLOT 语法参考卡片 ····························································229 语法10-1 宏程序定义与调用语法参考卡片·······················································273 1 第 章 人生若只如初见:初识SAS 清代词人纳兰性德曾有词曰“人生若只如初见,何事秋风悲画扇”。从初识SAS到 如今每天的工作都和SAS纠缠不清,按理来说我和SAS应该早已过了“人生若只如初见” 的美好阶段,但是每次当我有疑惑再去查阅SAS Help文档、琢磨SAS时却常有“日暮北 风吹雨去,数峰清瘦出云来”的惊喜。我不知道以后我会不会将SAS当成秋扇束之高阁, 但目前我还有些激情和热血,那就趁热记录、分享一下这一路走来和SAS的点点滴滴吧。 1.1 往事并不如烟 关于SAS,这里面有很多有意思的往事。从简简单单的名字发音,到颇为有趣的公司 历史,再到刀光剑影,快意人生的网络江湖,每一个话题都值得煮一杯清酒开怀畅谈。 . 1.1.1 逗你玩的发音 我第一次听说SAS(Statistical Analysis System)是在本科的统计软件包课上,当时我 以为老师说的是严重急性呼吸综合征(Severe Acute Respiratory Syndrome,SARS),因为 老师的发音大概就是“萨死”,这不禁让我想起2003年刚经历的那场不堪回首的全民浩 劫“非典”。 后来,我才留意到原来我们的英语发音是多么的糟糕,或者说是多么的随意。SAS的 正式发音大概是“赛死”,所以SAS公司在中国注册的中文企业名用的是 “赛仕”,而 SARS的正确发音是“萨而死”,因为中间有卷舌音。更让人啼笑皆非的是SPSS的读法, 很多统计老师即便是在大庭广众之下也毫无羞涩地脱口而出“死怕死”,其实由于SPSS 没有元音字母,正确的发音应该是“爱死辟,爱死爱死”。 . 1.1.2 有点趣的历史 关于SAS(Statistical Analysis System )这一词可以有多种层次的解读。SAS 可以是业 界最负盛名的一个统计分析系统,也可以是一门德高望重的统计编程语言,还可以是一个 颇具传奇色彩的商业分析软件与服务供应商。 图1-1 SAS 公司信息 (https://en.wikipedia.org/wiki/SAS_Institute) SAS 作为一个统计分析系统和一门统计编程语言,要远早于其作为一家商业公司(见 图1-1)。1966 年,美国农业部(United States Department of Agriculture , USDA )把海量 农业数据的计算机化和统计分析需求委托给大学统计师南方实验站(University Statisticians Southern Experiment Stations ),希望开发出一种具有综合用途的统计软件包,以便分析他 们获取的所有农业数据。这个试验站联盟了以北卡罗来纳州立大学(North Carolina State University,NCSU)为主的八家政府资助大学,他们从美国农业部获得科研经费,并从美 国国立卫生研究院(National Institutes of Health,NIH)获得了一笔捐赠,最终的研究成 果即统计分析系统(Statistical Analysis System,SAS)。北卡罗来纳州立大学的教职员工 Jim Goodnight 与Jim Barr(Jim Barr 又名Anthony James Barr,Tony Barr )为项目负责人, Jim Barr 创造了整个项目框架,而Jim Goodnight 则负责实施框架之上的各种特性,并拓 展了系统的功能。1972 年NIH 终止了资助之后,试验站联盟的成员们同意共同出资,每 个成员每年出资5000 美元,NCSU 也由此得以继续开发并维持系统运作,从而支持其统 计分析需求。此后,NCSU 的统计系雇员Jane Helwig 、研究生与程序员John Sall 也加入 了该项目。1976 年,他们离开NCSU,在大学的对面希尔博拉大街2806 号的一幢办公楼 里组建了私人公司SAS 研究所(SAS Institute Inc. )。SAS 公司成立早期,Jim Barr 、Jim Goodnight 和John Sall 三人负责敲代码,Jane Helwig 则负责SAS 文档的规划和书写(见 图1-2)。目前,Jim Goodnight 仍然是公司的CEO,John Sall 已经是公司的二把手,他还 一手缔造了SAS 软件的兄弟产品JMP。Jim Barr 后来单飞,又创立了Barr Systems 公司, 关于Jane Helwig,虽然颜值高,但网上信息寥寥。 图1-2 SAS 公司的几位创始人 (http://saslist.com/wiki/index.php?pic-view-31-60.html) SAS 公司成立当年,他们做了两件大事:一是发布了第一个商用版本SAS 软件; 二是举办了第一届SAS 用户国际组会(SAS Users Group International, SUGI )。这两件 事无论是对SAS 公司还是对SAS 用户来说都意义深远。SAS 软件发布一年后,便入榜 Datamation 杂志举办的DataPro 软件光荣榜,此后三年仍位列榜上。SAS 软件系统自发 布到现在,经历了很多变革。早期版本的SAS 运行于大型机上,1985 年SAS 公司发布 了运行于PC DOS 版本的SAS 5,1988 年发布了用C 语言全部重写的SAS 6 ,并开始支持 Windows 操作系统,2000 年SAS 8 开始支持Linux 操作系统,目前(2017 年)SAS 软件最新 版本是9.4,包含了支持高性能统计建模、分布式内存计算、可视化统计分析等诸多适应 大数据时代的新特性。更多关于SAS 软件的历史,推荐SAS 官方的2 分24 秒宣传视频: SAS Timeline: A History of the Analytics Leader 。 SUGI 自第一届成功举办以来,每年参加的人数都迅猛上升,成为全球SAS 用户分享 交流的盛宴。2007 年SUGI 更名为SAS 全球论坛(SAS Global Forum, SGF )后,吸引了 全球更多行业的SAS 用户参与分享交流。现如今,SAS 公司仍然是全球最大的商业分析 软件与服务供应商,据说Jim Goodnight 为了保持公司的独立发展战略,一直拒绝上市, 在传统的统计软件公司要么消失、要么被合并的洪流中,SAS 公司竟然保持了一枝独秀的 状态。目前SAS 公司全球雇员超过1 万多名,客户遍及全球149 个国家,应用领域涉及 银行、政府、服务、保险以及生命科学等各行各业(见图1-3)。SAS 公司凭借其卓越的 表现将诸多殊荣收入囊中,如“在职母亲最适宜公司”“全球最佳雇主”“最受欢迎的 百强企业”……这与其创始人Jim Goodnight 的人才理念不无关系:If you treat employees as if they make a difference to the company, they will make a difference 。 2000 年SAS 公司启用新的Logo 和标语:THE POWER TO KNOW.。通过数据,探 知世界,这是数据分析的终极目的,提供探知的力量,这是SAS 所努力的方向。关于 这个宣传语,同样推荐SAS 公司官方宣传视频:Know all the possibilities with SAS. High- Performance Analytics。 图1-3 SAS 公司的行业分布及收益情况 (http://www.sas.com/content/dam/SAS/en_us/doc/other1/company-overview-annual-report.pdf) 第 1 章 人生若只如初见 :初识SAS.1.1.3 逝不去的江湖 介绍SAS 的历史,毕竟不是笔者分内的事,聊聊网络江湖中SASor(SAS 爱好者) 的快意恩仇,不失为乐事一件。 在微博、微信还没有崛起的年代,网络论坛(Bulletin Board System,BBS)一统天下。 在论坛里注册个名号,就像武林人士有了个绰号:比如行者武松、浪子燕青、花和尚鲁智深、 一丈青扈三娘什么的,就可以在网上行走江湖了。SAS 武林,最早可能存在于imoen 创 立的SASOR 论坛(www.sasor.com )里,里面的风云人物如SAS_Deam、data _null_ 等。关 于SAS_Deam,网上的痕迹很少,目前可以找到的只有其两篇文章——《关于SAS 的零碎印 象》和《SAS 语言管窥》;data _null_ 在UGA 大学邮件列表SAS-L 还有活动记录。Shiyiming 建立的SAS 中文论坛(http://www.mysas.net/ )和sxlion 倒腾的SAS 资源资讯列表(http://saslist. net/)也是承载了很多SASor 记忆的地方。人大经济论坛的SAS 专版(http://bbs.pinggu.org/ forum-68-1.html)可能是目前少有的还在和微博、微信抢流量,做垂死挣扎的网络论坛。 网络论坛里有一大批熟悉的ID,通常为了解决一个小问题,各路大神前赴后继贴代码, 只为一比高下,就像武林人士的擂台赛,好生热闹。在微博、微信一统天下后,转发和点 赞成为常态,拼代码、讨论帖子已然成为过去。这正如sxlion 所写的:“可惜美好的时光 不长久,春去秋来,草长莺飞。论坛ID 后面一个个现实生活中的SASor,或结婚生子, 或迁徙他乡,或跳槽转行,人生变幻,几度春秋,论坛里新人经常有,故人不常在。美好 时光,竟成稀缺的回忆。”好在论坛里沉淀的帖子记录下了岁月的痕迹,论坛虽然逐渐消 逝,但微信公众平台或者其他网络社群会随即出现,SASor 也会不断更替。常言道,有人 的地方就有江湖,人就是江湖,SASor 还在,SAS 的江湖如何逝去? 1.2 选择一厢情愿 如果我们只是想“放一枪”就走人,断定以后几乎再也不会“拿枪”了,那么可能用 SPSS 会更合适我们;如果我们想做一个专业“抢手”,那就应该选择更专业一点儿的武器。 古语有言曰:“工欲善其事,必先利其器”,诚然,一个合适的统计分析工具可以 让我们的统计分析工作事半功倍,而一个蹩脚的统计分析工具则有可能浪费我们大量宝贵 的时间和资源。目前的统计分析软件,主要分两大类:一类基于图形用户界面(Graphical User Interface,GUI)(如SPSS);另一类基于命令行界面(Command Line Interface, CLI)(如SAS、R 及Stata)。GUI 和CLI 两种形式各有优劣,GUI 通过点击菜单完成数 SAS编程演义 据处理和统计分析,对于非统计人员来说,操作简单容易,但其可重复性差,也不便留痕 和记录,此外,菜单式的界面能容纳的统计过程和选项有限,无法快速跟进学科的发展; CLI 则通过命令行或者编程语言完成数据处理和统计分析工作,作业过程灵活,对于自动 化和重复性作业有明显优势,适合统计专业人员,更重要是非常契合现在越来流行的“可 重复性研究”(Reproducible Research)的理念。 SAS 软件作为老牌的统计软件,能够称霸统计界,且至今仍然独立运营,实属罕见。 在大数据时代,SAS 软件也在与时俱进,开发了很多适应大数据处理的功能和产品,如 SAS 网格计算(SAS. Grid Computing)、库内计算(SAS. In-Database)和内存计算(SAS. In-Memory Analytics),等等。虽然SAS 的安装文件庞大,安装过程也较为费劲,但是 这些一劳永逸的付出会让我们在后期觉得这是值得的,至于我们所担心的昂贵的费用问 题,那就交给财大气粗的雇主吧。如果没有有钱的雇主,那就用大学版(SAS University Edition)吧,如果连大学版也懒得安装,还可以尝试免费的云端统计分析平台SODA(SAS. On Demand for Academics )。反正作为程序员和统计师,不必为软件费用埋单的问题担忧。 此外,如果希望进入生物医药领域,特别是临床试验领域,那必需赶紧倒腾SAS,越早越好。 当然,如果你已经习惯了其他CLI 的统计软件,笔者也不是非要苦口婆心的来劝你改 用SAS,这不是本书的目的。但是,若果你要用或者正在用SAS,那本书所讨论的一些内容, 可能正是你不愿错过的。 1.3 软件架构 通常,大众口中所言的SAS 软件其实是指SAS Foundation+Windows 的视窗管理系统 (Display Management System, DMS)。SAS Foundation 包括Base SAS、数据管理和访问、 数据分析、报告和图表、可视化和发现、商业解决方案、用户界面、应用扩展以及Web 应用等组成部分,其中Base SAS 是核心。SAS 软件的设计思路是在Base SAS 的基础上, 再配合特定的模块完成特定的任务需求。例如,要做统计分析,那就配合使用SAS/STAT 模块;要绘制统计图形,那就配合使用SAS/GRAPH 模块;要导入各种外部数据,那就 配合使用SAS/ACCESS 模块;要做时间序列分析,那就配合使用SAS/ETS 模块;要做 基因分析;那就搬出SAS/GENETIC 模块。最基础的SAS 软件,只需要Base SAS 、SAS/ STAT 、SAS/GRPAH 模块。随着数据分析需求的增加,SAS 公司不断推出各种新的模块、 新的过程以及新的选项,据不完全统计,目前9.4 版本的SAS 功能模块总计已达75 个。 常见的SAS 模块及其功能见表1-1。 表1-1 常见SAS模块及其功能简介 模块名称 功能简介 主要过程 Base SAS SAS系统的核心模块,是运行SAS必须的 模块。由DATA步、PROC步、MACRD、 ODS以及 SAS的窗口环境组成 基 础 的 统 计 过 程 FREQ、MEAN、CORR 及UNIVARIATE;ODS 绘图过程SGPLOT、 SGPANEL、SGRENDER SAS/ACCESS 与第三方数据源(各种关系型数据库)进 行交互的模块;不同的数据源,需要单独 的软件使用许可,如与Excel数据交互需 要ACCESS TO PC FILES组件的许可 导入导出过程IMPORT、EXPORT以及 ACCESS和DBLODAD过程,LIBNAME 语句 SAS/STAT 统计分析模块,总计已有75 个统计分析过 程,还在不断更新补充中。SAS的“权威” 性就在于这些统计分析过程 经典统计分析过程t检验(TTEST)、 方差分析(ANOVA)、回归(REG)、 一般线性回归(GLM)、广义线性回归 (GENMOD)、混合效应模型(MIXED)、 聚类(CLUSTER、VARCLUSTER、 FASTCLUS)、判别(DISCRIM)、因子 (FACTOR)、主成分(PRICIPLE)、 Logistic回归(LOGISTIC)、生存分析 (LIFETEST、LIFEREG、PHREG) SAS/GRAPH 绘图模块,绘制常见统计图形 GCHART、GPLOT、GBARLINE、G3D等 SAS/ETS 时间序列分析模块 ARIMA、AUTOREG、COUNTREG、 QLIM、ESM、UCM、MODEL等 SAS/IML 矩阵语言模块 IML SAS/GENETIC 基因分析模块 ALLELE、CASECONTROL、FAMILY、 PSMOOTH等 SAS/OR 运筹与优化模块 LP、NLP、OPTLP、OPTMILP、 OPTNET、OPTMODEL等 在大数据时代的浪潮下,SAS不断扩展其功能组件和产品,SAS. Visual Analytics, SAS. Visual Statistics, SAS. Cloud Analytics,SAS. Viya等一大批迎接大数据时代数据分析 处理需求的产品也逐渐进入程序员、数据挖掘人员、统计分析师以及数据科学家的视野。 不过,对于常规的程序员和统计分析师而言,Base SAS +SAS/ACCESS + SAS/STAT+SAS/ GRAPH 已基本能满足需求。 1.4 安装与许可 SAS可以安装在本地,也可以部署在服务器上,无论哪种方式,要想使用SAS软件 及其模块,需要:①安装模块:比如想要导入Excel文件,就必须安装SAS/ACCESS to PC files模块;②获得许可:光是安装了模块还不行,还必须有使用此模块的权限。关于 模块的使用权限,可以通过SAS安装数据文件(SAS Installation Data, SID)获得,SID就 相当于其他软件的注册码。 SAS公司在销售策略上是,将基础的功能模块打包(SAS/BASE、SAS/ACCESS、 SAS/STAT、SAS/GRAPH),并提供相应模块的SAS安装数据文件,用户以租赁的形式 获得软件安装包和许可文件,这就是坊间流传的“SAS只租不卖”。如果需要更多的功能 模块,则需要在订单中增加相应的模块才可以获得其安装介质和SID。 SAS公司给安装介质时,一般会邮寄安装光盘。光盘的数量可能会与订购的模块多少 有一定关系,一般在6~7张,每张约4GB大小。安装时,建议先用UltralISO软件把光 盘里的内容直接抓取出来打包成ISO文件(见图1-4):一则可防止光盘损坏;二则也方 便安装,因为用虚拟光驱安装,免去了在光驱里来回更换安装光盘的麻烦。 图1-4 SAS安装光盘文件 用虚拟光驱软件Daemon Tools Lite打开第一张光盘,在SAS安装文件夹install_doc 文件夹下,可以看到一个以数字和字母组合而成的6位字符命名的文件夹,此即订单号文 件夹,其中的SOI.HTML文件即SAS订单信息(SAS Order Information)文件,里面包含 了购买的SAS产品和模块的摘要信息,通过此文件可以了解到用户购买了哪些模块,也 即安装文件包含了哪些模块。例如,某研究中心的SOI信息如图1-5所示,通过此文件可 以看到该中心不仅订购了可以导入Excel文件的SAS/ACCESS Interface to PC Files,还购 买了DB2、Hadoop、MySQL等其他众多数据库接口,在统计分析模块里,除了常规的统 计分析模块SAS/STAT、还购买了矩阵运算模块SAS/IML、时间序列模块SAS/ETS以及 运筹模块SAS/OR等。 图1-5 SAS订单信息文件 如果SAS软件已安装完成,则可以通过编程的方法(PRODUC_STATUS和SETINIT 过程,详见程序 1-1)查看安装了哪些模块(见图1-6),获得了哪些模块的许可(见图1-7)。 如果希望查看更完整的安装报告,可以在网上搜宏程序%sasinstallreporter,运行此程序, 即可在Log文件中看到SAS已经安装的模块、许可的模块及其有效日期(如果现在还对 运行代码感到陌生,可以等读完1.6后回过头再来测试)。 程序1-1 查看SAS安装、许可的模块 *===带*号的行是注释行===; *===查看SAS已安装的模块; proc product_status; run; *===查看SAS已许可的模块; proc setinit; run; *===查看完整安装报告; *===SAS程序文件地址依据存储位置自行修改; %include " D:\03 Writting\01 SAS编程演义\03 Code\fusion_20390_1_ sasinstallreporter4u.sas"; %sasinstallreporter; 图1-6 SAS已安装的模块结果 图1-7 SAS许可模块结果 %sasinstallreporter运行的结果包括已许可的和已安装的产品模块、解决方案、修复文 件、其他应用程序和客户端、SAS 产品的用户版本信息、JAVA环境安装信息、部署信息、 定义和部署的Windows服务信息以及其他信息等。由于此部分结果内容较多,不再具体 展示,读者可自行运行查看。 SAS的SID文件本质上是一个TXT格式的文本文件,用记事本打开就可以查看里面 的内容,如图1-8所示。此文件包含了SAS的版本、对应的操作系统平台等信息,其实最 重要的是里面包含了PROC SETINIT过程,这个过程更新了所购买产品的模块编号、名称 以及产品使用的期限。 产品使用期限 产品模块编号及名称 图1-8 SAS SID文件 1.5 运行模式 SAS 有多种运行模式——窗口环境模式、非交互式模式、批处理模式及交互式行模式, 各模式简要介绍如下。 ● 窗口环境模式:是在SAS的视窗管理系统(Display Management System, DMS) 下,用户编写SAS程序、提交运行SAS程序、查看日志及结果的模式,这是 Windows平台优势模式,也是广大用户最为常用和熟知的模式。 ● 非交互式模式:主要用于在不启动DMS的情境下,直接运行保存在SAS软件外部 文件中的SAS程序,并将结果和日志保存在指定的位置。 ● 批处理模式:可以对SAS 作业进行预定执行,如定期自动运行某程序,在商业智 能解决方案中这种模式较为常用。 ● 交互式行模式:是UNIX操作系统使用的一种顺序地输入程序语句的运行模式,是 一种使用较少的模式。 1.6 编程界面 本书将以最为常用的、SAS自带的窗口环境模式为例进行展示。在窗口环境模式 下,编程环境和界面其实也有多种选择:①视窗管理系统(Display Management System, DMS);②SAS企业版(Enterprise Guide,EG);③SAS工作室(SAS Studio)。如果 安装完全,在Windows「开始」菜单下,可以看到三种界面的启动链接,如图1-9所示。 . 1.6.1 DMS界面 DMS(Display Management System, 视窗管理系统)是SAS初学者最为常见的编 程界面,如图1-10所示。DMS的五大部分为Eidtor(编辑器窗格)、Log(日志窗格)、 Ouput(输出窗格)、Results(结果窗格)以及Explorer(资源管理器),可以通过底部 的选项卡切换。 DMSEGSTUDIO 图1-9 窗口环境下DMS、EG以及SAS Studio的启动链接 图1-10 SAS DMS编程界面 . 1.6.2 EG界面 SAS EG (Enterprise Guide)是基于客户端/服务器(Client/Server)架构的客户端, 可从从项目工程角度管理相关资源,其界面主要由项目树、工作区、资源窗格组成,如 图1-11所示。与DMS中类似的Code(程序)、Log (日志)、Results(结果)均在工作 区内。 图1-11 SAS EG编程界面 . 1.6.3 SAS Studio 界面 SAS Studio基于浏览器/服务器(Browser/Server,B/S)架构的用浏览器来实现 与SAS本机服务器的交互。SAS Studio由左侧的导航面板和右侧的工作区组成,如图 1-12所示。工作区同SAS EG类似,有与DMS中类似的Code(程序)、Log (日志)、 Results(结果)。 SAS DMS、SAS EG 和SAS Studio三种编程环境,究竟有何区别?如何选择呢?其实 三者在编程语言上并没有什么区别,不过后两者在编程界面、功能上有很多的改进。三者 间更多的区别,可见表1-2的总结。 图1-12 SAS Studio编程界面 表1-2 SAS三大编程环境简要比较 维 度 SAS DMS SAS EG SAS Studio 界面组成 五大窗格:编辑器、日志、输出、 结果、资源管理器 三大窗格:工作区、项目树 以及资源窗格 两大区域:工作区和导航面板 软件架构 图形用户界面(GUI),集数据 存取、代码执行和结果交付为一 体的编程环境客户端 客户端/服务器(C/S)架构, 需要SAS 服务器(可以在 本地)存取数据、执行代码 浏览器/服务器(B/S)架构 平台支持 Windows、UNIX、Linux、Z/OS Windows Windows、UNIX、Linux、 MacOS 语法提示 除了语法着色,作为一款编程编 辑器,在很多方面确实差强人意 语法着色、自动补全、语法 提示、格式化代码、警告及 错误定位 语法着色、自动补全、语法 提示、警告及错误定位 优势 支持% WINDOW、DDE、X命 令以及DM命令等;反应速度比 C/S、B/S架构快;自定义宏键盘 存储过程、工作流、代码生 成器;图形化的菜单操作比 DMS更丰富,也更方便 工作区与EG类似;导航面 板功能丰富;任务模板,代 码片段比较有特色 不足 语法编辑器功能太单一,基本只 有语法着色一项功能 反应速度慢,部分DMS支 持的功能如% WINDOW、 DDE、X命令、DM命令等 它不支持 反应速度较慢,部分DMS支 持的功能如% WINDOW、 DDE、X命令、DM命令等 它不支持 总体而言,SAS DMS最为传统,速度最快。SAS EG和SAS Studio具有良好的语法提示、 自动补全等功能,可以在学习SAS代码,提升编程效率方面给初学者更多帮助。如果是 初学者,建议不妨多在SAS Studio里尝试编程,如果追求测试效率,建议在DMS里开发, 当然,至于最终的选择,可以依据个人喜好和具体业务而定。 1.7 版本 SAS在启动时会在日志窗格中显示软件版本号以及相应模块的版本号。在启动后, 我们也可以通过宏变量&SYSVER或者&SYSVLONG获得其版本号,如图1-13所示。 程序1-2 获取SAS版本号 %put SAS 版本号:&SYSVER; %put SAS 版本号(长):&SYSVLONG; 图1-13 SAS版本号 一直以来,SAS的版本更新比较谨慎,甚至可以说是缓慢。胡江堂和Rick Wicklin曾 经在博客上统计过SAS 8.0到SAS 9.4m4的发布日期,并制成了图片,如图1-14所示, 近年来SAS虽然没有大版本的更新,但是小版本更新的速度却在不断加快。 9.4m49.4m39.4m29.3m29.2m29.4m19.49.39.39.1.39.19.08.28.08.119982000200220042006200820102012201420162018年份 版本 9.29.229.312.112.313.113.214.114.2 古老旧版最近 SAS软件和分析产品的主要版本及发布日期 图1-14 SAS及其分析产品的主要版本发布日期 (http://blogs.sas.com/content/iml/2013/08/02/how-old-is-your-version-of-sas-release-dates-for-sas-software.html) . 1.7.1 购买版与大学版 除了上面介绍的版本区别,SAS还有购买版与大学版的区别(不知道官方具体的称谓, 姑且这样描述),以及启动时加载各种语言配置版本的区别。 SAS购买版按模块收取年费,而SAS的大学版(SAS University Edition)是免费供大 家下载使用的。SAS大学版本质上是用虚拟机打包的Redhat系统里的SAS,采用B/S架 构的SAS Studio链接,包含了BASE、STAT、ACCESS、IML以及HPS模块,但是遗憾 的是,没有GRAPH模块,不过如果熟悉ODS GRAPH的话,基本可以不用GRAPH模块 画图 ,具体可参见本书第8章的介绍。 . 1.7.2 免费云端版 如果既没有购买SAS,也不愿意下载大学版,甚至连安装都嫌麻烦,我们还有什么办 法可以用SAS吗?确实有,那就是SODA——一个免费云端版的SAS,只要有网络,我 们就可以随时随地用SAS写代码。 SODA(SAS. OnDemand for Academics)是 SAS 为学术界人士免费提供的在线的、 基于SAS私有云上的应用服务环境。利用SODA,我们可以随时随地地在SAS Studio 中编写运行SAS代码,而且所有数据和代码都可以存储在云端,所有结果均可以下载保 存,每个账号用户有5120MB的存储空间。SODA可以说是懒人学习SAS最方便、快 捷的低成本途径了。如果你手头还没有SAS,而有了SODA,照样可以一起愉快地学 习SAS。 要使用SODA,首先需要到(https://odamid.oda.sas.com)进行注册(如果以前 有SAS社区的账号则可直接登录)。注册流程非常简洁,只需姓、名、邮箱、国籍几 项信息即可。注册成功后,稍等片刻会收到一封名为You are ready to start using SAS OnDemand for Academics的邮件,里面有登录SODA的用户名(通常是邮箱的前缀)。 登录后,单击SAS Stduio应用(见图1-15),即可进入SAS Studio编程环境,开启免 费云端之旅。 图1-15 SODA登录后界面 此SAS Studio的界面(见图1-16)同本机SAS Studio的界面结构(上端的菜单栏、 左侧的导航面板、右侧的工作区)几乎一致。不过,其内核可能是不同的,单击右上角问号, 查看关于SAS Studio的信息,可见此SAS Studio的后台是Linux系统下的SAS 9.04.01M3 版本,这已经是目前最新版的SAS了。 内容窗口 导航面板 图1-16 SODA SAS Studio界面 我们在程序标签页下运行「Proc setinit; run;」,看看许可了哪些模块。测试结果发现 除了常规的BASE SAS、SAS/STAT、SAS/GRAPH、SAS Enterprise Guide外,SAS/IML、 SAS/ETS、SAS/OR、SAS/QC、SAS/CONNECT等模块也都赫然在列,甚至连数据挖掘 和文本挖掘的产品SAS Enterprise Miner、SAS Text Miner以及可视化分析产品SAS Visual Analytics Hub、Visual Analytics Explorer、SAS Visual Analytics Services 都囊括其中,不得 不说,SAS公司此举诚意满满。 如图1-17所示,通过右击左侧的「文件(主目录)」,我们还可以上传自己的数据 文件到云端,然后在右侧工作区写代码、运行代码、获取分析结果。数据和代码可以保存 在云端,下次登录后仍可利用,而分析结果和中间数据则可以下载到本地,具体可参考 SAS公司大数据与可视化分析产品线负责人巫银良先生的文章:《从程序员到数据科学家: SAS 编程基础 (04)》,本节不再赘述。 依次为:下载结果为HTML、 PDF、RTF、下载数据、打印结果 上传欲分析的数据 图1-17 SODA SAS Studio的便捷 . 1.7.3 各操作系统平台版 SAS目前支持的操作系统平台包括z/OS、UNIX、Linux以及Windows,各操作系统 版本与其兼容的SAS版本具体可在SAS官网(http://support.sas.com/supportos/list)页面 System Requirements下的Supported Operating Systems里查阅。 苹果电脑MacOS系统目前没有相应的SAS版本,如果想在苹果系统中使用SAS, 有三种策略可供参考:①虚拟机软件+Windows+SAS;②虚拟机软件+SAS University Edition;③免费在线云端版本SODA。或者干脆选用SAS兄弟产品JMP软件。 . 1.7.4 各语言版 如果在安装过程中,选择了中文语言包,配置了Unicode Support的话,在开始菜单 里我们就可以有多种语言版本的SAS可供选择:①英文版;②中文版;③带DBCS的英 文版;④Unicode Support版,如图1-18所示。需要留意的是,如果希望我们的SAS支持 中文字符的话,那么就选择后面三个吧;如果我们希望既能支持中文字符,又想在英文环 境下使用SAS,那就选择带DBCS的英文版,带DBCS的英文版的优势是可以获得英文版 的提示信息,方便后续在SAS Help和搜索工具里检索相关信息,因此,笔者个人推荐此版本。 SAS的各语言版本 图1-18 SAS的各语言版本 1.8 本章小结 本章从闲聊SAS的八卦和历史开始,谈及选择SAS 的理由,并着重对SAS软件进行 了一个概要式的说明。通过本章的介绍,希望我们能够从感性上对SAS软件的架构、安 装技巧、许可文件、运行模式、编程界面以及版本有一个初步的了解。 清歌苦调两不厌:夯实基础 SAS程序是一个庞杂的体系,无论是从原始数据的整理、统计方法的实现,还是统计 结果的呈现来说,了解其背后术语和概念、理解其运行原理与机制,对阅读理解SAS程序, 自己编写SAS 程序,快速实现数据整理和统计分析需求都将大有裨益。因此,本章我们 就来扒一扒那些重要的,尤其是被市面上的SAS书籍所忽略的基本概念和基础知识。 2.1 Foundation SAS 正如上一章所提及的,我们大部分人首先接触的、日常使用的都是Windows平台下 的SAS视窗管理系统DMS。我们通过DMS与Foundation SAS进行交互,从而完成我们 的工作。 . 2.1.1 Foundation SAS的构成 在Windows下可以看到SASHOME安装目录下有SASFoundation文件夹,里面 包含了诸如ACCESS、BASE、GRAPH等诸多组件。正如第一章所提及的,其实整个 Foundation 就是由Base SAS、数据管理与访问、用户界面、报告与绘图、分析、可视化与 发现、商业解决方案、应用开发以及网络应用等组件构成的(见图2-1)。 Foundation SAS 报告与绘图 数据访问与管理 用户界面 分析 Base SAS 应用开发 可视化与发现 商业解决方案 网络应用 图2-1 Foundation SAS的组件 2 第 章 第 2 章 清歌苦调两不厌 :夯实基础 23 概括而言,Foundation SAS提供了以下功能: ● 管理SAS任务的图形用户界面,如DMS、EG、SAS Studio等 ● 高度灵活、可扩展的编程语言,即SAS语言 ● 丰富的内置SAS过程 ● Windows、UNIX以及z/OS(OS/390)的多平台运行 ● 几乎任何数据源的访问,如DB2、Oracle、SYBASE、Teradata、SAP以及微软的Excel ● 几乎所有主流的字符编码 . 2.1.2 Base SAS Base SAS是Foundation SAS的核心,是运行SAS必备模块,由DATA 步、PROC步、 MACRO、DATA步调试器、ODS以及SAS窗口环境组成。 ● DATA步:是由用于操作管理数据的编程语言组成的,SAS 编程优势的集中体现之 一就是DATA步编程。 ● PROC步(SAS Procedures):是数据处理、统计分析与结果呈现的工具。BASE SAS里的SAS 过程有限,欲完成特定的处理和任务,需要加载特定模块,如SAS/ STAT、SAS/ETS、SAS/IML等。 ● 宏(Macro Facility):宏的本质是文本替换,它可扩展和定制SAS 程序,完成重 复、复杂的任务。 ● DATA步调试器:追踪DATA步执行情况,便于查找DATA步的运行错误。 ● 输出传递系统(Output Delivery System, ODS):将SAS输出以易访问的格式输 出,如列表输出(LISTING)、HTML输出、富文本输出(RTF)、PDF输出以及 以数据集形式输出等。 ● SAS窗口环境:开发测试SAS程序的环境,最为广泛的是SAS视窗管理系统(DMS)。 DATA步、PROC步和MACRO是SAS程序的三大核心。通常我们说写SAS程序就是: 在DMS 的高级程序编辑器里编写SAS的DATA步、PROC步以及宏。不过,在写SAS 程序之前,我们先熟悉下SAS处理数据的流程。 2.2 SAS数据分析流程 如图2-2所示,在数据处理流程上,同R软件把所有数据都存在内存里进行计算不同, SAS编程演义 SAS 读入各种来源的数据(可能是外部的原始数据,也可能是SAS 可以直接打开的SAS 数 据文件),将其存储在硬盘的SAS 数据集里,经过进一步的整理、清洗,把数据变成可以 直接套用统计模型的数据集,然后运行统计模型、跑出统计分析结果、把结果进一步整理 成表格、图片甚至是图文混排的报告,还可以把结果抓取出来另存为数据集,便于后续处理。 这个整理数据的过程通常是使用DATA 步来完成的,分析数据的过程通常是使用PROC 步来完成的,此即人们口中常说的SAS 两步编程:DATA 步整理数据、PROC 步分析数据。 原始 数据 SAS 数据集 SAS 数据集 报告PROC步DATA 步 图2-2 SAS 数据分析流程 当然,在实际导入、整理数据的过程中,也不仅仅只限于用DATA 步,巧妙运用 PROC 步往往能事半功倍。同样,在PROC 步做完分析后,也需配合用DATA 步做进一步 的结果整理,以便输出更易读的结果报告。 2.3 逻辑库与数据集 SAS 从导入数据到完成统计分析报告,这中间涉及很多重要的基本概念,如数据集、 DATA 步,PROC 步等。接下来,我们就顺着这个流程,把最基本、最重要的概念捋一遍, 那就从逻辑库和数据集开始吧。 .2.3.1 逻辑库 从上一小节SAS 数据分析流程里我们知道,要进行数据分析,需要我们先把外部的 Windows 数据文件,比如CSV 文件转成SAS 可以直接识别和处理的SAS 数据集,而SAS 数 据集则必须置于SAS 逻辑库(SAS Library)中。数据存储在数据集里,数据集存放在逻辑库中, 数据、数据集和逻辑库的关系就如同数据页、文件夹和抽屉的关系(见图2-3)。 在Windows 环境下,SAS 逻辑库其实是映射到一个(当然,也可以是多个)文件夹 的名字。SAS 会按照某些约定的格式去读写SAS 逻辑库中的SAS 数据集,这些约定的格 式,被称为引擎(Engine),如SAS 9.4 默认的引擎就是V9,SAS 9.4 会用V9 这种格式 SAS编程演义 大多数情况下,我们都希望处理好的SAS 数据集能够保存在某个文件夹下,以备后 用,而对于一些中间数据或者临时数据,我们则希望关掉SAS 后它们就被自动删除。因此, SAS 给我们分别提供了永久逻辑库和临时逻辑库(见图2-5)。永久库除了有SAS 自带的 Maps、Mapsgfk、Mapssas、Sashelp 以及Sasuser 外,我们也可以自建存放自己数据集的永 久库,而临时库在SAS 里就一个,名为Work 库。我们在Work 库里倒腾数据时很可能产 生了一大堆中间数据集,最后把倒腾好的最终数据集存入永久库即可,关掉SAS 软件后, 那些留在Work 库里的中间数据集不用我们去操心,会被自动删除。 自建永久逻辑 临时逻辑库 逻辑库的物理位置 (单个文件夹) 逻辑库的物理位置 (多个文件夹) 永久逻辑库 (SAS自带) 图2-5 永久逻辑库与临时逻辑库 例如,我们想在D:\03 Writting\01 SAS 编程演义\02 Data\Clean 的位置建立一个永久 逻辑库,取名为“Demo”。一种方法是采用工具按钮如图2-4 所示方法生成;另一种方 法则是通过LIBNAME 语句生成。基本格式就是:LIBANME 数据库名称“数据库物理地 第 2 章 清歌苦调两不厌 :夯实基础 如果希望查看数据集的描述信息,特别是如图2-8所示的第三部分内容:变量列表及 其属性,则可以通过PROC CONTENTS实现,并且可以进一步加工整理成数据库的变量 字典,极大地方便后续的工作。当然,也可以通过右击数据集,通过「查看列」来查看变 量信息(见图2-9)。 图2-8 SAS数据文件描述信息 SAS编程演义 图2-9 右击数据集查看变量信息 欲查看数据值,可以通过PROC PRINT 实现,当然,双击数据集也可以。不过在实 际操作中,我们不会总是这样查看所有数据值,而是希望通过统计过程查看一些统计信息, 如均数、分位数、分布图等。 由于数据集都在逻辑库下,因此在程序中指定数据集时,需要按照「逻辑库. 数据集」 的二级命名格式来明确告知是哪个逻辑库下的哪个数据集,中间用英文的句号隔开,当逻 辑库为临时库WORK 时,可以省略掉一级命名结构和句号「逻辑库.」。 程序2-2 查看数据集描述信息与数据值 *====查看逻辑库Demo里的数据集描述信息; *===查看数据文件的描述信息; proc contents data=demo.class_datafile; run; *===查看数据文件的数据值; proc print data=demo.class_datafile; run; *===查看视图的描述信息; proc contents data=demo.class_view; run; *===查看视图的数据值; proc print data=demo.class_view; run; 查看数据集的信息是学会了,但是如何创建数据文件呢?回顾图2-2,我们知道有两 种方式:导入外部数据或者读取既有的SAS 数据集。导入外部数据,我们既可以用DATA 第 2 章 清歌苦调两不厌 :夯实基础 步,也可以用PROC步,具体情况将在第3章做详细介绍。读取既有数据集,我们可以用 DATA步的SET语句,这里我们简单举例,把SAShelp库的Class数据集读取到我们自建 的永久库Demo里和临时库WORK里,并都命名为class_datafile。 程序2-3 SET语句建立数据文件 *===自建永久库; libname demo "D:\03 Writting\01 SAS编程演义\02 Data\Clean"; *===建永久数据集,demo.不可省略; data demo.class_datafile; set sashelp.class; run; *===建临时数据集,work.被省略; data class_datafile; set sashelp.class; run; 建立数据文件学会了,如何建立视图呢?有两种方法:DATA步的VIEW选项和PROC SQL的Create view语句。 程序2-4 创建SAS视图 *===建视图; *===from data setp; data demo.class_view/view=demo.class_view; set sashelp.class; run; *===from Proc sql; proc sql; create view demo.class_view as select * from sashelp.class; quit; . 2.3.3 变量 数据集中最为重要的一个概念莫过于变量(variable)。在SAS里,我们可以将变量 简单理解为存储数字或者字符的容器,一个变量就是一列。变量有其属性,包括名称、类 型、长度、输入格式、输出格式、标签、观测中的位置以及索引类型等。图2-9展示了变 量的属性信息。 SAS编程演义 变量名有其命名规则,后面会详细介绍。同其他编程语言或者统计软件不同的是, SAS 的变量类型非常简约,只有两种:数字和字符。数字型变量存储浮点数,包括日期和 时间(在SAS 里,日期实际存储的是距离1960 年1 月1 日的天数,而时间实际存储的是 距离凌晨的秒数,具体可见程序2-5);字符型变量存储的是拉丁字母、0 ~9 阿拉伯数 字以及其他特殊字符,默认长度是8 个字节。输入/ 输出格式是SAS 读取或者显示变量的 规则,数字型变量默认输入格式是「w.d 」,输出格式是「BEST12.」;字符型默认输入、 输出格式均为「$w. 」。关于格式,我们将在第7 章做详细介绍。 程序2-5 SAS日期、时间以及日期时间的本质 data tmp; date="01Jan1960"d; time="00:00:00"t; datetime="01Jan1960 00:00:00"dt; run; 2.4 SAS编程语言 前面几个小节我们基本上都把SAS 当作一个软件来进行介绍,辅助性地展示了一些 SAS 代码,对于初学者,如果没看懂前面的代码没有关系,理解软件层面的概念即可。从 这一节开始,我们一起捋一捋SAS 作为一门编程语言的基本概念和基础知识。 .2.4.1 SAS程序结构 SAS 程序是由一系列SAS 语句(statement)组成,所谓SAS 语句通常是指以SAS 关 键字(keyword)开头,始终以分号(;)结束的代码行。最常见的SAS 关键字就是「DATA 」 和「PROC」,因此最常见的语句就是DATA 语句和PROC 语句。当然,SAS 的关键字多 如牛毛,我们也不必刻意去死记硬背每一个SAS 关键字。在DMS、EG 和SAS Studio 的 编辑器中,SAS 都会自动给关键字着成深蓝或者蓝色,EG 和SAS Studio 还会给出提示, 初学者可以尝试看看。 另外,如果从程序块上来讲解,SAS 程序可以分为两大块:DATA 步和PROC 步。所 谓一个「步」(step)是指这样的一个程序块。 ●以DATA 语句或者PROC语句开头。 第 2 章 清歌苦调两不厌 :夯实基础 ● 以RUN语句(大多数情况下)、QUIT语句(部分情况下)、新的DATA语句或者 PROC语句结束。 在SAS编辑器中,SAS会自动显示横线以隔开DATA步或者PROC步(见图2-10)。 需要留意的是,有些语句只能在DATA步里出现(如INPUT语句),有些语句只能 在PROC步里出现(如CLASS语句),有些语句DATA步、PROC步都可以出现(如 FORMAT语句),而还有些语句可以既不在DATA步也不在PROC步出现,它们可以单 独出现(如前面使用过的LIBNAME语句),此即DATA步语句、PROC步语句及全局语 句的概念。 图2-10 DATA步与PROC步 SAS程序除了单独的DATA步和PROC步程序,还有可以把它们打包组合在一起的 程序,那就是宏程序,宏程序本质上是文本替代,用更少的文本替代更多的文本。这个话 题暂且不做过多介绍,留在后面的第10章进行详细说明。 . 2.4.2 SAS语法规则 规则的SAS程序书写风格看起来基本就是被DATA步和PROC步分割的条块,其实 SAS程序书写的格式是比较自由的,如果要真正究其语法规则的话,有两方面:①SAS 语句语法规则;②SAS名语法规则。 SAS语句 DATA步 分割线 DATA步 RROC步 RROC步 SAS编程演义 SAS 语句语法规则: ●分隔单词的可以是一个空格或特殊字符(比如加号、等号等运算符),也可以是 多个。 ●程序可以在任何列开始,也可以在任何列结束。 ●单个语句可以写在多行,多个语句也可以写在一行。 SAS 名是指SAS 给其一些语言元素(如逻辑库、数据集、变量以及格式等)的名称标记。 SAS 名有两类。 (1)SAS 系统定义名,如自带的库名WORK、SASHELP 等;如特殊的数据集名_ NULL_ (不创建数据集)、_DATA_ (自动数据集名)、_LAST_(最后一个活动数据集); 如SAS DATA 步的自动变量名_N_ (观测号)、_ERROR_(错误标识变量);如特殊的变 量列表名_CHARACTER_(所有字符型变量)、_NUMERIC_(所有数字型变量)、_ALL_(所 有变量);以及SYS 开头的宏变量名如SYSDATE (日期)、SYSVER(SAS 版本)等。 (2)用户自定义名,自定义名不能与系统定义名相冲突,且需符合SAS 命名的语法 规则,总结起来可归纳为以下三点。 ●只能由数字、字母、下划线组成。 ●首字符不能是数字。 ●长度限制各有不同,有的最长可以达32 个字符(如变量名,宏变量名),有的最 长只能有8个字符(如逻辑库、文件引用名以及引擎名)。 这个命名规则一定要遵守吗?是的,都应该遵守。这个规则能打破吗?可以,但不 推荐。不过,有的时候,我们也确实有特殊需求:比如如何打破规则让SAS 也可以用中 文命名数据集、命名变量呢?这时候,我们可以通过修改系统选项VALIDMEMNAME 和 VALIDVARNAME 的值来实现,如图2-11 所示。 程序2-6 SAS中文名数据集和变量名 *===中文名数据集; *===中文名变量; options validmemname=extend validvarname=any; data 中文名演示; SAS中文变量名="YES"; SAS中文变量名="YES"; '2SAS中文变量名'n="YES"; '2SAS中文变量名'n="YES"; 'SAS空格变量名'n="YES"; 'SAS空# @ %格特殊字符变量名'n="YES"; run; 第 2 章 清歌苦调两不厌 :夯实基础 图2-11 SAS中文名数据集和变量名 语法规则只是对编程的合法性给出了最低的要求。在合法性的基础上,我们还应追求 语法风格的统一和规范,这样不仅方便自己日后阅读调试,也方便他人审阅,下面是同一 段简单的SAS程序,对比左右两边的风格,正常的人类都更愿意看左边的,对吧?编程 人士中有一个术语叫Good Programming Practice,GPP,即良好编程实践,很多编程语言 都有推荐的编程规范,遵循这些规范,可以极大地方便与同行的交流,笔者自己总结过一 些SAS的编程规范,具体可参考附录。 程序2-7 编程风格:规范与凌乱 *===自建永久库; libname demo "D:\03 Writting\01 SAS编程演义\02 Data\Clean"; *===建永久数据集,demo.不可省略; data demo.class_datafile; set sashelp.class; run; *===建临时数据集,work.可以省略; data class_datafile; set sashelp.class; run; libname demo "D:\03 Writting\01 SAS编程演义\ 02 Data\Clean"; data demo.class_datafile; set sashelp.class; run; data class_datafile; set sashelp.class;run; . 2.4.3 SAS语言元素 作为一门编程语言,SAS语言元素除了上面提及的SAS语句(statements),还有表 达式(expressions)、选项(options)、格式(format)、函数(function)以及Call 列程(Call Rountine)等。 1. 表达式 表达式是SAS 语言中一个非常重要的概念,SAS 在生成一个新变量、给一个变量赋 值、计算新值、变量转换以及依据不同的条件进行处理都需要借助表达式来实现。什么是 表达式?SAS 官方给表达式的定义比较拗口:表达式是由一系列操作数和操作符构成的、 可执行的、并且产生结果值的序列。简单来说,表达式就是告诉SAS 对什么对象执行什 么操作,从而得到一个结果的命令。被操作的对象叫操作数(operands),执行操作用的 符号就是操作符(operators),习惯上称运算符的更多,执行的结果可能是一个数字值, 也可能是一个字符值,还可能是一个布尔值(是/ 否、真/ 假、1/0)。 (1)操作数:操作数可以是常量、变量,也可以是表达式。常量,顾名思义,表示一 个值是恒常固定的量;同理,变量表示值是可以变化的,有一套数值去刻画某个特征的量。 常量有以下四种情况。 ● 字符常量:字符常量由1~32767个字符组成,必需放在英文引号内,引号可以是 单引号,也可以是双引号。字符常量中包含单引号(双引号)时,可以用双引号 (单引号),或者连续的单引号(双引号),如:“Hongqiu Gu’s Book”。 ● 数字常量:数字常量无须多言,只需留意除了标准计数法(如:1,-5,+49, 1.23,01),科学计数法(如:2E23,0.5e-10)和十六制计数法(如:0C1X、 9X)也可以。 ● 日期时间常量:时间日期常量包括日期、时间、日期时间常量三种,命名是需要 采用单引号或双引号加D(日期)、T(时间)、DT(日期时间)后缀来分别表 示,如'08Sep2016'D、'11:11'T、'08Sep201611:11'DT,具体可参考程序 25 SAS日 期、时间以及日期时间的本质,这种引号加字母后缀的命名方式称之为名称文字 (Name Literal),在使用非规范的数据集名、变量名时也需要用到这种形式。 ● 位测试常量:在引号里由0 , 1 以及点(. )组成字符串,且后缀为B, 如'..1.0000'b,用来测试对应的位是否为0或1。这种常量使用较少,在此不做具体 介绍。 变量有两种类型:字符变量和数字变量。日期、时间以及日期时间在SAS 里其实也 是以数字存储的数字变量。如前所述,日期变量的值为距离1960 年1 月1 日的天数,时 间变量的值为距离凌晨的秒数,日期时间的值为距离1960 年1 月1 日凌晨的秒数。 第 2 章 清歌苦调两不厌 :夯实基础 程序2-8 SAS中的常量 *===常量; data _null_; *==字符常量; c1="Hongqiu Gu's Book"; c2='Hongqiu Gu''s Book'; c3='Hongqiu Gu"s Book'; c4="Hongqiu Gu""s Book"; *==数字常量; n1=123; n2=-123; n3=+123; n4=1.23; n5=0123; *===日期时间常量; d='08Sep2016'D; t='11:11'T; dt='08Sep2016:11:11'DT; *===在日志中输出; put c1-c4 ; put n1-n5 ; put d yymmdd10.; put t time.; put dt datetime.; run; (2)运算符:SAS运算符从位置上讲,放在操作数前面的叫前缀运算符(如+、-),放 在操作数中间的叫中缀运算符(大多数运算都是);从功能上讲,有用于算术运算的算术运算 符(如+、-、*、/),用于比较大小的比较运算符(如>、<、=、^=),用于逻辑运算的逻辑 运算符(如^、&、|);算术运算符运算的结果通常为数值,比较和逻辑运算符运算的结果为 真(1)或假(0)。关于这几种运算符,没有太多可说的,请参考下面的表2-2、表2-3及表2-4。 表2-2 算术运算符 符 号 定 义 例 子 结 果 ** 指数 a**3 a的三次方 * 乘 2*y 2乘以y / 除 Var/5 Var除以5 + 加 Num+3 Num加3 - 减 Sales-Discount Sales减去Discount 注:乘法中,*号是必需的,2y或者2(y)都是非法的。 SAS编程演义 表2-3 比较运算符 符号等效字符定义例子 = EQ 等于A=3 ^=、.=、~= NE 不等于* A^=3,A.=3,A~=3 > GT 大于Num>8 < LT 小于Num<8 >= GE 大于等于** Sales>=100 <= LE 小于等于** Sales<=100 IN 等于列表中的一个元素Num in (3,4,5) 注:*NE 的符号在不同的键盘上可能会有所不同。 **>=、<= 与以前SAS 版本兼容。WEHRE 或SQL 语句中不支持。 表2-4 逻辑运算符 符号等效字符例子运算符说明 & AND (a>b & c>d) 两边都为真,运算结果为真 |、!、| OR* (a>b or c>d) 任一边为真,运算结果为真 .、^、~ NOT* Not(a>b) 取反面结果 注:* 不同的操作环境可能符号有所不同。 除此之外,还有取小运算符(><)、取大运算符(<>)以及连接运算符(||)。>< 和 <> 分别用来找到两个操作数中的最小值、最大值,|| 用来连接前后两字符。 如果只是单个运算符时,不会牵涉运算顺序的问题,但是,当有多个运算符时,就需 要厘清运算顺序了,如复合表达式中会有多个运算符,其运算顺序的原则是: (1)先算括号中的表达式,再算括号外。 (2)不同组有不同的优先级。 (3)同组内有不同的运算顺序。 具体示例详见表2-5。 表2-5 复合表达式运算顺序 优先级运算顺序符号例子 组1 从右到左** y=a**2; + y=+(a*b); -z=-(a+b); ^.~ if not z then put x; >< x=(a> x=(a<>b); 组2 从左到右* c=a*b; / f=g/h; 组3 从左到右+ c=a+b; 第 2 章 清歌苦调两不厌 :夯实基础 优先级 运算顺序 符 号 例 子 - f=g-h; 组4 从左到右 || || !! name= ‘J’||’SMITH’; 组5 从左到右 < if x= if y>=a then output; > if z>a then output; if state in (‘NY’,’NJ’,’PA’) then region=’NE’; y = x in (1:10); 组6 从左到右 & if a=b & c=d then x=1; 组7 从左到右 | | ! if y=2 or x=3 then a=d; 2. 选项 SAS选项包括系统选项和数据集选项。系统选项主要是一些可以影响整个SAS程序 执行或SAS会话交互的指令,数据集选项是仅用于数据集的选项,如变量的重命名与筛选、 观测筛选、数据集权限控制等。 3. 格式 格式依据应用场景,分为输入格式和输出格式;依据定义方式,分为系统格式和自定 义格式。格式告诉SAS按一定的模式读取、显示数据。关于格式,详见第7章。 4. 函数与CALL例程 SAS函数可以接收参数,执行一些运算和操作,然后返回一个值。CALL例程与SAS函 数类似,不过不能用在赋值语句或表达式中。关于函数和CALL例程,详细讨论将在第6章进行。 我们通过一个综合的例子来简单感受上面提及的一些概念。 程序2-9 SAS语言元素演示 *====概念演示; data test2; length ID $ 4; input Name $ start yymmdd10. end date8. grade; *输入格式; FirstName=substr(Name,1,1); *函数substr; GivenName=substr(Name,length(Name)-1,2); *函数substr; call cats(ID,FirstName, GivenName); *CALL CATS例程; if grade>=2 and start<'01Jun2016'd then pay=(end-start)*150; *比较、逻辑、算术运算; (续表) SAS编程演义 else pay=(end-start)*100; datalines; ZhangXL 2016/08/09 06SEP16 1WangSJ 2016/07/03 09SEP16 2WenTC 2016/05/05 02SEP16 3LiWC 2016/04/09 10SEP16 2; run; options nodate; *系统选项; proc print data=test2(obs=2); *数据集选项; var ID start end pay ; format start yymmdd10. end yymmdd10.; *输出格式; run; .2.4.4 三种逻辑结构 就如人生中面临的三种情境一样:按照既定的步骤去做一些事情、依据不同情境选择 性地应对一些事情、在某些情境下重复做相同的事情,几乎所有的编程语言都设计了三种 程序逻辑结构:顺序、选择和循环。 1. 顺序结构(sequence) 顺序结构的程序执行时就按照代码出现的顺序依次执行:第一条语句,第二条语句, 第三条语句……前面的所有SAS 代码几乎都是顺序结构式的。 2. 选择结构(selection) 最经典的选择结构语句就是IF-ELSE/THEN 语句,告诉SAS 在满足某条件的情况下 执行一套操作,不满足则执行另一套操作。例如,我们对SASHLEP 库CLASS 数据集的 人按男女性别的不同分别抓出来放到Male 和Female 数据集。 程序2-10 IF-ELSE/THEN示例 data male female; set sashelp.class; if sex="M" then output male; else if sex="F" then output female; else put "Invalid sex :" sex ; run; 需要留意的是: ●对于情境的分类,要考虑完全。因此,尽量最后加一个ELSE语句,纳入其他所有 可能情况。 (续表) 第 2 章 清歌苦调两不厌 :夯实基础 ● 如果某种情境下,希望执行的不仅仅是一个动作,而是多个动作,此时可以在关 键词THEN后面用夹板语句DO-END,把多个动作整合在DO-END语句中。例如, 我们嫌弃SEX不文雅,把它换成GENDER,用Male、Female标明男性、女性。 程序2-11 IF-ELSE配合DO-END data male female; set sashelp.class; if sex="M" then do; gender="Male "; output male; end; else if sex="F" then do; gender="Female"; output female; end; else put "Invalid sex :" sex ; run; 3. 循环结构(iteration) 循环结构的程序是只要满足某个特定的条件,就重复进行某些操作。SAS里常见的循 环语句有三种:DO循环语句、DO-WHILE语句以及DO-UNTIL语句。 (1)DO循环语句。DO循环语句其实就是DO-END语句的衍生,在DO后面添加循 环的条件,这个条件可以是数字、字符、日期的列表;可以指定起始值和终止值以及步长; 还可以是前面两者的混合。 程序2-12 DO循环语句 data schedule; do date='01Sep2016'd to '30Sep2016'd ; *日期循环; day=weekday(date); if day in (1,7) then Activity="Running"; else if day in (2,4,6) then Activity="Writing"; else Activity="Reading"; output; end; run; data random; do i=1 to 10; *数字10次循环; r=rannor(23); *生成随机数; output; end; run; (2)DO-WHILE语句。与DO循环语句每次按照指示变量的值去执行不同,DO- WHILE语句会先判断是否满足条件,如果满足则执行否则跳出循环。 (3)DO-UNTIL语句。与DO-WHILE语句会先判断是否满足条件不同,DO-UNITL SAS编程演义 语句不管三七二十一,先执行了本次循环再说,而后再判断条件是否满足。在做条件判断时, DO-UNTIL 与DO WHILE 的思维也不一样:DO-UNIL 是如果不满足,则继续下一次循环, 如果满足,则跳出循环。具体可留意程序2-13 的条件差异。 程序2-13 循环语句DO WHILE 与DO UNTIL data dowhile; i=0; do while(i<5); i+1; output; end; run; data dountil; i=0; do until(i>=5); i+1; output; end; run; 如果读了上面的文字和程序,对三种逻辑结构还是不太清楚的话,图2-12 或许能让 我们的思维更清晰些。 条件 条件循环体分支2分支1 步骤A 步骤B 步骤C 图2-12 程序的三种逻辑结构 .2.4.5 数组结构 SAS 编程语言不像其他语言那样有丰富的结构体(struct),用来聚合数据类型,这 第 2 章 清歌苦调两不厌 :夯实基础 正如SAS的数据类型只有简单的字符和数字两种。不过,其他编程语言的数组(array) 的思想倒是在SAS编程语言中有充分的利用。 SAS编程语言里,数组是一系列有特定顺序的变量组成的一个临时变量组。之所以说 是临时的,是因为数组仅仅存在于DATA步执行的过程中。数组中的变量必须有相同的数 据类型,如果全为字符型,则为字符型数组;如果全为数字型,则为数字型数组。此外, 如果数组里的值只在一个维度上排列,比如就一行,这就是一维数组;如果数组里的值在 多个维度上排列,比如行列上都有,就像一张EXCEL表格,这便是二维数组。 在什么场合下会用到数组呢?怎样理解一维和二维数组呢?举例说明:比如某研 究项目持续每天测量患者的收缩压(SBP)、舒张压(DBP),并持续了一周,这样就 有7次收缩压和7次舒张压的测量值。当然,我们可以把它们分别存储在SBP1~SBP7、 DBP1~DBP7这14个变量中。但是仅仅这样,可能还不够,如果后期我们发现这批血压仪 的测量值有系统偏差,SBP比正常测量值低5mmHg, DBP比正常测量值低3mmHg。现在 要校正的这些血压值,我们要分别对SBP、DBP写7个赋值语句,总计14个。这样是不 是太烦琐了?是的。这时候数组就可以派上用场了。 我们可以建两个数组SBP、DBP分别用来存储SBP1~SBP7、DBP1~DBP7。就像下面 这样有一排格子,每个格子有一个编号,SAS依据格子的编号进行数据的存取,这就是一 维数组,数据排列就在一个维度上:行。 数组SBP 1 2 3 4 5 6 7 SBP SBP1 SBP2 SBP3 SBP4 SBP5 SBP6 SBP7 数组DBP 1 2 3 4 5 6 7 DBP DBP1 DBP2 DBP3 DBP4 DBP5 DBP6 DBP7 当然,我们甚至可以直接建一个数组,同时把7次SPB,DBP的值打包在一起,这 就是二维数组,数据排列在两个维度上:行和列。 1 2 3 4 5 6 7 SBP 1 SBP1 SBP2 SBP3 SBP4 SBP5 SBP6 SBP7 DBP 2 DBP1 DBP2 DBP3 DBP4 DBP5 DBP6 DBP7 上面只是给出了数组的概念示意图。实际操作时涉及两个核心问题:一是如何定义数 组;二是如何访问数组。 SAS编程演义 1. 定义数组 SAS DATA 步中,我们通过语句ARRAY 来定义数组。其具体语法格式请参考语法2-1: ARRAYarray-name{number-of-elements} <$> <(initial-value-list)>; $:表明为字符型数组 length指定字符长度 数组元素 数组元素初始值 数组名称,注意符 合SAS命名规则数组元素个数 语法2-1 定义数组语句ARRAY 语法参考卡片 关于数组语法的一些解释如下所述。 ●元素个数可以用{*}代替,表示让SAS自动计数,也可以指定具体的数字,如 {7},还可以指定一定的数字范围,如{1:7}。 ●元素名可以是变量名,也可以是SAS自定义的变量,如_ALL_(标示所有定 义的变量,但是变量类型需要相同), _NUMERIC_ (所有数字变量)以及_ CHARACTER_(所有字符变量),还可以是_TEMPORARY_ (临时变量)。 ●<>表示其中的内容并非必须有。例如,$只有在数组元素为字符型时才用到,length 也是。数组元素及其初始值也并非必需,如果指定数组元素初始值的话,应该在小 括号中指定。 程序2-14 定义数组 *===定义数组; *===sbp1-sbp7是sbp1到sbp7的缩略写法; array sbp{7} sbp1-sbp7; array dbp{1:7} dbp1-dbp7; *===带初始值; array sbp{1:7} sbp1-sbp7 (163 164 167 171 155 158 154); array dbp{7} dbp1-dbp7 (98 99 92 94 95 93 93); *===定义二维数组; array bp{2,1:7} sbp1-sbp7 dbp1-dbp7 ; array bp{2,7} sbp1-sbp7 dbp1-dbp7 (163 164 167 171 155 158 154 98 99 92 94 95 93 93); 2. 访问数组 访问数组的元素时,我们需要告诉SAS 数组元素的地址,数组中元素的地址用数组 第 2 章 清歌苦调两不厌 :夯实基础 名加角标的形式arrayname{i} 表示。配合前面已经介绍过的DO循环语句,我们可以遍历 数组中的所有元素(见图2-13),进行各种数据操作,如果希望进行前面提到的加减校正, 把PUT语句换成赋值语句即可。 程序2-15 访问数组元素 data tmp; *===定义数组; array sbp{7} sbp1-sbp7 (163 164 167 171 155 158 154); array dbp{7} dbp1-dbp7 (98 99 92 94 95 93 93); array bp{2,7} sbp1-sbp7 dbp1-dbp7 (163 164 167 171 155 158 154 98 99 92 94 95 93 93); *===遍历一维数组; do i=1 to 7; put "第" i "次测量的SBP为:" sbp{i}; put "第" i "次测量的DBP为:" dbp{i}; end; *===遍历二维数组; do m=1 to 2; do n=1 to 7; put "血压类型为:" m ",血压测量次数为:" n ",血压测量值为:" bp{m,n}; end; end; run; 图2-13 遍历数组元素结果 SAS编程演义 .2.4.6 函数与CALL 例程 在SAS 里,特别是DATA 步中,如果希望更加方便、快捷地处理数据,我们就必须 了解函数和CALL 例程。SAS 函数可以接收参数,执行一些运算和操作,然后返回一个值。 CALL 例程与SAS 函数类似,不过不能用于赋值的语句或表达式中。我们通过一个简单 的例子感受下函数和CALL 例程的应用。 程序2-16 函数与例程应用示例 data _null_; length FullName_ByFunction FullName_ByRoutine $10; FamilyName="Gu"; GivenName="Hongqiu"; *===用函数生成全名; FullName_ByFunction=catx(" ",GivenName, FamilyName); *===用例程生成全名; call catx(" ",FullName_ByRoutine, GivenName, FamilyName ); *===Log中查看结果; put "Fullname Generatedy by Function: " FullName_ByFunction; put "Fullname Generatedy by Routine: " FullName_ByRoutine; run; 笔者粗略统计了下,SAS 中有将近30 多类,总计达520 个函数。这是一个比较庞大 的体系,也是一个非常有力的武器,我们将在第6 章专门论述。 .2.4.7 结构化查询语言SQL SQL 是结构化查询语言(Structured Query Language)的简称,自1970 年IBM 开发以 来,作为关系型数据库查询工具的标准化语言而广泛使用。SAS 自6.06 版本引入SQL 后, 一直在增强完善其功能及其与SAS 软件的兼容性,目前SAS 9 中的SQL 已经非常强大。 通过SQL,我们可以进行简单查询、子查询,不用排序就可以进行表的连接、集合运算、 创建视图和表、创建宏变量等一系列操作。本小节我们仅就SQL 语言做一概要式介绍, 具体的应用我们会结合后面的实例再讨论。 SQL 最简单的应用就是用SELECT 语句做查询。SELECT 语句包含了一系列有序的 从句,具体可见语法2-2。 Help 中<> 表示里面的东西选用。因此,必用的就只有SELECT 和FROM 了,比如 下面的例子就用SQL 查看sashelp.class 中的姓名、性别以及年龄。 第 2 章 清歌苦调两不厌 :夯实基础 宏变量 FROM:指定表格(数据集) WHERE:选择条件 GROUP BY:分组变量 HAVING:对结果再进行条件筛选 ORDER BY:结果排序 SELECT查询语句选择的变量、自定义的表达式等去除重复观测 Proc sql; SELECT  object-item-1<, object-item-2, ...> > FROM from-list > , ...>>; Quit; 语法2-2 PROC SQL SELECT语句语法参考卡片 程序2-17 最简单的一个SQL过程 proc sql; select name, sex, age from sashelp.class; quit; 当然其他从句也是非常实用的。比如,用WHERE可以进行条件筛选,用GORUP BY可以进行分组统计,用HAVING可以对分组统计的结果进行条件筛选,用ORDER BY 可以对结果进行排序。初接触时,可能对这些从句的顺序记忆有些混淆,笔者个人就用 SFW、GHO来记忆它。sfw是一种位图格式文件的扩展名,gho是ghost镜像文件的扩展名。 下面是一个完整的,利用了所有SELECT从句的例子。目的是先按性别分组统计人数、 平均身高,然后挑出身高大于62的组,最后按人数多少排序。 程序2-18 PROC SQL SELECT语句全从句示例 proc sql; select sex, count(name) as cnt_name ,mean(height) as m_height from sashelp.class where age>=12 group by sex having m_height>62 order by cnt_name; quit; SAS编程演义 .2.4.8 SAS宏MACRO MACRO(宏)这个术语可能对我们来说并不陌生,宏就是实现自动化操作的一种工具。 在EXCEL 里我们就曾接触过,只是大部分人很少用而已。在SAS 里,宏工具是一个用来 自动化和定制化SAS 代码的文本处理工具。 SAS 的强大,很大一部分原因就是宏工具的存在。宏的本质是文本替换,但是通过文 本替换,可以实现SAS 代码的自动化生成,动态生成以及SAS 代码的条件结构,也就是 说,不仅可以让SAS 代码自己去写SAS 代码,而且还可以根据不同的条件写不同的代码, 这很符合“元编程”的理念。也正是因为这样,很多SAS 开发者,疯狂开发自己的宏, 从而避免很多重复性的代码编写工作,实现更多自动化、智能化的处理。 SAS 宏语言分为两大块:宏变量和宏程序。宏变量是不必限定在DATA 步使用的变量, 即独立于数据集的变量。宏变量分为系统宏变量和用户自定义宏变量。最常规的情况下, 我们可以用%LET 语句定义宏变量,%PUT 语句查看宏变量。正如前面所说,宏本质是文 本替换,宏变量也是用简单的文本去替换更长更复杂的文本。例如,我们可用一小段文本 “PUMC”替换更长的“Peking Union Medical College”。 程序2-19 宏变量 *===自定义; %let PUMC=Peking Union Medical College; *===查看系统自带; %put &sysdate; *===查看自定义; %put &PUMC; 宏程序同宏变量类似,不过宏程序还有其他特性:①可以包含编程语句,包括DATA 步和PROC 语句;②可以接受参数。比如,我们可以定义一个打印指定数据集、指定变 量的宏。在定义宏程序时,用%MACRO 开头,用%END 结尾,使用宏时,用% 宏名称即可。 程序2-20 MACRO定义和调用 *===定义Macro; *===通过data和var这两个参数指定数据集和变量; %macro prtdsvar(data=, var=); proc print data=&data; var &var; run; 第 2 章 清歌苦调两不厌 :夯实基础 %mend; *===调用Macro; %prtdsvar(data=sashelp.class, var=name sex) 关于宏,本节仅作概念性介绍,具体的内容我们将在第10章详细讨论。 2.5 理解SAS运行机制 SAS的学习曲线比较陡峭,其原因之一就是很多SAS学习者没有深入理解SAS的运 行机制,其中最为重要的机制就是PDV(Program Data Vector)与DATA步自循环。 . 2.5.1 PDV与DATA步自循环 很多时候,即使是写了很多SAS程序、用了很长时间SAS的人,也总是会对SAS DATA步运行出的结果感到莫名其妙,对发生的错误更是一头雾水,但是如果能够静下心 来,了解PDV、厘清SAS的运行机制,很多疑惑或许就迎刃而解了。 SAS 系统处理SAS DATA步时,分两步:编译和执行。经典的DATA步,基本按照图2-14 的流程来。 具体而言,在编译和执行阶段,SAS会分别进行如表2-6所示的操作。 表2-6 编译和执行阶段具体动作 阶段 动 作 1.编译 提交代码后,SAS进行编译,此时SAS要确定每个变量的类型和长度,并确定变量是否有必要 进行类型转换。具体如下: ● 检查代码的语法 ● 将代码翻译成机器语言 ● 如果是用INPUT语句读入原始数据,则建立输入缓冲区(Input Buffer),如果是读入SAS 数据集,则直接建立程序数据向量 ● 建立程序数据向量(Program Data Vector, PDV),包含 ■ SAS数据集变量以及SAS语句计算生成的变量 ■ 自动变量_N_、_ERROR_ ● 建立SAS数据集以及变量属性的描述信息 ■ 数据集名字、类型(数据文件、SAS视图)、创建日期时间 ■ 变量名称、类型(字符、数字)、序号等 (续表) SAS编程演义 阶段动作 2. 执行默认情况下,一条观测要经历一次DATA 步的迭代 ●从DATA 语句开始,将_N_ 值设定为1(随着DATA 语句的每次迭代,变量_N_ 自动加1), _ERROR_=0(发生错误时_ERROR_ 会变成1,程序终止) ●把PDV 中的变量设为缺失 ●用INPUT 语句把一条数据记录从原始数据读入缓存区,或者用SET,MERGE,MODIFY 或UPDATE 语句,把SAS 数据集里的一条观测值读入到PDV ●对当前观测执行DATA 步中后续的程序语句 ●执行完最后的语句,SAS 自动完成输出、返回、重设动作 ■输出,即把观测从PDV 写入数据集,自动变量(_N_、_ERROR_ 不会输出) ■返回:系统自动返回DATA 步开头 ■重置:将PDV 里由INPUT 语句和赋值语句创建的变量设置为缺失,但由SET, MERGE,MODIFY 或UPDATE 语句读入的变量不置为缺失 ●SAS 开始计下一次迭代,读入下一条记录或观测,对当前观测执行后续的编程语句 ●到达要读入的SAS 数据集或者原始文件的数据末尾时,DATA 步终止 编译阶段 编译 SAS语法 (包括语法检查) 创建 1.一个输入缓冲区(Input Buffer) 2.一个程序数据向量(PDV) 3.数据集描述信息 开始 从DATA 语句开始 (记录迭代次数) 执行阶段 否 关闭 关闭数据 进行下一个DATA 或者PROC步 是 数据读入语句: 是否有记录可读入? 读入 读入一条输入记录 执行 写入 写入一行观测到的数据集 执行其他可执行语句 返回 返回DATA 步的开头 图2-14 DATA 步动作流程图 (续表) 第 2 章 清歌苦调两不厌 :夯实基础 在上面的过程中,有两个概念不是很好理解:一是输入缓冲区(Input Buffer);二是 程序数据向量(Program Data Vector)。这两个概念都是内存里的一个逻辑区域,我们简 要示图如图2-15所示。 内存 PDVBuffersSAS数据集Buffer 原始数据 输入缓冲区(Input Buffer) IDStartEnd…Pay_N__Error_Name 图2-15 Input Buffery与PDV Buffers是系统内存的缓冲区,我们可以先不细究。如图2-15所示,分别展示了读入 原始数据和读入SAS数据集时的流程。 (1)读入原始数据时:原始数据先读入Input Buffer,再从Input Buffer转换到PDV, 最后从PDV输出到SAS数据集。 (2)读SAS数据时:把数据集观测直接读入到PDV,再从PDV输出到数据集。 我们再次以一个小程序为例,看看Input Buffer与PDV,了解SAS DATA步的运行机制。 程序2-21 PDV演示程序 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93 S002 90 85 95 S003 83 88 81 ; run; 在编译阶段,SAS就知道这个要建立的数据集叫DemoPDV,有ID、Chinese、Math、 English以及Sum五个变量,其中ID为字符型。SAS给它们建立好Input Buffer和PDV。 Input Buffer :内存里开辟空间,以便中转数据。 123456789012 PDV:从Input 语句或者SET、MERGE、UPDATE 语句获取变量信息,建立好数据变量。 ID Chinese Math English Sum _N_ _Error_ 运行阶段: (1)设置INPUT 中的变量为缺失( 字符变量为空白,数字变量为小数点),并设置 自动变量_N_=1,_ERROR_=0; 1234567890123 ID Chinese Math English Sum _Error_ _N_ 0 1 (2)INPUT 语句读入第一条记录,Input Buffer 和PDV 的状态。方框可以理解为在 运行的程序部分; 开始INPUT 语句: data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93 S002 90 85 95 S003 83 88 81 ; run; 123456789012 S 0 0 1 8 0 9 9 9 3 ID Chinese Math English Sum _Error_ _N_ 0 1 第 2 章 清歌苦调两不厌 :夯实基础 读入第一个变量ID。 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93S002 90 85 95S003 83 88 81; run; ID Chinese Math English Sum _Error_ _N_ S001 0 1 读入第二个变量。 ID Chinese Math English Sum _Error_ _N_ S001 80 0 1 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93S002 90 85 95S003 83 88 81; run; 如此直到最后一个变量sum。 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93S002 90 85 95S003 83 88 81; run; ID Chinese Math English Sum _Error_ _N_ S001 80 99 93 272 0 1 (3)完成所有DATA 步后续语句,SAS 自动完成输出数据集。 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93 S002 90 85 95 S003 83 88 81 ; run; ID Chinese Math English Sum _Error_ _N_ S001 80 99 93 272 0 1 将上面PDV 里除了自动变量_ERROR_,_N_ 外,其他变量自动输出到数据集 DemoDPV 。 (4)返回DATA 步第一语句,初始化PDV。 data demoPDV; input ID $ Chinese Math English; Sum=Chinese+Math+English; datalines; S001 80 99 93 S002 90 85 95 S003 83 88 81 ; run; 1234567890123 S 0 0 2 9 0 8 5 9 5 ID Chinese Math English Sum _Error_ _N_ 0 2 (5)开始读入第二条记录的第一个变量ID。 ID Chinese Math English Sum _Error_ _N_ S002 0 2 (6)如此循环重复,读完最后一条记录的最后一个变量,写入数据集。 ID Chinese Math English Sum _Error_ _N_ S003 83 88 81 252 0 3 第 2 章 清歌苦调两不厌 :夯实基础 (7)再次返回第一条DATA语句,发现已经没有数据可以读取,直到这时,DATA 步才彻底结束。 ID Chinese Math English Sum _Error_ _N_ 0 4 如何粗略的验证上述步骤呢?我们可以尝试运行程序 2-22 验证PDV,看LOG窗口给 我们的信息提示。 程序2-22 验证PDV data demoPDV; put "第" _n_ "次运行前:" _all_; input ID $ Chinese Math English; Sum=Chinese+Math+English; put "第" _n_ "次运行后:" _all_; datalines; S001 80 99 93 S002 90 85 95 S003 83 88 81 ; run; LOG的结果显示: 第1 次运行前:ID= Chinese=. Math=. English=. Sum=. _ERROR_=0 _N_=1 第1 次运行后:ID=S001 Chinese=80 Math=99 English=93 Sum=272 _ERROR_=0 _N_=1 第2 次运行前:ID= Chinese=. Math=. English=. Sum=. _ERROR_=0 _N_=2 第2 次运行后:ID=S002 Chinese=90 Math=85 English=95 Sum=270 _ERROR_=0 _N_=2 第3 次运行前:ID= Chinese=. Math=. English=. Sum=. _ERROR_=0 _N_=3 第3 次运行后:ID=S003 Chinese=83 Math=88 English=81 Sum=252 _ERROR_=0 _N_=3 第4 次运行前:ID= Chinese=. Math=. English=. Sum=. _ERROR_=0 _N_=4 最后补充说明一下:上面所展示的都是SAS默认的、最基础的、最简单的运行机制。 当DATA步有循环、选择语句,有OUTPUT、RETAIN等语句时,SAS的处理流程会有所不同。 . 2.5.2 @与@@的困惑 初学SAS者,或多或少都会对@与@@的理解有些吃力。官方对@的说法是: INPUT语句尾部的@是行保持符,主要作用是保持数据行停留在此行,不要跳到下一行。 @称为单尾@,@@称为双尾@,很多情况下,我们连一个@也不用,我姑且称之 为无尾。那么什么情况下用无尾、什么情况下用单尾、什么情况下用双尾呢?以下是笔者 总结的一些原则: SAS编程演义 ●当DATALINES 数据行里要读入的数据列数=要读入的变量数,也就是说一行就是 一条观测时,无尾。 ●当DATALINES 数据行里要读入的数据列数>要读入的变量数,而且是整数倍时, 也就是说一行= K*数条观测(K为≥1 的整数),用@@。 ●当一个DATA 步里有多个INPUT语句时,我们需要单尾@。 程序2-23 @与@@示例程序 *=== 数据列数=变量数; data test1; input id x y z; datalines; 1 98 99 97 2 93 91 92 ; run; *=== 数据列数=变量数,多个input 语句; data test2; input id@; input x@; input y@; input z@; datalines; 1 98 99 97 2 93 91 92 ; run; *=== 数据列数=k*变量数; data test3; input id x y z @@; datalines; 1 98 99 97 2 93 91 92 ; run; 关于@、@@ 与跳行,笔者曾简单总结了如下原则: ●无尾Hold不住立即跳。 ●一尾(@)Hold当前INPUTY语句不跳,但若刚好是DATA 步最后一个INPUT语句,跳。 ●二尾(@@)打死都不跳。 ●最后,无论多少尾,数据行末尾必定自动跳。 例如,实例程序2-24 @ 与@@ 的辨析中第一个程序,由于INPUT X 后面有@,且 不是最后一个INPUT 语句,故读完X=1 后,不跳行,继续读Y=2, 由于INPUT Y 后无尾, 立即跳行,故读Z 时为Z=4,又因INPUT Z 后有@@,虽然这是最后一个DATA 步的 第 2 章 清歌苦调两不厌 :夯实基础 IPUT语句,不跳,程序返回开头,开始读第二条观测,X=5,不跳,Y=6,跳,Z=7。故 最终的结果为两条观测,X,Y,Z的值分别为:1,2,4;5,6,7。第二个程序,答案是1,4,5,各位读者能运用上面的原则得出答案吗? 程序2-24 @与@@的辨析 data test; input x @; /*单个@,能Hold住,读后不跳*/ input y; /*没有@,Hold不住,读后跳*/ input z @@; /*两个@,Hold住没问题,但数据行末尾,读后自动跳*/ datalines; 1 2 3 4 5 6 7 ; run; data test; input x ; /*无@,Hold不住,读后立即跳*/ input y @@; /*两个@,Hold住,读后不跳*/ input z @; /*单个@,但是是最后一个INPUT语句,跳*/ datalines; 1 2 3 4 5 6 7 ; run; 2.6 用好SAS Help的秘诀 很多SAS初学者抱怨SAS的帮助文档太复杂,难以读懂。其实,真正说起来,SAS Help文档才是这世界上学习SAS最好的教材,对比R软件包的Help文档,SAS的文档 可以让我们感动到流泪。 . 2.6.1 SAS Help知多少 SAS的帮助文档(SAS Help),窃以为,是市面上所有统计软件里做得最有诚意的 作品。SAS Help是SAS公司投入大量的精力打造的体系最为完整,措辞最为规范,获得 最为方便,知识最为权威的SAS教材。打开SAS Help的官方网站(http://support.sas.com/ documentation/),如图2-16所示,我们可以感受下那份满满的诚意。 SAS编程演义 按产品版本分类的帮助文档 按字母顺序排列的帮助文档 按主题标题排列的帮助文档 自由组合检索帮助文档 图2-16 SAS Help 官网 我们以最新的9.4 版本为例,官网上的Help 文档都可以轻易获得,HTML 或者PDF 任挑(见图2-17),而且PDF 文档的品质完全可以媲美精美的书籍(见图2-18),更重 要的是,我们都可以免费下载。 图2-17 SAS 9.4 官网帮助文档 第 2 章 清歌苦调两不厌 :夯实基础 图2-18 SAS ODS Graph帮助文档PDF版 如果嫌弃HTML打开太麻烦,PDF也懒得去下载,那也没有关系。只要我们在本地 安装了SAS,我们就可以随时在本地查看我们所购买的模块的帮助文档(见图2-19)。不 过需要留意的是,只有购买、安装了某一模块,Help里才可以查到其相应的文档。 图2-19 SAS本地帮助文档 第 2 章 清歌苦调两不厌 :夯实基础 体重、吸烟、饮酒以及血压血脂等信息。 2.7 本章小结 本章对SAS编程里最常用的基本概念,如逻辑库、数据集、SAS编程语言、SAS运 行机制等做了较为详细的介绍。当然,这些介绍仍然比较简略,目的是让大家对SAS编 程有个框架式的了解,而不是为了替代SAS本身的Help文档。因此,最后我们详细阐述 了SAS Help的语法风格,这将有助于我们真正将SAS Help查阅起来、用起来。 苔点狂吞纳线青:读取数据 正所谓巧妇难为无米之炊,数据分析首要的问题是数据的获取。SAS作为老牌的统计 软件发展至今,已经集成了丰富的数据获取与管理功能组件。本章我们就SAS如何获取 数据做重点介绍,并顺带介绍SAS如何导出数据文件。 3.1 读取对象与读取方式 说到数据读取这个问题,我们可以从SAS读取的对象来说, 当然也可以从读取的方式 来说。从SAS读取的对象来说,我们可以把外部数据文件归为四类。 (1)数据库管理系统(Data Base Management System, DBMS)数据文件,市面的 DBMS非常之多,常见的如DB2、Sybase、MySQL、MS SQL Server、Oracle、Teradata以 及Hadoop等。 (2)单机文件(PC file),单机文件应是相对DBMS数据文件而言的,常见的单 机数据文件包括 MS Access、MS Excel、Lotus、DBF 以及大家更熟悉的JMP、SPSS、 Stata、Paradox等软件的数据文件。 (3)平面文件(Flat file),这是一种记录间没有结构关系的文件,一个Flat file既可 以是纯文本文件(Plain text file),也可以是二进制文件(Binary file),对于我们而言,最常 见的是纯文本TXT文件和CSV文件。 (4)流式数据(Instream data),即SAS 程序中DATA步里DATALINES语句后的数据行。 从SAS读取的方式来说,笔者总结为七类(更准确地讲,应该是SAS和外部数据交 互的方式,因为不仅仅读入,还有导出等其他交互操作)。 (1)LIBNAME语句,LINAME语句其实是动用了我们前面提到的数据库引擎来实 3 第 章