首页 理论教育基于R的实验:城市居民消费性支出聚类分析

基于R的实验:城市居民消费性支出聚类分析

【摘要】:表7-1给出了我国31个省、市、自治区1999年城镇居民家庭平均每人全年消费支出的8个指标:x1:人均食品支出(元、人);x2:人均衣着商品支出(元、人);x3:人均家庭设备用品及服务支出(元、人);x4:人均医疗保健支出(元、人);x5:人均交通和通讯支出(元、人);x6:人均娱乐教育文化服务支出(元、人);x7:人均居住支出(元、人);x8:人均杂项商品和服务支出(元、人).表7-1全国城镇

表7-1给出了我国31个省、市、自治区1999年城镇居民家庭平均每人全年消费支出的8个指标:

x1:人均食品支出(元、人);

x2:人均衣着商品支出(元、人);

x3:人均家庭设备用品及服务支出(元、人);

x4:人均医疗保健支出(元、人);

x5:人均交通和通讯支出(元、人);

x6:人均娱乐教育文化服务支出(元、人);

x7:人均居住支出(元、人);

x8:人均杂项商品和服务支出(元、人).

表7-1 全国城镇居民平均每人全年消费性支出的数据

(续表)

说明:在表7-1中,序号1—31,分别代表:北京,天津,河北,山西,内蒙古,辽宁,吉林,黑龙江,上海,江苏,浙江,安徽,福建,江西,山东,河南,湖北,湖南,广东,广西,海南,重庆,四川,贵州,云南,西藏,陕西,甘肃,青海,宁夏,新疆.

以下根据表7-1导入数据并画聚类图,其代码如下:

>x1=c(2959.19,2459.77,1495.63,1046.33,1303.97,1730.84,

+1561.86,1410.11,3712.31,2207.58,2629.16,1844.78,

+2709.46,1563.78,1675.75,1427.65,1783.43,1942.23,

+3055.17,2033.87,2057.86,2303.29,1974.28,1673.82,

+2194.25,2646.61,1472.95,1525.57,1654.69,1375.46,

+1608.82)

>x2=c(730.79,495.47,515.90,477.77,524.29,553.90,492.42,

+510.71,550.74,449.37,557.32,430.29,428.11,303.65,

+613.32,431.79,511.88,512.27,353.23,300.82,186.44,

+589.99,507.76,437.75,537.01,839.70,390.89,472.98,

+437.77,480.99,536.05)

>x3=c(749.41,697.33,362.37,290.15,254.83,246.91,200.49,

+211.88,893.37,572.40,689.73,271.28,334.12,233.81,

+550.71,288.55,282.84,401.39,564.56,338.65,202.72,

+516.21,344.79,461.61,369.07,204.44,447.95,328.90,

+258.78,273.84,432.46)

>x4=c(513.34,302.87,285.32,208.57,192.17,279.81,218.36,

+277.11,346.93,211.92,435.69,126.33,160.77,107.90,

+219.79,208.14,201.01,206.06,356.27,157.78,171.79,

+236.55,203.21,153.32,249.54,209.11,259.51,219.86,

+303.00,317.32,235.82)

>x5=c(467.87,284.19,272.95,201.50,249.81,239.18,220.69,

+224.65,527.00,302.09,514.66,250.56,405.14,209.70,

+272.59,217.00,237.60,321.29,811.88,329.06,329.65,

+403.92,240.24,254.66,290.84,379.30,230.61,206.65,

+244.93,251.08,250.28)(www.chuimin.cn)

>x6=c(1141.82,735.97,540.58,414.72,463.09,445.20,459.62,

+376.82,1034.98,585.23,795.87,513.18,461.67,393.99,

+599.43,337.76,617.74,697.22,873.06,621.74,477.17,

+730.05,575.10,445.59,561.91,371.04,490.90,449.69,

+479.53,424.75,541.30)

>x7=c(478.42,570.84,364.91,281.84,287.87,330.24,360.48,

+317.61,720.33,429.77,575.76,314.00,535.13,509.39,

+371.62,421.31,523.52,492.60,1082.82,587.02,312.93,

+438.41,430.36,346.11,407.70,269.59,469.10,249.66,

+288.56,228.73,344.85)

>x8=c(457.64,305.08,188.63,212.10,192.96,163.86,147.76,

+152.85,462.03,252.54,323.36,151.39,232.29,160.12,

+211.84,165.32,182.52,226.45,420.81,218.27,279.19,

+225.80,223.46,191.48,330.95,389.33,191.34,228.19,

+236.51,195.93,214.40)

>X=data.frame(x1,x2,x3,x4,x5,x6,x7,x8)

>row.names=c("1","2","3","4","5","6","7","8","9","10",

+"11","12","13","14","15","16","17","18","19","20",

+"21","22","23","24","25","26","27","28","29","30","31"),

>hc1<-hclust(d);hc2<-hclust(d,"average")

>hc3<-hclust(d,"complete")

>opar<-par(mfrow=c(2,1),mar=c(5.2,4,0,0))

>plot(hc1,hang=-1);re1<-rect.hclust(hc1,k=4,border="red")

>plot(hc2,hang=-1);re2<-rect.hclust(hc2,k=4,border="red")

>par(opar)

结果如图7-2所示.

图7-2 聚类图

根据图7-2,按照最长距离法(complete),分为四类:

第一类:西藏(序号:26)

第二类:广东(序号:19)

第三类:天津(序号:2),浙江(序号:11),北京(序号:1),上海(序号:9)

第四类:除上述第一,二,三类的其他省、市、自治区

根据图7-3,按照类平均法(average),分为四类:

第一类:西藏(序号:26)

第二类:广东(序号:19)

第三类:上海(序号:9),北京(序号:1),浙江(序号:11)

第四类:除上述第一,二,三类的其他省、市、自治区

以上两种聚类法的结果基本相同,只是天津有所不同(在最长距离法中,天津在第三类;而在类平均法中天津在第四类).