Linux System and Performance Monitoring(总结篇)
来源:互联网 发布:js new date gmt 编辑:程序博客网 时间:2024/05/20 00:15
Linux System and PerformanceMonitoring(总结篇)
Date: 2009.07.21
Author: Darren Hoch
结束语:这是该译文的最后一篇,在这篇中,作者提供了一个案例环境,用之前几篇所阐述的理论以及涉及到的工具,对其进行一个整体的系统性能检查.对大家更好理解系统性能监控,进行一次实战演习.
BTW:
附录 A: 案例学习 - 性能监控之循序渐进
某一天,一个客户打电话来需要技术帮助,并抱怨平常15秒就可以打开的网页现在需要20分钟才可以打开.
具体系统配置如下:
RedHat Enterprise Linux 3 update7
Dell 1850 Dual Core Xenon Processors, 2 GB RAM, 75GB 15KDrives
Custom LAMP software stack(
性能分析之步骤
1. 首先使用vmstat查看大致的系统性能情况:
# vmstat 1 10
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy idwa
1 0 249844 19144 18532 1221212 0 0 7 3 22 17 25 8 1718
0 1 249844 17828 18528 1222696 0 0 40448 8 1384 1138 13 7 6514
0 1 249844 18004 18528 1222756 0 0 13568 4 623 534 3 4 5637
2 0 249844 17840 18528 1223200 0 0 35200 0 1285 1017 17 7 5620
1 0 249844 22488 18528 1218608 0 0 38656 0 1294 1034 17 7 5818
0 1 249844 21228 18544 1219908 0 0 13696 484 609 559 5 3 5438
0 1 249844 17752 18544 1223376 0 0 36224 4 1469 1035 10 6 6717
1 1 249844 17856 18544 1208520 0 0 28724 0 950 941 33 12 497
1 0 249844 17748 18544 1222468 0 0 40968 8 1266 1164 17 9 5916
1 0 249844 17912 18544 1222572 0 0 41344 12 1237 1080 13 8 6513
分析:
1,
2,CPU 方面也没有太大问题,尽管有一些运行队列(procsr),但处理器还始终有50%多的idle(CPUid).
3,有太多的上下文切换(cs)以及diskblock从RAM中被读入(bo).
4,CPU 还有平均20%的I/O 等待情况.
结论:
2. 然后使用iostat检查是谁在发出IO 请求:
# iostat -x 1
Linux 2.4.21-40.ELsmp (mail.example.com) 03/26/2007
avg-cpu: %user %nice %sys%idle
30.00 0.00 9.33 60.67
Device: rrqm/s wrqm/s r/s w/s rsec/swsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm%util
/dev/sda 7929.01 30.34 1180.91 14.23 7929.01 357.84 3964.50 178.926.93 0.39 0.03 0.06 6.69
/dev/sda1 2.67 5.46 0.40 1.76 24.62 57.77 12.31 28.88 38.11 0.062.78 1.77 0.38
/dev/sda2 0.00 0.30 0.07 0.02 0.57 2.57 0.29 1.28 32.86 0.00 3.812.64 0.03
/dev/sda3 7929.01 24.58 1180.44 12.45 7929.01 297.50 3964.50 148.756.90 0.32 0.03 0.06 6.68
avg-cpu: %user %nice %sys%idle
9.50 0.00 10.68 79.82
Device: rrqm/s wrqm/s r/s w/s rsec/swsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm%util
/dev/sda 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.69 3.600.99 117.86
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00
/dev/sda3 0.00 0.00 1195.24 0.00 0.00 0.00 0.00 0.00 0.00 43.693.60 0.99 117.86
avg-cpu: %user %nice %sys%idle
9.23 0.00 10.55 79.22
Device: rrqm/s wrqm/s r/s w/s rsec/swsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm%util
/dev/sda 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.65 2.120.99 112.51
/dev/sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00
/dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00
/dev/sda3 0.00 0.00 1200.37 0.00 0.00 0.00 0.00 0.00 0.00 41.652.12 0.99 112.51
分析:
1,
2,差不多有1200读IOPS,磁盘本身是支持200 IOPS左右(译注:参考之前的IOPS 计算公式).
3,有超过2秒,实际上没有一个读磁盘(rkb/s).这和在vmstat 看到有大量I/Owait是有关系的.
4,大量的readIOPS(r/s)和在vmstat中大量的上下文是匹配的.这说明很多读操作都是失败的.
结论:
3. 使用top来查找系统最活跃的应用程序
# top -d 1
11:46:11 up 3 days, 19:13, 1 user, load average: 1.72, 1.87,1.80
176 processes: 174 sleeping, 2 running, 0 zombie, 0stopped
CPU states: cpu user nice system irq softirq iowaitidle
total 12.8% 0.0% 4.6% 0.2% 0.2% 18.7% 63.2%
cpu00 23.3% 0.0% 7.7% 0.0% 0.0% 36.8% 32.0%
cpu01 28.4% 0.0% 10.7% 0.0% 0.0% 38.2%22.5%
cpu02 0.0% 0.0% 0.0% 0.9% 0.9% 0.0% 98.0%
cpu03 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0%
Mem: 2055244k av, 2032692k used, 22552k free, 0k shrd, 18256kbuff
1216212k actv, 513216k in_d, 25520k in_c
Swap: 4192956k av, 249844k used, 3943112k free 1218304kcached
PID USER PR NI VIRT RES SHR S %CPU%MEM TIME+ COMMAND
14939 mysql 25 0 379M 224M 1117 R 38.2 25.7% 15:17.78mysqld
4023 root 15 0 2120 972 784 R 2.0 0.3 0:00.06top
1 root 15 0 2008 688 592 S 0.0 0.2 0:01.30init
2 root 34 19 0 0 0 S 0.0 0.0 0:22.59ksoftirqd/0
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00watchdog/0
4 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 events/0
分析:
1,
2,在top(wa)看到的数值,和在vmstat 看到的wio 数值是有关联的.
结论:
4. 现在已经确定是mysql在发出读请求,使用strace 来检查它在读请求什么.
# strace -p 14939
Process 14939 attached - interrupt toquit
read(29, "\3\1\237\1\366\337\1\222%\4\2\0\0\0\0\0012P/d", 20) =20
read(29, "ata1/strongmail/log/strongmail-d"..., 399) =399
_llseek(29, 2877621036, [2877621036], SEEK_SET) =0
read(29, "\1\1\241\366\337\1\223%\4\2\0\0\0\0\0012P/da", 20) =20
read(29, "ta1/strongmail/log/strongmail-de"..., 400) =400
_llseek(29, 2877621456, [2877621456], SEEK_SET) =0
read(29, "\1\1\235\366\337\1\224%\4\2\0\0\0\0\0012P/da", 20) =20
read(29, "ta1/strongmail/log/strongmail-de"..., 396) =396
_llseek(29, 2877621872, [2877621872], SEEK_SET) =0
read(29, "\1\1\245\366\337\1\225%\4\2\0\0\0\0\0012P/da", 20) =20
read(29, "ta1/strongmail/log/strongmail-de"..., 404) =404
_llseek(29, 2877622296, [2877622296], SEEK_SET) =0
read(29, "\3\1\236\2\366\337\1\226%\4\2\0\0\0\0\0012P/d", 20) =20
分析:
1,
2,看上去似乎是,某一sql查询导致读操作.
结论:
5. 使用mysqladmin命令,来查找是哪个慢查询导致的.
# ./mysqladmin -pstrongmailprocesslist
+----+------+-----------+------------+---------+------+----------+----------------------------------------
| Id | User | Host | db | Command | Time | State |Info
+----+------+-----------+------------+---------+------+----------+----------------------------------------
| 1 | root | localhost | strongmail | Sleep | 10 ||
| 2 | root | localhost | strongmail | Sleep | 8 ||
| 3 | root | localhost | root | Query | 94 | Updating | update`failures` set
`update_datasource`='Y' where database_id='32' andupdate_datasource='N' and |
| 14 | root | localhost | | Query | 0 | | showprocesslist
分析:
1,MySQL
2,基于这个update查询,数据库是对所有的table 进行索引.
结论:
后续
把以上这些性能信息移交给了相关开发人员,用于分析他们的PHP 代码.一个开发人员对代码进行了临时性优化.某个查询如果出错了,也最多到100K记录.数据库本身考虑最多存在4百万记录.最后,这个查询不会再给数据库带来负担了.
References
• Ezlot, Phillip – Optimizing Linux Performance, Prentice Hall,Princeton NJ 2005 ISBN – 0131486829
• Johnson, Sandra K., Huizenga, Gerrit – Performance Tuning forLinux Servers, IBM Press, Upper Saddle River NJ 2005 ISBN013144753X
• Bovet, Daniel Cesati, Marco – Understanding the Linux Kernel,O’Reilly Media, Sebastoppl CA 2006, ISBN0596005652
• Blum, Richard – Network Performance Open Source Toolkit, Wiley,Indianapolis IN 2003, ISBN 0-471-43301-2
• Understanding Virtual Memory in RedHat 4, Neil Horman,12/05
• IBM, Inside the Linux Scheduler,
• Aas, Josh, Understanding the Linux 2.6.8.1 CPUScheduler,
• Wieers, Dag, Dstat: Versatile Resource StatisticsTool,
转自:http://www.sanotes.net/html/y2009/370.html
- Linux System and Performance Monitoring(总结篇)
- Linux System and Performance Monitoring(总结篇)
- [转] Linux System and Performance Monitoring(总结篇)
- Linux System and Performance Monitoring(CPU篇)
- Linux System and Performance Monitoring(Memory篇)
- Linux System and Performance Monitoring(Network篇)
- Linux System and Performance Monitoring(CPU篇)
- Linux System and Performance Monitoring(Memory篇)
- Linux System and Performance Monitoring(Network篇)
- Linux System and Performance Monitoring
- Linux System and Performance Monitoring
- Linux System and Performance Monitoring
- [转] Linux System and Performance Monitoring(CPU篇)
- [转] Linux System and Performance Monitoring(I/O篇)
- [转] Linux System and Performance Monitoring(Memory篇)
- [转] Linux System and Performance Monitoring(Network篇)
- Linux System and Performance Monitoring(I/O篇)
- Linux System and Performance Monitoring(I/O篇)
- python初步实现word2vec
- Linux System and Performance Monitoring(Network篇)
- 上拉和下拉电阻作用
- Android动画之LayoutAnimation
- android下拉菜单三级联动
- Linux System and Performance Monitoring(总结篇)
- Android中的Binder机制的简要理解
- 使用Dynamic Shortcuts
- Java EE项目中异常设计及处理总结
- 【Visual Studio】VS 工程目录与文件
- JQuery----倒计时插件downCount
- ionic开发介绍之SASS介绍
- 正则表达式思维导图
- [事务] -- 事务(Transaction)