这两天在调优数据库性能的过程中需要降低操作系统文件Cache对数据库性能的影响，故调研了一些降低文件系统缓存大小的方法，其中一种是通过修改/proc/sys/vm/dirty_background_ration以及/proc/sys/vm/dirty_ratio两个参数的大小来实现。看了不少相关博文的介绍，不过一直弄不清楚这两个参数的区别在哪里，后来看了下面的一篇英文博客才大致了解了它们的不同。

vm.dirty_background_ratio:这个参数指定了当文件系统缓存脏页数量达到系统内存百分之多少时（如5%）就会触发pdflush/flush/kdmflush等后台回写进程运行，将一定缓存的脏页异步地刷入外存；

vm.dirty_ratio:而这个参数则指定了当文件系统缓存脏页数量达到系统内存百分之多少时（如10%），系统不得不开始处理缓存脏页（因为此时脏页数量已经比较多，为了避免数据丢失需要将一定脏页刷入外存）；在此过程中很多应用进程可能会因为系统转而处理文件IO而阻塞。

之前一直错误的一位dirty_ratio的触发条件不可能达到，因为每次肯定会先达到vm.dirty_background_ratio的条件，后来才知道自己理解错了。确实是先达到vm.dirty_background_ratio的条件然后触发flush进程进行异步的回写操作，但是这一过程中应用进程仍然可以进行写操作，如果多个应用进程写入的量大于flush进程刷出的量那自然会达到vm.dirty_ratio这个参数所设定的坎，此时操作系统会转入同步地处理脏页的过程，阻塞应用进程。

附上原文：

Better Linux Disk Caching & Performance with vm.dirty_ratio& vm.dirty_background_ratio

by BOBPLANKERS on DECEMBER22, 2013

in BESTPRACTICES,CLOUD,SYSTEMADMINISTRATION,VIRTUALIZATION

This is post #16 in myDecember 2013 series about Linux Virtual Machine PerformanceTuning. For more, please see the tag “Linux VMPerformance Tuning.”

In previous postson vm.swappiness and usingRAM disks we talked about how the memory on aLinux guest is used for the OS itself (the kernel, buffers, etc.),applications, and also for file cache. File caching is an importantperformance improvement, and read caching is a clear win in mostcases, balanced against applications using the RAM directly. Writecaching is trickier. The Linux kernel stages disk writes intocache, and over time asynchronously flushes them to disk. This hasa nice effect of speeding disk I/O but it is risky. When data isn’twritten to disk there is an increased chance of losing it.

There is also thechance that a lot of I/O will overwhelm the cache, too. Everwritten a lot of data to disk all at once, and seen large pauses onthe system while it tries to deal with all that data? Those pausesare a result of the cache deciding that there’s too much data to bewritten asynchronously (as a non-blocking background operation,letting the application process continue), and switches to writingsynchronously (blocking and making the process wait until the I/Ois committed to disk). Of course, a filesystem also has to preservewrite order, so when it starts writing synchronously it first hasto destage the cache. Hence the long pause.

The nice thing isthat these are controllable options, and based on your workloads& data you can decide how you want to set them up. Let’s take alook:

$ sysctl -a | grep dirty vm.dirty_background_ratio = 10 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000

vm.dirty_background_ratio isthe percentage of system memory that can be filled with “dirty”pages — memory pages that still need to be written to disk — beforethe pdflush/flush/kdmflush background processes kick in to write itto disk. My example is 10%, so if my virtual server has 32 GB ofmemory that’s 3.2 GB of data that can be sitting in RAM beforesomething is done.

vm.dirty_ratio isthe absolute maximum amount of system memory that can be filledwith dirty pages before everything must get committed to disk. Whenthe system gets to this point all new I/O blocks until dirty pageshave been written to disk. This is often the source of long I/Opauses, but is a safeguard against too much data being cachedunsafely in memory.

vm.dirty_background_bytes and vm.dirty_bytes areanother way to specify these parameters. If you set the _bytesversion the _ratio version will become 0, and vice-versa.

vm.dirty_expire_centisecs ishow long something can be in cache before it needs to be written.In this case it’s 30 seconds. When the pdflush/flush/kdmflushprocesses kick in they will check to see how old a dirty page is,and if it’s older than this value it’ll be written asynchronouslyto disk. Since holding a dirty page in memory is unsafe this isalso a safeguard against data loss.

vm.dirty_writeback_centisecs ishow often the pdflush/flush/kdmflush processes wake up and check tosee if work needs to be done.

You can also seestatistics on the page cache in /proc/vmstat:

$ cat /proc/vmstat | egrep "dirty|writeback" nr_dirty 878 nr_writeback 0 nr_writeback_temp 0

In my case I have878 dirty pages waiting to be written to disk.

Approach 1: Decreasing the Cache

As with mostthings in the computer world, how you adjust these depends on whatyou’re trying to do. In many cases we have fast disk subsystemswith their own big, battery-backed NVRAM caches, so keeping thingsin the OS page cache is risky. Let’s try to send I/O to the arrayin a more timely fashion and reduce the chance our local OS will,to borrow a phrase from the service industry, be “in theweeds.” To do this we lower vm.dirty_background_ratio andvm.dirty_ratio by adding new numbers to /etc/sysctl.conf andreloading with “sysctl –p”:

vm.dirty_background_ratio = 5vm.dirty_ratio = 10

This is a typicalapproach on virtual machines, as well as Linux-basedhypervisors. I wouldn’t suggest setting theseparameters to zero, as some background I/O is nice to decoupleapplication performance from short periods of higher latency onyour disk array & SAN (“spikes”).

Approach 2: Increasing the Cache

There arescenarios where raising the cache dramatically has positive effectson performance. These situations are where the data contained on aLinux guest isn’t critical and can be lost, and usually where anapplication is writing to the same files repeatedly or inrepeatable bursts. In theory, by allowing more dirty pages to existin memory you’ll rewrite the same blocks over and over in cache,and just need to do one write every so often to the actual disk. Todo this we raise the parameters:

vm.dirty_background_ratio = 50vm.dirty_ratio = 80

Sometimes folksalso increase the vm.dirty_expire_centisecs parameter to allow moretime in cache. Beyond the increased risk of data loss, you also runthe risk of long I/O pauses if that cache gets full and needs todestage, because on large VMs there will be a lot of data incache.

Approach 3: Both Ways

There are alsoscenarios where a system has to deal with infrequent, burstytraffic to slow disk (batch jobs at the top of the hour, midnight,writing to an SD card on a Raspberry Pi, etc.). In that case anapproach might be to allow all that write I/O to be deposited inthe cache so that the background flush operations can deal with itasynchronously over time:

vm.dirty_background_ratio = 5vm.dirty_ratio = 80

Here thebackground processes will start writing right away when it hitsthat 5% ceiling but the system won’t force synchronous I/O until itgets to 80% full. From there you just size your system RAM andvm.dirty_ratio to be able to consume all the written data. Again,there are tradeoffs with data consistency on disk, which translatesinto risk to data. Buy a UPS and make sure you can destage cachebefore the UPS runs out of power. :)

No matter theroute you choose you should always be gathering hard data tosupport your changes and help you determine if you are improvingthings or making them worse. In this case you can get data frommany different places, including the application itself,/proc/vmstat, /proc/meminfo, iostat, vmstat, and many of the thingsin /proc/sys/vm. Good luck!

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。