MdadmLVMext4
Preliminaries
For faster rebuild
# echo "dev.raid.speed_limit_min=100000 # set higher raid rebuild limit" >> /etc/sysctl.conf
For Faster Writes Add this to a script on boot ( rc.local )
# echo 16384 > /sys/block/md127/md/stripe_cache_size
Beware of RAM usage: Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB
After enabling SATA AHCI in BIOS!
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null bs=4M count=2500 2500+0 records in 2500+0 records out 10485760000 bytes (10 GB) copied, 57.1676 s, 183 MB/s
Build the array
# mdadm -v --create /dev/md127 -l 5 -c 512 --bitmap=internal --raid-devices=3 /dev/sdb /dev/sdd /dev/sde ------------------------------------------------------------------------------------------- [root@drewserv ~]# mdadm -D /dev/md127 /dev/md127: Version : 1.2 Creation Time : Fri Jul 22 22:24:38 2011 Raid Level : raid5 Array Size : 976770048 (931.52 GiB 1000.21 GB) Used Dev Size : 488385024 (465.76 GiB 500.11 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Jul 23 00:25:31 2011 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : 127 UUID : e9355597:7a4ff30b:21b71007:42443f63 Events : 21 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 48 1 active sync /dev/sdd 3 8 64 2 active sync /dev/sde
MD RAID5 Perf Tests
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 63.5414 s, 165 MB/s Write Test [root@drewserv ~]# cat /sys/block/md127/md/stripe_cache_size 256 [root@drewserv ~]# echo 16384 > /sys/block/md127/md/stripe_cache_size [root@drewserv ~]# cat /sys/block/md127/md/stripe_cache_size 16384 [root@drewserv ~]# dd if=/dev/zero of=/dev/md127 count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 74.9109 s, 140 MB/s
Create LVM VG/PV/LV
[root@drewserv ~]# vgcreate vg-raid5 /dev/md127 No physical volume label read from /dev/md127 Physical volume "/dev/md127" successfully created Volume group "vg-raid5" successfully created [root@drewserv ~]# vgdisplay -v vg-raid5 Using volume group(s) on command line Finding volume group "vg-raid5" --- Volume group --- VG Name vg-raid5 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 931.52 GiB PE Size 4.00 MiB Total PE 238469 Alloc PE / Size 0 / 0 Free PE / Size 238469 / 931.52 GiB VG UUID MbPYwg-Q12h-YoEq-oh1p-dlKH-HwaP-fiKRzq --- Physical volumes --- PV Name /dev/md127 PV UUID eeErmO-wm0Z-p870-6uJp-TcMJ-tgSm-NAq1BX PV Status allocatable Total PE / Free PE 238469 / 238469 root@drewserv ~]# lvcreate -l 178850 -n lv-raid5 vg-raid5 Logical volume "lv-raid5" created root@drewserv ~]# lvdisplay /dev/vg-raid5/lv-raid5 --- Logical volume --- LV Name /dev/vg-raid5/lv-raid5 VG Name vg-raid5 LV UUID QYuPPT-TqzR-bDC5-6Cjd-DYDj-W102-0oK4Or LV Write Access read/write LV Status available # open 0 LV Size 698.63 GiB Current LE 178850 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 253:4
LVM Performance Tests
[root@drewserv ~]# dd if=/dev/vg-raid5/lv-raid5 of=/dev/null count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 67.0551 s, 156 MB/s [root@drewserv ~]# dd if=/dev/zero of=/dev/md127 count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 71.3196 s, 147 MB/s
Create the filesystem (EXT4)
- Dry run for fs*
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 -n /dev/vg-raid5/lv-raid5 mke2fs 1.41.14 (22-Dec-2010) fs_types for mke2fs.conf resolution: 'ext4' Calling BLKDISCARD from 0 to 750151270400 failed. Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 45793280 inodes, 183142400 blocks 0 blocks (0.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 5590 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000
Real run
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 /dev/vg-raid5/lv-raid5 mke2fs 1.41.14 (22-Dec-2010) fs_types for mke2fs.conf resolution: 'ext4' Calling BLKDISCARD from 0 to 750151270400 failed. Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 45793280 inodes, 183142400 blocks 0 blocks (0.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 5590 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 34 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Filesystem Performance Tests
Write test
[root@drewserv raid5]# dd if=/dev/zero of=/mnt/raid5/10Gtest count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 73.8108 s, 142 MB/s
Read Test
[root@drewserv raid5]# dd if=/mnt/raid5/10Gtest of=/dev/null count=10000 bs=1M 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 85.8405 s, 122 MB/s test2 after rebuild [drew@drewserv raid5]$ sudo dd if=/dev/zero of=/mnt/raid5/10Gtest bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 126.455 s, 82.9 MB/s [drew@drewserv raid5]$ sudo dd if=/mnt/raid5/10Gtest of=/dev/null bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 76.3866 s, 137 MB/s
Add to /etc/fstab
# blkid | grep raid5 /dev/mapper/vg--raid5-lv--raid5: UUID="16bf772d-bd34-4a14-b314-cea1ed737040" TYPE="ext4" # echo "UUID="16bf772d-bd34-4a14-b314-cea1ed737040" /mnt/raid5 ext4 defaults 1 3" >> /etc/fstab
Restore data from backup
Copy data back:
[root@drewserv ~]# rsync -aqv --log-file=/root/bkup-raid5 /mnt/backup/backup/ /mnt/raid5/ 2011/07/23 17:28:17 [8373] building file list 2011/07/23 17:28:17 [8373] .d..t...... ./ . . . 2011/07/23 21:49:09 [8373] .d..t...... lost+found/ 2011/07/23 21:49:09 [8373] sent 428464277809 bytes received 1589811 bytes 27373637.92 bytes/sec 2011/07/23 21:49:09 [8373] total size is 428406181021 speedup is 1.00
NOTES NOTES NOTES
http://en.wikipedia.org/wiki/Mdadm http://www.mythtv.org/wiki/LVM_on_RAID#Performance_Enhancements_.2F_Tuning
Increasing RAID5 Performance To help RAID5 read/write performance, setting the read-ahead & stripe cache size[2] [3] for the array provides noticeable speed improvements. Note: This tip assumes sufficient RAM availability to the system. Insufficient RAM can lead to data loss/corruption
sets strip cache----
- echo 16384 > /sys/block/md127/md/stripe_cache_size
^----- Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB
Sets read-ahead---- / try 32768 , 65536 , 131072 , 262144
- blockdev --setra 16384 /dev/md127
- blockdev --setra 16384 /dev/video_vg/video_lv
- blockdev --setra 16384 /dev/sdb
- blockdev --setra 16384 /dev/sdd
- blockdev --setra 16384 /dev/sde
///////////////////////////////////////???????????????????
- Options used:
- blockdev --setra 1536 /dev/md3 (back to default)
- cat /sys/block/sd{e,g,i,k}/queue/max_sectors_kb
- value: 512
- value: 512
- value: 512
- value: 512
- Test with, chunksize of raid array (128)
- echo 128 > /sys/block/sde/queue/max_sectors_kb
- echo 128 > /sys/block/sdg/queue/max_sectors_kb
- echo 128 > /sys/block/sdi/queue/max_sectors_kb
- echo 128 > /sys/block/sdk/queue/max_sectors_kb
http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html
http://wiki.centos.org/HowTos/Disk_Optimization
For example if you have 4 drives in RAID5 and it is using 64K chunks and given a 4K file system block size. The stride size is calculated for the one disk by (chunk size / block size), (64K/4K) which gives 16. While the stripe width for RAID5 is 1 disk less, so we have 3 data-bearing disks out of the 4 in this RAID5 group, which gives us (number of data-bearing drives * stride size), (3*16) gives you a stripe width of 48.
When you create an ext3 partition in this manner, you would format it like this
mkfs.ext3 -b 4096 -E stride=16 -E stripe-width=48 -O dir_index /dev/XXXX
http://www.nickyeates.com/technology/unix/raid_filesystem_lvm Note: You would want to change the stride-width if you added disks to array.
tune2fs -E stride=n,stripe-width=m /dev/mdx
rabble