MdadmLVMext4: Difference between revisions
No edit summary |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Preliminaries == | ==Preliminaries== | ||
For faster rebuild | For faster rebuild | ||
< | <syntaxhighlight lang=bash> | ||
# echo "dev.raid.speed_limit_min=100000 # set higher raid rebuild limit" >> /etc/sysctl.conf | |||
</syntaxhighlight> | |||
For Faster Writes | For Faster Writes | ||
Add this to a script on boot ( rc.local ) | Add this to a script on boot ( rc.local ) | ||
< | <syntaxhighlight lang=bash> | ||
# echo 16384 > /sys/block/md127/md/stripe_cache_size | |||
</syntaxhighlight> | |||
'''Beware of RAM usage: Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB''' | '''Beware of RAM usage: Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB''' | ||
== Build the array== | After enabling SATA AHCI in BIOS! | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null bs=4M count=2500 | |||
2500+0 records in | |||
2500+0 records out | |||
10485760000 bytes (10 GB) copied, 57.1676 s, 183 MB/s | |||
</syntaxhighlight> | |||
==Build the array== | |||
<syntaxhighlight lang=bash> | |||
# mdadm -v --create /dev/md127 -l 5 -c 512 --bitmap=internal --raid-devices=3 /dev/sdb /dev/sdd /dev/sde | # mdadm -v --create /dev/md127 -l 5 -c 512 --bitmap=internal --raid-devices=3 /dev/sdb /dev/sdd /dev/sde | ||
------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | ||
Line 43: | Line 56: | ||
1 8 48 1 active sync /dev/sdd | 1 8 48 1 active sync /dev/sdd | ||
3 8 64 2 active sync /dev/sde | 3 8 64 2 active sync /dev/sde | ||
</ | </syntaxhighlight> | ||
=== MD RAID5 Perf Tests === | ===MD RAID5 Perf Tests=== | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null count=10000 bs=1M | |||
10000+0 records in | 10000+0 records in | ||
10000+0 records out | 10000+0 records out | ||
Line 60: | Line 74: | ||
10000+0 records in | 10000+0 records in | ||
10000+0 records out | 10000+0 records out | ||
10485760000 bytes (10 GB) copied, 74.9109 s, 140 MB/s</ | 10485760000 bytes (10 GB) copied, 74.9109 s, 140 MB/s | ||
</syntaxhighlight> | |||
== Create LVM VG/PV/LV == | ==Create LVM VG/PV/LV== | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# vgcreate vg-raid5 /dev/md127 | |||
No physical volume label read from /dev/md127 | No physical volume label read from /dev/md127 | ||
Physical volume "/dev/md127" successfully created | Physical volume "/dev/md127" successfully created | ||
Line 116: | Line 132: | ||
Read ahead sectors auto | Read ahead sectors auto | ||
- currently set to 4096 | - currently set to 4096 | ||
Block device 253:4</ | Block device 253:4 | ||
</syntaxhighlight> | |||
=== LVM Performance Tests === | ===LVM Performance Tests=== | ||
<syntaxhighlight lang=bash> | |||
[root@drewserv ~]# dd if=/dev/vg-raid5/lv-raid5 of=/dev/null count=10000 bs=1M | [root@drewserv ~]# dd if=/dev/vg-raid5/lv-raid5 of=/dev/null count=10000 bs=1M | ||
10000+0 records in | 10000+0 records in | ||
Line 128: | Line 146: | ||
10000+0 records out | 10000+0 records out | ||
10485760000 bytes (10 GB) copied, 71.3196 s, 147 MB/s | 10485760000 bytes (10 GB) copied, 71.3196 s, 147 MB/s | ||
</syntaxhighlight> | |||
==Create the filesystem (EXT4)== | |||
== Create the filesystem (EXT4) == | |||
*Dry run for fs* | *Dry run for fs* | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 -n /dev/vg-raid5/lv-raid5 | |||
mke2fs 1.41.14 (22-Dec-2010) | mke2fs 1.41.14 (22-Dec-2010) | ||
fs_types for mke2fs.conf resolution: 'ext4' | fs_types for mke2fs.conf resolution: 'ext4' | ||
Line 151: | Line 170: | ||
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, | 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, | ||
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, | 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, | ||
102400000</ | 102400000 | ||
</syntaxhighlight> | |||
Real run | Real run | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 /dev/vg-raid5/lv-raid5 | |||
mke2fs 1.41.14 (22-Dec-2010) | mke2fs 1.41.14 (22-Dec-2010) | ||
fs_types for mke2fs.conf resolution: 'ext4' | fs_types for mke2fs.conf resolution: 'ext4' | ||
Line 180: | Line 201: | ||
This filesystem will be automatically checked every 34 mounts or | This filesystem will be automatically checked every 34 mounts or | ||
180 days, whichever comes first. Use tune2fs -c or -i to override.</ | 180 days, whichever comes first. Use tune2fs -c or -i to override. | ||
</syntaxhighlight> | |||
=== Filesystem Performance Tests === | ===Filesystem Performance Tests=== | ||
Write test | Write test | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv raid5]# dd if=/dev/zero of=/mnt/raid5/10Gtest count=10000 bs=1M | |||
10000+0 records in | 10000+0 records in | ||
10000+0 records out | 10000+0 records out | ||
10485760000 bytes (10 GB) copied, 73.8108 s, 142 MB/s</ | 10485760000 bytes (10 GB) copied, 73.8108 s, 142 MB/s | ||
</syntaxhighlight> | |||
Read Test | Read Test | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv raid5]# dd if=/mnt/raid5/10Gtest of=/dev/null count=10000 bs=1M | |||
10000+0 records in | 10000+0 records in | ||
10000+0 records out | 10000+0 records out | ||
Line 203: | Line 228: | ||
10000+0 records in | 10000+0 records in | ||
10000+0 records out | 10000+0 records out | ||
10485760000 bytes (10 GB) copied, 76.3866 s, 137 MB/s</ | 10485760000 bytes (10 GB) copied, 76.3866 s, 137 MB/s | ||
</syntaxhighlight> | |||
=== Add to /etc/fstab === | ===Add to /etc/fstab=== | ||
< | <syntaxhighlight lang=bash> | ||
# blkid | grep raid5 | |||
/dev/mapper/vg--raid5-lv--raid5: UUID="16bf772d-bd34-4a14-b314-cea1ed737040" TYPE="ext4" | /dev/mapper/vg--raid5-lv--raid5: UUID="16bf772d-bd34-4a14-b314-cea1ed737040" TYPE="ext4" | ||
# echo "UUID="16bf772d-bd34-4a14-b314-cea1ed737040" /mnt/raid5 ext4 defaults 1 3" >> /etc/fstab</ | # echo "UUID="16bf772d-bd34-4a14-b314-cea1ed737040" /mnt/raid5 ext4 defaults 1 3" >> /etc/fstab | ||
</syntaxhighlight> | |||
== Restore data from backup== | ==Restore data from backup== | ||
Copy data back: | Copy data back: | ||
< | <syntaxhighlight lang=bash> | ||
[root@drewserv ~]# rsync -aqv --log-file=/root/bkup-raid5 /mnt/backup/backup/ /mnt/raid5/ | |||
2011/07/23 17:28:17 [8373] building file list | |||
2011/07/23 17:28:17 [8373] .d..t...... ./ | |||
. | . | ||
. | . | ||
. | . | ||
sent 428464277809 bytes received 1589811 bytes | 2011/07/23 21:49:09 [8373] .d..t...... lost+found/ | ||
total size is 428406181021 speedup is 1.00</ | 2011/07/23 21:49:09 [8373] sent 428464277809 bytes received 1589811 bytes 27373637.92 bytes/sec | ||
2011/07/23 21:49:09 [8373] total size is 428406181021 speedup is 1.00 | |||
</syntaxhighlight> | |||
Latest revision as of 01:14, 25 January 2018
Preliminaries
For faster rebuild
# echo "dev.raid.speed_limit_min=100000 # set higher raid rebuild limit" >> /etc/sysctl.conf
For Faster Writes Add this to a script on boot ( rc.local )
# echo 16384 > /sys/block/md127/md/stripe_cache_size
Beware of RAM usage: Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB
After enabling SATA AHCI in BIOS!
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null bs=4M count=2500
2500+0 records in
2500+0 records out
10485760000 bytes (10 GB) copied, 57.1676 s, 183 MB/s
Build the array
# mdadm -v --create /dev/md127 -l 5 -c 512 --bitmap=internal --raid-devices=3 /dev/sdb /dev/sdd /dev/sde
-------------------------------------------------------------------------------------------
[root@drewserv ~]# mdadm -D /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Fri Jul 22 22:24:38 2011
Raid Level : raid5
Array Size : 976770048 (931.52 GiB 1000.21 GB)
Used Dev Size : 488385024 (465.76 GiB 500.11 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jul 23 00:25:31 2011
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : 127
UUID : e9355597:7a4ff30b:21b71007:42443f63
Events : 21
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 48 1 active sync /dev/sdd
3 8 64 2 active sync /dev/sde
MD RAID5 Perf Tests
[root@drewserv ~]# dd if=/dev/md127 of=/dev/null count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 63.5414 s, 165 MB/s
Write Test
[root@drewserv ~]# cat /sys/block/md127/md/stripe_cache_size
256
[root@drewserv ~]# echo 16384 > /sys/block/md127/md/stripe_cache_size
[root@drewserv ~]# cat /sys/block/md127/md/stripe_cache_size
16384
[root@drewserv ~]# dd if=/dev/zero of=/dev/md127 count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 74.9109 s, 140 MB/s
Create LVM VG/PV/LV
[root@drewserv ~]# vgcreate vg-raid5 /dev/md127
No physical volume label read from /dev/md127
Physical volume "/dev/md127" successfully created
Volume group "vg-raid5" successfully created
[root@drewserv ~]# vgdisplay -v vg-raid5
Using volume group(s) on command line
Finding volume group "vg-raid5"
--- Volume group ---
VG Name vg-raid5
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 931.52 GiB
PE Size 4.00 MiB
Total PE 238469
Alloc PE / Size 0 / 0
Free PE / Size 238469 / 931.52 GiB
VG UUID MbPYwg-Q12h-YoEq-oh1p-dlKH-HwaP-fiKRzq
--- Physical volumes ---
PV Name /dev/md127
PV UUID eeErmO-wm0Z-p870-6uJp-TcMJ-tgSm-NAq1BX
PV Status allocatable
Total PE / Free PE 238469 / 238469
root@drewserv ~]# lvcreate -l 178850 -n lv-raid5 vg-raid5
Logical volume "lv-raid5" created
root@drewserv ~]# lvdisplay /dev/vg-raid5/lv-raid5
--- Logical volume ---
LV Name /dev/vg-raid5/lv-raid5
VG Name vg-raid5
LV UUID QYuPPT-TqzR-bDC5-6Cjd-DYDj-W102-0oK4Or
LV Write Access read/write
LV Status available
# open 0
LV Size 698.63 GiB
Current LE 178850
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 4096
Block device 253:4
LVM Performance Tests
[root@drewserv ~]# dd if=/dev/vg-raid5/lv-raid5 of=/dev/null count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 67.0551 s, 156 MB/s
[root@drewserv ~]# dd if=/dev/zero of=/dev/md127 count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 71.3196 s, 147 MB/s
Create the filesystem (EXT4)
- Dry run for fs*
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 -n /dev/vg-raid5/lv-raid5
mke2fs 1.41.14 (22-Dec-2010)
fs_types for mke2fs.conf resolution: 'ext4'
Calling BLKDISCARD from 0 to 750151270400 failed.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
45793280 inodes, 183142400 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
5590 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Real run
[root@drewserv ~]# mkfs.ext4 -v -E stride=128 -E stripe-width=256 -m 0 /dev/vg-raid5/lv-raid5
mke2fs 1.41.14 (22-Dec-2010)
fs_types for mke2fs.conf resolution: 'ext4'
Calling BLKDISCARD from 0 to 750151270400 failed.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
45793280 inodes, 183142400 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
5590 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 34 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Filesystem Performance Tests
Write test
[root@drewserv raid5]# dd if=/dev/zero of=/mnt/raid5/10Gtest count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 73.8108 s, 142 MB/s
Read Test
[root@drewserv raid5]# dd if=/mnt/raid5/10Gtest of=/dev/null count=10000 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 85.8405 s, 122 MB/s
test2 after rebuild
[drew@drewserv raid5]$ sudo dd if=/dev/zero of=/mnt/raid5/10Gtest bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 126.455 s, 82.9 MB/s
[drew@drewserv raid5]$ sudo dd if=/mnt/raid5/10Gtest of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 76.3866 s, 137 MB/s
Add to /etc/fstab
# blkid | grep raid5
/dev/mapper/vg--raid5-lv--raid5: UUID="16bf772d-bd34-4a14-b314-cea1ed737040" TYPE="ext4"
# echo "UUID="16bf772d-bd34-4a14-b314-cea1ed737040" /mnt/raid5 ext4 defaults 1 3" >> /etc/fstab
Restore data from backup
Copy data back:
[root@drewserv ~]# rsync -aqv --log-file=/root/bkup-raid5 /mnt/backup/backup/ /mnt/raid5/
2011/07/23 17:28:17 [8373] building file list
2011/07/23 17:28:17 [8373] .d..t...... ./
.
.
.
2011/07/23 21:49:09 [8373] .d..t...... lost+found/
2011/07/23 21:49:09 [8373] sent 428464277809 bytes received 1589811 bytes 27373637.92 bytes/sec
2011/07/23 21:49:09 [8373] total size is 428406181021 speedup is 1.00
NOTES NOTES NOTES
http://en.wikipedia.org/wiki/Mdadm http://www.mythtv.org/wiki/LVM_on_RAID#Performance_Enhancements_.2F_Tuning
Increasing RAID5 Performance To help RAID5 read/write performance, setting the read-ahead & stripe cache size[2] [3] for the array provides noticeable speed improvements. Note: This tip assumes sufficient RAM availability to the system. Insufficient RAM can lead to data loss/corruption
sets strip cache----
- echo 16384 > /sys/block/md127/md/stripe_cache_size
^----- Value in pages per device, i.e. on 4 devices at 8192 comes out to 128MB
Sets read-ahead---- / try 32768 , 65536 , 131072 , 262144
- blockdev --setra 16384 /dev/md127
- blockdev --setra 16384 /dev/video_vg/video_lv
- blockdev --setra 16384 /dev/sdb
- blockdev --setra 16384 /dev/sdd
- blockdev --setra 16384 /dev/sde
///////////////////////////////////////???????????????????
- Options used:
- blockdev --setra 1536 /dev/md3 (back to default)
- cat /sys/block/sd{e,g,i,k}/queue/max_sectors_kb
- value: 512
- value: 512
- value: 512
- value: 512
- Test with, chunksize of raid array (128)
- echo 128 > /sys/block/sde/queue/max_sectors_kb
- echo 128 > /sys/block/sdg/queue/max_sectors_kb
- echo 128 > /sys/block/sdi/queue/max_sectors_kb
- echo 128 > /sys/block/sdk/queue/max_sectors_kb
http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html
http://wiki.centos.org/HowTos/Disk_Optimization
For example if you have 4 drives in RAID5 and it is using 64K chunks and given a 4K file system block size. The stride size is calculated for the one disk by (chunk size / block size), (64K/4K) which gives 16. While the stripe width for RAID5 is 1 disk less, so we have 3 data-bearing disks out of the 4 in this RAID5 group, which gives us (number of data-bearing drives * stride size), (3*16) gives you a stripe width of 48.
When you create an ext3 partition in this manner, you would format it like this
mkfs.ext3 -b 4096 -E stride=16 -E stripe-width=48 -O dir_index /dev/XXXX
http://www.nickyeates.com/technology/unix/raid_filesystem_lvm Note: You would want to change the stride-width if you added disks to array.
tune2fs -E stride=n,stripe-width=m /dev/mdx