Introduction
Linux has robust services for handling and managing devices, including storage devices
But how does Linux do it? How does it represent these devices and make it usable to us
What is Block Storage
Block Storage is another name for a block device (is what Linux would call it)
Block devices are hardware that is designed to store and retrieve data at relatively high speeds ( I say that because it always seems it is never fast enough)
The most common type of storage device in years past was the spinning hard disk drive. Otherwise known as Spinning Rust.
Today those older Iron Oxide based devices are being replaced with solid state devices (SSD). Flash memory stick, SD Card, etc.
But why does Linux call them block devices? Its because the kernel interfaces with the hardware by reading and writing in chunks or fixed sized blocks.
So block devices are devices we can mount anywhere we wish on our file system. Once they are mounted we can read and write data seamlessly.
What are Disk Partitions
Disk partitions are a way of dividing up the space on a disk drive so we can 1) install different filesystems on them, or mount them in different places in the linux file tree.
So you can think of a partition as a way to segment our data storage so they can be used for different purposes. (Say boot the computer) or be my home directory…etc)
In some systems you do not need a partition, but other you do. It is generally recommended to partition your drives so you have more flexibility down the road if you need to change it.
MBR and GPT
Partitioning a disk boils down to using one of two different choices, the Master Boot Record (MBR) format or the GUID Partition Table (GPT), GUID stands for Globally Unique Identifiers.
MBR is the older partition type and it can only support disk drives with 2TB or less of storage, MBR also as a maximum of 4 partitions, although there is a trick you can use to make additional “logical partitions) using Extents.
GPT fixes both problems of MBR it can handle disk sizes up to 9,400,000,000 TB. That should be just enough to hold my music collection
GPT can have up to 128 partitions
So your first choice should be easier since GPT is generally preferred unless you are running on some weird operating system that prevents you from using MBR or GPT.
Formatting and Filesystems
Once we partition the drive the next step is to format it, but before we can do that, we need to choose a filesystem for it
File System Choices
Ext4 – The most popular format to use on Linux, it is the fourth version of the extended file system. Ext4 is journaled and is highly tuned to supporting operating system workloads.
XFS – comes from the mighty Silicon Graphics Computers, it has been enhanced, adopted and drives most of the high performance server workloads in Linux, it is the default file system for RedHat. XFS also uses journaling, but it journals metadata only. While this can lead to faster performance it can also lead to data corruption in the event of an abrupt power failure
Btrfs – is a modern feature rich copy-on-write filesystem. Its architecture allows for some volume management functionality including snapshots, cloning, volumes, etc. Btrfs still has a few known problems particularly with full disks and there is some debate about its suitability for production workloads, but none the less it has become the default filesystem for Fedora
ZFS – is a copy-on-write file system and volume manager with robust and mature feature set, including snapshotting, cloning, organizing volumes into RAID-like arrays (they are better than RAIDS IMHO). ZFS has a controversial history because of the license chosen to release it into the open source community. Ubuntu now ships a kernel and distor which allows its installation as a root filesystem and Debian includes the source code for ZFS in its repositories.
And yes, there are others, but we would be here all day listing them, go to wikipedia if you want a complete list
How Linux Manages Storage Devices
Unix pioneered the concept of “everything” is a file. Linux carries on this tradition and also includes hardware like storage devices which are represented on the filesystem as a file. The first disk drive on a Linux system light look like this /dev/sda
The first partition on that /dev/sda would be /dev/sda1
/dev/sda is a symbolic link which points back to a kernel defined hardware name
There are special links stored in /dev as well you may see one or more of the following:
/dev/disk/by-partlabel – uses user defined label names for a partition (GPT)
Dev/disk/by-partuuid – used uuid’s for partitions (GPT)
/dev/disk/by-label – uses user defined label names for a disk or partitions
/dev/disk/by-uuid – generated at time the disk is formatted, but unique among all devices on the system
/dev/disk/by-id – used links generated by hardware serial numbers
/dev/disk/by-path – like by-id the links are constructed from the systems interpretation of the hardware used to access the device
Usually by-label and by-id are the best choices to use, but you will find most distress will use by-uuid for the ones they create during install
Mounting Block Devices
The /dev/ file is used to communicate with the Linux kernel, but there is more to the story than that
To mount a file system you have to pick or create a place on the system file tree to mount your device, UNIX used to follow a convention which has either been forgotten or lost by Linux, but here it is
What conventions you come up with are all up to you. Just don’t choose one that is already in use or mounted, also avoid using /mnt as this one is generally reserved for mounting removable media like USB sticks, CD/DVD Drives
In any case picks scheme and be consistent
Making Mounts Permanent with /etc/fstab
The fstab file is used to hold definitions about block devices, filesystems, mount points and mounting options
Usually a filesystem not defined in fstab will not be automatically mounted, although there are other ways to do this by defining them to systemd.mount but that is not common
See the man page for fstab for additional information
More Complex Storage Management
RAID
You can also group devices together to form larger logical disk structures with varying degrees of redundancy and performance
RAID 0: is a striping method which spreads data more or less evenly between disk in the set. Read performance is spread between drives in the set, while writes are stripped across drives in the set.
A Single drive failure will result in a total loss of data
RAID 1: is driving mirroring. Anything written to a RAID 1 array is written to multiple disk drives. You can lose drives in either side of the mirror, and the remaining drives can be used to reconstruct the RAID, however RAID 1 reduces the total disk capacity in half
RAID 5: uses distributed parity to provide data redundancy for single drive failures. The RAID can be rebuilt from the remaining drives. Data capacity is reduced by one disk drive
RAID 6: offers double distributed parity to provide data redundancy for up to 2 disk failures. Again the RAID can be rebuilt from the remaining drives however this type of RAID takes the longer time to recover and the total capacity is reduced by 2 disk drives
RAID 10: is a combination of RAID 0 and RAID 1 so it offers some redundancy while providing good performance However it will take 1/2 of the total capacity of the combined disk drives
LVM
LVM is Logical Volume Management it is a system which manages logical devices from the the physical characteristics of the underlying storage devices . LVM allows you to create groups of physical devices and manage it as if it were one single block of space.
- Create a partition using disk
- Create Physical Volumes from a disk partition
- Create a Volume Group from Physical Volume(s)
- Create a Logical Volume from a Volume Group
- Format the Logical Volume using a filesystem type (btrfs, XFS, EXT4, etc.)
- Mount the filesystem
- Manage the Logical Volume (extending, snapshotting, etc)
LVM can do things regular partitions and filesystem simply can not do. For instance you can expand partitions, create partitions that span multiple physical disk drives, take live snapshots of partitions and move volumes to different physical disk drives.
You can also combine LVM with RAID to provide additional flexibility to RAID file systems