The building blocks of containers, such as the Kernel primitives, have been around for a long time, so they are not all new when considering their security. However, the container itself is not a kernel construct; it is an abstraction of using features of the host operating system kernel. For Docker container security, these kernel primitives are the namespaces and control groups that allow the abstraction of the container.
Docker uses control groups to control the resources that workloads have to host resources. As a result, Docker allows you to easily implement system controls with these container workloads. Fortunately, much of the control group complexity is hidden behind the Docker API, making containers much easier to use. Then we have namespaces that control what a container can see. A namespace allows us to take an O/S with all its resources, such as filesystems, and carve it into virtual operating systems called containers. Namespaces are like a visual boundary, and there are several different namespaces.
Diagram: Docker Container Security Tools: Control Groups and Namespaces.
- Containerized processes
Containers are often referred to as “containerized processes.” Essentially, a container is a Linux process running on a host machine. However, the process has a limited view of the host and can access a subtree of the filesystem. Therefore, it would be best to consider a container a process with a restricted view. Namespace and resource restrictions provide the limited view provided by control groups. The inside of the container looks similar to that of a V.M. with isolated processes, networking, and file system access. However, it looks like a normal process running on the host machine from the outside.
- Recap: Docker Container Security and Protection
The first thing to consider when starting Docker container security is that containers run as root by default and share the Kernel of the Host OS. They rely on the boundaries created by namespaces for isolation and control groups to prevent one container from consuming resources negatively. So here, we can prevent things like a noisy neighbor, where one application uses up all resources on the system affecting other applications from performing adequately on the same system. In the early days of containers, this is how container protection started with namespace and control groups, and the protection was not perfect. For example, it cannot prevent all interference in resources that the operating system kernel does not manage.
So we need to move to a higher abstraction layer with container images. The container images encapsulate your application code and any dependency, third-party packages, and libraries. Images are our built assets representing all the fields to run our application on top of the Linux kernel. In addition, images are used to create containers so that we can provide additional Docker container security here.
- Security Concerns. Image and Supply Chain
To run a container, we need to pull images. The images are pulled locally or from remote registries, and we can have vulnerabilities here. Your hosts connected to the registry may be secure, but that does not mean that the image you are pulling is secure. Traditional security appliances are blind to malware and other vulnerabilities in images as they are looking for different signatures. There are several security concerns here. Users can pull full or bloated images from untrusted registries or images containing malware. As a result, we need to consider the container threats in both runtimes and the supply chain for effective container security.
Diagram: Docker container security supply chain threats.
- Security Concerns: Container Breakouts
The container process is visible from the host. Therefore, if a bad actor gets access to the host with the correct privileges, it can compromise all the containers on the host. If an application can read the memory that belongs to your application, it can access your data. So you need to ensure that your applications are safely isolated from each other. If your application runs on separate physical machines, accessing another application’s memory is impossible. From the security perspective, physical isolation is the strongest but is often not always possible.
If a host gets compromised, all containers running on the host are potentially compromised, too, especially if the attacker gains root or elevates their privileges, such as a member of the Docker Group. So your host must be locked down and secured, so container breakouts are hard to do. Also, remember that it’s hard to orchestrate a container breakout. Still, it is not hard to misconfigure a container with additional or excessive privileges that make a container breakout easy to do.
- The role of the Kernel: Potential Attack Vector
The Kernel manages its userspace processes and assigns memory to each process. So it’s up to the Kernel to ensure that one application can’t access the memory assigned to another. The Kernel is hardened and battle-tested, but it is complex, and the number one enemy of good security is complexity. You cannot rule out a bug in how the Kernel manages memory; an attacker could exploit that bug to access the memory of other applications.
- Hypervisor: Better Isolation? Kernel Attack Surface
So does the Hypervisor give you better isolation than a Kernel gives to its process? The key point is that a kernel is complex and always evolving; as important as it manages memory and device access, the Hypervisor has a much more simple role. As a result, the hypervisors are smaller and simpler than full Linux kernels. What happens if you compare the lines of code in the Linux Kernel to that of an open-source hypervisor.? Less code means less complexity resulting in a smaller attack surface—a smaller attack surface results in the likelihood of a bad actor finding an exploitable flaw. With a kernel, the userspace process allows some visibility of each other. For example, you can run certain CLI commands and see the running processes on the same machine. Furthermore, you can access information about those processes if you have the right permissions.
This is fundamentally different between the container and V.M, so many consider the container to have weaker isolation. With a V.M., you can’t see one machine’s process from another. The fact that containers share a kernel means they have weaker isolation than the V.M. For this reason and from the security perspective, you can place containers into V.Ms.
Diagram: Docker Container Security: Link to YouTube Video.
Docker Container Security: The Starting Point
So we have some foundational docker container security that has been here for some time. A Linux side of security will give us things such as namespace, control groups we have just mentioned, secure computing (seccomp), AppArmor, and SELinux that provide isolation and resource protection. Consider these security technologies to be the first layer of security that is closer to the workloads. Then we can expand from there and create additional layers of security, creating an in-depth defense strategy.
How to Create a Sandbox Environment
As a first layer, you need to consider the available security module templates when considering docker container security. Several security modules can be implemented that can help you enable fine-grained access control or system resources hardening your containerized environment. More than likely, your distribution comes with a security model template for Docker containers, and you can use these out of the box for some use cases. However, you may need to tailor the out-of-the-box default templates for other use cases. There will be templates for Secure Computing, AppArmor, and SELinux. Along with the Dockerfile and workload best practices, these templates will give you an extra safety net.
Diagram: Docker Security Best Practices.
- Goal1: Strengthen Isolation: Namespaces
One of the main building blocks of containers is a Linux construct called the namespace, providing a security layer for your applications run inside containers. For example, you can limit what that process can see by putting a process in a namespace. A namespace fools a process that it uniquely has access to. In reality, other processes in their namespace have access to similar resources in their isolated environments. The resources belong to the host system.
- Goal2: Strengthen Isolation: Access Control
Access control is about managing who can access what on a system. With Linux, we inherited Unix’s Discretionary Access Control (DAC) features. Unfortunately, they are very limited, and there are only a few ways to control access to objects. If you want a more fine-grained approach, we have Mandatory Access Control (MAC), which is policy-driven and granular for many object types. We have a few solutions for MAC. For example, SELinux was in Kernel in 2003 and AppArmor in 2010. These are the most popular in the Linux domain, and these are implemented as modules via the LSM framework. SELinux was created by the National Security Agency (NSA ) to protect systems and was integrated into the Linux Kernel. It is a Linux kernel security module that has access controls, integrity controls, and role-based access controls (RBAC)
- Goal3: Strengthen Isolation: AppArmor
AppArmor applies access control on an application by application basis. To use it, you associate an AppArmor security profile with each program. Docker loads a default profile for the container’s default. Keep in mind that this is used and not on the Docker Daemon. The “default profile” is called docker-default. Docker describes it as moderately protective while providing wide application capability. So when you instantiate a container, it uses the ”docker default” policy” unless you override it with the “security-opt” flag. It is crafted toward the general use case. The default profile is applied to all container workloads if the host has AppArmor enabled.
- Goal4: Strengthen Isolation: Control Groups
Containers should not starve other containers from, for example, using all the memory or other host resources. So we can use control groups to limit resources available to different Linux processes. Control Groups control hosts’ resources and are an essential tool for fending off Denial of Service Attacks. If a process is allowed to consume, for example, unlimited memory, it can starve other processes on the same host of that host resource. This could be done inadvertently through a memory leak in the application or maliciously due to a resource exhaustion attack that takes advantage of a memory leak. The container can fork as many processes (PID ) as the max configured for the host kernel. Unchecked, this is a big avenue as a DoS. And a container should be limited to its required number of processors through the CLI. A control group called PID is used to limit the total number of processes allowed within a control group to prevent a fork bomb attack. This can be done with the PID subsystem.
- Goal5: Strengthen Isolation: Highlighting System Calls
System calls run in the Kernel space, with the highest privilege level and kernel and device drivers. At the same time, a user application runs in the user space, which has fewer privileges. When an application that runs in user space needs to carry out such tasks as cloning a process, it does this via the Kernel, and the Kernel carries out the operation on behalf of the userspace process. This represents an attack surface for a bad actor to play with.
- Goal6: Security Standpoint: Limit the System Calls
So you want to limit the system calls available to an application. If a process is compromised, it may be used to invoke system calls that it may not ordinarily use. This could potentially lead to further compromisation. It would help if you aimed to remove system calls that are not required and reduce the available attack surface. As a result, it will reduce the risk of compromise and risk to the containerized workloads.
- Goal7: Secure Computing Mode
Secure Computing Mode (seccomp) is a Linux kernel feature that restricts the actions available within the containers. For example, there are over 300+ syscalls in the Linux system call interface, and your container unlikely needs access to all of them. For example, if you don’t want containers to change kernel modules. Therefore, they do not need to call “create” module, “delete” module, or “init”_module.” Seccomp profiles are applied to a process that determines whether or not a given system call is permitted. Here we can list or blocklist a set of system calls. The default seccomp profile sets the Kernel’sKernel’s action when a container process attempts to execute a system call. An allowed action will specify an allowlist of the system calls permitted unconditionally.
For Docker container security, the Docker default seccomp profile blocks over 40 syscalls without ill effects on the containerized applications. You may want to tailor this more to suit your security needs and restrict even further and limit your container to an even smaller group of syscalls. It is recommended to have a seccomp profile for each application that permits precisely the exact set of syscalls it needs to function. This will follow the security principle of the least privileged.