Since their introduction, Docker containers have changed the way applications are written, distributed, and deployed. The aim of containers is to allow applications to be flexible when needed. Of course, when using any app, data is involved.

When considering the mapping of data into containers, there are two schools of thought. The first says that we keep every data in containers, while the second one says that we have data outside of the containers that extend the lifetime of any container. Either of the two approaches poses security issues when considering big problems for data and container management.

Managing Data Access

There are several techniques used to assign storage to Docker containers. The temporary storage capacity, local to the host running the container, is attached to the container run time. Storage volumes assigned to the containers are stored in a specific sub directory mapped to the application. Volumes are assigned to the containers when they are instanced, or could be dedicated to them in advance with the command “docker volume.”

On the other hand, local storage can be mapped as the mount point into the container with the help of the “docker run” command, a local directory is specified as the mounting point. The third option is to use a storage plugin to associate the external storage with the container directly.

Open Access

In each of the above-described models, Docker offers no security for the data. For instance, any host directory can be mounted onto any container, including sensitive folders. Once the directory is mounted, it is possible for a container to modify the files, as the permissions based on UNIX settings are granted. An alternative is to consider using non-root containers, requiring you to run the containers under separate LINUX user ID (UID). It seems like a relatively easy option; however, you are required to build a methodology to secure each container with a group ID (GID) or UID.

Here we run into another problem of using non-root containers with local volumes that don’t work unless the assigned UID has the permission to access /var/lib/docker/volumes directory. Without access, data cannot be stored or created, while opening up this directory would pose several threats and risks. However, there’s no way you can set individual permissions on a per-volume basis without many efforts.

If we have a look at the external storage mounted on a container, many solutions present a block device, called LUN, to the host of the container and format the file system into it, which is then the mounting point of the container. At the moment, the security on the files and directories can be set from Data Backup Software, reducing the security threats. However, if the said LUN/volume is used somewhere else, there will no security controls regarding the mounting or usage on other containers. Since no security model is directly built into the container, everything depends on the commands running on the host.

This is where we can notice another issue of lack of multi-tenancy. Each container may run as a separate application. Just like the traditional storage system, space assigned to each container should have a degree of separation, so data is not accessed with malicious intent. Other than trusting the orchestration tools, there is no easy way to run the container and map data at the host level.

Finding a Solution

Some of the issues mentioned above are limited to LINUX/UNIX. For instance, the abstraction of the different data entry points is provided once the namespaces are mounted on the container. However, there’s no abstraction of permission. You cannot map user 1000 to user 1001 without updating the access control list (ACL) data regarding the files and directories. The large-scale ACL changes can impact the performance of your system. For local volumes, Docker can set permission to the host that represents a new volume to match the UID of the container being initiated.

With the external volumes, you can move away from the permission structure on the host containers. On the other hand, you would need a process to map data on the volume to a trusted application running in some containers. Note that containers have no “identification” and can be started or stopped at will, making it hard to identify the container being the owner of data volume.

These days, the easiest solution is to rely on the orchestration platform to manage and run the containers. We trust the system to map volumes and containers accurately. In many aspects, this is like the traditional SAN storage or virtual disks mapped to virtual machines. The thing that differentiates the containers is their level of probability and the need to have a security mechanism extended to public cloud infrastructures.

Docker is required to do some work here; its acquisition of the storage startup Infinit can bring the ideas about persistent data security. We can hopefully expect the development of an interface suitable for all the vendors.