DevOps Best Practices

This article first appeared on ShiftLeft’s Blog.

The DevOps movement has been gaining notoriety in recent years for what appears to be the best thing that has happened to the traditional software development lifecycle. Whether this is the pinnacle of the DevOps' popularity is yet to be seen and only time will tell.

Hidden in this noise are some valid questions:

  • “How do we get started?”
  • “What is the best approach?”
  • “Where should we apply these processes/tools?”
  • “Why is this better than our current processes?”

I will attempt to address some of these common questions and allay any lingering fears, if any. It would take multi-volume tomes to provide sufficient coverage to the subject of DevOps best practices to all the different organizations in various stages/sizes/locales. For this article, the scope will be limited to born-in-the-cloud organizations and to the best practices of implementing a CI/CD pipeline, design principles of a resilient and highly available production stack, and the foundation of security.

Generally, these born-in-the-cloud organizations tend to be more receptive to new technologies and methodologies as their resources are limited. Being mindful of the given business constraints, a pragmatic approach should be taken. Full end-to-end automation should still be the key driver in keeping organizations on track and eliminate the potential for human error.

Pragmatic Approach

Through years of experience in leading implementation of DevOps best practices, the journey has revealed to me some important traits to have in this industry and the primary one is to be practical. Do not get too hung up on finding the perfect set of tools or spending too much time to design the ideal build/deployment systems. Put together a minimum solution to help you achieve full CI/CD automation then you can always make time to iterate to perfection later on.

Be mindful that this approach does create technical debt; with a good chunk of it being security related. The important part is that you are fully aware of the existence of technical debt and make it a mission to address it on a weekly, sprint-to-sprint, or monthly basis. Create reminders or issues/tickets to help you track it. It also keeps you honest and fully accountable.

This approach has effectively guided me through numerous challenges. The list of best practices I will share below has served as my guiding principles in my past endeavours as well as when I joined ShiftLeft as its first DevOps Engineer. I hit the ground running 100mph with the design and implementation of the CI/CD pipeline and subsequent cloud infrastructure to support the various development and production stacks. Being the first has its upsides but it also comes with great challenges such as an incredibly heavy workload coupled with an aggressive schedule commitment. What if there is no red dot on the map indicating where you are? How will you know where to start then?

Fully Automated Continuous Integration (CI)

The foundation of DevOps begins with CI. It is crucial to establish and adhere to a consistent CI build system right from day 0 (or as soon as humanly possible). Popular tools such as Jenkins, Circle CI, Travis, or Bamboo are generally used as the CI system of choice by DevOps practitioners.

There are a lot of differing opinions as to the best way to establish a good CI system. Arguments ranges from keeping a homogenous build environment using a mono repo with a singular build tool to wild wild west, free-for-all developer configuring their own builds. My belief is more along the line of a pragmatic approach with a high degree of flexibility.

These are the characteristics of a functional CI build system:

  • Proper access control
  • Lock build system down to only allow internal access in corporate network or through VPN if remote
  • (Optional) Use simple firewall rules to control both inbound and outbound internet connections
  • Naming convention for various build/deploy/test jobs
  • Build and system configuration must be tracked in SCM for quick recovery in the event of a disaster recovery situation
  • Most, if not all, builds should follow a general flow (template) as a way to enforce consistency. These are:
    • Clean workspace
    • Source checkout(s)
    • Pre-build steps (if any)
    • Compile
    • Unit Tests (if any)
    • Integration Tests (if any)
    • Archive artifacts (if needed)
    • Deploy artifacts to local/online artifactory (if needed)
    • Post-build steps (if any)
    • Notify (fail and fixed only -or- fail, success, and fixed)
  • Support multiple versions of JDKs and build tools
  • Update build system plugins (if any) on a weekly, if not daily, basis
  • Be flexible enough to accomodate various build tools, languages, etc.
  • If available, use the CI Build system’s internal secret store. If not, use an external secret store. NEVER store secrets/credentials in plain text build scripts or configuration files.

Automated and Continuous Deployment (CD) Pipeline

This portion of the pipeline takes the release-ready artifacts from CI and deploy it to QA, UAT, Dev, Staging, and/or Production stack. Automating this part also requires a pragmatic approach. There is a myriad of tools out there that can satisfy the automation requirement but the trick is to quickly and reasonably pick one that will help you achieve your goals of end-to-end automation without impeding the overall velocity.

The following list has served me well thus far:

  • Taking into account the in-house expertise and reasonable foresight into the near future, pick one or a combination of tools that will get you there in the shortest amount of time.
  • Aside from production, for all of the other stacks, the CI and CD jobs should be linked to create an uninterupted path from source check-in to deployment.
  • For the production stack, at first, this should be a manually triggered deployment. This is simply a checkpoint in the process.
    • For the first several releases, this should be a controlled deployment to weed out any issues which will allow you time to address.
    • After several successful milestone releases, let it loose and open the flood gate allowing an uninterupted path from source check-in to deploying to production.
  • Keep releases moving forward whenever possible. If a release fails, then quickly fix it and re-deploy.
  • Daily and even hourly releases should be the norm.
  • Avoid release rollbacks but prepare for the inevitable. Add in rollback mechanisms for the edge cases where rollbacks are unavoidable to allow for a rapid recovery in the event of a bad release.
  • Relentlessly and continuously improve the process by removing obstacles and streamlining the path from source check-in to deployment.

Resilient and Highly Available Production Infrastructure

With an established and running CI/CD pipeline, the next phase of the challenge begins with the design, implementation and upkeep of your software-as-a-service (SaaS) in a production environment.

These are the important principles that applies to this portion:

  • Security is a must. Security must be a forethought. Security must be built-in (more on this later).
  • Infrastructure-as-code is a must. Codify the entire infrastructure.
  • Orchestrate the provisioning, configuration, updates, and scaling of all of your cloud resources.
  • Use a cluster orchestration or resource scheduler technology to maximize resiliency and self-healing features.
  • Log management is important. Aggregate all logs into a centralized place to allow for ease of debugging/troubleshooting activities.
  • Service and system monitoring is a must. Aggregate all metrics and resource monitoring to allow for scaling and remediation, in the event of problems, of the infrastructure.
  • Application performance monitoring is important.
  • Design the production stack such that it is highly resilient and available across multi-region/country/continent.
    • Drill down into individual service(s) or cluster(s) and work your way up the stack to add in appropriate load balancers and/or hot/warm failover mechanism.

Paradigm Shift

How do we push an application through a CI/CD pipeline and then effectively protect it in production? Generally there has always been this implied trust to take what the engineering team has built through the CI/CD pipeline and deploy it to production. Security has generally operated at the edge of production with well-defined network policies and commercial solutions such as Web Application Firewall (WAF) and/or Runtime Application Self-Protection (RASP). To the left side of the development process, you have some tools to conduct code analysis with a scrutiny for known vulnerabilities and bugs.

Recently there has also been a trend towards providing solutions for container specific security. With all the available security best practices, processes, solutions, and tools out there today, it still requires a large amount of efforts to effectively protect all angles of the attack surface with many unknown variables.

The art of security can be compared to the famous book by Sun Tzu titled The Art of War. One of the most frequently quoted proverb in the book was “Know thyself, know thy enemy. A thousand battles, a thousand victories.” The meaning of the quote is straightforward in that it conveys the notion that if we understand ourselves in addition to our enemies then we should be assured victory. I believe there is a much deeper meaning in that.

A large portion of the book was actually dedicated to the tactics of knowing your environment. In addition to knowing thyself and thy enemy, you must know the environment and how you can leverage it for your defensive and offensive advantages. How this relates to security is if we associate the environment to third party open-source libraries (OSS) within the application, the databases, the data services, or just plain input/output of data to block storage medium.

If our application’s login functionality is to simply retrieve the input from our user, go look up the info in the databases, send acknowledgement back to the server to proceed with logging the user into our system then we have a well defined path of understanding. In the event of an attack: we know what our application is doing (know thyself), we know what the bad actor is not supposed to be doing (know thy enemy), and most importantly, we know our inputs/outputs (know the environment). Since we have a good understanding of our application, we can better protect the environment and that in turn know exactly what the bad actor can and cannot do.

Embedding ShiftLeft in your CI/CD pipeline will allow us to generate a security DNA of your application and thus enabling built-in security. Armed with the knowledge of this DNA profile, we can then use it to protect your application at runtime in production. This is a true paradigm shift in DevOps in that it seamlessly plugs into your existing CI/CD workflow and helps harden your security posture in production as a result.

Protect Your Castle

An analogy I often used to explain what is it that ShiftLeft does is the modularized house concept. Imagine for a minute that there is a typical home with some bonus room additions such as pool, jacuzzi, game room, and a home theatre. Now lets imagine the main house as your application while the pool and other bonus rooms as third-party OSS libraries add-ons.

The floors, windows, and doors of these rooms symbolizes the input/output such as databases and storage of your application. You know a typical visit from an actor is to enter through the front door, go through the various rooms, then leave through the front door. Let’s further imagine if this entire property sits inside of an opaque box with the border walls of this box extending from front to back yard and side to side fence.

To make it fair and since you did build the main portion of the house, you are given security cameras to allow visibility in the entrance area, living and dining rooms. Your task is to provide security for this property with no visibility into the rest of the rooms in the house. I believe that this is exactly what most organizations' DevOps teams have been tasked to do. How would you know if a bad actor had entered the house, stolen something important and snuck out through the drain of a pool or a tunnel in the floor of one of the other rooms?

To make matters worse, imagine if this house is in constant renovation where the rooms changes from day-to-day. This is akin to the rapid build and deployment model of CI/CD which causes security to become a moving target.

With ShiftLeft, it is as though the DevOps team have been given superhero-like powers such as x-ray vision and invisible sentinels. You would see if a bad actor enters through the front door but tried to leave through a window in another room. You would be notified if this actor is trying to dig a tunnel through the floor since sentinels in all the important areas of the room would alert you to such attempt to subvert security.

This would allow us to establish a foundation of security for our application and continuously track it throughout its lifecycle. This is true continuous security for DevOps.