Gathering thoughts as I organize this topic, and explore it. Suggestions on where to learn more are welcome.
I’ll be simplifying things initially. I plan to start by focusing on the pipeline and tools used in the DevOps culture. I expect the more I dive in this topic, the more precise next posts will be.
For simplicity let’s assume we are dealing with development of a cloud-based web application, having a DevOps toolchain that includes tools like Chef, Puppet, Jenkins, Docker, Packer, AWS, New Relic, Splunk… how do you test a deployment pipeline built on top of these?
I have to start somewhere. I know this: you can approach testing software by dividing the problem into separate areas, researching them, and executing any necessary actions, including finding and resolving issues. The result should hopefully be a high quality, or at least acceptable, product.
Let me try applying these areas to DevOps toolchain, and list the questions/topics that emerge.
- Is it working as expected? What does working as expected mean to you? To your stakeholders?
- Do you have unit tests? Integration? End-to-end? How many is enough?
- Do you need to do any manual testing after a pipeline step is executed?
Automation / Automatability / Testability
- Are you going to automate the testing? Why yes? Why not? How much?
- If yes – which tools will you use? Are they free? What alternatives do you have?
- Is the toolchain automation-friendly? Was it created with automation in mind?
- Is it testing-friendly in general? Do you have hooks / breakpoints to make it easy to test?
- Is there a certain User Experience your DevOps tools should deliver?
- Is the pipeline error-prone? Can somebody deploy a test build to production by mistake? Can they destroy your current production stack by clicking on a badly described button?
- Do you need to support keyboard shortcuts? Arrow keys / tabs to navigate?
- Does the UI support long/short inputs for build names, or components? High build numbers?
- Who is your customer? What acceptance do you need from them?
- Would you do A/B testing for your pipeline?
Installation / Integration / System
- What do you need to integrate with? For example – would you file JIRA tickets automatically if something goes wrong?
- Do you need a database? Which version?
- What operating system will your toolchain run on? What OS will you support for developing it?
- When depending on a 3rd party – do you accept to rely on their uptime? What if critical cloud-based tool goes down when you urgently need to deploy a hotfix?
- Will the 3rd party let you know of planned downtime? Is the downtime in a timezone suitable for you?
- Do you have backup?
- What platforms should you be compatible with? AWS? OpenStack? Azure? Are you going to test all of them?
- If your pipeline is web-based – which browsers will you support? Can a bad rendering on Safari cause an error? What about strict Firefox security? What if the users are running Chrome with JS-blocking extension?
- Any potential compatibility issues between your tools? Should you test every new version with others?
- Do you have any dates or numbers showing up in the pipeline? 1.000 and 1,000 are not the same… same goes for 6/12/2016…
- Monday is not the first day of the week for everybody. Do you care?
- If you have user input – does it support non-ASCII characters? Does it have to?
- Any of your users need a localized UI?
- If some of your resources are outside your country – would you support them? What if part of the deployment needs a phone number, but it’s in a weird formatting from another country?
- Are you required to meet certain requirements like SOX or HIPAA? Can your DevOps toolchain and code assure at least part of the compliance?
- Any export regulations you might be violating with your DevOps code? What if certain country requires that data is stored locally, but your tools deploy a server on a different continent?
Stress / Load / Performance
- Can you deploy 10 servers simultaneously? What about 10000?
- How long does it take to deploy the infrastructure? Is 1 hour acceptable? What if 10 minutes is too long?
- Did anyone even define these requirements?
- Do you track any of the performance metrics?
- Do you take any user input? Can a malicious user infect other users? Steal their passwords? Admin password?
- Do you store sensitive data in your Jenkins jobs? Where do you store them securely?
- How will you prevent users from committing their AWS credentials to public repositories?
- Do you remove all access when terminating employees?
- Do you use access control? Do you audit user actions? Should you?
- Who is really implementing security? Can a single engineer misconfigure firewall on all your production servers?
- Do you have enough logging to know why something went wrong? Do all 3rd party tools have enough logging?
- Where are your logs?
- Do you have alerts / notifications in place?
- What are the configuration options for your jobs?
- What documentation do you need? Do you have enough if somebody decides to leave abruptly or falls under a bus?
- Any public-facing documentation you want to / have to share?
Adoption / Metrics and Instrumentation
- Any metrics you want to track?
- Do you need to add instrumentation to the jobs to know where the bottlenecks are?
Upgrade / Rollback
- How will you test new versions of the tools? Are you ready to roll them back? Will they work after rollback?
- What is your must-have vs nice-to-have? What tools depend on each other?
- Can you define phases of your DevOps toolchain deployment?
- Have you identified all the resources you need for testing?
- Environmental resources like hardware, and software that you need?
- What about licenses? Any legal review of these needed?
- Are you well staffed? Any training your engineers need?
- Documentation, artifacts… what else do you need to deliver?
Vendor / 3rd party
- When working with a vendor on your DevOps implementation: how much would you want them to test vs you? What is their testing strategy? How much testing overlap should happen? What to they need to deliver?
Definition of done
- When can you tell you are happy with the testing of the DevOps toolchain?
- Do you need to sign off? Who else signs off?
These are just some initial thoughts. What do you think of these? What’s missing?