An Agnostic Agile case in Infrastructure and Service Management

By Adrian Lander

The context was outsourcing, this time. While I was a “software product guy” by origin (in SW development since the age of 10, mid 70s – not the hobby computer stuff but real HP computers), my turnaround record led to me being suggested for challenging misery from time to time. Not always in a position to duck… In the end, it turned out to be an interesting application of lean agile  in unusual territory.

CASE – A GLOBAL RESOURCES COMPANY

80+ critical business application servers. Moved from client data center to outsourcer’s data center as part of a very large global outsourcing deal. None of the servers had been accepted by the data center as they were not meeting data center standards. In production – full use by the client who ran their business critical processes with it. The outsourcing program was an IT transition and transformation, and the fact that none of the servers had been accepted was blocking new projects, within a large program. So, blocking business improvement and money coming in for both parties.  Of course there was more going on than getting 80+ services already in productive use for years, accepted by a center.

Sounds simple to fix. Reality was totally different due to constraints. SLAs the client was paying for – but not end to end executed, as should be obvious. Due to the global 24×7 use of the systems, maintenance windows were short and rare and hard to obtain. Then for the specific data center, that also housed defense systems, security clearance was needed and any visitor needed to be announced in advance. All servers needed 5 types of upgrade, requiring 5 different specialist engineers.

The previous project manager had not made progress, as when one of the engineers turned out not to be available, more than one window was needed. (He was getting replaced or relieved from the misery)

If any mistake was made, more than one window was needed. Also, if things took longer, windows would be exceeded, so the work had to stop or be undone. And things were missed as the engineers worked as individual specialists, only focused on their own task, not as a team. And what they found when they arrived at the server was often different than expected. The client was getting dissatisfied and impatient, also as the overall program was blocked, which was about building future capability.

Plans changed all the time and the planning took more time than the actual work being Done.

Apart from the server upgrades, required for the data center to accept the servers into officially supported production, also the support documentation had to be updated. The program had not been very successful with that. The data center had operational readiness reviews and if anything was missing or wrong – even just a contact number – the server had to wait for the next cycle and review board meeting, several weeks later. From the original 90 or so, only 10 had been upgraded, some still not accepted because of paperwork, or the other way around, and this had taken several months. The remaining 80+ would take years at that pace. The program was burning money, also as support needed to be done by the program, with a team inside the data center, as the data center had not accepted the servers. This is more expensive than having it done by the data center people, who were shared across clients.

The program teams were getting tired and frustrated with the data center. The data center people were getting tired and frustrated with the program. The data center was perceived inflexible. The program was perceived flimsy. The typical “conflict” between development people and support people. One party wants to innovate and change, the other wants to stabilize. Both are needed.

I was asked to take over. I had just completed the successful transformation of a global business intelligence development and support organization to agile,  leaving a satisfied CIO behind (“Adrian brought back on-time delivery capability”), hoping to enjoy a few relaxed days…Instead, I had to take a plane, for a new mission.

Can an agnostic agile coach add to an Infrastructure / Service Management context in the Data Center world?
(Hint: Oh, yes!)

THE APPROACH

I did a quick assessment of the core issues, just by listening to the people and their gripes. The first thing to do was to assess where we actually were, as different people had different data. So I had someone independent do a quick audit of the servers. Creating transparency, the same view for all. This created a backlog of “stories”. To support that, I created a sort of “Definition of Done” – when was a server Done. This involved hardware upgrades, software upgrades / installs, tests of course and documentation creation/update.

3. To tailor agility to context – Agnostic Agile

I realized that needing up to 5 engineers per server really created dependencies and was one of the reasons maintenance windows had failed. It created 5 extra points of failure. Also every time using different engineers would not lead to learning. The organization did not allow yet having an engineer of group X, do the work of group Y. So, I negotiated a compromise with the managers of the various groups. We would have a small set of engineers T-shape, pick up additional skills needed for the backlog. We would send a duo in, but each could in principle do all 5 tasks, with supervision. They would be remotely supervised. I would be on the call as well for decision making. Like a speed skating coach. Plus, tests would confirm that the work had been done correctly. So, the work was sort of certified remotely. We we were able to form several duos. Also, if one turned out to be sick, the other could still do the work. High Availability. People needed to be put on a list in advance and checked for clearance, so we could not just replace someone last minute – which had been the problem. But we had to use the windows effectively, as getting a new one for the same server could take a long time.

4. To understand hindering constraints and work to remove them

Our maintenance windows became sprints, in which we get several stories – servers – done. By using the same people, they got better and better. Maintenance window planning became sprint planning. By learning and improving, actually the velocity went up, and we went faster and faster through the backlog. After each maintenance window we did a sort of short retrospective. We also inspected on our Definition of Done. I did a regular review with the client, also selecting the strategy through the landscape,based on complexity, importance, resistance, confidence etc.

Of course, improving on the past, we also synchronized the readiness of the documentation, so that as soon as possible after the upgrade of a server, the documentation also was ready and the server and documentation was accepted in the data center board meeting. So we had release cycles, in which several servers and their documentation were accepted and went into official production. A release consisted of several sprints – maintenance windows. And a maintenance window consisted of several stories – servers. Stories had several tasks. We even did a product backlog burn down and could extrapolate a landing zone. Across everything, we were inspecting and adapting, not only on the work but also the process.

We were applying lean agile to data center infrastructure transition and transformation!

We used elements of lean, Scrum and Kanban. Lean as in lean software development but then applied to hardware. Where people before us had failed by using a traditional upfront planned project management approach with ever changing Microsoft Project plans that were “automatically optimized”. We used empiricism and team intelligence instead, and a simple spreadsheet.

SOME CLOSING THOUGHTS

I actually never used the words agile or lean throughout this project. Instead of selling the approach to the client, we won customer confidence and satisfaction back, by doing a bit, demonstrating success and demonstrating continuous improvement. The approach did not come from a textbook. It grew organically, principle by principle, from principles and practices I had applied elsewhere, to an adequate set to solve the problem. An agnostic agile approach. Embracing whatever works and inspecting and adapting on the process, not just the work, not constrained by a framework or philosophy / ideology. Not constrained by a fixed ruleset. For the engineers, I was more acting as a coach and a shield. My title, authority and how other people on the outside saw my role and responsibilities did not matter. We were peers with different experience and skillset, learning from each other.

I got along better and better with the data center manager, a mutual appreciation. I understood he needed stability in the data center. I needed product backlog progress, speed. I helped avoid instability, he helped me actually gain and maintain speed. It is about respecting and understanding and learning from each other’s work and world, and collaboration. In that collaboration, we actually improved processes.

Also, when it concerns infrastructure and support, the thought is often that agile is not a suitable choice. Even if the context is build up from simple elements, then dependencies and available skills, and the perception and lack of confidence that has grown, can create the complexity that benefits from a lean agile approach. I have experienced that many time. If your project plan changes many times, you may want to consider agile.

Adrian Lander
Lean Agile Transformation / Executive Coach
Co-founder Agnostic Agile
https://www.linkedin.com/in/adrianlander/

Leave a Reply

Your email address will not be published. Required fields are marked *