Category Archives: DevOps

to whom it may concern

I’ve needed to write this for a long time.

I’ve struggled with how to frame everything and what tone would be best.

I would like to believe this could help OpenStack, but I have my doubts that anyone who can do anything will, or they already would have. I would like to think reading this might help someone else, but it might end up just being for me, and I’m ok with that.

For context, I was around when OpenStack was announced at OSCON in 2010 as the VP of Engineering at Cloudscaling. I have experience deploying and operating OpenStack and CloudStack implementations of some significance. I also evaluated Open Nebula, Eucalyptus and Nimbula along the way, as I really wanted a good solution to the cloud problem. I spent time consulting on OpenStack projects and last focused on OpenStack when I was hired by Jesse Andrews at Rackspace, before my whole team left to go to Nebula.

I’ve since moved on to focus on other things, but still watch OpenStack as I have a literal vested interest in the success of the project as well as many friends I care about in the community.

OpenStack was born into a vacuum created by the acceleration of AWS adoption and features, plus missteps by Eucalyptus and CloudStack with respect to the value of Open Source communities and how to cultivate them as a vendor. When OpenStack was first announced, I felt there was so much potential. I certainly made an effort to evangelize the project. Now, while many will declare victory, I’m afraid most of that potential will not be realized or worse, that OpenStack will leave a wake of bad projects that people unwittingly mistake for operable software solutions.

OpenStack has some success stories, but dead projects tell no tales. I have seen no less than 100 Million USD spent on bad OpenStack implementations that will return little or have net negative value.

Some of that has to be put on the ignorance and arrogance of some of the organizations spending that money, but OpenStack’s core competency, above all else, has been marketing and if not culpable, OpenStack has at least been complicit.

My compulsion to record this for posterity was triggered by details in the last release and related chatter around other announcements.

I would like to highlight the ‘graduation’ of Ceilometer. Ceilometer is a tragedy masquerading as a farce. In my opinion, this project should not exist and as it exists should not be relied upon for anything, much less billing customers.

First, the idea that monitoring/metering is something that should be bolted on the side of a cloud is almost as nonsensical as adding on reliability and scalability. Experience operating a web service that has been developed with instrumentation will quickly disavow anyone but a masochist of the bolted on approach. Second, Ceilometer’s implementation is such a mishmash of naive ideas and pipe dreams without regard for corner cases and failure scenarios, that Ceilometer’s association with OpenStack should be seen as a negative and graduation of the project calls into question the literal foundation of OpenStack decision making. Ceilometer’s quality is bad, even by OpenStack standards.

When I was still inclined to take actions about OpenStack related things, when Ceilometer was just a name someone proposed, I made all these same arguments but with more detail, specifics and depth. What I came to understand is that solving metering was not the primary motive. No one really cared about that. Certainly not in the way one would approach a project they intended to deploy and operate. The primary motivation was to have a project so that someone could be a Project Technical Lead (PTL).

That precedent started at the beginning with the creation of Glance, a project that never should have existed, and the subsequent creation of PTLs. The dynamics of the perceived prestige of a PTL superseded other considerations.

This dynamic allowed for a splintering of vision and mandate. If there has been any one thing that I have seen waste time implementing and operating OpenStack, that would have to be trying to coax the disparate OpenStack services, often from the same ‘release’, to work together. Press releases trumpet a new release of OpenStack as if this was working software and the naive ‘Enterprise buyer’ would rush headlong into that assumption hopped up on adrenaline and hubris. I accept Conway’s Law as a truism. Organizations will build software that reflects the communication of the organization. If the people implementing Keystone, don’t respect or understand the people implementing Nova, well, at least there is a PTL…

Don’t get me started on networking or storage…

I used to be hopeful, evangelistic even, about the possibility of a cloud service provider ecosystem built on open source. Now I am quite skeptical and feel that opportunity may be lost. Not that OpenStack doesn’t work, or at least that it can’t be made to, given certain competence and constraints, but that OpenStack doesn’t have the coherence or the will to do more than compromise itself for politics and vanity metrics.

People contribute to projects for a variety of reasons. OpenStack releases like to highlight the number of committers. That’s not a bad thing, but it’s not necessarily a good thing either. How many engineers do you think are working on AWS? GCE? How many of those committers will be the ones responsible for the performance and failure characteristics of their code? How many of those committers are dedicated to producing a world class service bent on dominating an industry? There has been some interesting, and even impressive work dedicated to improve code reviews and continuous integration, but that should not be confused with a unified vision and purpose. The per project difference in emphasis and the focus on nuances of stylistic python over other considerations of quality, let alone architecture and failure conditions determine OpenStack’s present and future. OpenStack wants to differentiate with features going up the stack, but has still not solved the foundational infrastructure and is busy furiously discussing what should be considered ‘core’. Projects that prove to be unreliable and a poor experience for operators and users ultimately damage the OpenStack ‘brand’.

OpenStack loves to declare its own victory. OpenStack’s biggest success has been getting vendors excited about abdicating their cloud strategy in lieu of going it alone. As long as OpenStack succeeds at that, there will be no shortage of funds for a foundation, summits and parties. Ultimately, if that is OpenStack’s primary accomplishment, Amazon (and perhaps others) will run away with the user driven adoption as service providers until there is nothing left of OpenStack but bespoke clouds and mailing list dramas.

I’m not sure anyone wants my advice at this point, but here it is.

  • Focus on users, not vendors. If there can’t be a benevolent dictator, there has to be an overriding conscience. The project setup solves vendor political problems more than user software problems.
  • PTLs end up being trophies. Electing PTLs every cycle is distracting and impacts continuity. Everything about this hurts OpenStack. Don’t confuse the way things are being done with the best way to do them.
  • Define a core and stick to it, or acknowledge that your governance is broken and make OpenStack an explicit free for all.
  • Stop declaring victory with vanity metrics. Divide OpenStack google trends number by the number of marketing dollars spent. The total number of committers is less meaningful if OpenStack is an excuse to sponsor parties around a collection of disparate projects.
  • Make ‘OpenStack’ brand mean something to users. Stewardship is greater than governance. If OpenStack is just a trough for the old guard IT vendors to feed slop to their existing enterprise buyers without regard to the technology or user experience then OpenStack has completely lost the plot. Make ‘OpenStack’ mean something and defend that.

I’ve certainly exceeded my 0.02 limit. Do what you will.

To all my friends at the OpenStack Summit, enjoy what Hong Kong has to offer… 幹杯.

Meatcloud Manifesto – The gauntlet is thrown…

No good deed goes unpunished...

This is the first year I won’t attend both Velocity Conf and Structure.

I would have gone to Structure if it didn’t overlap with Velocity, but given the choice, I’ll go with the people that build things.

Both conferences have an overlapping theme of the infrastructure renaissance.

So in the spirit of friendly competition, @ShanleyKane and I are going to see who can get the most people to sign the ‘Meatcloud Manifesto’, take a picture with it, and post it to flickr (or the photo sharing site of their choosing), and tweet a link with the tags of the conference you are at (#structure10 or #velocityconf) and #meatcloud just for good measure.

It’s should look something like this:

We hold these truths to be self-evident

Or this:

So if you are a builder of things, and you love some APIs, show your support for Velocity Conf and sign the manifesto.

You are either with us or against us.

Meatcloud manifest destiny, for real this time…

New Beginning: Cloudscaling

Cloud Rising

Every new beginning comes from some other beginning's end. --Seneca

Some of you already know this, but last week I joined the Cloudscaling team fulltime.

Cloudscaling helps organizations transition towards the application centric operations models that analysts/the blogosphere/random people can’t seem to define well (or quit arguing about) but refer to in generalizations as ‘cloud computing’.

We have a few interesting developments coming and a couple big projects we can’t quite speak freely about yet, but we provide strategic consulting and implementation assistance, especially for large organizations looking to invest in internal IaaS resources or to differentiate themselves as public IaaS providers.

So far, I’ve been getting up to speed on our projects and the tools, in addition to learning some things that I’ve typically been somewhat removed from, like layer 2 networks and other details most developers (and even many sysadmins) take for granted in their day to day.

The bottom line is Cloudscaling is working on pushing the boundaries of ‘Infrastructure is Code’. We can agnostically evaluate and implement solutions using the best tools and track the evolution of the space. We have a team with both breadth and depth up and down the technology, from the datacenter to virtualization, from hardware to APIs.

I’m really excited to be part of the team (although there’s some great people not on that page, like Lew Tucker ex-Sun Cloud CTO who just joined our board of advisors) and I’m expecting big things and a great year from us.

Look for some systems management and cloud related thoughts from me on the cloudscaling blog

Perhaps DevOps Misnamed?

There are only two mistakes one can make along the road to truth; not going all the way, and not starting.


Andi Mann posted ‘Myopic DevOps Misses the Mark‘ earlier today and after reading it, I wanted to put my thoughts out there, particularly since I had hoped some of what I consider his misconceptions would have been cleared up before this post.

To be fair, Andi does ask some good questions and has clearly spent his share of time thinking about ops in general, so hopefully I can make some attempt to address them as well.

To start with, Andi asserts that DevOps is mostly about developers. I’m not entirely certain what makes him think that, but it is patently false and the majority of people involved are heavily from an operations background. That said, I do believe semantics matter, and it might just be the name itself that leads people to that conclusion.

Maybe NeoOps, or KickassOps would have been better… but it is probably too late for that now.

I may be mistaken, but I believe the credit for the term DevOps belongs to Patrick Debois when he organized the first DevOpsDays last year.

Patrick is a bit of a Renaissance man, playing many roles in the process of software delivery along the way. I’m not particularly a fan of labeling people, but Patrick has self identified himself as a sysadmin on more than one occasion. I’m also not particularly a fan of certification, but Patrick’s CV lists certifications like ITIL and SANS, that I’d wager are almost exclusively taken by people in Ops/admin roles. The glaring exception is SCRUM, and I know for a fact Patrick has fought tooth and nail to get the Agile community to recognize the role of systems administrators in the process of delivering value.

Of anyone involved in what has apparently transitioned from ‘a bunch of good ideas’ to ‘a movement’, I probably have the most dev centric background.

  • Patrick Debois – IT Renaissance Man
  • John Allspaw – WebOps Master
  • James Turnbull – Author of Pro Linux System Administration, Hardening Linux, Pulling Strings with Puppet, and he apparently has a day job doing security.
  • Luke Kanies – Recovering sysadmin
  • Adam Jacob – Still calls himself a sysadmin
  • Kris Buytaert – Another Belgian Renaissance Man and a system administrator
  • I’m sure I’m missing lots of people, sorry, maybe we need a poll

Andi keeps saying DevOps is developer centric, and I think the problem (besides maybe the name) is the fact that there is code involved in automation that isn’t a shellscript. Of course, I’m only speculating because he doesn’t actually articulate what makes him think this, but let’s move on to his questions.

Andi makes assertions about lack of control, process, compliance and security. This is ludicrous, bordering on negligent. I’ve seen Puppet deployments on 1000s of machines in what can only be classified as ‘the enterprise’ and I will guarantee those machines are more tightly controlled, compliant and secured than 99% of the machines in most organizations claiming to embrace ITIL. A solid Puppet installation is closer to a functional CMDB than anything I’ve seen in the wild with the advantage that it is both auditing and enforcing the configuration on an ongoing basis. DevOps automation and ITIL are not mutually exclusive and can coexist. (I’m not going to really get into what I think about most of ITIL… but this should help.)

More Specific Questions (most of which are predicated on the misconception that ops somehow goes away, but there are some other bits worth addressing):

Who handles ongoing support, especially software update for the unrestrained sprawl of non-standard systems and components?

Ops. Unrestrained sprawl of non-standard systems is a bad assumption. First of all, the slow moving ITIL loving enterprise tends to have as much or more problems with heterogeneous systems as anyone, second of all, when you start to model and automate systems it makes the problem of the heterogeneity both more apparent and more manageable. No one I know advocates anything but pushing towards simple homogeneous systems whenever possible. No one is pretending support and software updates go away.

Who ensures each new application doesn’t interfere with existing and especially legacy systems (and networks, storage, etc.)?

Ops of course, but with the added benefit of an automated  infrastructure with semantics relevant to the questions being answered.

Who handles integration with common production systems that cannot be encapsulated in a VM, like storage arrays (NAS, SAN), networking fabrics, facilities, etc.

Yep, Ops. VMs are nice because they are typically only an API call away, but there are tools for doing API driven provisioning on bare metal and they will only get better, but… VMs are just the bottom of abstraction mountain. The API driven abstractions of storage and networking fabric are coming. That isn’t the reality today, but it will happen, and relatively soon.

Who handles impact analysis, change control and rollback planning to ensure deployment risk is understood and mitigated?

This is a good one, because frankly I don’t think Ops can do this in isolation anyway. This is a cross cutting concern involving Ops, Dev, Product Management and the other business stakeholders, but change control and rollback are orders of magnitude easier to reason about and accomplish with DevOps approach.

Who is responsible for cost containment and asset rationalization, when devops keeps rolling out new systems and applications?

Similar to the last question, but with the added misconception that DevOps means rolling out random stuff just cause. I know I’ve personally made this point explicitly, but the whole point is to enable a business, and cost containment and asset rationalization are obviously cross cutting concerns of that business.

Who ensures reporting, compliance, data updates, log maintenance, Db administration, etc. are built into the applications, and integrated with standard management tools?

Ops doesn’t really do this now. What is the definition of ‘ensure’? Ask nicely? Write up documents? Beg? Get mad? At worst, attempts to do this are often at the root of ‘the wall of confusion’ between Ops and Dev. Again, I’m not sure where Andi got the idea DevOps = ‘cowboys without any concern for anything but deploying stuff as fast as they can’. What are the ‘standard management’ tools? As much as anything, maybe that is what DevOps is replacing, because most of them are embarrassingly poor. The best way to accomplish everything on this list is to expose sensible internal APIs. When we can get to the point that we have reasonable conventions, integration with the next generation of ‘standard management tools’ will be trivial. That might strike you as a dev centric perspective, but really it just means that the present is isn’t evenly distributed.

Who will assure functional isolation, role-based access controls, change auditing, event management, and configuration control to secure applications, data, and compliance?

DevOps for the win, with the help of tools that can actually model, audit and enforce all those things programmatically.

I’m sure Andi means well, but I’m not clear why he got the impressions he did of what DevOps means or is trying to accomplish. I did the best I could. (Twitter ‘lives in the now’ so that link will probably only be useful for a few days.) I guess if you use the word ‘API’ people won’t process anything further because you are obviously a cowboy developer. C’est la vie…

Finally, Andi finishes with a list of things he would like to see. The irony here is everything on his list is DevOps:

Including ops during the design process, so applications are built to work with standard ops tools.


Taking ops input on deployment, so applications will go in cleanly without disrupting other users


Working with ops on capacity and scalability requirements, so they can keep supporting it when it grows


Implementing ops’ critical needs for logging, isolation, identity management, configuration needs, and secure interfaces so the app can be secure and compliant


Giving ops some advance insight into applications, especially during test and QA, so they can start to prepare for them before they come over the wall

Tear down the wall! DevOps!

Allowing ops to contribute to better application design, deployment, and management; that ops can do more for the release cycle and ongoing management than just ‘manipulating APIs

Allow ops to contribute to better application design, deployment, and management, in addition to manipulating APIs! DevOps!

See, there is hope for Andi yet! (I just hope he has a good sense of humor about the title… and would be willing to discuss this over a nice meal if he comes through Salt Lake or we end up in the same city soon.)

%d bloggers like this: