Monthly Archives: October 2013

The Exchanges: Open Source Collaboration Among States and Federal Government?

Early Innovator Grant slide about sharing for reuse

A slide from one of the Early Innovator Learning Collaborative webinars

The last post looked at what may have accounted for the architectural instability of the healthcare.gov infrastructure and tried to learn from it. Recently HHS provided a brief overview of the problems and their efforts to resolve them, but clearly most questions will remain unanswered unless a full postmortem is ever made available. One issue that has received more notice lately is the role of open source in building out the exchanges. Originally this got attention with Alex Howard’s piece in the Atlantic from last June which emphasized the open source approach Development Seed had taken to develop the frontend. With this context many thought of the whole project as open source and were dismayed to discover otherwise. More recently there have been renewed questions about the open source nature of the project because of the removal of the Github repository that housed the code originally created by Development Seed.

It turns out there was a lot of open source thinking from the earliest days of building the exchanges. In fact, I was part of a team asked to help ensure the infrastructure was developed following open source best practices.

Idealistic Youth

In 2010 I helped co-found a project called Civic Commons as a partnership between OpenPlans and Code for America. Civic Commons was meant to provide human capacity and technical resources to help governments collaborate on technology, particularly through the release and reuse of open source software. Civic Commons itself was a collaboration among several non-profits, but initially it was also a close partnership with government. The project was supported by the District of Columbia with then CTO, Bryan Sivak, as one of the city projects in Code for America’s inaugural year. Unfortunately DC’s involvement didn’t survive through the District’s transition to a new administration, but Civic Commons continued on with the Code for America fellows (Jeremy Canfield, Michelle Koeth, Michael Bernstein) and both organizations working together.

In early 2011 the Code for America fellows met with then US CTO Aneesh Chopra and he asked them for assistance on an exciting new project.

“I need your help,” he began before sharing some recent news. He said it had just been announced earlier in the week that, “Seven states and one multi-state consortia will receive in aggregate $250M to build out insurance exchanges – they are required to align to principles, feed into verification hub, and engage in health philanthropy community.” Then he explained how we fit in, “Here is the specific ask, because I am big on open collaboration, I made a requirement that each awardee would have to join an open collaborative, a civic commons, have to join a commons for the reusability and sharing of all IP assets.” Aneesh went on to explain that it didn’t seem appropriate for the federal government to play this kind of role among the states and that he was really looking for an independent entity to help out. He said he was, “Looking for a commons that can act as the convening and tech support arm to the seven awardees – before the states go off and hire [big government contractor] who will set things up to minimize sharing, we want someone to set the rules of the road.”

At the time Civic Commons was just getting started and even though the prospect of a large and important project like this was very attractive it also seemed like it would consume all of our resources. After discussing concerns about our capacity and uncertainty of available funding to help, we decided against being involved. Much of the early work with Civic Commons was focused on more manageable projects among cities, but we did also include open source work at the federal level like the IT Dashboard. I do have a slight sense of regret that we couldn’t be more involved with open sourcing the exchanges, but it seems better that we learned what we learned from smaller experiments.

Lessons Learned

Karl Fogel was the primary shepherd of the open source process with governments at Civic Commons and one of his most notable blog posts detailed how difficult or even futile it was to try to do open source as an afterthought. If you’re not doing open source from the beginning then you’re probably not doing open source. Without the kind of organizational steward Aneesh was looking for, I fear those states might not have ever truly engaged in open source development as originally intended.

The other difficult lesson we learned through experiments getting cities and other governments excited about open source is that there tends to be much more motivation to release than to re-use. Some of this seemed like it may have been motivated by a PR driven sense that giving away your hard work looks good, but reusing others’ work looks lazy. Perhaps we need to do more to praise government when they are smart enough to not reinvent the wheel or pay for it over and over again.

The most encouraging lesson I learned during our time with Civic Commons is that there are some effective models for open source collaboration that involve very little direct coordination. The main model I’m referring to is one based around common standards and modular components. At OpenPlans we saw this with the success of open source projects based on the GTFS transit data standard like OpenTripPlanner or ones based on standardized protocols for real time bus tracking like OneBusAway. I’ve also watched this closely with the open source ecosystem that has developed around the Open311 standard with both open source CRMs and separate client apps like Android and iOS mobile apps that can be shared interchangeably. The full stack of open source tools from the City of Bloomington and the work Code for America has done with cities like Chicago have been great models that demonstrate the opportunities for software reuse when governments have asynchronously agreed on shared requirements by implementing a common standard. The apps developed by Bloomington are now even being used by cities in other countries.

The IT infrastructure for the exchanges was clearly based around common data standards, so you would hope the same opportunity would exist there.

An Open Exchange

The effort Aneesh had referred to did still continue without us and only now have I started to learn about it in detail. The $250 million he had described was more precisely $241 million in federal funding through the Early Innovator grants. These grants were awarded to Kansas, Maryland, Oklahoma, Oregon, New York, Wisconsin, and the University of Massachusetts Medical School representing the New England States Collaborative for Insurance Exchange Systems – a consortium among Connecticut, Maine, Massachusetts, Rhode Island, and Vermont. The grant was in fact as lofty as Aneesh had described to the fellows. The “Funding Opportunity Description” section of the grant states:

The Exchange IT system components (e.g., software, data models, etc.) developed by the awardees under this Cooperative Agreement will be made available to any State (including the District of Columbia) or eligible territory for incorporation into its Exchange. States that are not awarded a Cooperative Agreement through this FOA can also reap early benefits from this process by reusing valuable intellectual property (IP) and other assets capable of lowering Exchange implementation costs with those States awarded a Cooperative Agreement. Specifically, States can share approaches, system components, and other elements to achieve the goal of leveraging the models produced by Early Innovators.

The expected benefits of the Cooperative Agreements would include:

Lower acquisition costs through volume purchasing agreements.

Lower costs through partially shared or leveraged implementations. Organizations will be able to reuse the appropriate residuals and knowledge base from previous implementations.

Improved implementation schedules, increased quality and reduced risks through reuse, peer collaboration and leveraging “lessons learned” across organizational boundaries.

Lower support costs through shared services and reusable non-proprietary add-ons such as standards-based interfaces, management dashboards, and the like.

Improved capacity for program evaluation using data generated by Exchange IT systems.

The grant wasn’t totally firm about open source, but it was still pretty clear. The section titled “Office of Consumer Information and Insurance Oversight: Intellectual Property” included the following:

The system design and software would be developed in a manner very similar to an open source model.

State grantees under this cooperative agreement shall not enter in to any contracts supporting the Exchange systems where Federal grant funds are used for the acquisition or purchase of software licenses and ownership of the licenses are not held or retained by either the State or Federal government.

It’s not totally clear what came of this. The last evidence I’ve seen of the work that came out of these grants is from a Powerpoint deck from August 2012. The following month a report was published by the National Academy of Social Insurance that provided some analysis of the effort. The part about code reuse (referred to as Tier 2) is not encouraging.

Tier 2: Sharing IT code, libraries, COTS software configurations, and packages of technical components that require the recipient to integrate and update them for their state specific needs.

Tier 2 reusability has been less common, although a number of states are discussing and exploring the reuse of code and other technical deliverables. One of the Tier 2 areas likely to be reused most involves states using similar COTS products for their efforts. COTS solutions, by their very nature, have the potential to be reused by multiple states. Software vendors will generally update and improve their products as they get implemented and as new or updated federal guidance becomes available. For instance, our interviews indicate that three of the states using the same COTS product for their portal have been meeting to discuss their development efforts with this product. Another option, given that both CMS and vendors are still developing MAGI business rules, is that states could potentially reuse these rules to reduce costs and time. CMS has estimated that costs and development could be reduced by up to 85 percent 32 when states reuse business rules when compared to custom development

When you’re simply talking about using the same piece of commercial software among multiple parties, you’re far from realizing the opportunity of open source. That said, the work developed by these states was really meant to be the foundation for reuse by others, so perhaps that was just the beginning. We do in fact have good precedent for recent open source efforts in the healthcare space. Just take a look at CONNECT, OSEHRA, or Blue Button.

Maybe there actually has been real re-use among the state exchanges, but so far I haven’t been able to find much evidence of that or any signs of open source code in public as was originally intended. Then again, there’s still time for some states to open up their own exchanges, so maybe we’ll see that over time. Right now the attention isn’t on the states so much anyway, but it turns out the Federally Facilitated Marketplace was supposed to be open source as well.

I’m not just talking about the open source work by Development Seed that recently disappeared from the Centers for Medicare & Medicaid Services’ Github page, I’m also talking about the so called, “Data Hub” that has been a critical component of the infrastructure – both for the federal exchange and for the states. The contractor that developed the Data Hub explains it like this:

Simply put, the Data Services Hub will transfer data. It will facilitate the process of verifying applicant information data, which is used by health insurance marketplaces to determine eligibility for qualified health plans and insurance programs, as well as for Medicaid and CHIP. The Hub’s function will be to route queries and responses between a given marketplace and various data sources. The Data Services Hub itself will not determine consumer eligibility, nor will it determine which health plans are available in the marketplaces.

So the Data Hub isn’t everything, but it’s clearly one of the most critical pieces of system. As the piece of integration that ties everything together, you might even call it the linchpin. What appears to be the original RFP for this critical piece of infrastructure makes it pretty clear that this was meant to be open source:

3.5.1 Other Assumptions
The Affordable Care Act requires the Federal government to provide technical support to States with Exchange grants. To the extent that tasks included in this scope of work could support State grantees in the development of Exchanges under these grants, the Contractor shall assume that data provided by the Federal government or developed in response to this scope of work and their deliverables and other assets associated with this scope of work will be shared in the open collaborative that is under way between States, CMS and other Federal agencies. This open collaborative is described in IT guidance 1.0. See http://www.cms.gov/Medicaid-Information-Technology-MIT/Downloads/exchangemedicaiditguidance.pdf.

This collaboration occurs between State agencies, CMS and other Federal agencies to ensure effective and efficient data and information sharing between state health coverage programs and sources of authoritative data for such elements as income, citizenship, and immigration status, and to support the effective and efficient operation of Exchanges. Under this collaboration, CMS communicates and provides access to certain IT and business service capabilities or components developed and maintained at the Federal level as they become available, recognizing that they may be modified as new information and policy are developed. CMS expects that in this collaborative atmosphere, the solutions will emerge from the efforts of Contractors, business partners and government projects funded at both the State and federal levels. Because of demanding timelines for development, testing, deployment, and operation of IT systems and business services for the Exchanges and Medicaid agencies, CMS uses this collaboration to support and identify promising solutions early in their life cycle. Through this approach CMS is also trying to ensure that State development approaches are sufficiently flexible to integrate new IT and business services components as they become available.

The Contractor’s IT code, data and other information developed under this scope of work shall be open source, and made publicly available as directed and approved by the COTR.

The development of products and the provision of services provided under this scope of work as directed by the COTR are funded by the Federal government. State Exchanges must be self-funded following 2014. Products and services provided to a State by the Contractor under contract with a State will not be funded by the Federal government.

So far I haven’t been able to find any public code that looks like it would be the open source release of the Data Hub, but I remain optimistic.

Open source software is not a silver bullet, but it does a lot to encourage higher quality code and when coordinated properly it can do a great deal to maximize the use of tax dollars. Beyond the transparency of the code itself, it also helps provide transparency in the procurement process because it makes it easier to audit claims about what a company’s software is capable of and what software a company has already produced. It also tends to select for a culture and an acumen of software engineers that is genuinely driven to work with others to push the capability and impact of technology forward.

I hope we can stay committed to our obligations to maximize the tax dollars and ingenuity of the American people and stay true to the original vision of the exchanges as open source infrastructure for the 21st century.

Learning from the Healthcare.gov Infrastructure

Screenshot of Healthcare.gov account creation error

Over the past few days I’ve been paying attention to the problems that have troubled the launch of healthcare.gov. What I’ve compiled here is an outsider’s perspective and my technical analysis should be treated as educated speculation rather than insider knowledge or anything authoritative. All other commentary here should be treated as my own personal perspective as well. I state these disclaimers because I’ve had no direct involvement with this project and the only information I have to work with is what’s available to the public so some of my claims might be inaccurate. It’s also worth noting that I do know some of the people involved with the project and I’ve worked with some of the companies that were contractors, but I’ve gone about trying to understand this in an independent and unbiased way. Real accountability may be warranted for some of the problems we’ve seen so far, but I’m less interested in placing blame and more interested in simply learning what happened to help ensure it doesn’t happen again.

Context

To be clear, the problems I’m referring to are specifically the issues relating to errors and an unresponsive website when creating a new account for the Federal Facilitated Marketplace hosted on healthcare.gov. The healthcare.gov website is also used to provide related information and to redirect people to State Based Marketplaces hosted by their own states where they exist, but those aren’t the things I’m talking about here.

Unfortunately, many of the more political perspectives around the problems with this website have been illogical and much of the reporting in the news has either appeared to be inaccurate or so vague as to be meaningless. For example, some have claimed that problems with the website indicate that the Affordable Care Act is a bad idea and won’t work, but that’s a radical distortion of logic. This claim is like saying that a problem with an automatic sliding door or a broken cash register at a grocery store indicates that the grocery store (and better access to food) is a bad idea and won’t work. Others have claimed the high demand that caused glitches was unexpected, yet at the same time claim that glitches with the launch of a new product should be expected. Many of the news reports about the problems attempt to provide technical analysis, but mostly fail in identifying anything relevant or specific enough to be accurate or informative.

I do agree that the problems revealed themselves as a result of high demand. Exceeding capacity is a good problem to have, but it’s still a problem and it’s even a problem that gone unchecked could erode the kind of popularity that overwhelmed the system to begin with. I also think this is a problem that can be prevented and should never happen again. It’s true that Americans still have many more months of open enrollment, but first impressions really do matter, especially with something as sensitive as a new health care program.

It would be wonderful if an official postmortem was published to help us understand what happened with the launch and prepare us enough to prevent similar situations in the future. As an outsider’s perspective, my analysis shouldn’t be considered anything like that, but it is worth noting that the worst problems with the website are likely already behind us. As Alex Howard reports, there are indications that improvements in the past few days have made an impact, but things still look like they could be going more smoothly. A test conducted by myself today showed that there were basically no wait times whatsoever, but I was still unable to create a new account, receiving the error displayed above instead.

Until problems are fully resolved and until anything resembling a postmortem exists, there will be demand for more answers and better reporting on what has happened. My motivation for writing this is partly that “unexpected demand” or “inevitable glitches” haven’t been satisfying answers, but I’ve also been unsatisfied with the reporting. The best analysis I’ve seen so far has been by Paul Smith (also syndicated on Talking Points Memo) and Tom Lee (also syndicated on the Washington Post Wonkblog). Part of the reason why Paul and Tom’s writing is good is because it actually attempts to distinguish the different components of the healthcare.gov infrastructure and explain the architectural significance of decoupling components in an asynchronous way. Both of these pieces also point out that the frontend of the website, a Jekyll based system, was not the problem despite the many attempts at technical analysis in major publications that have tried to place fault there without looking further. Yet while Paul and Tom definitely seemed to get broad strokes right, I wanted more detail.

After reading Paul’s piece I started a thread among the current and past Presidential Innovation Fellows to see if anyone knew more about what was going on. Basically none of us had direct knowledge of the technical underpinnings of the system, but being furloughed and eager to fix problems turned this into one of the most active discussions I’ve seem among the fellows. I also saw similar discussions arise among the Code for America fellows. Over the course of a day or so we shared our insights and speculation and some reported on their findings. Kin Lane described his concerns about the openness and transparency of the project, especially the conflation of the open source frontend and the blackbox backend. Clay Johnson wrote about how problems with procurement contributed to the situation. I added most of my technical analysis as a comment on Tom Lee’s blog post and I’ve included that here with some edits:

Technical Analysis

For the basic process of creating an account on healthcare.gov there are several potential areas for bottlenecks: 1) Delivering content to the user 2) receiving account creation data from the user 3) actually generating a new user account 4) validating identity and eligibility based on submitted account data.

As Tom and Paul point out, there is almost certainly no issue with point #1. Even though the frontend content is managed through the Ruby based Jekyll app, it’s basically all generated and delivered as static files which are then served by Akamai’s CDN. Even if there are many opportunities to create efficiencies there, it’s unlikely an issue when you’re just dealing with static files on a robust CDN. Placing blame on this smooth running frontend is frustrating not only because it is inaccurate but it also appears to be just about the only part of the system that was done well and done in a very open and innovative way. There’s smart underlying technology, a clean responsive design, a developer friendly API, and an open source project here. This piece was contracted out to a great DC tech firm called Development Seed and it’s been written about a lot before. (Also see Alex Howard’s piece in the Atlantic). Let’s say it again: this is not the problem.

It’s possible there could be a bottleneck in receiving data as writing to a system is almost always more resource intensive than reading data. The system receiving the data seems totally separate from the Ruby Jekyll code even if it appears on the same domain. It appears to be a Java based system as the response headers identify:
X-Powered-By:Servlet 2.5; JBoss-5.0/JBossWeb-2.1 on the account creation form POST to https://www.healthcare.gov/ee-rest/ffe/en_US/MyAccountEIDMUnsecuredIntegration/createLiteEIDMAccount

HTTP/1.1 200 OK
Server: Apache
X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1
sysmessages: {"messages":["Business_ee_sap_MyAccountEIDMIntegration_CreateLiteEIDMAccount.OK_200.OK"]}
Content-Length: 181
Content-Type: application/json
X-Frame-Options: SAMEORIGIN;
X-Server: WS01
Expires: Sat, 05 Oct 2013 01:22:59 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Sat, 05 Oct 2013 01:22:59 GMT
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: JSESSIONID=A965DE6747123123275DCA1A40CBB16.green-app-ap50; Path=/ee-rest; Secure; HttpOnly
Access-Control-Allow-Origin: *

It’s unclear to me if this Java based system is able to do any deferred processing or act as a parallel autonomous system or if it relies on direct integration as a slave to another system. The aforementioned POST URL does refer to another system, the EIDM. The EIDM mentioned here is almost certainly the CMS’s (Center for Medicaid and Medicare) Enterprise Identity Management system. The login form on healthcare.gov also seems to point directly to the EIDM which appears to be an Oracle Access Manager server: https://eidm.cms.gov/oam/server/authentication

Screenshot of Oracle Access Manager for Healthcare.gov

You can read more about the EIDM system and its contract on recovery.gov and the IT Dashboard but the best description I’ve seen so far comes from LinkedIn where it appears next to 11 team members associated with the project. Here’s that description:

EIDM is the consolidated Identity and Access Management System in CMS which is one of the largest Oracle 11gR2 Identity and Access Management deployment in the world with integrated all Oracle components to support 100 million users including providing Identity and Access Management Services for Federal Health Insurance Exchange as well as health insurance exchanges in all 50 states that use FFE level of IDM integration, and 100 of CMS federal applications.

Services available from EIDM have been grouped into four main services areas (registration service, authorization service, ID lifecycle management service, and access management service). CMS will make remote electronic services available in a reliable and secure manner to providers of services and suppliers to support their efforts to better manage and coordinate care furnished to beneficiaries and Exchange applicants of CMS programs.

Identify and Access Management services will provide identity and credential services for millions of partners, providers, insurance exchanges enrollees, beneficiaries and other CMS nonorganizational users; and thousands of CMS employees, contractors and other CMS organizational users. EIDM accepts other federal agency credentials provided to CMS from the Federated Cloud Credential Exchange and provides secure access to the CMS Enterprise Portal

It’s unclear to me how that Java backend and the EIDM connect with one another, but together they could account for potential bottlenecks as described by points #2 or #3. My guess is that any identity/eligibility verification (#4) happens totally separately and wouldn’t cause the issues on account creation. Nevertheless, even if that Java backend and the EIDM are totally separate systems there still could be a need to design them in a more decoupled way to allow for better deferred or parallel processing.

As an aside, the reason I pointed out “Java” and “Ruby” is partly superfluous, but refers to people who’ve made references to Twitter’s past performance issues which some have claimed to be relating to Ruby and Java. Twitter was originally written in Ruby and is now mostly driven by Java or other JVM served languages. In the case of the Ruby used for Healthcare.gov, it should be pretty irrelevant because the Ruby is primarily used to generate static files (using Jekyll) which are then served by a CDN.

Transparency

There are a few other things that could also use more clarification: One, worth emphasizing again, is the conflation of the frontend that’s all developer friendly and open source (see https://www.healthcare.gov/developers) and this backend that’s very opaque. As people have attempted to understand the problem, this conflation has been misleading and a cause for confusion. If everything had run smoothly there would be less need to clarify this, but so far the wrong piece of the product and the wrong people have been criticized because of this conflation. This is much of what Kin Lane wrote about. Another issue worth clarifying is the assumption that the federal government has the same access to agile rapidly scalable hosting infrastructure that the private sector has. Unfortunately, the “cloud” hosting services typically used in government pale in comparison to what is readily available in the private sector.

From a hosting infrastructure perspective, these kinds of scalability problems are increasingly less common in the private sector because so much has been done to engineer for high demand and to commoditize those capabilities. When an e-commerce website crashes on Cyber Monday, a whole lot of money is lost. This is why companies like Amazon have invested so much in building robust infrastructure to withstand demand and have even packaged and sold those capabilities for others to use through their Amazon Web Services (AWS) business. One major flaw with comparisons to the private sector is that it is much easier to do phased roll-outs and limited beta-tests for new websites and it is much easier to acquire the latest and greatest infrastructure platforms like AWS, OpenStack, OpenShift, OpenCompute, and Azure. The main reasons for these discrepancies are about ensuring fairness and equitable access, just like many other distinctions between the private and public sector. A phased rollout of healthcare.gov would probably seem unjust because of whomever got early access. Furthermore, many of the procurement policies that make access to services like AWS difficult have been put in place to prevent corrupt or unjust spending of taxpayers’ money. Unfortunately many of these policies have become so complicated that the issues get obfuscated, they repel innovative and cost effective solutions, and ultimately fail to achieve their original intent. Fortunately there are programs like FedRAMP that look like they’re starting to make common services like AWS and Azure more easily available for government projects, but this is far from commonplace at the moment. There’s also a lot more work needed to improve procurement to attract better talent for architecting good solutions. While the need for a more scaleable hosting environment was likely part of the problem here, it was probably more about the design of the software as Paul described. In this project, Development Seed seemed like an exception to the typical kind of work that comes out of federal IT contracts. The need to improve procurement is what Clay wrote about.

Improvements

Aside from the deeper systemic issues that need attention, like procurement and open technology, there are some more immediate opportunities to prevent situations like this: 1) Better testing 2) Better user experience (UX) to handle possible delays 3) Better software architecture with more modularity and asynchronous components.

#1 Better testing: The issues with excessive demand on the website should’ve been detected well before launch with adequate load testing on the servers. The reports I’ve seen suggest that there was load testing or preparations for a certain load, but that the number of simultaneous users it was designed to account for was much smaller than what came to be. It might be helpful to ensure that load testing and QA is always conducted independently of the contractors who built the system. I’m not sure if that happened in this case. Perhaps the way those original estimates were determined should be also re-evaluated, but ultimately the design of the system should have accounted for a wide range of possible loads. There are ways of designing servers and software so that they will perform well at varying loads as long as you can add more hardware resources to meet the demand. With an adequate hosting environment that is easy to do in an immediate and seamless way.

It’s worth noting that the contractor’s description of the EIDM system states that it “is one of the largest Oracle 11gR2 Identity and Access Management deployment in the world with integrated all Oracle components to support 100 million users.” I wouldn’t be surprised if this is the largest deployment in the world. That in itself should have been a red flag signaling it might not have proven itself at this scale and deserved extra stress testing.

Another helpful strategy is to allow for real-world limited beta testing or a phased roll out. Unfortunately there are policies in government that make it very difficult to conduct a limited beta test, but there is also the fairness issue I mentioned earlier. In this case, the fairness issue would actually be more of a false perception or fodder for political spin rather than anything substantial. Doing a phased roll out wouldn’t really be unfair because this isn’t a zero-sum resource, those who get insurance before others don’t make it harder or more expensive for those who come later to access the same insurance. In fact, you could almost argue that the opposite is true. It’s also worth noting that no matter how early you sign up, nobody is getting new health insurance under this program until January 1st. In some ways this actually was a phased roll out because open enrollment lasts all the way through the end of March 2014, but because there wasn’t more clear messaging to prevent a rush on day one, we got a massive rush on day one.

#2 Better UX to handle possible delays: A common way for a tech start-up to throttle new users coming to their recently launched platform is to provide a simple email sign-up form and then to send notifications for when they can actually get a real account. Something similar could’ve been provided as a contingency plan for an overloaded sign-up system on healthcare.gov. Instead, users got an “online waiting room” which they had to actively monitor in order to get access. Anything to better inform users of a possible situation like this and allow them to come back later rather than actively wait would have been a significant improvement to the user experience of the website in this situation

#3 Better software architecture with more modularity and asynchronous components. I think Paul Smith’s piece covered this pretty well, but it’s worth emphasizing. In some ways, there was more decoupling of components in this system than you might find in a typical government IT project with a monolithic stack developed by one contractor. The frontend and the backend were very separate systems, but unfortunately the backend couldn’t keep up with the frontend. To prevent that issue from causing a bottleneck the system could’ve been designed with a simple and robust queuing system to allow for deferred processing with good user experience that clearly stated that the user would receive an email when their new account was ready.

Fortunately there are already people working in government on improvements like this. Among the Presidential Innovation Fellows and many other people in government there are discussions about providing better systems for testing and preparing for the kind of extremely high traffic we’ve seen with healthcare.gov. There are also people working to improve many of the policies that make it difficult to get agile and robust IT infrastructure in government. The American people deserve to know their government is not only working to improve healthcare.gov right now but that there are also many people who are working to learn from it and provide more responsive and graceful government services for the future.