Author Archives: Philip Ashlock

The Exchanges: Open Source Collaboration Among States and Federal Government?

Early Innovator Grant slide about sharing for reuse

A slide from one of the Early Innovator Learning Collaborative webinars

The last post looked at what may have accounted for the architectural instability of the infrastructure and tried to learn from it. Recently HHS provided a brief overview of the problems and their efforts to resolve them, but clearly most questions will remain unanswered unless a full postmortem is ever made available. One issue that has received more notice lately is the role of open source in building out the exchanges. Originally this got attention with Alex Howard’s piece in the Atlantic from last June which emphasized the open source approach Development Seed had taken to develop the frontend. With this context many thought of the whole project as open source and were dismayed to discover otherwise. More recently there have been renewed questions about the open source nature of the project because of the removal of the Github repository that housed the code originally created by Development Seed.

It turns out there was a lot of open source thinking from the earliest days of building the exchanges. In fact, I was part of a team asked to help ensure the infrastructure was developed following open source best practices.

Idealistic Youth

In 2010 I helped co-found a project called Civic Commons as a partnership between OpenPlans and Code for America. Civic Commons was meant to provide human capacity and technical resources to help governments collaborate on technology, particularly through the release and reuse of open source software. Civic Commons itself was a collaboration among several non-profits, but initially it was also a close partnership with government. The project was supported by the District of Columbia with then CTO, Bryan Sivak, as one of the city projects in Code for America’s inaugural year. Unfortunately DC’s involvement didn’t survive through the District’s transition to a new administration, but Civic Commons continued on with the Code for America fellows (Jeremy Canfield, Michelle Koeth, Michael Bernstein) and both organizations working together.

In early 2011 the Code for America fellows met with then US CTO Aneesh Chopra and he asked them for assistance on an exciting new project.

“I need your help,” he began before sharing some recent news. He said it had just been announced earlier in the week that, “Seven states and one multi-state consortia will receive in aggregate $250M to build out insurance exchanges – they are required to align to principles, feed into verification hub, and engage in health philanthropy community.” Then he explained how we fit in, “Here is the specific ask, because I am big on open collaboration, I made a requirement that each awardee would have to join an open collaborative, a civic commons, have to join a commons for the reusability and sharing of all IP assets.” Aneesh went on to explain that it didn’t seem appropriate for the federal government to play this kind of role among the states and that he was really looking for an independent entity to help out. He said he was, “Looking for a commons that can act as the convening and tech support arm to the seven awardees – before the states go off and hire [big government contractor] who will set things up to minimize sharing, we want someone to set the rules of the road.”

At the time Civic Commons was just getting started and even though the prospect of a large and important project like this was very attractive it also seemed like it would consume all of our resources. After discussing concerns about our capacity and uncertainty of available funding to help, we decided against being involved. Much of the early work with Civic Commons was focused on more manageable projects among cities, but we did also include open source work at the federal level like the IT Dashboard. I do have a slight sense of regret that we couldn’t be more involved with open sourcing the exchanges, but it seems better that we learned what we learned from smaller experiments.

Lessons Learned

Karl Fogel was the primary shepherd of the open source process with governments at Civic Commons and one of his most notable blog posts detailed how difficult or even futile it was to try to do open source as an afterthought. If you’re not doing open source from the beginning then you’re probably not doing open source. Without the kind of organizational steward Aneesh was looking for, I fear those states might not have ever truly engaged in open source development as originally intended.

The other difficult lesson we learned through experiments getting cities and other governments excited about open source is that there tends to be much more motivation to release than to re-use. Some of this seemed like it may have been motivated by a PR driven sense that giving away your hard work looks good, but reusing others’ work looks lazy. Perhaps we need to do more to praise government when they are smart enough to not reinvent the wheel or pay for it over and over again.

The most encouraging lesson I learned during our time with Civic Commons is that there are some effective models for open source collaboration that involve very little direct coordination. The main model I’m referring to is one based around common standards and modular components. At OpenPlans we saw this with the success of open source projects based on the GTFS transit data standard like OpenTripPlanner or ones based on standardized protocols for real time bus tracking like OneBusAway. I’ve also watched this closely with the open source ecosystem that has developed around the Open311 standard with both open source CRMs and separate client apps like Android and iOS mobile apps that can be shared interchangeably. The full stack of open source tools from the City of Bloomington and the work Code for America has done with cities like Chicago have been great models that demonstrate the opportunities for software reuse when governments have asynchronously agreed on shared requirements by implementing a common standard. The apps developed by Bloomington are now even being used by cities in other countries.

The IT infrastructure for the exchanges was clearly based around common data standards, so you would hope the same opportunity would exist there.

An Open Exchange

The effort Aneesh had referred to did still continue without us and only now have I started to learn about it in detail. The $250 million he had described was more precisely $241 million in federal funding through the Early Innovator grants. These grants were awarded to Kansas, Maryland, Oklahoma, Oregon, New York, Wisconsin, and the University of Massachusetts Medical School representing the New England States Collaborative for Insurance Exchange Systems – a consortium among Connecticut, Maine, Massachusetts, Rhode Island, and Vermont. The grant was in fact as lofty as Aneesh had described to the fellows. The “Funding Opportunity Description” section of the grant states:

The Exchange IT system components (e.g., software, data models, etc.) developed by the awardees under this Cooperative Agreement will be made available to any State (including the District of Columbia) or eligible territory for incorporation into its Exchange. States that are not awarded a Cooperative Agreement through this FOA can also reap early benefits from this process by reusing valuable intellectual property (IP) and other assets capable of lowering Exchange implementation costs with those States awarded a Cooperative Agreement. Specifically, States can share approaches, system components, and other elements to achieve the goal of leveraging the models produced by Early Innovators.

The expected benefits of the Cooperative Agreements would include:

  1. Lower acquisition costs through volume purchasing agreements.
  2. Lower costs through partially shared or leveraged implementations. Organizations will be able to reuse the appropriate residuals and knowledge base from previous implementations.
  3. Improved implementation schedules, increased quality and reduced risks through reuse, peer collaboration and leveraging “lessons learned” across organizational boundaries.
  4. Lower support costs through shared services and reusable non-proprietary add-ons such as standards-based interfaces, management dashboards, and the like.
  5. Improved capacity for program evaluation using data generated by Exchange IT systems.

The grant wasn’t totally firm about open source, but it was still pretty clear. The section titled “Office of Consumer Information and Insurance Oversight: Intellectual Property” included the following:

The system design and software would be developed in a manner very similar to an open source model.

State grantees under this cooperative agreement shall not enter in to any contracts supporting the Exchange systems where Federal grant funds are used for the acquisition or purchase of software licenses and ownership of the licenses are not held or retained by either the State or Federal government.

It’s not totally clear what came of this. The last evidence I’ve seen of the work that came out of these grants is from a Powerpoint deck from August 2012. The following month a report was published by the National Academy of Social Insurance that provided some analysis of the effort. The part about code reuse (referred to as Tier 2) is not encouraging.

Tier 2: Sharing IT code, libraries, COTS software configurations, and packages of technical components that require the recipient to integrate and update them for their state specific needs.

Tier 2 reusability has been less common, although a number of states are discussing and exploring the reuse of code and other technical deliverables. One of the Tier 2 areas likely to be reused most involves states using similar COTS products for their efforts. COTS solutions, by their very nature, have the potential to be reused by multiple states. Software vendors will generally update and improve their products as they get implemented and as new or updated federal guidance becomes available. For instance, our interviews indicate that three of the states using the same COTS product for their portal have been meeting to discuss their development efforts with this product. Another option, given that both CMS and vendors are still developing MAGI business rules, is that states could potentially reuse these rules to reduce costs and time. CMS has estimated that costs and development could be reduced by up to 85 percent 32 when states reuse business rules when compared to custom development

When you’re simply talking about using the same piece of commercial software among multiple parties, you’re far from realizing the opportunity of open source. That said, the work developed by these states was really meant to be the foundation for reuse by others, so perhaps that was just the beginning. We do in fact have good precedent for recent open source efforts in the healthcare space. Just take a look at CONNECT, OSEHRA, or Blue Button.

Maybe there actually has been real re-use among the state exchanges, but so far I haven’t been able to find much evidence of that or any signs of open source code in public as was originally intended. Then again, there’s still time for some states to open up their own exchanges, so maybe we’ll see that over time. Right now the attention isn’t on the states so much anyway, but it turns out the Federally Facilitated Marketplace was supposed to be open source as well.

I’m not just talking about the open source work by Development Seed that recently disappeared from the Centers for Medicare & Medicaid Services’ Github page, I’m also talking about the so called, “Data Hub” that has been a critical component of the infrastructure – both for the federal exchange and for the states. The contractor that developed the Data Hub explains it like this:

Simply put, the Data Services Hub will transfer data. It will facilitate the process of verifying applicant information data, which is used by health insurance marketplaces to determine eligibility for qualified health plans and insurance programs, as well as for Medicaid and CHIP. The Hub’s function will be to route queries and responses between a given marketplace and various data sources. The Data Services Hub itself will not determine consumer eligibility, nor will it determine which health plans are available in the marketplaces.

So the Data Hub isn’t everything, but it’s clearly one of the most critical pieces of system. As the piece of integration that ties everything together, you might even call it the linchpin. What appears to be the original RFP for this critical piece of infrastructure makes it pretty clear that this was meant to be open source:

3.5.1 Other Assumptions
The Affordable Care Act requires the Federal government to provide technical support to States with Exchange grants. To the extent that tasks included in this scope of work could support State grantees in the development of Exchanges under these grants, the Contractor shall assume that data provided by the Federal government or developed in response to this scope of work and their deliverables and other assets associated with this scope of work will be shared in the open collaborative that is under way between States, CMS and other Federal agencies. This open collaborative is described in IT guidance 1.0. See

This collaboration occurs between State agencies, CMS and other Federal agencies to ensure effective and efficient data and information sharing between state health coverage programs and sources of authoritative data for such elements as income, citizenship, and immigration status, and to support the effective and efficient operation of Exchanges. Under this collaboration, CMS communicates and provides access to certain IT and business service capabilities or components developed and maintained at the Federal level as they become available, recognizing that they may be modified as new information and policy are developed. CMS expects that in this collaborative atmosphere, the solutions will emerge from the efforts of Contractors, business partners and government projects funded at both the State and federal levels. Because of demanding timelines for development, testing, deployment, and operation of IT systems and business services for the Exchanges and Medicaid agencies, CMS uses this collaboration to support and identify promising solutions early in their life cycle. Through this approach CMS is also trying to ensure that State development approaches are sufficiently flexible to integrate new IT and business services components as they become available.

The Contractor’s IT code, data and other information developed under this scope of work shall be open source, and made publicly available as directed and approved by the COTR.

The development of products and the provision of services provided under this scope of work as directed by the COTR are funded by the Federal government. State Exchanges must be self-funded following 2014. Products and services provided to a State by the Contractor under contract with a State will not be funded by the Federal government.

So far I haven’t been able to find any public code that looks like it would be the open source release of the Data Hub, but I remain optimistic.

Open source software is not a silver bullet, but it does a lot to encourage higher quality code and when coordinated properly it can do a great deal to maximize the use of tax dollars. Beyond the transparency of the code itself, it also helps provide transparency in the procurement process because it makes it easier to audit claims about what a company’s software is capable of and what software a company has already produced. It also tends to select for a culture and an acumen of software engineers that is genuinely driven to work with others to push the capability and impact of technology forward.

I hope we can stay committed to our obligations to maximize the tax dollars and ingenuity of the American people and stay true to the original vision of the exchanges as open source infrastructure for the 21st century.

Learning from the Infrastructure

Screenshot of account creation error

Over the past few days I’ve been paying attention to the problems that have troubled the launch of What I’ve compiled here is an outsider’s perspective and my technical analysis should be treated as educated speculation rather than insider knowledge or anything authoritative. All other commentary here should be treated as my own personal perspective as well. I state these disclaimers because I’ve had no direct involvement with this project and the only information I have to work with is what’s available to the public so some of my claims might be inaccurate. It’s also worth noting that I do know some of the people involved with the project and I’ve worked with some of the companies that were contractors, but I’ve gone about trying to understand this in an independent and unbiased way. Real accountability may be warranted for some of the problems we’ve seen so far, but I’m less interested in placing blame and more interested in simply learning what happened to help ensure it doesn’t happen again.


To be clear, the problems I’m referring to are specifically the issues relating to errors and an unresponsive website when creating a new account for the Federal Facilitated Marketplace hosted on The website is also used to provide related information and to redirect people to State Based Marketplaces hosted by their own states where they exist, but those aren’t the things I’m talking about here.

Unfortunately, many of the more political perspectives around the problems with this website have been illogical and much of the reporting in the news has either appeared to be inaccurate or so vague as to be meaningless. For example, some have claimed that problems with the website indicate that the Affordable Care Act is a bad idea and won’t work, but that’s a radical distortion of logic. This claim is like saying that a problem with an automatic sliding door or a broken cash register at a grocery store indicates that the grocery store (and better access to food) is a bad idea and won’t work. Others have claimed the high demand that caused glitches was unexpected, yet at the same time claim that glitches with the launch of a new product should be expected. Many of the news reports about the problems attempt to provide technical analysis, but mostly fail in identifying anything relevant or specific enough to be accurate or informative.

I do agree that the problems revealed themselves as a result of high demand. Exceeding capacity is a good problem to have, but it’s still a problem and it’s even a problem that gone unchecked could erode the kind of popularity that overwhelmed the system to begin with. I also think this is a problem that can be prevented and should never happen again. It’s true that Americans still have many more months of open enrollment, but first impressions really do matter, especially with something as sensitive as a new health care program.

It would be wonderful if an official postmortem was published to help us understand what happened with the launch and prepare us enough to prevent similar situations in the future. As an outsider’s perspective, my analysis shouldn’t be considered anything like that, but it is worth noting that the worst problems with the website are likely already behind us. As Alex Howard reports, there are indications that improvements in the past few days have made an impact, but things still look like they could be going more smoothly. A test conducted by myself today showed that there were basically no wait times whatsoever, but I was still unable to create a new account, receiving the error displayed above instead.

Until problems are fully resolved and until anything resembling a postmortem exists, there will be demand for more answers and better reporting on what has happened. My motivation for writing this is partly that “unexpected demand” or “inevitable glitches” haven’t been satisfying answers, but I’ve also been unsatisfied with the reporting. The best analysis I’ve seen so far has been by Paul Smith (also syndicated on Talking Points Memo) and Tom Lee (also syndicated on the Washington Post Wonkblog). Part of the reason why Paul and Tom’s writing is good is because it actually attempts to distinguish the different components of the infrastructure and explain the architectural significance of decoupling components in an asynchronous way.  Both of these pieces also point out that the frontend of the website, a Jekyll based system, was not the problem despite the many attempts at technical analysis in major publications that have tried to place fault there without looking further. Yet while Paul and Tom definitely seemed to get broad strokes right, I wanted more detail.

After reading Paul’s piece I started a thread among the current and past Presidential Innovation Fellows to see if anyone knew more about what was going on. Basically none of us had direct knowledge of the technical underpinnings of the system, but being furloughed and eager to fix problems turned this into one of the most active discussions I’ve seem among the fellows. I also saw similar discussions arise among the Code for America fellows. Over the course of a day or so we shared our insights and speculation and some reported on their findings. Kin Lane described his concerns about the openness and transparency of the project, especially the conflation of the open source frontend and the blackbox backend. Clay Johnson wrote about how problems with procurement contributed to the situation. I added most of my technical analysis as a comment on Tom Lee’s blog post and I’ve included that here with some edits:

Technical Analysis

For the basic process of creating an account on there are several potential areas for bottlenecks: 1) Delivering content to the user 2) receiving account creation data from the user 3) actually generating a new user account  4) validating identity and eligibility based on submitted account data.

As Tom and Paul point out, there is almost certainly no issue with point #1. Even though the frontend content is managed through the Ruby based Jekyll app, it’s basically all generated and delivered as static files which are then served by Akamai’s CDN. Even if there are many opportunities to create efficiencies there, it’s unlikely an issue when you’re just dealing with static files on a robust CDN. Placing blame on this smooth running frontend is frustrating not only because it is inaccurate but it also appears to be just about the only part of the system that was done well and done in a very open and innovative way. There’s smart underlying technology, a clean responsive design, a developer friendly API, and an open source project here. This piece was contracted out to a great DC tech firm called Development Seed and it’s been written about a lot before. (Also see Alex Howard’s piece in the Atlantic). Let’s say it again: this is not the problem.

It’s possible there could be a bottleneck in receiving data as writing to a system is almost always more resource intensive than reading data. The system receiving the data seems totally separate from the Ruby Jekyll code even if it appears on the same domain. It appears to be a Java based system as the response headers identify:
X-Powered-By:Servlet 2.5; JBoss-5.0/JBossWeb-2.1 on the account creation form POST to

HTTP/1.1 200 OK
Server: Apache
X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1
sysmessages: {"messages":["Business_ee_sap_MyAccountEIDMIntegration_CreateLiteEIDMAccount.OK_200.OK"]}
Content-Length: 181
Content-Type: application/json
X-Frame-Options: SAMEORIGIN;
X-Server: WS01
Expires: Sat, 05 Oct 2013 01:22:59 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Sat, 05 Oct 2013 01:22:59 GMT
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie:; Path=/ee-rest; Secure; HttpOnly
Access-Control-Allow-Origin: *

It’s unclear to me if this Java based system is able to do any deferred processing or act as a parallel autonomous system or if it relies on direct integration as a slave to another system. The aforementioned POST URL does refer to another system, the EIDM. The EIDM mentioned here is almost certainly the CMS’s (Center for Medicaid and Medicare) Enterprise Identity Management system. The login form on also seems to point directly to the EIDM which appears to be an Oracle Access Manager server:

Screenshot of Oracle Access Manager for

You can read more about the EIDM system and its contract on and the IT Dashboard but the best description I’ve seen so far comes from LinkedIn where it appears next to 11 team members associated with the project. Here’s that description:

EIDM is the consolidated Identity and Access Management System in CMS which is one of the largest Oracle 11gR2 Identity and Access Management deployment in the world with integrated all Oracle components to support 100 million users including providing Identity and Access Management Services for Federal Health Insurance Exchange as well as health insurance exchanges in all 50 states that use FFE level of IDM integration, and 100 of CMS federal applications.

Services available from EIDM have been grouped into four main services areas (registration service, authorization service, ID lifecycle management service, and access management service). CMS will make remote electronic services available in a reliable and secure manner to providers of services and suppliers to support their efforts to better manage and coordinate care furnished to beneficiaries and Exchange applicants of CMS programs.

Identify and Access Management services will provide identity and credential services for millions of partners, providers, insurance exchanges enrollees, beneficiaries and other CMS nonorganizational users; and thousands of CMS employees, contractors and other CMS organizational users. EIDM accepts other federal agency credentials provided to CMS from the Federated Cloud Credential Exchange and provides secure access to the CMS Enterprise Portal

It’s unclear to me how that Java backend and the EIDM connect with one another, but together they could account for potential bottlenecks as described by points #2 or #3. My guess is that any identity/eligibility verification (#4) happens totally separately and wouldn’t cause the issues on account creation. Nevertheless, even if that Java backend and the EIDM are totally separate systems there still could be a need to design them in a more decoupled way to allow for better deferred or parallel processing.

As an aside, the reason I pointed out “Java” and “Ruby” is partly superfluous, but refers to people who’ve made references to Twitter’s past performance issues which some have claimed to be relating to Ruby and Java. Twitter was originally written in Ruby and is now mostly driven by Java or other JVM served languages. In the case of the Ruby used for, it should be pretty irrelevant because the Ruby is primarily used to generate static files (using Jekyll) which are then served by a CDN.


There are a few other things that could also use more clarification: One, worth emphasizing again, is the conflation of the frontend that’s all developer friendly and open source (see and this backend that’s very opaque. As people have attempted to understand the problem, this conflation has been misleading and a cause for confusion. If everything had run smoothly there would be less need to clarify this, but so far the wrong piece of the product and the wrong people have been criticized because of this conflation. This is much of what Kin Lane wrote about. Another issue worth clarifying is the assumption that the federal government has the same access to agile rapidly scalable hosting infrastructure that the private sector has. Unfortunately, the “cloud” hosting services typically used in government pale in comparison to what is readily available in the private sector.

From a hosting infrastructure perspective, these kinds of scalability problems are increasingly less common in the private sector because so much has been done to engineer for high demand and to commoditize those capabilities. When an e-commerce website crashes on Cyber Monday, a whole lot of money is lost. This is why companies like Amazon have invested so much in building robust infrastructure to withstand demand and have even packaged and sold those capabilities for others to use through their Amazon Web Services (AWS) business.  One major flaw with comparisons to the private sector is that it is much easier to do phased roll-outs and limited beta-tests for new websites and it is much easier to acquire the latest and greatest infrastructure platforms like AWS, OpenStack, OpenShift, OpenCompute, and Azure. The main reasons for these discrepancies are about ensuring fairness and equitable access, just like many other distinctions between the private and public sector. A phased rollout of would probably seem unjust because of whomever got early access. Furthermore, many of the procurement policies that make access to services like AWS difficult have been put in place to prevent corrupt or unjust spending of taxpayers’ money. Unfortunately many of these policies have become so complicated that the issues get obfuscated, they repel innovative and cost effective solutions, and ultimately fail to achieve their original intent. Fortunately there are programs like FedRAMP that look like they’re starting to make common services like AWS and Azure more easily available for government projects, but this is far from commonplace at the moment. There’s also a lot more work needed to improve procurement to attract better talent for architecting good solutions. While the need for a more scaleable hosting environment was likely part of the problem here, it was probably more about the design of the software as Paul described. In this project, Development Seed seemed like an exception to the typical kind of work that comes out of federal IT contracts. The need to improve procurement is what Clay wrote about.


Aside from the deeper systemic issues that need attention, like procurement and open technology, there are some more immediate opportunities to prevent situations like this: 1) Better testing 2) Better user experience (UX) to handle possible delays 3) Better software architecture with more modularity and asynchronous components.

#1 Better testing: The issues with excessive demand on the website should’ve been detected well before launch with adequate load testing on the servers. The reports I’ve seen suggest that there was load testing or preparations for a certain load, but that the number of simultaneous users it was designed to account for was much smaller than what came to be. It might be helpful to ensure that load testing and QA is always conducted independently of the contractors who built the system. I’m not sure if that happened in this case. Perhaps the way those original estimates were determined should be also re-evaluated, but ultimately the design of the system should have accounted for a wide range of possible loads.  There are ways of designing servers and software so that they will perform well at varying loads as long as you can add more hardware resources to meet the demand. With an adequate hosting environment that is easy to do in an immediate and seamless way.

It’s worth noting that the contractor’s description of the EIDM system states that it “is one of the largest Oracle 11gR2 Identity and Access Management deployment in the world with integrated all Oracle components to support 100 million users.” I wouldn’t be surprised if this is the largest deployment in the world. That in itself should have been a red flag signaling it might not have proven itself at this scale and deserved extra stress testing.

Another helpful strategy is to allow for real-world limited beta testing or a phased roll out. Unfortunately there are policies in government that make it very difficult to conduct a limited beta test, but there is also the fairness issue I mentioned earlier. In this case, the fairness issue would actually be more of a false perception or fodder for political spin rather than anything substantial. Doing a phased roll out wouldn’t really be unfair because this isn’t a zero-sum resource, those who get insurance before others don’t make it harder or more expensive for those who come later to access the same insurance. In fact, you could almost argue that the opposite is true. It’s also worth noting that no matter how early you sign up, nobody is getting new health insurance under this program until January 1st. In some ways this actually was a phased roll out because open enrollment lasts all the way through the end of March 2014, but because there wasn’t more clear messaging to prevent a rush on day one, we got a massive rush on day one.

#2 Better UX to handle possible delays: A common way for a tech start-up to throttle new users coming to their recently launched platform is to provide a simple email sign-up form and then to send notifications for when they can actually get a real account. Something similar could’ve been provided as a contingency plan for an overloaded sign-up system on Instead, users got an “online waiting room” which they had to actively monitor in order to get access. Anything to better inform users of a possible situation like this and allow them to come back later rather than actively wait would have been a significant improvement to the user experience of the website in this situation

#3 Better software architecture with more modularity and asynchronous components. I think Paul Smith’s piece covered this pretty well, but it’s worth emphasizing. In some ways, there was more decoupling of components in this system than you might find in a typical government IT project with a monolithic stack developed by one contractor. The frontend and the backend were very separate systems, but unfortunately the backend couldn’t keep up with the frontend. To prevent that issue from causing a bottleneck the system could’ve been designed with a simple and robust queuing system to allow for deferred processing with good user experience that clearly stated that the user would receive an email when their new account was ready.

Fortunately there are already people working in government on improvements like this. Among the Presidential Innovation Fellows and many other people in government there are discussions about providing better systems for testing and preparing for the kind of extremely high traffic we’ve seen with There are also people working to improve many of the policies that make it difficult to get agile and robust IT infrastructure in government. The American people deserve to know their government is not only working to improve right now but that there are also many people who are working to learn from it and provide more responsive and graceful government services for the future.

Knight Funds the Civic Data Standards Study

I’m pleased to announce that the Knight Foundation is supporting a new project called the Civic Data Standards Study through their Prototype Fund.

The Civic Data Standards Study will investigate the successes and shortcomings of recent data standards efforts for government and civic projects to gain a better understanding of the dynamics involved and the factors responsible for success or failure. The study will profile several recent standards and the development processes used to create them.

The potential areas of focus include specific standards like the General Transit Feed Specification (GTFS) for public transit, the Open311 GeoReport API for providing feedback to government, the LIVES health inspection standard, and the Voting Information Project specification as well as standards frameworks and common schemas like the National Information Exchange Model (NIEM),, and Data Catalog Vocabulary (DCAT).

Beyond these case studies, the study will also help catalog a broader range of existing standards, emerging efforts, and areas with a clear need or lack of standards. Ultimately, the study aims to develop recommendations for facilitating future standards efforts.

You can find more details about the Civic Data Standards Study on the Knight Foundation website.

Stay tuned here for more updates and be sure to follow @civicagency on Twitter.

The News Challenge as an Open Process

Over the past few weeks there’s been some really valuable discussion about the proposal for DemocracyMap in the Knight News Challenge. Mostly this emerged from David Eaves’ piece in TechPresident which questioned the sustainability of the project, but I also wrote a guest post for the Open Knowledge Foundation blog describing the larger context that DemocracyMap exists within.

My last post responded to David’s concerns and attempted to dig deeper into some of the challenging questions surrounding the sustainability of civic technology projects. There were a number of tweets that distilled what I wrote such as Tech President’s Nick Judd saying, “Some things – like who your lawmakers are – you just shouldn’t have to pay to find out.” Philadelphia’s Chief Data Officer Mark Headd referred to it as “On the importance of access to public data and the dangers of over zealous contrarianism.”  And I think David Moore from the Participatory Politics Foundation might be writing something too. The Sunlight Foundation’s Tom Lee went even further and did a great job of refining and expanding on my thoughts with a post of his own titled, Making Open Government Data Sustainable and TechPresident continued to discuss the project during a podcast with Micah Sifry, David Eaves, and Nick Judd.

I was a little frustrated to listen to the podcast and hear David continue to claim that the DemocracyMap proposal makes no mention of other efforts in this space.  In reality, the proposal explicitly lists some it sees as having failed, links to a page with an even more complete list, and also mentions some of the other organizations working in this space which I hope to continue collaborating with.

It’s worth noting that one of the organizations doing the best work in this space is OpenNorth who is working on OpenNorth Represent, a Canadian counterpart to much of the work surrounding DemocracyMap. David Eaves is on the board of OpenNorth so his argument that something like Azavea Cicero should be left to itself rather than be threatened or complimented by others, even non-profit grant driven endeavors, is even more confusing when he you consider that he makes no mention of a similar project he’s associated with. In any case, I’ve been in discussion with the folks at OpenNorth for a while and as mentioned in my proposal, I hope to work with them and others more closely as this project moves forward. A huge focus of community collaboration in this space is around standards and OpenNorth has been a leader there. Standards are the only way we’ll be able to source data at scale and collaborate with one another effectively. See Popolo and Open Civic Data.

It’s true that I could have elaborated more on the successes and failures of other efforts and better articulated how DemocracyMap would be different, but at this stage I thought that the mentions were sufficient. Hopefully my earlier post on the OKFN blog and my response to David help provide some additional context around those topics, but I also hope to have the chance to elaborate further as the Knight News Challenge continues. This ongoing process brings me to something David discussed in the Tech President podcast that I think deserves much more recognition and consideration than it typically gets: one of the best aspects of the Knight News Challenge is the open process of developing and refining the proposals with multiple stages of public feedback and iterative improvements. In fact, I would go even further and say that the process of driving the creation of proposals is one of the most valuable things that the News Challenge does.

Hopefully it’s not too radical to claim that if you considered all the proposals submitted to the News Challenge that don’t make it as final grant recipients as a collective whole that their value would at least come close to equaling that which comes out of funding just one of the final projects. Obviously this is a hard thing to compare considering the creativity, critical thinking, and prototyping dispersed across those 800+ other proposals, but the News Challenge does compel a lot of thinking and making around all these projects. Even just dollar for dollar, I wonder if the Knight Foundation generates more useful stuff from all the proposals as a whole as it does from the final grantees. In fact, Knight may already be starting to think along these lines as evident by new efforts like the Knight Prototype Fund.

There are a number of parallels here with app contests and prize driven challenges in the context of government innovation. App contests have been a big part of the open government space for the past five years. I first started paying attention to them with Washington DC’s Apps for Democracy. In fact, the Open311 effort emerged out of DC’s second round of the Apps for Democracy contest. There was a lot of hype around app contests for a few years and while some are still going strong, there seems to be consensus that app contests don’t deliver as much value as people initially expect. Often this is because the contests are framed as delivering all sorts of useful new apps that will be sustained on their own, but that’s rarely the case. The only way for app contest to be successful in the long term is to carefully factor in sustainability whether that means helping to facilitate introductions to investors, or helping establish partnerships with relevant NGOs, or simply to have the government directly invest in ongoing development of the app. Even if sustainability isn’t totally factored in, app contests can generate a lot of great ideas and useful prototypes, but expectations should be set appropriately about the long term value that will come out of those.

In the case of the News Challenge, the intent is a little different because there’s such a singular focus on using the contest to narrow the proposals down to the final grant recipients. Yet I think there’s something to be said about considering the value of all the other proposals, especially given the point David emphasized: the process of public feedback really helps refine and iterate on the ideas. In fact, I think this is often more worthwhile than the app contest model since it can help guide an idea before a whole lot of work has been invested in building the app. In other words it can force you to articulate your hypothesis, demonstrate demand, and do something more lightweight like paper prototyping before real development. There’s a lot that could be done to better support all the proposals beyond the finalists, but I think the bare minimum that Knight could do is avoid outright discarding them.

While it’s true that the proposals from past News Challenges have been archived and are available online, none of this was done from the beginning with any consideration for permanent links. This means that the thousands of tweets and blog posts discussing the proposals are all broken each time the News Challenge starts anew. This was particularly noticeable last year because there were successive rounds of the News Challenge just a few months after one another, so it was really easy to see how quickly all the links to the proposals were getting broken. I’ve managed to assemble links to all the previous News Challenge submissions for DemocracyMap and it’s true that a Google search should still be effective in finding archived proposals, but all those links from social media and other sites (and with them all their Google page rank) are broken. This isn’t a hard problem to fix. For example, each year the News Challenge process could be hosted at a new subdomain like which would still make it possible to use another website or platform to host the process as they did with Tumblr last year.

Another simple strategy to get better long term value out of the proposals is to encourage each one to include enough information for people to follow the project and stay engaged even if it doesn’t make it through as a finalist. I’d completely forgotten to include these basic details until advised by Ben Sheldon. Ben had experienced this problem with the Pepsi Challenge where people invested a lot in community engagement around the short-term process of the contest and ended up without a way to continue the engagement when the contest was over. This is particularly important if you submit a project to multiple contests over time and want to have a central place where you can continue to engage and grow your community even if the project isn’t selected within each individual contest.

How else has the kind of public process leveraged by the News Challenge helped drive the creation or evolution of new projects and what else could be done to better support that on into the future?

Poll Taxes and Paying for Public Data

Poll Tax Receipt from 1964

Image courtesy of Brian D. Newby

Would you like to contact your city council member? That will cost $5.00. Would you like to vote in the general election? That will cost you $15.00. Would you like to attend a public meeting? Admission is $25.00.

There are certain elements of our democratic system of government that are so essential to its freedoms and principles that we have to make them as accessible as possible and provide them free of charge. Voting is probably the most crucial example of this which is why the poll tax of the Jim Crow era was made unconstitutional in 1964 by the 24th Amendment. Unfortunately 1964 is not very long ago and there continue to be efforts to make essential acts like voting more and more difficult including a recently proposed tax penalty in North Carolina for parents who’s children would like to vote while they’re in college. Ironically, the states which have some of the best voting practices today, Washington and Oregon, do impose a small fee of $0.46 or whatever the cost of postage is, but you’d think with all the money saved by administering a whole election by mail they could include the postage. We have it easy though, folks in many countries can be fined for not voting.

I bring up the relatively recent injustice of the poll tax because I think there are some parallels with fees imposed to access public data. This topic has been given a fair amount of attention this year following the loss of Aaron Swartz and demands to make all publicly funded scientific research available to the public. Yet there are still many tough questions and the Knight News Challenge on Open Government has definitely helped stir up the debate even more. This has been discussed by Matt McDonald and more recently with David Eaves’ post questioning the business model behind the DemocracyMap Knight News Challenge proposal.

I know straw man arguments against liberating government data and leveraging civic technology are in vogue these days, but I expected a little more from David Eaves’ TechPresident story. As David points out, we’ve known each other for a while and respect each other’s work, but that’s partly why I was so taken aback by what he wrote.

David writes about DemocracyMap because it’s the most viewed proposal of the Knight News Challenge semi-finalists and he’s fearful of it lacking a business model or being so corrupted and destructive with grant money that it will kill other solutions he sees as more viable like Azavea’s Cicero. He later goes on to talk about other projects he’s more optimistic about. I too care deeply about the sustainability of DemocracyMap which is exactly why it’s one of the only proposals that includes a section specifically devoted to sustainability and why it makes several mentions of business models and strategies for the success of the project including intentional obsolescence. The News Challenge didn’t ask anyone to include details on sustainability or business models, but I thought it was important enough to include anyway. What’s so confusing about David discussing DemocracyMap with fear is that none of the projects he goes on to discuss say much of anything about how they will be sustainable or what their business model will be. Furthermore, they all ask for more funding than the DemocracyMap proposal calls for, so I’m not sure why it’s the example he uses of being corruptible by grant money.

The other part that’s confusing about the post is how he pits DemocracyMap against Azavea Cicero. If David had talked to me before writing this he would’ve learned that I’ve been in touch with Robert at Azavea for a long time and he’s been supportive of DemocracyMap. In fact, I was meeting with someone who Robert had introduced me to when David posted this piece. While I have great respect for Robert’s work and Azavea and plan to continue coordinating with him, what’s troubling about the way David characterizes Cicero is that he assumes it’s already solving the problem and he assumes that it has a sustainable business model which should be an indicator of success. If I believed all those things to be true, I probably would not be doing this project, but the truth is there’s still a long way to go to solve this problem.

Perhaps it’s because of the way Cicero is shown as a product for sale that led David to think it was already solving the problem better than DemocracyMap. Yet there’s a wealth of easily available public data that’s not even included in Cicero’s results such as basic city and county contact information published by the Census Government Integrated Directory. DemocracyMap doesn’t just aim to cover thousands more cities than Cicero, it already does. The same could be said in comparing DemocracyMap to VoteSmart or many of the other services that are called out in the proposal.

The other conclusion that David jumped to is that Cicero is already sustainable, but as I knew in talking to Robert privately and as he later made public in his comment, that’s not true either. Just because something has a for profit business model does not mean that it’s a sustainably viable solution. This is very much what I was eluding to in my proposal by emphasizing that I didn’t want to repeat the history of the efforts that had failed to make a business by charging for this data. Even more confusingly, David later left a comment claiming that my proposal never stated this, but it was there all along.

While David explicitly thinks it’s dangerous when “success can be seen as external from sustainability” I actually think it’s very important to think of them separately. They are certainly interrelated, but it’s helpful to think of them distinctly since it’s often counterproductive if they are too deeply intertwined. In fact, I would argue that this more nuanced way of thinking is also in play at Azavea which is why Cicero continued to operate even at a loss and why the company is a B Corporation. In fact, this distinction is often consciously recognized in the civic sector even for efforts which are good at bringing in revenue. I suspect this is also why Matt McDonald expressed his interest in establishing a B Corporation or why even business savvy outfits like TurboVote are set up as non-profits. This isn’t to say that sustainability or profitability are bad, quite the contrary, but it is important to recognize that it doesn’t equate to successfully solving your problem. In fact, too much of a blind drive toward profit, can actually make it harder to be successful. We even see this with big for-profit companies where too much focus on short term gains can hamper long term profitability. Increasingly, even musicians and writers are finding that if they make their work more easily available, even freely available, they’re more likely to be successful and even more profitable in the long run. Traditionally, news publications have tended to lose money on the best investigative reporting they do, and we definitely need to keep working on creative ways to support that, but simply basing success on the profit of each and every story is not a recipe for good journalism or a good company.

In the context of democracy you might also consider that the folks in the US who see the market as the tool to fix every problem and the only true indicator of success are often the same kinds of folks who are making it harder for people to engage in democratic processes resorting to even, you guessed it, economic pressure like new taxes to make it harder for people to vote.

This isn’t to say that David doesn’t recognize the folly of overwhelming financial influence. In fact he clearly states, “The key problem with money – particularly grant money – is that it can distort a problem and create the wrong incentives.” which makes it all the more confusing why he also argues for a such a simplistic profit driven approach for public access to public data. To be fair, in this case he’s concerned by the threat of unseating an incumbent and the risk of destroying the whole market by not being a sustainable replacement. To his credit, and as Robert later elaborates, this is a totally lucid point. In the private sector, profit driven ventures tend to condone more risk because they often care more about the possibility of turning profit than the chances of hurting their whole industry. Social entrepreneurs and grant makers on the other hand have to be much more discerning and have a broader understanding of their field if they genuinely care more about solving the problem than turning a profit. However, in this scenario this isn’t a particularly valid concern since the incumbent he cites or any of the others I cite in my proposal are not particularly sustainable nor fully solving the problem. If David had done a little research, this would’ve been obvious. Furthermore, if this kind of concern isn’t thought out more carefully it has the potential of being even more counterproductive by simply maintaining the status quo rather than striving for progress.

I had to re-read David’s use of the word “disruption” a few times because I’m so accustomed to seeing it used in a positive light, particularly in the context of new technology. The Code for America Accelerator runs under the banner of “Disruption as a Public Service” and Emer Coleman, the former Deputy Director of Digital Engagement for the UK’s Government Digital Service, has a new company called Disruption Ltd. While it’s true that there are some rare instances where a new company or project can be so destructive that it ruins the whole field including itself, the public sector is littered with stagnant, inefficient, unproductive systems that are in much need of disruption. In this context, the traditional “sustainability” of current offerings is often counterproductive – which is also why efforts like are so important. As new software becomes cheaper and easier to develop, it becomes easier to see how many companies that profit from government inefficiencies are actually stymying progress. As was mentioned earlier, “money can distort a problem and create the wrong incentives.” The lobbying efforts of Intuit and H&R block against “return-free filing” are a potent reminder of this and if you need a refresher on the crippling consequences of money on the broader workings of a democracy, I encourage you to see Lawrence Lessig’s latest talk to remind yourself how much work still needs to be done.

The trick is to position the incentives associated with sustainability in a way that provides the most leverage toward progress and the common good. The smartest and most successful companies tend to put their profit driven incentives at a place that forces them to make the most progress and deliver the best products and services in a way that advances their whole industry. Often this means disrupting the status quo and sometimes you are the status quo and you have to cannibalize your own company to move forward. Even Apple’s advancements with the iPad killed their own laptop sales, but it helped advance technology for everyone and ultimately delivered higher sales for Apple.

In the case of charging for access to public data, it’s not only ethically questionable, but it’s counter productive and usually unsustainable. The most common ethical quandary in charging for public data is that you are making people pay for data their tax dollars already paid for. In the case of DemocracyMap we’re also talking about obstructing access to some of the most essential information needed for us to interact with our own democracy and essential government services – hence the reference to the poll tax earlier. When these kinds of sensitive ethical issues are less applicable to a particular dataset, I can understand the approach of companies that initially bootstrap themselves by selling access to data. I think Brightscope is a common example of this. However, I think it’s risky and unsustainable to build a business on access to the data alone. For one thing, scraping public data rarely involves much ingenuity or creativity, it’s usually more of a brute force thing. This means your competitors rarely have much of a barrier to entry. For another thing, the real value of data to the people who actually need it is typically not realized until it’s meaningfully analyzed or given enough context to be relevant to them. The final point is that governments increasingly understand the value of opening their data and have the potential to undercut you with free access.

Data is not a zero sum resource like a parking spot, it’s value tends to increase when more people have access to it. This is true even in the sense of delivering more revenue to those who provide the data. For example, making public transit data freely available can increase ridership and improve support for public funding, both of which can increase revenue to the transit agency. Governments are learning this and starting to make their data open by default.  You don’t want your whole business to be threatened by a simple policy change that’s becoming increasingly common. Furthermore, under US law, the facts that comprise most raw public data are not subject to copyright, so selling or licensing this data is dubious anyway. Again, the best and smartest companies are the ones who are always aware of these threats and advance themselves preemptively.

I would argue that the most stable and progressive way to position sustainability in the context of public data is at the extremes: the point where the data is produced and the point where it’s analyzed and contextualized, not with a pay wall at the point where it’s published. In the case of DemocracyMap, I think it’s important to focus on the root of the problem and work to ensure that this data is managed and published at the source in the most accessible and useful way possible. While DemocracyMap already provides basic tools to contextualize the data and will likely develop even more advanced ones, the main intent is to help ensure the conditions for an ecosystem where everyone can help play that role, particularly journalists and civic hackers. In some ways this may take the form of providing support and software as a service to cities, states, and other entities who manage this data internally, but in other ways it may even be about convincing those who can set policy at a high level. Over twenty years ago, back in 1992, the US Census did actually collect and manage a significant portion of this data, but they haven’t since. As the Census becomes more and more digital, I would love to see them better incorporate the goals of DemocracyMap. One of the most scalable ways to make this data more accessible is by establishing open standards much like I’ve done with Open311 and that is definitely emphasized in the proposal. So while I think DemocracyMap can help deliver revenue generating tools that are used to produce and maintain this data, one of the central goals is to make the current practice of scraping and manually aggregating data obsolete.

I do think that sustainably minded efforts tend to deliver the best results, but it’s also important to consider that some efforts are best served when they are made obsolete. It’s also worth noting how much leverage an investment in a few engineers can have even when there’s no revenue model whatsoever. The Voting Information Project has made a huge impact with just a handful of engineers and an investment that I’m sure pales in comparison to the money flowing toward the dozens of voter suppression laws that have been introduced this year. Carl Malamud deserves recognition here as well. If he had simply turned access to EDGAR into a business of his own back in 1994 rather than making it freely available and ultimately getting the SEC to do so themselves, then we might not even have the momentum behind liberating government data that we have today. Carl continues to have a big impact with this strategy, now primarily focusing on liberating access to legal documents such as the legal code for the District of Columbia. The D.C. Code, the law which governs D.C. just like many other municipal codes, is one which you traditionally had to pay for to get a copy. Normally this was sold for over $800, but after Carl made it freely available online the District government was compelled to do so as well. Charging for this is particularly egregious because it’s not like most data which is the byproduct of some government operation or policy, it is the law itself and multiple court cases have already made it clear that the law must be freely available. In the grand scheme of things I don’t think it costs a whole lot to support the kind of work Carl and the VIP are doing and i think these catalysts are well worth the investment to ensure that people don’t have to pay extra for civic education and civic engagement.

The Biggest Failure of Open Data in Government

Many open data initiatives forget to include the basic facts about the government itself

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

It’s difficult enough to come by comprehensive information about local government, so there definitely aren’t many surveys that help convey this problem, but you can start to get the idea from a pretty high level. Studies have shown that only about two thirds of Americans can name their governor (Pew 2007) while less than half can name even one of their senators (Social Capital Community Survey 2006). This excerpt from Andrew Romano in Newsweek captures the problem well:

Most experts agree that the relative complexity of the U.S. political system makes it hard for Americans to keep up. In many European countries, parliaments have proportional representation, and the majority party rules without having to “share power with a lot of subnational governments,” notes Yale political scientist Jacob Hacker, coauthor of Winner-Take-All Politics. In contrast, we’re saddled with a nonproportional Senate; a tangle of state, local, and federal bureaucracies; and near-constant elections for every imaginable office (judge, sheriff, school-board member, and so on). “Nobody is competent to understand it all, which you realize every time you vote,” says Michael Schudson, author of The Good Citizen. “You know you’re going to come up short, and that discourages you from learning more.”

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program. Even the cities that have been very progressive with open data like Washington DC and New York neglect to publish basic information like the names and contact details of their city councilmembers as raw open data. The NYC Green Book was finally posted online last year, but it’s still not available as raw data. Even in the broader open data and open government community, this information doesn’t get much attention. The basic contact details for government offices and elected officials were not part of the Open Data Census and neither were jurisdiction boundaries for government districts.

Fortunately, a number of projects have started working to address this discrepancy. In the UK, there’s already been great progress with websites like OpenlyLocal, TheyWorkForYou and MapIt, but similar efforts in North America are much more nascent. OpenNorth Represent has quickly become the most comprehensive database of Canadian elected officials with data that covers about half the population and boundary data that covers nearly two thirds. In the US, the OpenStates project has made huge progress in providing comprehensive coverage of the roughly 7,500 state legislators across the country while the Voting Information Project has started to provide comprehensive open data on where to vote and what’s on the ballot – some of the most essential yet most elusive data in our democracy. Most recently, DemocracyMap has been digging in at the local level, building off the data from the OpenStates API and the Sunlight Congress API and deploying an arsenal of web scrapers to provide the most comprehensive open dataset of elected officials and government boundaries in the US. The DemocracyMap API currently includes over 100,000 local officials, but it still needs a lot more data for complete coverage. In order to scale, many of these projects have taken an open source community-driven approach where volunteers are able to contribute scrapers to unlock more data, but many of us have also come to realize that we need data standards so we can work together better and so our governments can publish data the right way from the start.

James McKinney from OpenNorth has already put a lot of work into the Popolo Project, an initial draft of data standards to cover some of the most basic information about government like people and their offices. More recently James also started a W3C Open Government Community Group to help develop these standards with others working in this field. In the coming months I hope to see a greater convergence of these efforts so we can agree on basic standards and begin to establish a common infrastructure for defining and discovering who and what our government is. Imagine an atlas for navigating the political geography of the world from the international offices to those in the smallest neighborhood councils.

This is a problem that is so basic that most people are shocked when they realize it hasn’t been solved yet. It’s one of the most myopic aspects of the open government movement. Fortunately we are now making significant progress, but we need all the support we can get: scraping more data, establishing standards, and convincing folks like the Secretaries of State in many US States that we need to publish all boundaries and basic government contact information as open data. If you’re starting a new open data program, please don’t forget about the basics!

DemocracyMap is a submission for the Knight News Challenge. You can read the full proposal and provide feedback on the Knight News Challenge page.