Category Archives: Open Data

The News Challenge as an Open Process

Over the past few weeks there’s been some really valuable discussion about the proposal for DemocracyMap in the Knight News Challenge. Mostly this emerged from David Eaves’ piece in TechPresident which questioned the sustainability of the project, but I also wrote a guest post for the Open Knowledge Foundation blog describing the larger context that DemocracyMap exists within.

My last post responded to David’s concerns and attempted to dig deeper into some of the challenging questions surrounding the sustainability of civic technology projects. There were a number of tweets that distilled what I wrote such as Tech President’s Nick Judd saying, “Some things – like who your lawmakers are – you just shouldn’t have to pay to find out.” Philadelphia’s Chief Data Officer Mark Headd referred to it as “On the importance of access to public data and the dangers of over zealous contrarianism.”  And I think David Moore from the Participatory Politics Foundation might be writing something too. The Sunlight Foundation’s Tom Lee went even further and did a great job of refining and expanding on my thoughts with a post of his own titled, Making Open Government Data Sustainable and TechPresident continued to discuss the project during a podcast with Micah Sifry, David Eaves, and Nick Judd.

I was a little frustrated to listen to the podcast and hear David continue to claim that the DemocracyMap proposal makes no mention of other efforts in this space.  In reality, the proposal explicitly lists some it sees as having failed, links to a page with an even more complete list, and also mentions some of the other organizations working in this space which I hope to continue collaborating with.

It’s worth noting that one of the organizations doing the best work in this space is OpenNorth who is working on OpenNorth Represent, a Canadian counterpart to much of the work surrounding DemocracyMap. David Eaves is on the board of OpenNorth so his argument that something like Azavea Cicero should be left to itself rather than be threatened or complimented by others, even non-profit grant driven endeavors, is even more confusing when he you consider that he makes no mention of a similar project he’s associated with. In any case, I’ve been in discussion with the folks at OpenNorth for a while and as mentioned in my proposal, I hope to work with them and others more closely as this project moves forward. A huge focus of community collaboration in this space is around standards and OpenNorth has been a leader there. Standards are the only way we’ll be able to source data at scale and collaborate with one another effectively. See Popolo and Open Civic Data.

It’s true that I could have elaborated more on the successes and failures of other efforts and better articulated how DemocracyMap would be different, but at this stage I thought that the mentions were sufficient. Hopefully my earlier post on the OKFN blog and my response to David help provide some additional context around those topics, but I also hope to have the chance to elaborate further as the Knight News Challenge continues. This ongoing process brings me to something David discussed in the Tech President podcast that I think deserves much more recognition and consideration than it typically gets: one of the best aspects of the Knight News Challenge is the open process of developing and refining the proposals with multiple stages of public feedback and iterative improvements. In fact, I would go even further and say that the process of driving the creation of proposals is one of the most valuable things that the News Challenge does.

Hopefully it’s not too radical to claim that if you considered all the proposals submitted to the News Challenge that don’t make it as final grant recipients as a collective whole that their value would at least come close to equaling that which comes out of funding just one of the final projects. Obviously this is a hard thing to compare considering the creativity, critical thinking, and prototyping dispersed across those 800+ other proposals, but the News Challenge does compel a lot of thinking and making around all these projects. Even just dollar for dollar, I wonder if the Knight Foundation generates more useful stuff from all the proposals as a whole as it does from the final grantees. In fact, Knight may already be starting to think along these lines as evident by new efforts like the Knight Prototype Fund.

There are a number of parallels here with app contests and prize driven challenges in the context of government innovation. App contests have been a big part of the open government space for the past five years. I first started paying attention to them with Washington DC’s Apps for Democracy. In fact, the Open311 effort emerged out of DC’s second round of the Apps for Democracy contest. There was a lot of hype around app contests for a few years and while some are still going strong, there seems to be consensus that app contests don’t deliver as much value as people initially expect. Often this is because the contests are framed as delivering all sorts of useful new apps that will be sustained on their own, but that’s rarely the case. The only way for app contest to be successful in the long term is to carefully factor in sustainability whether that means helping to facilitate introductions to investors, or helping establish partnerships with relevant NGOs, or simply to have the government directly invest in ongoing development of the app. Even if sustainability isn’t totally factored in, app contests can generate a lot of great ideas and useful prototypes, but expectations should be set appropriately about the long term value that will come out of those.

In the case of the News Challenge, the intent is a little different because there’s such a singular focus on using the contest to narrow the proposals down to the final grant recipients. Yet I think there’s something to be said about considering the value of all the other proposals, especially given the point David emphasized: the process of public feedback really helps refine and iterate on the ideas. In fact, I think this is often more worthwhile than the app contest model since it can help guide an idea before a whole lot of work has been invested in building the app. In other words it can force you to articulate your hypothesis, demonstrate demand, and do something more lightweight like paper prototyping before real development. There’s a lot that could be done to better support all the proposals beyond the finalists, but I think the bare minimum that Knight could do is avoid outright discarding them.

While it’s true that the proposals from past News Challenges have been archived and are available online, none of this was done from the beginning with any consideration for permanent links. This means that the thousands of tweets and blog posts discussing the proposals are all broken each time the News Challenge starts anew. This was particularly noticeable last year because there were successive rounds of the News Challenge just a few months after one another, so it was really easy to see how quickly all the links to the proposals were getting broken. I’ve managed to assemble links to all the previous News Challenge submissions for DemocracyMap and it’s true that a Google search should still be effective in finding archived proposals, but all those links from social media and other sites (and with them all their Google page rank) are broken. This isn’t a hard problem to fix. For example, each year the News Challenge process could be hosted at a new subdomain like 2013.newschallenge.org which would still make it possible to use another website or platform to host the process as they did with Tumblr last year.

Another simple strategy to get better long term value out of the proposals is to encourage each one to include enough information for people to follow the project and stay engaged even if it doesn’t make it through as a finalist. I’d completely forgotten to include these basic details until advised by Ben Sheldon. Ben had experienced this problem with the Pepsi Challenge where people invested a lot in community engagement around the short-term process of the contest and ended up without a way to continue the engagement when the contest was over. This is particularly important if you submit a project to multiple contests over time and want to have a central place where you can continue to engage and grow your community even if the project isn’t selected within each individual contest.

How else has the kind of public process leveraged by the News Challenge helped drive the creation or evolution of new projects and what else could be done to better support that on into the future?

Poll Taxes and Paying for Public Data

Poll Tax Receipt from 1964

Image courtesy of Brian D. Newby

Would you like to contact your city council member? That will cost $5.00. Would you like to vote in the general election? That will cost you $15.00. Would you like to attend a public meeting? Admission is $25.00.

There are certain elements of our democratic system of government that are so essential to its freedoms and principles that we have to make them as accessible as possible and provide them free of charge. Voting is probably the most crucial example of this which is why the poll tax of the Jim Crow era was made unconstitutional in 1964 by the 24th Amendment. Unfortunately 1964 is not very long ago and there continue to be efforts to make essential acts like voting more and more difficult including a recently proposed tax penalty in North Carolina for parents who’s children would like to vote while they’re in college. Ironically, the states which have some of the best voting practices today, Washington and Oregon, do impose a small fee of $0.46 or whatever the cost of postage is, but you’d think with all the money saved by administering a whole election by mail they could include the postage. We have it easy though, folks in many countries can be fined for not voting.

I bring up the relatively recent injustice of the poll tax because I think there are some parallels with fees imposed to access public data. This topic has been given a fair amount of attention this year following the loss of Aaron Swartz and demands to make all publicly funded scientific research available to the public. Yet there are still many tough questions and the Knight News Challenge on Open Government has definitely helped stir up the debate even more. This has been discussed by Matt McDonald and more recently with David Eaves’ post questioning the business model behind the DemocracyMap Knight News Challenge proposal.

I know straw man arguments against liberating government data and leveraging civic technology are in vogue these days, but I expected a little more from David Eaves’ TechPresident story. As David points out, we’ve known each other for a while and respect each other’s work, but that’s partly why I was so taken aback by what he wrote.

David writes about DemocracyMap because it’s the most viewed proposal of the Knight News Challenge semi-finalists and he’s fearful of it lacking a business model or being so corrupted and destructive with grant money that it will kill other solutions he sees as more viable like Azavea’s Cicero. He later goes on to talk about other projects he’s more optimistic about. I too care deeply about the sustainability of DemocracyMap which is exactly why it’s one of the only proposals that includes a section specifically devoted to sustainability and why it makes several mentions of business models and strategies for the success of the project including intentional obsolescence. The News Challenge didn’t ask anyone to include details on sustainability or business models, but I thought it was important enough to include anyway. What’s so confusing about David discussing DemocracyMap with fear is that none of the projects he goes on to discuss say much of anything about how they will be sustainable or what their business model will be. Furthermore, they all ask for more funding than the DemocracyMap proposal calls for, so I’m not sure why it’s the example he uses of being corruptible by grant money.

The other part that’s confusing about the post is how he pits DemocracyMap against Azavea Cicero. If David had talked to me before writing this he would’ve learned that I’ve been in touch with Robert at Azavea for a long time and he’s been supportive of DemocracyMap. In fact, I was meeting with someone who Robert had introduced me to when David posted this piece. While I have great respect for Robert’s work and Azavea and plan to continue coordinating with him, what’s troubling about the way David characterizes Cicero is that he assumes it’s already solving the problem and he assumes that it has a sustainable business model which should be an indicator of success. If I believed all those things to be true, I probably would not be doing this project, but the truth is there’s still a long way to go to solve this problem.

Perhaps it’s because of the way Cicero is shown as a product for sale that led David to think it was already solving the problem better than DemocracyMap. Yet there’s a wealth of easily available public data that’s not even included in Cicero’s results such as basic city and county contact information published by the Census Government Integrated Directory. DemocracyMap doesn’t just aim to cover thousands more cities than Cicero, it already does. The same could be said in comparing DemocracyMap to VoteSmart or many of the other services that are called out in the proposal.

The other conclusion that David jumped to is that Cicero is already sustainable, but as I knew in talking to Robert privately and as he later made public in his comment, that’s not true either. Just because something has a for profit business model does not mean that it’s a sustainably viable solution. This is very much what I was eluding to in my proposal by emphasizing that I didn’t want to repeat the history of the efforts that had failed to make a business by charging for this data. Even more confusingly, David later left a comment claiming that my proposal never stated this, but it was there all along.

While David explicitly thinks it’s dangerous when “success can be seen as external from sustainability” I actually think it’s very important to think of them separately. They are certainly interrelated, but it’s helpful to think of them distinctly since it’s often counterproductive if they are too deeply intertwined. In fact, I would argue that this more nuanced way of thinking is also in play at Azavea which is why Cicero continued to operate even at a loss and why the company is a B Corporation. In fact, this distinction is often consciously recognized in the civic sector even for efforts which are good at bringing in revenue. I suspect this is also why Matt McDonald expressed his interest in establishing a B Corporation or why even business savvy outfits like TurboVote are set up as non-profits. This isn’t to say that sustainability or profitability are bad, quite the contrary, but it is important to recognize that it doesn’t equate to successfully solving your problem. In fact, too much of a blind drive toward profit, can actually make it harder to be successful. We even see this with big for-profit companies where too much focus on short term gains can hamper long term profitability. Increasingly, even musicians and writers are finding that if they make their work more easily available, even freely available, they’re more likely to be successful and even more profitable in the long run. Traditionally, news publications have tended to lose money on the best investigative reporting they do, and we definitely need to keep working on creative ways to support that, but simply basing success on the profit of each and every story is not a recipe for good journalism or a good company.

In the context of democracy you might also consider that the folks in the US who see the market as the tool to fix every problem and the only true indicator of success are often the same kinds of folks who are making it harder for people to engage in democratic processes resorting to even, you guessed it, economic pressure like new taxes to make it harder for people to vote.

This isn’t to say that David doesn’t recognize the folly of overwhelming financial influence. In fact he clearly states, “The key problem with money – particularly grant money – is that it can distort a problem and create the wrong incentives.” which makes it all the more confusing why he also argues for a such a simplistic profit driven approach for public access to public data. To be fair, in this case he’s concerned by the threat of unseating an incumbent and the risk of destroying the whole market by not being a sustainable replacement. To his credit, and as Robert later elaborates, this is a totally lucid point. In the private sector, profit driven ventures tend to condone more risk because they often care more about the possibility of turning profit than the chances of hurting their whole industry. Social entrepreneurs and grant makers on the other hand have to be much more discerning and have a broader understanding of their field if they genuinely care more about solving the problem than turning a profit. However, in this scenario this isn’t a particularly valid concern since the incumbent he cites or any of the others I cite in my proposal are not particularly sustainable nor fully solving the problem. If David had done a little research, this would’ve been obvious. Furthermore, if this kind of concern isn’t thought out more carefully it has the potential of being even more counterproductive by simply maintaining the status quo rather than striving for progress.

I had to re-read David’s use of the word “disruption” a few times because I’m so accustomed to seeing it used in a positive light, particularly in the context of new technology. The Code for America Accelerator runs under the banner of “Disruption as a Public Service” and Emer Coleman, the former Deputy Director of Digital Engagement for the UK’s Government Digital Service, has a new company called Disruption Ltd. While it’s true that there are some rare instances where a new company or project can be so destructive that it ruins the whole field including itself, the public sector is littered with stagnant, inefficient, unproductive systems that are in much need of disruption. In this context, the traditional “sustainability” of current offerings is often counterproductive – which is also why efforts like Procure.io are so important. As new software becomes cheaper and easier to develop, it becomes easier to see how many companies that profit from government inefficiencies are actually stymying progress. As was mentioned earlier, “money can distort a problem and create the wrong incentives.” The lobbying efforts of Intuit and H&R block against “return-free filing” are a potent reminder of this and if you need a refresher on the crippling consequences of money on the broader workings of a democracy, I encourage you to see Lawrence Lessig’s latest talk to remind yourself how much work still needs to be done.

The trick is to position the incentives associated with sustainability in a way that provides the most leverage toward progress and the common good. The smartest and most successful companies tend to put their profit driven incentives at a place that forces them to make the most progress and deliver the best products and services in a way that advances their whole industry. Often this means disrupting the status quo and sometimes you are the status quo and you have to cannibalize your own company to move forward. Even Apple’s advancements with the iPad killed their own laptop sales, but it helped advance technology for everyone and ultimately delivered higher sales for Apple.

In the case of charging for access to public data, it’s not only ethically questionable, but it’s counter productive and usually unsustainable. The most common ethical quandary in charging for public data is that you are making people pay for data their tax dollars already paid for. In the case of DemocracyMap we’re also talking about obstructing access to some of the most essential information needed for us to interact with our own democracy and essential government services – hence the reference to the poll tax earlier. When these kinds of sensitive ethical issues are less applicable to a particular dataset, I can understand the approach of companies that initially bootstrap themselves by selling access to data. I think Brightscope is a common example of this. However, I think it’s risky and unsustainable to build a business on access to the data alone. For one thing, scraping public data rarely involves much ingenuity or creativity, it’s usually more of a brute force thing. This means your competitors rarely have much of a barrier to entry. For another thing, the real value of data to the people who actually need it is typically not realized until it’s meaningfully analyzed or given enough context to be relevant to them. The final point is that governments increasingly understand the value of opening their data and have the potential to undercut you with free access.

Data is not a zero sum resource like a parking spot, it’s value tends to increase when more people have access to it. This is true even in the sense of delivering more revenue to those who provide the data. For example, making public transit data freely available can increase ridership and improve support for public funding, both of which can increase revenue to the transit agency. Governments are learning this and starting to make their data open by default.  You don’t want your whole business to be threatened by a simple policy change that’s becoming increasingly common. Furthermore, under US law, the facts that comprise most raw public data are not subject to copyright, so selling or licensing this data is dubious anyway. Again, the best and smartest companies are the ones who are always aware of these threats and advance themselves preemptively.

I would argue that the most stable and progressive way to position sustainability in the context of public data is at the extremes: the point where the data is produced and the point where it’s analyzed and contextualized, not with a pay wall at the point where it’s published. In the case of DemocracyMap, I think it’s important to focus on the root of the problem and work to ensure that this data is managed and published at the source in the most accessible and useful way possible. While DemocracyMap already provides basic tools to contextualize the data and will likely develop even more advanced ones, the main intent is to help ensure the conditions for an ecosystem where everyone can help play that role, particularly journalists and civic hackers. In some ways this may take the form of providing support and software as a service to cities, states, and other entities who manage this data internally, but in other ways it may even be about convincing those who can set policy at a high level. Over twenty years ago, back in 1992, the US Census did actually collect and manage a significant portion of this data, but they haven’t since. As the Census becomes more and more digital, I would love to see them better incorporate the goals of DemocracyMap. One of the most scalable ways to make this data more accessible is by establishing open standards much like I’ve done with Open311 and that is definitely emphasized in the proposal. So while I think DemocracyMap can help deliver revenue generating tools that are used to produce and maintain this data, one of the central goals is to make the current practice of scraping and manually aggregating data obsolete.

I do think that sustainably minded efforts tend to deliver the best results, but it’s also important to consider that some efforts are best served when they are made obsolete. It’s also worth noting how much leverage an investment in a few engineers can have even when there’s no revenue model whatsoever. The Voting Information Project has made a huge impact with just a handful of engineers and an investment that I’m sure pales in comparison to the money flowing toward the dozens of voter suppression laws that have been introduced this year. Carl Malamud deserves recognition here as well. If he had simply turned access to EDGAR into a business of his own back in 1994 rather than making it freely available and ultimately getting the SEC to do so themselves, then we might not even have the momentum behind liberating government data that we have today. Carl continues to have a big impact with this strategy, now primarily focusing on liberating access to legal documents such as the legal code for the District of Columbia. The D.C. Code, the law which governs D.C. just like many other municipal codes, is one which you traditionally had to pay for to get a copy. Normally this was sold for over $800, but after Carl made it freely available online the District government was compelled to do so as well. Charging for this is particularly egregious because it’s not like most data which is the byproduct of some government operation or policy, it is the law itself and multiple court cases have already made it clear that the law must be freely available. In the grand scheme of things I don’t think it costs a whole lot to support the kind of work Carl and the VIP are doing and i think these catalysts are well worth the investment to ensure that people don’t have to pay extra for civic education and civic engagement.

The Biggest Failure of Open Data in Government

Many open data initiatives forget to include the basic facts about the government itself

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

It’s difficult enough to come by comprehensive information about local government, so there definitely aren’t many surveys that help convey this problem, but you can start to get the idea from a pretty high level. Studies have shown that only about two thirds of Americans can name their governor (Pew 2007) while less than half can name even one of their senators (Social Capital Community Survey 2006). This excerpt from Andrew Romano in Newsweek captures the problem well:

Most experts agree that the relative complexity of the U.S. political system makes it hard for Americans to keep up. In many European countries, parliaments have proportional representation, and the majority party rules without having to “share power with a lot of subnational governments,” notes Yale political scientist Jacob Hacker, coauthor of Winner-Take-All Politics. In contrast, we’re saddled with a nonproportional Senate; a tangle of state, local, and federal bureaucracies; and near-constant elections for every imaginable office (judge, sheriff, school-board member, and so on). “Nobody is competent to understand it all, which you realize every time you vote,” says Michael Schudson, author of The Good Citizen. “You know you’re going to come up short, and that discourages you from learning more.”

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program. Even the cities that have been very progressive with open data like Washington DC and New York neglect to publish basic information like the names and contact details of their city councilmembers as raw open data. The NYC Green Book was finally posted online last year, but it’s still not available as raw data. Even in the broader open data and open government community, this information doesn’t get much attention. The basic contact details for government offices and elected officials were not part of the Open Data Census and neither were jurisdiction boundaries for government districts.

Fortunately, a number of projects have started working to address this discrepancy. In the UK, there’s already been great progress with websites like OpenlyLocal, TheyWorkForYou and MapIt, but similar efforts in North America are much more nascent. OpenNorth Represent has quickly become the most comprehensive database of Canadian elected officials with data that covers about half the population and boundary data that covers nearly two thirds. In the US, the OpenStates project has made huge progress in providing comprehensive coverage of the roughly 7,500 state legislators across the country while the Voting Information Project has started to provide comprehensive open data on where to vote and what’s on the ballot – some of the most essential yet most elusive data in our democracy. Most recently, DemocracyMap has been digging in at the local level, building off the data from the OpenStates API and the Sunlight Congress API and deploying an arsenal of web scrapers to provide the most comprehensive open dataset of elected officials and government boundaries in the US. The DemocracyMap API currently includes over 100,000 local officials, but it still needs a lot more data for complete coverage. In order to scale, many of these projects have taken an open source community-driven approach where volunteers are able to contribute scrapers to unlock more data, but many of us have also come to realize that we need data standards so we can work together better and so our governments can publish data the right way from the start.

James McKinney from OpenNorth has already put a lot of work into the Popolo Project, an initial draft of data standards to cover some of the most basic information about government like people and their offices. More recently James also started a W3C Open Government Community Group to help develop these standards with others working in this field. In the coming months I hope to see a greater convergence of these efforts so we can agree on basic standards and begin to establish a common infrastructure for defining and discovering who and what our government is. Imagine an atlas for navigating the political geography of the world from the international offices to those in the smallest neighborhood councils.

This is a problem that is so basic that most people are shocked when they realize it hasn’t been solved yet. It’s one of the most myopic aspects of the open government movement. Fortunately we are now making significant progress, but we need all the support we can get: scraping more data, establishing standards, and convincing folks like the Secretaries of State in many US States that we need to publish all boundaries and basic government contact information as open data. If you’re starting a new open data program, please don’t forget about the basics!

DemocracyMap is a submission for the Knight News Challenge. You can read the full proposal and provide feedback on the Knight News Challenge page.