Nathan Cortez


The federal government currently publishes 196,284 searchable databases online, a number of which include information about private parties that is negative or unflattering in some way. Federal agencies increasingly publish adverse data not just to inform the public or promote transparency, but to pursue regulatory ends-to change the underlying behavior being reported. Such "regulation by database" has become a preferred method of regulation in recent years, despite scant attention from policymakers, courts, or scholars on its appropriate uses and safeguards.

This Article evaluates the aspirations and burdens of regulation by database. Based on case studies of six important data sets (published by the CFPB, CPSC, EPA, FEC, FDA, and Medicare), the Article proposes what I call "Good Government Data Practices" to ensure that databases are reliable, useful, and fair. More optimal data disclosures require careful design choices that consider both data inputs and outputs, including how to gather and process data, how to characterize them, and how to present them. The article envisions a decidedly modern role for government agencies as data "stewards" rather than as mere publishers or repositories.

Agency databases have proliferated on the belief that markets, regulation, and even democracy all require transparency, that sunlight is the best disinfectant. But as transparency has moved online-becoming more pervasive, more powerful, and more burdened with regulatory dimensions-we also must recognize that sunlight can blind or even burn. It is in this spirit that I call for policymakers to embrace the government's role as a data "steward," a sentinel that helps maximize the quality of data inputs and outputs via tailored procedures. The more reliable government data are, the more they can enlighten us and perhaps even deter unwanted behavior.