Introduction
The Open Data Policy is a U.S. policy governing the management, storage, and publication of data by government agencies. Although initially presented in an Executive Order by Barack Obama’s administration in the U.S., similar policies are being adopted internationally. It requires that data collected or created by such agencies “machine-readable and open formats, data standards, and common core and extensible metadata” (Executive Office of the President, 2013, p. 1). It also includes provisions for information stewardship and improving information accessibility, interoperability, and information safeguards (Executive Office of the President, 2013). Thus, the policy dictates which data can and cannot be stored, and which data should and should not be made available to the public. It specifically requires agencies to make public any information that can be made public under the relevant policies. Furthermore, Open Data imposes requirements on how such data should be stored and publicized.
The policy uses specific definitions for the objects and concepts it concerns. Government information is defined as “information created, collected, processed, disseminated, or disposed of, by or for the Federal Government” (Executive Office of the President, 2013, p. 4). Information life cycle refers to stages through which information passes, from creation or collection to disposition (Executive Office of the President, 2013). The term “open data” itself refers to data that is public, accessible, described fully, reusable, complete, timely, and managed post-release; in short, “fully discoverable and usable by end users” (Executive Office of the President, 2013, p. 5). Finally, personally identifiable information (PII) is any information “that can be used to identify or trace a person’s identity” (Executive Office of the President, 2013, p. 5). The mosaic effect is a situation where multiple pieces of information that do not constitute PII in isolation, do so when viewed together (Executive Office of the President, 2013). These definitions cover the critical subjects concerned in the policy.
The original documents describing the policy provide a general outline of the measures required for compliance. Memorandum M-13-13, the original policy document, describes four steps. First, ensuring the agency’s chief information officer (CIO) and other executives and officials have the responsibility and authority to implement the policies (Executive Office of the President, 2013). Second, the agency’s CIO should work with the U.S. Chief Technology Officer to “improve the interoperability and openness of government information” (Executive Office of the President, 2013, p. 12). Third, the funding necessary for the required changes, which include new tools and resources, should be considered in the context of future cost savings as a result of these changes (Executive Office of the President, 2013). Finally, the documents explain that the agency’s updates to IRM plans, completeness of its data inventory, and its public data listing will be used as accountability mechanisms (Executive Office of the President, 2013). To aid in this process, Project Open Data was created, containing “definitions, code, checklists, case studies, and more” (Executive Office of the President, 2013, p. 5). However, the specifics of implementing Open Data practices are ultimately left to each specific agency’s discretion.
Benefits of Open Data
Historically, policies disclosing government data, similar to Open Data, have been implemented. The Executive Order announcing the policy specifically refers to making weather data and Global Positioning System (GPS) data publicly available (The White House, 2013). This has allowed individuals and businesses to use this data to create applications that have, ultimately, benefited the public (The White House, 2013). By extending similar policies to more categories of data, such as healthcare and public safety, the Obama administration intended to produce a similar effect (The White House, 2013). In the years since the policy’s initial implementation, this has led to the creation of companies utilizing open data to provide new services, while also improving the government’s transparency, and producing profits for themselves.
Entrepreneurs can use open government data to analyze the market and make better business decisions. Census data is particularly valuable as it allows an analyst to observe demographic shifts that affect supply and demand dynamics in a given area or for a given product or service (Pratt, 2020). While some data can enhance a business’s ability to provide services, access to other data sources can create entirely new business opportunities (Manyika, et al., 2013). Thus, Open Data is beneficial to both entrepreneurs rendering services and the public that consumes them.
Access to extensive government data is similarly beneficial to researchers and policy-makers. For instance, the Data-Driven Criminal Justice project uses open data to further research on crime and drive innovation and improvements in the U.S. criminal justice system (TheGovLab, n. d. a). Similarly, the Smarter Crowdsourcing Coronavirus project has used open data to formulate a set of recommendations for world governments to combat the ongoing COVID-19 pandemic (Smarter Crowdsourcing Coronavirus, n. d.). These examples show that Open Data can be used for research that can ultimately drive technological and policy advancement.
Improved interoperability of government information is stated as one of the goals of Open Data policies. The Open Data policy requires that data is not only made public but kept in a fully usable format. This facilitates communication, interaction, and collaboration between different agencies as well as between agencies and the public. Therefore, the policy improves agencies’ overall performance in situations where several have to operate simultaneously by allowing data to be shared quicker and with less processing or interpreting overhead. This improvement has obvious use in situations such as healthcare and emergency services; U.K.’s National Health Service (NHS) is currently developing a framework for using open data to improve clinical outcomes (TheGovLab, n. d. b).
Security Issues
Although ideally, Open Data policies are nearly universally beneficial, their implementation, or making such large amounts of information publicly available carries certain drawbacks. As information is increasingly stored exclusively in digital formats, it becomes vulnerable to corruption or alteration, whether accidentally or maliciously, and potentially undetectably (Coggins & Holterhoff, 2011). The authenticity of a digital document is, therefore, more difficult to confirm than its physical equivalent (Coggins & Holterhoff, 2011). Cryptographic methods are widely employed to authenticate digital documents, and federal laws mandate a minimum level of authentication measures (Coggins & Holterhoff, 2011). Furthermore, maintaining a single entry point to open government data, data.gov, is another way of reducing risks to authenticity.
Open Data concerns data that do not contain PII. However, as a large proportion of government information can be personally identifiable, additional steps are required to anonymize it. Furthermore, due to the mosaic effect, individual non-identifying data can be PII when viewed in combination. This creates a contradiction with the central value of privacy and requires strong data protection legislation. The Federal Information Security Management Act (FISMA) is the relevant legislative act in the U.S., requiring federal agencies to implement information security programs (Gillis, n. d.). This contradiction, however, cannot be completely mitigated as long as agencies must store data that can become PII under some circumstances.
Issues can arise with open data integrity and availability. Centralized storage on data.gov means that the repository is vulnerable to denial-of-service attacks, and must be regularly audited for any unauthorized changes (Truong, et al., 2019). Truong et al. (2019) propose leveraging recent developments in blockchain and IPFS, a decentralized storage system, to solve this issue.
Best Practices
Complying with the Open Data policies can be a complicated and challenging task for an agency. The National Institute of Standards and Technology (NIST) worked with various stakeholders in the initiative to develop a voluntary framework called the NIST Cybersecurity Framework (National Institute of Standards and Technology [NIST], n.d.). This framework details voluntary guidance aimed at managing and reducing cybersecurity risks (NIST, n.d.). The measures in the framework include identifying critical processes and assets, as well as threats and vulnerabilities to create organization cybersecurity policies, and developing appropriate safeguards (NIST, n. d.). It also guides detecting and responding to cybersecurity events and recovering any assets lost as a result of these events (NIST, n. d.). This framework can serve as a base for the cybersecurity measures employed by any agency working with open data.
The framework facilitates the creation of organizational cybersecurity policies by providing a shared language and structuring the processes involved in it. Furthermore, it uses a 4-tier system to help estimate the policies’ effectiveness and sophistication (NIST, 2018). These tiers range from partial, representing nonexistent to limited understanding and awareness of cybersecurity risk management, to adaptive, a continuous improvement of an organization’s cybersecurity measures incorporating advanced technologies and practices (NIST, 2018). To guide an organization’s development of its cybersecurity policies, the framework proposes a 7-step process for ongoing improvements (NIST, 2018). It also stresses the importance of communication with stakeholders and provides a common language for this communication (NIST, 2018). These best practices and tools help ensure that issues related to confidentiality, integrity, availability, authenticity, and non-repudiation of open data are mitigated as well as possible.
NIST also developed a catalog of security and privacy controls aimed at facilitating compliance with federal laws and standards and risk management. This catalog, Publication 800-53, Security and privacy controls for information systems and organizations, describes specific steps, best practices, and organizational policies that organizations can implement to comply with Open Data policies (NIST, 2020). It contains an extensive list of controls relevant to every step of working with information at every phase of its life cycle, from hardware acquisition and personnel training to incident response and contingency planning (NIST, 2020). These controls act as both guidance to implementing appropriate cybersecurity measures and, since the list receives a regular update, keep those measures up to date.
Conclusion
Open Data policies detail which information created and acquired by government agencies should be made public and how it should be made public. These policies are part of a global initiative aimed at increasing the transparency of data collection and storage. They represent significant benefits for both the involved agencies and the general public by guiding policy-making, research, and creating new business opportunities. However, storing and publicizing large amounts of data naturally carries issues related to the data’s integrity, authenticity, privacy, confidentiality, and non-repudiation. In the U.S., both federal policy and research measures exist, aimed at mitigating these issues. Specifically, the NIST has drafted the Cybersecurity Framework and a set of controls to facilitate the creation of organizational cybersecurity policies that comply with Open Data policies and standards.
References
Coggins, T. L., & Holterhoff, S. G. (2011). Authenticating digital government information. In P. Gavin (ed.) Government information management.
Executive Office of the President. (2013). M-13-13: Memorandum for the heads of executive departments and agencies. Web.
Gillis, A. S. (n. d.) Federal information security management act (FISMA). TechTarget SearchSecurity. Web.
Manyika, J., Chui, M., Farrell, D.,, Van Kuiken, S,, Groves, P., & Doshi, E. A. (2013). Open data: Unlocking innovation and performance with liquid information. McKinsey Global Institute. Web.
National Institute of Standards and Technology [NIST]. (2018). Framework for improving Critical infrastructure cybersecurity. Web.
National Institute of Standards and Technology [NIST]. (2020). NIST special publication 800-53 revision 5: Security and privacy controls for information systems and organizations. Web.
National Institute of Standards and Technology [NIST]. (n. d.). Getting started with the NIST Cybersecurity Framework: A quick start guide. Web.
Pratt, M. K. (2020). Top 5 U.S. open data use cases from federal data sets. TechTarget SearchData Management. Web.
Smarter Crowdsourcing Coronavirus. Web.
The White House (2013). Executive Order – Making open and machine readable the new default for government information. Web.
TheGovLab. (n. d. a). Data-driven criminal justice. Web.
TheGovLab. (n. d. b). NHS – Open Data. Web.
Truong, D.-D., Nguyen-Van, T., Nguyen, Q.-B., Huy, N. H., Tran, T.-A., Le, N.-Q., & Nguyen-An, K. (2019). Blockchain-based open data: An approach for resolving data integrity and transparency. In Future data and security engineering (pp. 526–541). Springer International Publishing. Web.