Among the variety of laws that disproportionately impact transgender people: Bathroom Laws: Over the past year, cities and states have debated, and in some cases passed, laws criminalizing transgender people for using the restroom that matches the gender they live every day. In the legislative session, at least 20 states proposed legislation restricting restroom access for transgender people.
People living with HIV, including transgender people, face a patchwork of outdated and reactionary laws that penalize behavior by people living with HIV, even if those behaviors carry no risk of transmission or unintentionally expose others to the virus. Criminalization of Sex Work: Faced with discrimination at school and work, high rates of homelessness, and limited access to meager safety net supports, some transgender people engage in sex work to earn income or trade for housing.
Because transgender people, particularly transgender women of color and undocumented transgender immigrants, may be disproportionately represented among individuals engaged in sex work, they are frequent targets of laws criminalizing prostitution and related offenses. Graphical Executive Summary Download. A Brief Overview of MAP Founded in , the Movement Advancement Project MAP is an independent, nonprofit think tank that provides rigorous research, insight and communications that help speed equality and opportunity for all.
To say that a model predicts inaccurately is to say that it is giving the wrong answer according to the data, either in a particular case or across many cases. Since accuracy is focused narrowly on how the tool performs on data reserved from the original data set, it does not address issues that might undermine the reasonableness of the dataset itself discussed in the section on validity.
UWM Lecture Sheds Light on the Criminal Justice System - Shepherd Express
Indeed, because accuracy is calculated with respect to an accepted baseline of correctness, accuracy fails to account for whether the data used to test or validate the model are uncertain or contested. Such issues are generally taken into account under an analysis of validity. Although accuracy is often the focus of toolmakers when evaluating the performance of their models, validity and bias are often the more relevant concerns in the context of using such tools in the criminal justice system.
A narrow focus on accuracy can blind decision-makers to important real-world considerations related to the use of prediction tools. That is, if risk assessments purport to measure how likely an individual is to fail to appear or to be the subject of a future arrest, then it should be the case that the scores produced in fact reflect the relevant likelihoods.
Unlike accuracy, validity takes into consideration the broader context around how the data was collected and what kind of inference is being drawn. A tool might not be valid because the data that was used to develop it does not properly reflect what is happening in the real world due to measurement error, sampling error, improper proxy variables, failure to calibrate probabilities, 8 or other issues. Separate from data and statistical challenges, a tool might also not be valid because the tool does not actually answer the correct question.
Because validation is always with respect to a particular context of use and a particular task to which a system is being put, validating a tool in one context says little about whether that tool is valid in another context. For example, a risk assessment might predict future arrests quite well when applied to individuals in a pretrial context, but quite poorly when applied to individuals post-conviction, or it might predict future arrest well in one jurisdiction, but not another.
Thus, different kinds of predictions e. Without such validation, even well-established methods can produce flawed predictions. In other words, just because a tool uses data collected from the real world does not automatically make its findings valid. In technical communities, making predictions about individuals from group-level data is known as the ecological fallacy. In the context of sentencing, defendants have a constitutional right to have their sentence determined based on what they did themselves instead of what others with similarities to them have done.
This concern arose in Wisconsin v. The ecological fallacy is especially problematic in the criminal justice system given the societal biases that are reflected in criminal justice data, as described in the sections on Requirements 1 and 2. It is thus likely that decisions made by risk assessment tools are driven in part by what protected class an individual may belong to, raising significant Equal Protection Clause concerns. While there is a statistical literature on how to deal with technical issues resulting from the ecological fallacy, the fundamental philosophical question of whether it is permissible to detain individuals based on data about others in their group remains.
As more courts grapple with whether to use risk assessment tools, this question should be at the forefront of debate and discussed as a first-order principle. The simplest meaning is that a prediction made by a model errs in a systematic direction—for instance, it predicts a value that is too low on average, or too high on average for the general population.
In the machine learning fairness literature, however, the term bias is used to refer to situations where the predicted probabilities are systematically either too high or too low for specific subpopulations. Bias in risk assessment tools can come from many sources. Requirement 2 discusses model bias that stems from omitted variable bias and proxy variables.
Requirement 3 discusses model bias that results from the use of composite scores that conflate multiple distinct predictions.
In combination with concerns about accuracy and validity, these challenges present significant concern for the use of risk assessment tools in criminal justice domains. Datasets pose profound and unresolved challenges to the validity of statistical risk assessments. In almost all cases, errors and bias in measurement and sampling prevent readily available criminal justice datasets from reflecting what they were intended to measure. Building valid risk assessment tools would require a a methodology to reweight and debias training data using second sources of truth, and b a way to tell whether that process was valid and successful.
To our knowledge, no risk assessment tools are presently built with such methods. Statistical validation of recidivism prediction in particular suffers from a fundamental problem: the ground truth of whether an individual committed a crime is generally unavailable, and can only be estimated via imperfect proxies such as crime reports or arrests.
Since the target for prediction having actually committed a crime is unavailable, it is tempting to change the goal of the tool to predicting arrest, rather than crime. One problem with using such imperfect proxies is that different demographic groups are stopped, searched, arrested, charged, and are wrongfully convicted at very different rates in the current US criminal justice system. Estimating such biases can be difficult, although in some cases may be possible by using secondary sources of data collected separately from law enforcement or government agencies.
Performing such reweighting would be a subtle statistical task that could easily be performed incorrectly, and so a second essential ingredient would be developing a method accepted by the machine learning and statistical research communities for determining whether data reweighting had produced valid results that accurately reflect the world. Beyond the difficulty in measuring certain outcomes, data is also needed to properly distinguish between different causes of the same outcome. For instance, just looking at an outcome of failure to appear in court obscures the fact that there are many different possible reasons for such an outcome.
Given that there are legitimate reasons for failing to appear for court that would not suggest that the individuals pose a danger to society e. Thus, if the goal of a risk assessment tool is to make predictions about whether or not a defendant will flee justice, data would need to be collected that distinguish between individuals that intentionally versus unintentionally fail to appear for court dates. There are two widely held misconceptions about bias in statistical prediction systems.
The first is that models will only reflect bias if the data they were trained with was itself inaccurate or incomplete. A second is that predictions can be made unbiased by avoiding the use of variables indicating race, gender, or other protected classes. It is perhaps counterintuitive, but in complex settings like criminal justice, virtually all statistical predictions will be biased even if the data was accurate, and even if variables such as race are excluded, unless specific steps are taken to measure and mitigate bias.
The reason is a problem known as omitted variable bias. Omitted variable bias occurs whenever a model is trained from data that does not include all of the relevant causal factors. Missing causes of the outcome variable that also cause the input variable of interest are known as confounding variables. Moreover, the included variables can be proxies for protected variables like race. Frequently driving to parties is a confounding variable because it causes both night-time driving and accident risk. A model trained on data about the times of day that drivers drive would exhibit bias against people who work night shifts, because it would conflate the risk of driving to parties with the risk of driving at night.
The diagram also indicates proxy variables at work: frequency of driving at night is a proxy, via driving to parties, for driving while inebriated. It is also a direct proxy for working night shifts. As a result, even though it is not appropriate to charge someone higher insurance premiums simply because they work night shifts, that is the result in this case due to the inclusion of the proxy variable of frequency of driving at night.
As such, it is difficult to separate the use of risk assessment instruments from the use of constitutionally-protected factors such as race to make predictions, and mitigations for this model-level bias are needed. There are numerous possible statistical methods that attempt to correct for bias in risk assessment tools. Although there is not a one-size-fits-all solution to addressing bias, below are some of the possible approaches that might be appropriate in the context of US risk assessment predictions: Given that some of these approaches are in tension with each other, it is not possible to simultaneously optimize for all of them.
Nonetheless, these approaches can highlight relevant fairness issues to consider in evaluating tools. For example, even though it is generally not possible to simultaneously satisfy calibration within groups and equal opportunity Methods 1 and 2 above with criminal justice data, it would be reasonable to avoid using tools that either have extremely disparate predictive parity across demographics i. Given that each of these approaches involves inherent trade-offs, 35 it is also reasonable to use a few different methods and compare the results between them.
This would yield a range of predictions that could better inform decision-making. Risk assessment tools must not produce composite scores that combine predictions of different outcomes for which different interventions are appropriate. In other words, the tool should predict the specific risk it is hoping to measure, and produce separate scores for each type of risk as opposed to a single risk score reflecting the risk of multiple outcomes.
Many existing pretrial risk assessment tools, however, do exactly this: they produce a single risk score that represents the risk of failure to appear or rearrest occurring. And regardless of the legal situation, a hybrid prediction is inappropriate on statistical grounds. Different causal mechanisms drive each of the phenomena that are combined in hybrid risk scores. Moreover, different types of intervention both as a policy and a legal matter are appropriate for each of these different phenomena. For example, the potential risk of failing to appear to a court date at a pretrial stage should have no bearing in a sentencing hearing.
Risk assessment tools must be clear about which of these many distinct predictions they are making, and steps should be taken to safeguard against conflating different predictions and using risk scores in inappropriate contexts. While risk assessment tools provide input and recommendations to decision-making processes, the ultimate decision-making authority still resides in the hands of humans. Judges, court clerks, pretrial services officers, probation officers, and prosecutors all may use risk assessment scores to guide their judgments. Thus, critical human-computer interface issues must also be addressed when considering the use of risk assessment tools.
One of the key challenges of statistical decision-making tools is the phenomenon of automation bias, where information presented by a machine is viewed as inherently trustworthy and above skepticism. Loomis 45 indirectly addressed the issue of automation bias by requiring that any Presentence Investigation Report containing a COMPAS risk assessment be accompanied by a written disclaimer that the scores may be inaccurate and have been shown to disparately classify offenders.
Over time, there is the risk that judges become inured to lengthy disclosure language repeated at the beginning of each report. Moreover, the disclosures do not make clear how, if at all, judges should interpret or understand the practical limits of risk assessments. While advocates have focused on the issues mentioned above of bias in risk prediction scores, one often overlooked aspect of fairness is the way risk scores are translated for human users.
Developers and jurisdictions deploying risk assessment tools must ensure that tools convey their predictions in a way that is straightforward to human users and illustrate how those predictions are made. This means ensuring that interfaces presented to judges, clerks, lawyers, and defendants are clear, easily understandable, and not misleading. Interpretability involves providing users with an understanding of the relationship between input features and output predictions.
Providing interpretations for predictions can help users understand how each variable contributes to the prediction, and how sensitive the model is to certain variables. This is crucial for ensuring that decision-makers are consistent in their understandings of how models work and the predictions they produce, and that the misinterpretation of scores by individual judges does not result in the disparate application of justice.
Because interpretability is a property of the tools as used by people, it requires consideration of the use of risk assessments in context and depends on how effectively they can be employed as tools by their human users. At the same time, developers of models should ensure that the intuitive interpretation is not at odds with intended risk prediction. For instance, judges or other users might intuitively guess that ordered categories are of similar size, represent absolute levels of risk rather than relative assessments, and cover the full spectrum of approximate risk levels.
However, this is not the case for many risk assessment tools. The probabilities of failure to appear and of rearrest for all risk levels were within the intuitive interval for the lowest risk level. An important component of any statistical prediction is the uncertainty underlying it. In order for users of risk assessment tools to appropriately and correctly interpret their results, it is vital that reports of their predictions include error bars, confidence intervals, or other similar indications of reliability.
For this reason, risk assessment tools should not be used unless they are able to provide good measures of the certainty of their own predictions, both in general and for specific individuals on which they are used.
- Unjust: How the Broken Criminal Justice System Fails LGBTQ Youth.
- Twelfth Night (Twelfth Night, Or what you will Book 1);
- Basque Legends With an Essay on the Basque Language;
There are many sources of uncertainty in recidivism predictions, and ideally disclosure of uncertainty in predictions should capture as many of these sources as possible. This includes the following:. User interfaces to satisfactorily display and convey uncertainty to users are in some respects also an open problem, so the training courses we suggest in Requirement 6 should specifically test and assist users in making judgments under simulations of this uncertainty.
Regardless of how risk assessment outputs are explained or presented, clerks and pretrial assessment services staff must be trained on how to properly code data about individuals into the system.
Human error and a lack of standardized best practices for data input could have serious implications for data quality and validity of risk prediction down the line. At the same time, judges, attorneys, and other relevant stakeholders must also receive rigorous training on how to interpret the risk assessments they receive.
These trainings should address the considerable limitations of the assessment, error rates, interpretation of scores, and how to challenge or appeal the risk classification. It should likely include basic training on how to understand confidence intervals. As risk assessment tools supplement judicial processes and represent the implementation of local policy decisions, jurisdictions must take responsibility for their governance.
Importantly, they must remain transparent to citizens and accountable to the policymaking process. The use of risk assessment tools has the potential to obscure—and remove from the public eye—fundamental policy decisions concerning criminal justice. These include choices about the point at which societal risk outweighs the considerable harm of detention to a defendant and their family, and how certain a risk must be before the criminal justice system is required to act on it i.
Use of these tools also includes choices about the nature and definition of protected categories and how they are used. In addition, important decisions must be made about how such tools interact with non-incarcerative measures aimed at rehabilitation, such as diversion measures or provision of social services. These are challenging policy questions that cannot and should not be answered by toolmakers alone, and will instead require active engagement from policymakers, judicial system leaders, and the general public. One key example of how seemingly technical decisions are actually policy decisions is the choice of thresholds for detention.
Risk thresholds like those mandated by S. Policymakers at both the state and federal level must decide which trade-offs to make to ensure just outcomes and lower the social costs of detention. For example, if a major goal is to reduce mass incarceration in the criminal justice system, thresholds should be set such that the number of individuals classified in higher risk categories is reduced.
In addition to gathering input from relevant stakeholders, threshold-setting bodies whether legislatures, panels, or other agencies should practice evidence-based policymaking informed by relevant and timely crime rates data, and plan to revisit and revise their decisions on an ongoing basis. Risk assessment tools embody important public policy decisions made by governments, and must be as open and transparent as any law, regulation, or rule of court. Thus, governments must not deploy any proprietary risk assessments that rely on claims of trade secrets to prevent transparency.
In particular, the training datasets, architectures, algorithms, and models of all tools under consideration for deployment must be made broadly available to all interested research communities—such as those from statistics, computer science, social science, public policy, law, and criminology, so that they are able to evaluate them before and after deployment. However, it is important to note that when such datasets are shared, appropriate de-identification techniques should be used to ensure that non-public personal information cannot be derived from the datasets.
Given the adversarial nature of the U. Individual-level information used in the assessments should be recorded in an audit trail that is made available to defendants, counsel, and judges. Through these processes, defendants can demonstrate how applicable and robust risk assessments are or are not with respect to their particular circumstances.
Jurisdictions must periodically publish an independent review, algorithmic impact assessment, or audit of all risk assessment tools they use to verify that the requirements listed in this report have been met. Such review processes must also be localized because the conditions of crime, law enforcement response, and culture among judges and clerks are all local phenomena. To ensure transparency and accountability, an independent outside body such as a review board must be responsible for overseeing the audit. This body should be comprised of legal, technical, and statistical experts, currently and formerly incarcerated individuals, public defenders, public prosecutors, judges, and civil rights organizations, among others.
These audits and their methodology must be open to public review and comment. To mitigate privacy risks, published versions of these audits should be redacted and sufficiently blinded to prevent de-anonymization. A current challenge to implementing these audits is a lack of data needed to assess the consequences of those tools already deployed.
Similarly, evaluating or correcting tools and training data for error and bias requires better data on discrimination at various points in the criminal justice system. In order to understand the impact of current risk assessment tools, whether in pretrial, sentencing, or probation, court systems should collect data on pretrial decisions and outcomes. To meet these responsibilities, whenever legislatures or judicial bodies decide to mandate or purchase risk assessment tools, those authorities should simultaneously ensure the collection of post-deployment data, provide the resources to do so adequately, and support open analysis and review of the tools in deployment.
Efforts to move the U. As a matter of historical and international comparison, the U. Thus, significant reforms to address that problem are justified and urgent based on the available data.
This context has driven the adoption of risk assessment tools, and it is crucial to note that nothing in this report should be read as calling for a slowing of criminal justice reform and efforts to mitigate mass incarceration. Rather, our aim is to help policymakers make informed decisions about the risk assessment tools currently in deployment and required under legislative mandates, and the potential policy responses they could pursue.
One approach is for jurisdictions to cease using the tools in decisions to detain individuals until they can be shown to have overcome the numerous validity, bias, transparency, procedural, and governance problems that currently beset them. This path need not slow the overall process of criminal justice reform. In fact, several advocacy groups have proposed alternative reforms that do not introduce the same concerns as risk assessment tools. Another option is to embark on the project of trying to improve risk assessment tools. That would necessitate procurement of sufficiently extensive and representative data, development and evaluation of reweighting methods, and ensuring that risk assessment tools are subject to open, independent research and scrutiny.
The ten requirements outlined in this report represent a minimum standard for developers and policymakers attempting to align their risk assessment tools—and how they are used in practice—with well-founded policy objectives. While the widespread use of risk assessments continues, administrative agencies and legislatures driving deployment have a responsibility to set standards for the tools they are propagating. In addition to the ten requirements we have outlined in this report, jurisdictions will also need to gather and incorporate significant expertise from the fields of machine learning, statistics, human-computer interaction, criminology, and law in order to perform this task.
At this stage, we should emphasize that we do not believe that any existing tools would meet properly set standards on all of these points, and in the case of Requirement 1, meeting an appropriately set standard would require major new data collection efforts. PAI believes standard setting in this space is essential work for policymakers because of the enormous momentum that state and federal legislation have placed behind risk assessment procurement and deployment.
But many of our members remain concerned that standards could be set with the aim of being easy to meet, rather than actually confronting the profound statistical and procedural problems inherent in using such tools to inform detention decisions. It would be tempting to set standards that gloss over complex accuracy, validity, and bias problems, and to continue deployment of tools without considering alternative reforms. For AI researchers, the task of foreseeing and mitigating unintended consequences and malicious uses has become one of the central problems of our field.
Doing so requires a very cautious approach to the design and engineering of systems, as well as careful consideration of the ways that they will potentially fail and the harms that may occur as a result. Criminal justice is a domain where it is imperative to exercise maximal caution and humility in the deployment of statistical tools. We are concerned that proponents of these tools have failed to adequately address the minimum requirements for responsible use prior to widespread deployment.
Going forward, we hope that this report sparks a deeper discussion about these concerns with the use of risk assessment tools and spurs collaboration between policymakers, researchers, and civil society groups to accomplish much needed standard-setting and reforms in this space.
UWM Lecture Sheds Light on the Criminal Justice System
The Partnership on AI would, where it is constructive, be available to provide advice and connections to the AI research community to facilitate such efforts. If you have any comments or questions, please feel free to Contact Us. Partnership on AI. Criminal Justice System Report.
Executive Summary This report documents the serious shortcomings of risk assessment tools in the U. Challenges in using these tools effectively fall broadly into three categories, each of which corresponds to a section of our report: Concerns about the validity, accuracy, and bias in the tools themselves; Issues with the interface between the tools and the humans who interact with them; and Questions of governance, transparency, and accountability. Introduction Context Risk assessment instruments are statistical models used to predict the probability of a particular future outcome.
Figure 1: Incarceration in the U. Methods to Mitigate Bias There are numerous possible statistical methods that attempt to correct for bias in risk assessment tools. PAI considers more expansive definitions, that include any automation of analysis and decision making by humans, to be most helpful.
- This site is not available in your region.
- Schwingen der Nacht (German Edition).
- Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System;
- Denied justice;
- The Process of Criminal Justice.
- Why we must reform our criminal justice system (Sen. Jim Webb) | TheHill.
In addition, a new federal law, the First Step Act of S. The bill allows the Attorney General to use currently-existing risk and needs assessment tools, as appropriate, in the development of this system. In addition, many of our civil society partners have taken a clear public stance to this effect, and some go further in suggesting that only individual-level decision-making will be adequate for this application regardless of the robustness and validity of risk assessment instruments.
That question must be asked specifically about the individual whose liberty is at stake — and it must be answered in the affirmative in order for detention to be constitutionally justifiable.