Longreads
May 11, 2021

Driving Forward Looking Backward: About Zhiguli, Neighbours, and Predictive Models

Photo by Roksolana Zasiadko on Unsplash

In February 2019 a 18-year-old guy from Stavropol, Russia swapped the steering wheel and seats in his Zhiguli. Someone uploaded the video about the car to YouTube.Thanks to social media, the video went viral and became a part of “How do you like this, @elonmusk?” challenge. Elon Musk tweeted haha, awesome” (in Russian!).

I recalled that story when I was listening to a lecture on algorithmic governance given by Alexander Saveliev, IBM Russia/CIS at 2020 Distant & Digital Conference. Alexander fairly pointed out certain risks arising from implementation of predictive models — models employing predictive analytics to forecast the future.

What does the remodelled Zhiguli have in common with predictive models? It illustrates them.

This story does not claim to be a fundamental research and seeks nothing more but to provide food for thought. To be reader-friendly, I will employ a rubber duck method — no math, no graphics, pure vanilla text.

Get Things Straight

To dive into the problem, I begin with some general observations.

Agree On Terms

Predictive models suggest “What will happen next?”. They leverage historical and current data to predict future or otherwise unknown events. Like begets like, so to say.

To function, the models —

  • need data any information to be processed to help task performance
  • perform data analysisa process of gathering, cleaning, analysing, and mining data, interpreting results, and reporting findings), and
  • employ algorithms an instruction or a finite set of instructions one follows to perform a specific task.

Predictive models are used everywhere: customer targeting, financial modelling, sales forecasting, market analysis, fraud detection, operation optimisation, litigation management, etc.

Also, the models vary significantly: linear models, cluster models, decision trees, support vector machines, neural networks, and many others.

Set Boundaries

Any product of human thought has fundamental limitations. Predictive models are not an exception —

  1. Predictive models do not say what will happen next. Instead, they forecast what might happen next. All predictions are probabilistic in nature, so relying on them unconditionally might be a dubious idea.
  2. Predictive models can look only in one direction — in the past. The models assume past patters will repeat. However, the models have little (if any) capacity to anticipate the things like scientific breakthroughs, new forms of knowledge, or shifts in cultural norms.
  3. The models — an abstract simplified representation of some process or event — have blind spots. To build the model, its developer chooses what to include. So, something might get left out, and someone might be underrepresented.
  4. Lack of objectivity — the thing predictive models and human beings have in common. The only way the models are impartial: they learn what humans teach them. Each model is a subjective opinion embedded in math and statistics. Each model reflects the mindset, judgement, and priorities of its developer.
  5. No one is smart enough, including predictive models. Donald Rumsfeld, US Secretary of Defense made an insightful observation: there are no knowns. Pulling all the available information together leaves us only with the known knowns and the known unknowns. But there are also unknown unknowns — the things we do not know we do not know. The models — as humans — do not know what they do not know.
  6. Predictive models can be gamed. Nothing is new under the moon: people can find a way to hack any system. Those who understand the model can try to manipulate or fool the model’s outcomes. However, as I will discuss below, understanding the model can be challenging.

Know Your Neighbour

Predictive models can get you interested in your neighbour’s rating. Not just for fun, but for a valid reason.

At Full Speed

There is a stance that predictive models develop opinion on your future performance based not on your past performance, but on the past performance of those who are considered to be similar to you, for example, your colleagues, schoolmates, social media subscribers, or neighbours.

Do predictive models — so to speak — drive your car forward while looking in the rearview mirror of your neighbour’s car?

Technically, yes. The Zhiguli guy almost killed it.

For predictive models to know someone means not to know someone, but to know what others say and do.

In practice, the model reveals features in the given dataset, gives the particular weighing to them, determines relevant features, establishes correlations between features, detects patterns within the dataset, induces a rule from those patterns, and applies that rule to a specific individual.

Don’t Fool Yourself

Predictive models heavily rely on statistical tools, but approach correlations differently.

In statistics, correlation does not imply causation. It means that an observed correlation between two variables does not enable to legitimately deduce a cause-and-effect relationship between them.

Contrary to statistics, predictive models appear to imply causation. This may lead to the outcomes that might not be obvious to a reasonable person or follow from a common sense or theoretic understanding of the subject matter.

No Escape

With predictive models, precedents flourish. Anything that does not fit precedents is an outlier and, thus, is subject to evaluation and, plausibly, elimination. Future behaviour deems to be invariably dependent on past patterns, and individuals are expected and encouraged to repeat the history. The result is that individuals are unable to escape the history.

This reminds of super stare decisis — the theory that courts must follow earlier court rulings without considering whether those decisions were correct or fit the case at stake. However, strict adherence to previous decisions can result in grave injustices, and the law has tools to eliminate such injustices.

Predictive models — unless revised — turn into a self-perpetuating system.

Datasets — unless reviewed — contribute to a toxic cycle of biases.

Human choice — unless given a place to exist — turns into illusion.

Brave New World

Predictive models are no longer only technologies helping to automate the process of analysing data, but a form of decision-making. This poses a number of challenges — control, agency, accountability. This also questions what area is left for pure human decision-making.

What about predictive models makes people surrender by waiving their power to make ultimate decisions?

Using the model as an intermediary between people might create a sense of neutrality and impartiality.

However, that sense might be false: datasets the models use are abstract, but are strongly linked to human culture. Plus model’s rationality might not fit our moral expectations.

Time will tell whether replacing self-critical judgments of reason by predictive models is appropriate.

False God

John F. Kennedy, 35th US President once said that life is unfair. Many cannot disagree.

I Have Nothing To Do About It

Nobody or only a few care how something works as long as it works “good”. Predictive models come into focus when someone adversely affected by the models claims the models are “bad” — biased.

One might say that the predictive models are unfair because they judge your behaviour by the factors outside your control, like credit scores of your neighbours and ratings of political activists whose posts you retweeted.

Do the predictive models have anything to do with fairness?

It depends on what you understand by fairness.

Things get more complicated when trying to answer —

  • what fairness means in the models’ context?
  • what’s the difference between technical and non-technical understanding of fairness?
  • whether and how fairness can be embodied into the models?

Play Fair

Fairness is a fragmented concept. Many disciplines have extensively researched fairness, but, as far as I can see, we are still far from reaching a common ground.

For what I know, fairness is about treatment, and it is multidimensional. Among dimensions of fairness are —

  • process fairness — the process by which the outcomes are achieved; and
  • outcome fairness — the outcomes achieved.

In Due Process We Believe

Do predictive models match general public’s notion of due process?

It’s true that the models formalise decision-making by use of algorithms. But this is not enough to constitute a due process.

The models have a number of characteristics that question the models’ ability to provide a due process. For example, the models can —

  • be based on inaccurate or otherwise invalid information
  • be affected by preconceptions
  • lack opportunities to modify or appellate decisions
  • fail to match or accommodate to values of the affected individuals
  • be complex to the extent challenging any attempt to trace the rationale and reasoning, and
  • be opaque due to restricted or denied access to the relevant information.

These characteristics contribute to the viewpoint that the ratings produced by the models are, by default, unregulated, arbitrary, and — eventually — unfair.

Unfreedom From Bias

Humans suffer from biases, so do predictive models.

How can biases emerge in the models? For instance, through —

  • datasets used to train and test the models
  • objectives of the models
  • human-assigned priorities for data analysis
  • false proxy attributes
  • algorithmic reinforcement of stereotypes and preferences
  • limitations of the programming languages
  • improper design and logic of the models, or
  • feedback loops.

It’s up to you to decide which side to take: the one that the data reflect the objective truth of the world through empirical observations, or the other that the data can inherit past discriminatory practices. Ultimately, this is about whether the technology eliminates human bias or camouflages it.

Do not throw the baby out with the bathwater: the models have the tremendous potential to discriminate humans, but turning the models blind to sensitive features (like age, race, or gender) might deteriorate their accuracy and, thus, usefulness.

Quality Is Never An Accident

Only those data that meets the quality criteria are accessed for analysis. But what is the data quality about?

Data quality is about eliminating missing, inconsistent, or incorrect data, fixing syntax errors, converting data types, and, examining outliers for accuracy.

Data quality is not about identifying and extracting data that represents or can potentially represent biases or serve as a ground for discriminatory patters.

To add to the point, data quality criteria are determined solely by the model’s developer, generally rest away from the public’s oversight, and appear to say nothing about fairness.

Other data criteria include security and privacy, but — again —appear to ignore biases.

One With The Law As A Majority

Predictive models create the rules, but appear to negate the common and civil law doctrine of fair process and fair outcomes.

To illustrate that statement, consider the following observations:

  1. The law leans strongly toward fairness. The models run to favour profits, not fairness.
  2. The law is designed to process the data that are (potentially) uncountable, (potentially) unmeasurable, interpretable, subjective, and situational, like fairness. The models feed on the data that can be measured and counted.
  3. The law is engineered to value fairness and has an appropriate infrastructure of legal principles, notions, actors, and procedures. The models lack any alike infrastructure that might be required to determine and maintain fairness in the sense of how the law defines fairness.
  4. The law presumes innocence. The model presumes inevitability of repetition of past patterns.
  5. The law traces fairness through the lenses of disparate impact (indirect discrimination), and disparate treatment (direct discrimination). The machine learning researchers define fairness in terms of statistical parity — it’s fair to say that statistical parity has little in common with what the law practitioners understand by justice.

In view of these observations, using the models’s outcomes to make legally compelling decisions can potentially pose certain challenges from the legal perspective. Disputes my arise regarding alleged violation of human rights, especially those that are linked to fairness — like the right to freedom from discrimination.

Give Us Tools

Along with variations of statistical parity, a number of tools are suggested by machine learning researchers to prevent undue discrimination.

Among them is a human oversight in the form of:

  • human-in-the-loop — a human actively oversees the model; and
  • human-on-the-loop — a human passively oversees the model.

It is claimed that human oversight enables speedy intervention when the model operates improperly.

However, that claim might be doubtful: in case of extremely complex models, humans may be able to remediate, but not to prevent the model’s shortcomings. Alas, no one is smart enough.

Moreover, human intervention does not guarantee better, more reliable results. Humans are discriminatory and biased, so as predictive models.

The good point of the loop tool — humans should exercise meaningful control over the models and should have the freedom and possibility of intervention at every moment. Using the models should not result in a zero sum game and a responsibility gap.

Another fair point of the loop tool — humans should bear ultimate responsibility and liability for development, deployment and application of the models. More responsibility of the models should not mean less responsibility of humans. More responsibility of the models should be balanced by more responsibility of their developers and operators.

Eyeglasses

Another tool to prevent undue discrimination is the increase of algorithmic transparency. This tool promotes interpretability and explainability of the models, but the very content of the tool remains questionable.

What is actually meant by algorithmic transparency?

  1. In terms of confidentiality, transparency might be the problem of not the the model, but right holders who deny third parties access to the model and the regulation that provides poor or no lawful tools to access the model.
  2. In terms of injustice, transparency might be the problem of not the the model, but developers who produce biased models, users who reinforce discrimination through application of biased models, and policymakers who address and mitigate the respective risks inappropriately.
  3. In terms of complexity, transparency might be the problem of not the the model, but humans whose expertise is insufficient.
  4. In terms of unreasonableness, transparency might be the problem of not the model, but, again, humans whose view of the real world may differ from the real world.

To explain the model you might need to explain the human decision-making process. However, isn’t it true that humans frequently have no idea why did they make certain decisions? Isn’t it true that the models can be as opaque as humans? Isn’t it true that the models are not more opaque than humans?

Another set of concerns arises from the scope of transparency:

  • what (datasets, algorithms, metrics, etc.) should be explained and made transparent?
  • is it possible to explain the model without revealing its proprietary components?
  • to whom (general public, public authorities, experts, etc.) should the model be interpretable and explainable?
  • given that humans have different levels of expertise, should the explanation be tailored to the respective audience and stakeholders?
  • how can we prevent turning transparency and explainability of the model into justification of the model?

Finally, under certain circumstances, the decision-making process can be irrelevant, but what comes into focus is the outcome of the decision-making process. Do we actually need to know how the model works in order to impose liability for the adverse effects caused by the model?

Look Under The Hood

In Alice in Wonderland, Lewis Carroll, little Alice decides to check first whether the bottle is marked “poison” before drinking from the bottle, despite that the bottle all very well says “Drink me”. Sometimes, we are not as wise as Alice, and prefer using things before carefully reading instructions for them.

A Kind Of Magic

One says predictive models are beyond the reach of the human mind. The models can exceed the understanding of those who use them, of those who oversee them, and even those who develop them.

Humans end up dealing with opaque and obscure things, or — even worse — somehow interconnected set of opaque and obscure things.

Is it true that a predictive model is a black box?

Depends on what you understand by understanding a predictive model.

Consider what needs to be understood. Hypothesis? Inner logic? Design? Likely outcomes ? Perhaps, as Agent Smith from Matrix said, we are asking the wrong questions.

U Means Unanswered

The black box argument articulates at least 3 dimensions of the models’ opaqueness:

  1. Organisational: those who create and use the models have no or minimum transparency commitments in terms of how the models function, and those who subject to the models’ application may long stay unaware of it.
  2. Technical: it is challenging — especially, for people lacking necessary hard skills — to understand why the models deliver the respective results, do the models work against certain interests, and whether the models have a capacity to scale, i.e. grow exponentially.
  3. Legal: copyright, commercial secrecy and other legal protection tools may prevent access to the sources of knowledge that are necessary to review the models — hardware, software, and documentation.

The black box argument is catchy and insightful, but leaves its supporters weaponless in the face of predictive models, their developers, owners and users. It is easier to say that something is unknowable than to dive into the problem and seek for solutions.

Secret Of Secrets

What actually constitutes a secret? Many things can — e.g., code, algorithms, hypothesis, knowledge and expertise of developers’ team members, datasets.

Is it possible that the very same model produces different outcomes upon using different dataset or upon being exploited by different team members? Sure.

In addition, predictive models might challenge the very concept of commercial secrecy. Consider how should the following issues be addressed:

  • does the model that is unknown to its developer due to technological reasons meet the secrecy criteria according to which the secret holder should undertake reasonable efforts to maintain the secrecy?
  • is the model that is inherently unknowable protected by the secrecy regulations by default?
  • does the right to gain benefits from proprietary algorithms overweight the right to be free from discrimination?

Beyond The Veil

It might appear counterintuitive, but studying lines of code shed little, if any, light on what predictive models do. Instead of the model’s code, other things — like pattern-discovery mechanism, datasets, features, and hypothesises — can determine the model’s outcomes.

Take a note that the data analysis process includes, but is not limited toanalysing and mining data. (Analysing and mining data essentially means extracting, analysing, and manipulating data to understand trends, identify correlations, and find patterns.)

Before analysing and mining data, the developer should —

  • understand the problem (where we are) and desired result (where we want to be)
  • set a metric (what and how will be measured)
  • gather data (determine data, data sources, and processing tools), and
  • clean data (fix quality issues).

After analysing and mining data, the developer should —

  • interpret the model’s results
  • evaluate defendability of analysis and circumstances under which analysis may not hold true, and
  • communicate and present findings to stakeholders. primarily business guys.

The above shows that the field to explore the models is vast.

Search For Clues

In contrast with the black box argument, the reality does not leave us weaponless.

Is it possible to scrutinisingly review the model even without access to its proprietary components (like codes and algorithms)?

Yes; if you have no lawful access to the model’s proprietary components, do not give up and search for side roads.

In the end, these are not codes and algorithms that bother the general public, but —

  • the data used by the model
  • the outcomes generated by the model, and
  • the way those outcomes are used, especially in course of making legally compelling decisions.

So, in the absence of access to the propriatery componente, it might be fruitful to shift a focus from the model’s inner life to the model’s external life.

In World War Z, American horror movie, the world was overrun by zombies, and no one knew what was the origin of the deadly virus and who was a zero patient. Former UN field agent Gerry Lane run around the world in search for a clue. Fortunately for the mankind, Gerry found that clue. The clue was in the virus’s weakness — something that was always in sight, but remained unnoticed. Everybody looked, but did not see, except Gerry.

Predictive models are a product of not a superior out-of-space intelligence, but of human beings, brilliant, but tremendously imperfect products of the mother nature. To understand predictive models, search for and study their “weaknesses” — humans who develop, exploit, and use models.

What to look at? Look at the issues that can be explored without recourse to the proprietary and other protected information.

Explore the story:

  • what are the developer’s objectives?
  • what (business or research) problem does the developer seek to solve by making the model?
  • what the developer expects the model to predict?
  • what are the data requirements?

Explore the data:

  • what kind of data is used?
  • what are the data requirements?
  • where data comes from?
  • how data is analysed?

Explore the application:

  • to what questions does the model answer?
  • how, where, and when is the model deployed?
  • what are the key takeaways for the general public from the model?
  • what role does the model play in the decision-making process?
  • what decisions are made by, or by use of, or based on, the model?
  • whether the model is under the human oversight?

(As regards the proprietary components, I will explore their issues in a separate article.)

Brotherhood

Do predictive models need a big brother — tailored legal rules and authorised watchdogs?

The debate is open. The issues that are currently discussed include:

  • given that predictive models are self-reinforcing systems, who regulates whom — humans the models or models the humans?
  • is it necessary to regulate the models at all?
  • in what contexts and to what extent is it necessary to regulate the models?
  • what are the appropriate regulatory tools — statutory regulations, or soft law, or both?

Despite the ongoing debate, the mainstream is the establishment of regulatory framework and oversight.

Why so? The reason might be that people like the rules. Knowing what are the rules promises security, stability, and predictiveness of outcomes.

Another reason might be that predictive models are too important not to be regulated. The models expose people to a risk, and that risk requires proper public control and mitigation.

What constitutes public control? At least regulation, inspection, and compliance. The regulation is expected to be uniform. The inspection is expected to be supplemented with sensitive fees. The compliance is expected to be moderate.

However, the modern patchwork of regulatory regimes specifically targeting predictive models is inconsistent, incomplete and contradictory. It is unclear what some of those regimes do better — cure or harm. It is unclear what is the ultimate goal of those regimes — promote technologies or hinder innovation. In addition, forum-shopping emerges thanks to jurisdictions proposing flexible, convenient, or no regulation.

Newborn

Once emerge, predictive models — as well as any other innovation — are not excluded from the general regulatory framework.

Almost every legal system has —

  • the rules of general nature — like human rights
  • the rules from which general principles can be deducted — like anti-discriminatory laws, — and
  • the rules that apply regardless of underlying technologies — like antitrust regulations.

Those rules potentially apply to any innovation, from which predictive models are not an exception.

The question is: are those general regulations capable of applying to predictive models?

Good Vs Bad

The general regulations can be either ordinary or good — in terms of adaptability to innovations.

The ordinary regulation is primarily based on the current knowledge of a technology and on past experiences with that technology. The good regulation meets both of those criteria, but adapts — automatically or with a help from legislatives or judges — to a new technology encapsulating it into the existing set of rules.

When a new technology emerges, the ordinary regulation either lags behind the innovation or — according to the precautionary principle or other concerns— has to forbid the innovation. On the contrary, when a new technology emerges, the good regulation auto-fills the legal landscape for the innovation.

What constitutes the good regulation? In my opinion, the regulation should —

  • convey common sense and good argument
  • be technologically neutral
  • be precise and clear
  • be reasonably flexible and open to intelligent interpretation, and
  • be supported with consistent law making, application and enforcement.

Technological neutrality means that the rules —

  • cover and uniformly apply to the use of automated decision-making systems, whether based on common statistical tools or advance machine learning algorithms, and
  • enable practitioners to tailor high-level rules to new technologies, even those which appearance have not been anticipated at the time of drafting the rules.

My preferences belong to adaptable and technologically neutral general regulations. They meet the challenge of whatever future might bring, and at the same time bring familiarity and security in the fast-moving life pace. I believe up-to-date rules is not about when the rules were adopted, but how the rules operate. I advocate for future-proof rules.

However, one size does not fit all. Some decide to choose another path — implement new regulations specifically targeting predictive models.

Vision Is Everything

When it comes to setting the new rules dedicated to predictive models, one needs first to decide what should the rules stimulate the models’ developers, owners and other stakeholders to do and undo.

For example, the rules can stimulate —

  • to develop the regularly updated models by use of fairly represented data
  • to make the models’ objectives, assumptions, and conclusions legally accessible without undue burden
  • to oversee the models in real time, and/or
  • to ensure the intervention by humans or human-made programs whenever necessary to prevent the models from doing harm.

However, it appears that law makers and practitioners sometimes fail to complete that task: they frequently miss, incorrectly mix, or misspell the ultimate goals and instruments of the new regulations.

Be A Teen

As James Bond wisely noted, youth is no guarantee of innovation. I could not agree more when it comes to the new regulations about predictive models.

New does not mean perfect. New does not mean better. New does not mean you are saved from falling behind the realities. In fact, new regulations regulate until after the fact.

The new regulations frequently suffer from imperfection and incompleteness — due to political preferences, technological ignorance, lack of consent, whatever.

As of today, predictive models are not generally forbidden, and certain of their components are specifically regulated like data processing.

Regulating data processing addresses privacy concerns, but appears to be a partial and, thus, imperfect solution. The new regulations usually stay blind to other model-related issues like liability of operators, professional qualifications of developers, models quality certifications, and the user’s right to redress.

Take note of what bothers the general public: the fact that the data about them is extracted and analysed somehow or the fact that the data analysis provides certain outcomes ultimately influencing their life?

Those who write and apply the new regulations should consider that —

  • the regulation should not create an excessive burden
  • the regulatory purposes should be transparent
  • the regulatory scope should be clearly outlined
  • the standards should be achievable with a medium level of resources
  • the target audience should be explicitly determined, and
  • the distribution of rights, obligations, and responsibilities among stakeholders should be well-grounded.

A Wolf In The Sheep’s Clothing

Whatever the regulation is — existing or new, general or specific, — the regulation should not forbid or hinder innovation.

A healthy compromise should be sought and established among all the stakeholders of predictive models.

More sensible and proportionate approach should be taken to be able to adopt the rules to any future innovative technologies.

Along with the rules, the practice is a key. The rules do no good if their application is dubious, incompetent, and inconsistent. Watchdogs are useless if they see no sense in the documentation submitted to them by the models’ developers. Legal practitioners are useless if they are not able to differentiate predictive models from descriptive models.

To create a predictable regulatory environment, it is necessary —

  • to avoid and remediate the arbitrariness in the rules’ application, and
  • to increase the awareness, knowledge, and proficiency in fundamental sciences and technologies (math, statistics, logic, algorithmic thinking, programming, machine learning, etc.) among the rules practitioners.

Reaction

Each regulation is based on a number of facts. However, the technology risk is increasingly beyond human understanding, and makes fact establishment hardly possible.

Facts about new technologies —

  • may be difficult to empirically establish
  • may be highly contested, even among experts in the field
  • may be hindered due to the lack of adequate sample or other reliable data on the effects of such technologies
  • may be unnoticed due to the lack the information, experience, or imagination to predict negative possibilities, and
  • may be distorted by the concerns of enhanced interests about new (and commercially threatening) technologies.

Consequently, any state regulation of disruptive technologies — like predictive models — is increasingly going to be reactive and based on an uncertain and politicized factual basis.

This recalls the state of affairs some authors name pacing problem, or regulatory disconnection, or challenge of regulatory connection: technology develops faster then the corresponding regulation, the latter hopelessly falling behind.

In the absence of adequate and timely governmental regulatory models, private actors — those who develop and deploy new technologies — take the lead and create the initial regulatory framework — by coding, writing algorithms, and adding features. It means rules come predominantly from the private sector, rather than the public sector. The private sector acts, the public sector reacts.

The End Is The Beginning

I favour technological progress and advancement. (It might not clearly follow from the above, so I explicitly state is here.)

Predictive models are a dual-use technology that is capable to deliver benefits and cause harm. However, that fact does not constitute a valid reason for abolishing development and use of predictive models. That fact does call for discussion.


Originally published on https://dearall.medium.com/driving-forward-looking-backward-reflections-on-predictive-models-6922a8fb05d9.


Disclaimer: This is my personal blog. This is neither a legal opinion nor a piece of legal advice. The opinions I express in this blog are mine, and do not reflect opinions of any third party, including employers. My blog is not an investment advice. I do not intend to malign or discriminate anyone. I reserve the right to rethink and amend the blog at any time, for any or no reason, without notice.