Could ‘expiration dates’ for AI systems help prevent bias?

Enterprise

Join today’s leading executives online at the Data Summit on March 9th. Register here.


Today’s AI technology, much like humans, learns from examples. AI systems are developed on datasets containing text, images, audio, and other information that serve as a ground truth. By figuring out the relationships between these examples, AI systems gradually “learn” to make predictions, like which word is likely to come next in a sentence or whether objects in a picture are inanimate.

The technique holds up remarkably well in the language domain, for example, where systems like OpenAI’s GPT-3 can write content from essays to advertisements in human-like ways. But similar in character to humans, AI that isn’t supplied fresh, new data eventually grows stale in its predictions — a phenomenon known as model drift.

The problem of drift

Model drift is at best inconvenient. Asking GPT-3, which was trained on data from two years ago, a question about 2022 Oscars won’t yield anything useful. But oftentimes, drift is a business liability. At Zillow, a misfiring algorithm led the company to overestimate the value of the houses it purchased in late 2021 by upwards $500 million.

The humanitarian impact of “stale” models is graver still, with studies finding that models engineered to predict hospital mortality can quickly drift off course due to changing patient populations. Economists have shown, meanwhile, that the algorithms used to determine credit scores tend to be less precise for underrepresented minorities and low-income groups, owing to the sparser — and less frequent — data in those groups’ credit histories.

A hypothetical solution to the problem of drift is continual learning, an approach to AI development where systems frequently retrain on new data. But continual learning is an unsolved science. Developers alternatively can — and often do — periodically refresh models. However, this, too, is simpler said than done because not all systems are susceptible to drift in the same fashion. For instance, an AI that recommends content to users (e.g., videos) or detects fraud likely needs to adapt more quickly than a system tasked with distinguishing cats from dogs.

“The question is: How quickly can we adapt and retrain machine learning models? Not only do the models need to be rebuilt or redesigned based on new data, but they also need the right processes to be put into production at a pace that keeps up,” Dan Simion, VP of AI and analytics for Capgemini North America, told VentureBeat in a previous interview.

A new proposal — inspired by food labels — is giving models an “expiration date,” or a point at which they’ll “expire” and cease to make drift-influenced predictions. AI models with an expiration date could automatically notify a developer that they need to be retrained or trigger the retraining automatically, kicking off a process to determine the causes of the model’s staleness.

Expiration dates for models

“Essentially, [an] expiration date might act as an additional failsafe to ensure the model and data are not kept around for use after they should be deleted,” Abishek Kumar, an AI research scientist at Google, told VentureBeat via email. “Data expiration is a big issue, especially in reinforcement learning-based AI models, where new data gets ‘sucked in’ regularly. In models trained only once, data and model expiration are one and the same.”

But while expiration mechanisms make sense in theory, figuring out how to implement them is another matter. The creator of a model would first have to decide on a deadline for the model to expire. Then, if they made the model public, they’d have to determine how to prevent users from disabling the mechanism.

“Model expiration dates raise a host of other questions, like: Why does the creator of the model decide? What factors do they consider? How do they standardize the process of deciding when to set the deadline?,” Andre Fu, an undergraduate deep learning researcher at the University of Toronto, told VentureBeat in a recent interview. “At the platform level, the implementation is pretty complex — when a model gets loaded, it’s effectively just zeros and ones, so how do we prevent the ‘expiration mechanism’ from getting stripped? When sending models between friends or for internal use, we traditionally don’t package it and wrap it in an API for execution, so if those types of models got leaked, then it wouldn’t have those expiration methods we want.”

These dilemmas have already reared their ugly heads in the dataset realm, where some actors continue to use outdated, biased, or otherwise problematic data to train AI even after the data has been disavowed. A Princeton analysis found that retractions and disclaimers did little to prevent companies from using flawed datasets to train their systems. The analysis also spotlighted how “derivatives” and offshoots of datasets have been used to circumvent usage restrictions.

Fu thinks that blockchain technology might be a viable way to implement an AI model expiration mechanism, albeit not without a fair amount of development legwork. He envisions creating a nonfungible token (NFT) of a model — essentially, a model with a unique code — and distributing access to the model via an API to prevent reverse engineering.

“The transparent nature of blockchain would inhibit some uses, so using a transaction-private network … would enable the privacy preserving features better. Furthermore … since this area is nascent, the tools to actually create this idea have yet to be developed, but it is theoretically feasible,” Fu said. “I do think that having an expiration for a model is an interesting idea. [B]ut it’d require a lot of front-loaded cost and introduce friction for developers that would translate to poor adoption and eventually relegating the mechanism to the backroom of engineering.”

Kumar also believes that blockchain technology could be used to enforce expiration dates at the platform level, and that at the model level, expiration dates could be implemented by training a model to output specific values after a number of uses. Likely, a combination of techniques will be required, he says — particularly in cases where a system uses multiple models to arrive at its predictions.

“[S]olutions such as ‘AI forgetting’ (an emerging research area) allow an AI model to unlearn knowledge from certain training data while retaining knowledge from the remaining training data, thus potentially allowing the expiration of certain data without full retraining (which is costly),” Kumar said. “Already, [laws like the European Union’s] GDPR gives us the right to retract our data from services. The AI model expiration mechanism would take this one step further in providing more control [to] how our data and data-derived models are used. “

Sainbayar Sukhbaatar, a research scientist at Meta (formerly Facebook), explored a data-focused “expiration” approach with a system that “deletes” stale data in an AI architecture called the Transformer. Dating back to 2017, the Transformer has become the architecture of choice for language tasks, where it’s demonstrated an aptitude for summarizing documents, translating between languages, and analyzing DNA sequences. Sukhbaatar’s work showed that Transformers could be architected to “forget” incoming pieces of information by predicting how long they should stay in “memory,” similar to how people tend to retain only the most salient memories.

“[T]here are potential roles for expiration in both [the model and dataset] contexts, and they have different motivations,” Sukhbaatar told VentureBeat via email. “Models can have expiration so they can scale to process more information and retain only critical knowledge, while datasets should have expiration to stay up to date and relevant. Of course, when data is replaced, it’s common practice to train a new model from scratch to reflect that change, and … model retraining can play an important role in making sure models are trained on the most updated and complete datasets.”

Bhuwan Dhingra, an assistant professor of computer science at Duke University, has also investigated “expiring” models from a dataset perspective. He and coauthors found that language models can be “calibrated” to be “refreshed” as new data arrives without the need for retraining from scratch.

“Current models do not have any notion of time, since they are trained on static datasets where all data is treated as homogeneous. For such models, the ‘expiration’ mechanism will likely need to be at the API level,” Dhingra said. “However, a more interesting notion of expiration would be one where the model calibration (i.e., its confidence when making predictions) itself indicates when expiration is due. If we can build such calibrated models, then the expiration date could be informed by the model itself.”

Drawbacks and limitations

Many fundamental technical issues stand in the way of “self-expiring” AI models. For example, self-expiring datasets might cause an AI model to “forget” certain key pieces of information, a phenomenon known as catastrophic forgetting. For this reason, even if expiration mechanisms come to fruition, Kumar emphasizes that biases in data should be acknowledged and that AI shouldn’t replace humans in “tasks that require more thorough analysis.”

“We are seeing the platform economy become more and more AI-driven with increased automation. Thus, it is very important that we are able to understand, explain, and validate those processes that relate to our data,” Kumar said. “To prevent potential harmful outcomes in future, instead of treating ‘fairness’ as a metric to be evaluated at a certain phase of the AI model lifecycle, it should be treated as an iterative process which acknowledges that ‘fairness’ is never fully solved. [The] cost and benefits of the AI model for different subgroups and tradeoffs should be explicitly predetermined, and then, an explicit choice should be made to situate the mode in the space of tradeoffs considering cost and benefits. If the model drifts from the planned trade-off space, it could be considered ‘expired.’”

Dhingra points out that, while training models on data from the past can be problematic in terms of bias, more data typically leads to better models. Instead of expiring datasets, he suggests looking more closely into coming up with “fairness measures” for both old and new data.

“The main consideration, as I mentioned above, is to ensure that models ‘know when they don’t know.’ Our work suggests there is some promise is using historical data to improve model calibration in the future,” Dhingra said. “If successful, this would impact any domain where distribution shift is a concern, i.e., where the input data changes with time.”

Fu, too, asserts that energy would be better spent on promoting greater awareness of data, models, and models’ value-adds and detractors. “[Engineers are] highly tuned into state-of-the-art models and what the next step in the technological evolution will be,” he said. “[A]dding friction for the sake of an idealistic goal won’t correlate to real-world results. I’d propose that a communal effort around chastising the actors — fellow engineers — that use stale models, stale data, and more in an effort to curb these potentially malicious actions.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More