Preparing for Machine Learning

Aniket Kumar

While machine learning and artificial intelligence (AI) are often equated, they are distinct. Both offer incredible promise in uncovering relations in data, but raise many ethical challenges as they are widely deployed.

Nick Bostrom in Superintelligence: Paths, Dangers, Strategies classifies different types of AI on a spectrum from low-level intelligence to superintelligence. AI, as compared to human intelligence, refers to a general capability of a computer to possess common sense, plan to meet complex information-processing challenges across domains, and have the ability to learn. Some advancements in the field of AI include Google’s AlphaGo system, which was created in their AI lab to master the ancient Chinese game of Go. AlphaGo defeated 18-time world champion Lee Sedol in the game.

On the other hand, machine learning is simpler because it is mainly used for uncovering relations in data. Machine learning takes two general forms—supervised learning and unsupervised learning. The goal, regardless of the technique employed, is to predict data classification. Under supervised learning, the machine is given training on how each piece of data is classified—that is, the categories are known before processing. It then constructs a model that will create rules to best account for the classifications using other characteristics of the data.

Imagine I give the machine a set of superheroes to classify as either “human” or “alien.” I want the machine to come up with rules based on other characteristics that will make sense of this classification of “human” or “alien.” The machine may hazard a few guesses. Guess #1: My criterion for selection is super strength. Guess #2: My criterion for selection is laser vision. It is attempting to create a path from existing characteristics of the superheroes to the classification of “human” or “‘alien.”

Under unsupervised learning, the machine is given a training set without classifications. It must create its own classifications for the data. For superheroes, the machine may place them in categories based on their costume colour, their superpowers, their intelligence, or their values.

While the applications to superheroes are endless, machine learning is being deployed to evaluate real people by human resources companies. Restless Bandit, a talent recruitment company, is in this business. According to an article by Fortune, Restless Bandit has used 102 million job descriptions and 30 million resumes to create its machine learning model. Clients of Restless Bandit are generally companies with more than 1,000 employees with at least 25,000 to 50,000 resumes in their applicant tracking systems. Restless Bandit’s machine learning software will sort through these resumes already on file and make recommendations for hiring.

Such algorithm-based screening can help rid the hiring process of human biases. However, such algorithms can also be biased. It depends on the data given to the machine for training.

For example, biases could occur if a company creates an algorithm using machine learning to capture a tech company’s hiring pattern. If the inputs for the machine include the gender of the applicant, the machine could determine that most successful candidates are men and thus use this criterion in the future evaluation of candidates. Therefore, the underrepresentation of women in the tech sector is used by the machine to discriminate against women.

Discrimination in algorithms could have other troubling policy implications, like The Economist notes in the expanding use of big data in financial services. Much like Restless Bandit, financial firms can use the swathes of data they have on their clients to spot patterns. These patterns may reveal more than we want them to know. The machine may find that people from a certain ethnic group are more likely to default on their loans, and thus become biased against them. Insurance companies already predict losses and premiums without machine learning—for example, that people living in certain neighbourhoods are more likely to be in car accidents.

Yet with machine learning, financial firms may start offering increased levels of differentiated services. People with backgrounds that put them at increased probability of a default may see higher interest rates on their debt. On the flip side, people with “better” backgrounds may see lower interest rates.

Canada will need to prepare for how such discrimination based on machine learning should be handled. There are economic arguments to be made here. Such “differentiation of services” is already done by insurance companies. If you have diabetes in the U.S., you’re going to pay more, despite your healthy diet. But with the ability of machine learning to spot much finer-grained patterns than ever before, this will not be “differentiation of services” as much as it will be discrimination. Is it fair that someone may pay more for the same loan because they come from a neighbourhood of defaulters? Can a person be so easily profiled through this new technology and be forced to pay for things out of their control? The Algorithmic Justice League, featured in Bloomberg Businessweek, cares about such issues inherent in machine learning, and so should we.

Aniket is a 2018 Master in Public Policy candidate at the University of Toronto’s School of Public Policy and Governance. He studied philosophy and English literature at UBC. His policy interests include education, urban, economic, and technology policy.