Microsoft Facial Recognition Data Base

From Privacy Wiki
Jump to navigation Jump to search
Microsoft Facial Recognition Data Base
Short Title Microsoft Facial Recognition DataBase Contains 10 Million Photos
Location Global
Date 2016

Solove Harm Aggregation, Identification, Secondary Use, Increased Accessibility
Information Identifying, Physical Characteristics
Threat Actors IBM Corp., Panasonic Corp., Alibaba Group Holding Ltd., Nvidia Corp., Hitachi Ltd., SenseTime, Megvii, US government, China's People's Liberation Army, Microsoft

Affected Potentially anybody, who has their pictures online
High Risk Groups
Tangible Harms

In 2016 Microsoft published a large publicly available facial recognition database, that included 10 million photos of nearly 100 thousand of people. The database was used by many commercial organizations as well as state agencies.


The Microsoft facial recognition database, known as MS Celeb, was published in 2016 and described by the company as the largest publicly available facial recognition data set in the world. It contains more than 10 million images of nearly 100,000 individuals combined from various sources all over the internet.

Facial recognition technology is an example of Identification since the algorithm is trained to identify people on photos.

There is also a problem with Aggregation since for the identification process the technology needs to combine a picture of an individual with the photos from the database along with links to where those photos appeared. This not only leads to the identification of the individual but can reveal other personal information about them, including contact, location, etc.

Microsoft’s MS Celeb data set has been used by several commercial organizations, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, SenseTime, and Megvii. It was also used by state agencies that included the United States and the Chinese military.

The people whose photos were used were not asked for their consent, their images were scraped off the internet from search engines and videos under the terms of the Creative Commons license that allows academic reuse of photos. The use of the photos by commercial organizations and state agencies, in this case, is an example of Secondary Use.

This dataset is reported to be the largest publicly available facial recognition database. Such public availability of the data can be seen as Increased Accessibility.

Although the database has been deleted by Microsoft, it is still available to researchers and companies that had previously downloaded it. The media reported it is still being shared on open source websites.

Laws and Regulations