How Facebook put the AI ​​at the heart of its social network

Detection of inappropriate content, ranking newsfeed, facial recognition … The platform uses massive machine and deep learning.

Artificial intelligence (AI) is present on all levels of the social network Facebook . At the heart of newsfeed, it prioritizes content based on users’ interests, their consultation history, their social and relational graph. Similarly, it allows them to push advertisements to which they have a high probability of joining, from the click to the purchase. More difficult, the AI ​​is also exploited by the platform to detect unauthorized connections or false accounts. Finally, it intervenes to orchestrate other tasks, less visible, but remain key in the daily social network: customize the ranking of search results, identify friends in photos ( by facial recognition) to recommend tagging them, manage speech recognition and text translation to automate captioning videos in Facebook Live …

Target functionAlgorithm familyType of model
Facial Recognition, Content LabelingDeep learningNetwork of convolutional neurons
Detection of inappropriate content, unauthorized access, classificationMachine learningGradient Boosting of Decision Trees
Customization of newsfeed, search results, advertisementsDeep learningMultilayer Perceptron
Natural language comprehension, translation, speech recognitionDeep learningNetwork of recurrent neurons
Matching usersMachine learningSupport vector machine
Source: Facebook search publication

The social network makes extensive use of standard machine learning techniques. Statistical algorithms (classification, regression, etc.) adapted to create predictive models starting from encrypted data, for example in order to predict changes in activity. Facebook uses it to find inappropriate messages, comments and photos. “Artificial intelligence is an essential component for protecting users and filtering information on Facebook,” insists Yann Lecun, vice president and scientific director of the group’s AI. “A series of techniques are used to detect hateful, violent, pornographic or propaganda content in images or texts, and, on the contrary, to label the elements likely to

Multiple neural networks

Alongside traditional machine learning, deep learning is obviously implemented by Facebook. Based on the concept of artificial neural network, this technique applies to both numbers and audio or graphic content. The network is divided into layers, each responsible for interpreting the results from the previous layer. The IA thus refines by successive iterations. In text analysis, for example, the first layer will deal with the detection of letters, the second of words, the third of noun or verbal groups, and so on.

Unsurprisingly, Facebook applies deep learning to facial recognition, particularly via convolutional neural networks, a particularly efficient algorithm for image processing. Via the multilayer perceptron method, ideal for managing a ranking, the social network uses it to customize the newsfeed or the results of its search engine. Finally, deep learning is also powered by Facebook for machine translation.

To recognize embedded text in images, Facebook uses a triple layer of neural networks. © Capture / JDN

Around image processing in particular, Facebook has built a deep learning platform called Rosetta . It is based on a triple network of neurons (see screenshot above). The first, of a convolutive nature, maps the photos. The second detects the zone or zones containing characters. As for the latter, he tries to recognize the words, expressions or sentences present within the identified regions. A technique known as Faster R-CNN that Facebook has slightly adapted to his needs. Objective: better qualify the images posted to optimize indexing , whether in the newsfeed or the search engine .

A deployment pipeline

To set its different AIs to music, the American group has a homemade pipeline. Code name: FBLearner (for Facebook Learner). It starts with an app store. For internal teams of data scientists, it federates a catalog of reusable functionalities as well for the training phases as for the deployment of the algorithms. A workflow brick is then added to manage the training of the models and evaluate their results. Training processes can be performed on computational clusters (CPUs) as wellonly clusters of graphics accelerators (GPUs), again designed internally. Last stone of the building, an environment motorized the predictive matrices, once these trained, in situ at the heart of the applications of Facebook.

Workflow built by Facebook to develop and deploy its machine models and deep learning. © Capture / JDN

Side libraries of deep learning, Facebook has historically chosen to develop two. Each is now available in open source. Designed for its basic research needs, the first, PytTorch, is characterized by its great flexibility, its advanced debugging capabilities and especially its dynamic neural network architecture. “Its structure is not determined and fixed, it evolves as learning and training examples are presented,” says Yann Lecun. The downside: the Python execution engine under the hood makes PytTorch inefficient on production applications. Conversely, the second deep learning framework designed by Facebook, Caffe2, was precisely designed to be the product deployments. In this context,

More recently, Facebook has set up a tool called ONNX to semi-automate the formatting for Caffe2 of models originally created in PyTorch. “The next step will be to merge PyTorch and Caffe2 into a single framework that will be called PyTorch 1.0,” says Yann Lecun. “The goal is to benefit from the best of both worlds, and result in a flexible infrastructure for research and efficient compiling techniques to produce usable AI applications.”

Towards the design of a processor cut for the AI

Following the model of Google’s Tensorflow Processor Units (TPU) cut for its Tensorflow deep learning framework, Facebook is also planning to develop its own optimized chips for deep learning. “Our users upload 2 billion photos a day to Facebook, and within 2 seconds each is handled by four AI systems: one manages the filtering of inappropriate content, the second the labeling of images in order their integration with the newsfeed and the search engine, the third the facial recognition, finally the last generates a description of the images for the blind, all consumes gigantic computing and electrical resources.With the real-time translation of video that we are now looking to develop, the problem will intensify.