Journal
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Volume 43, Issue 1, Pages 187-203Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2927476
Keywords
Cross-modal; deep learning; cooking recipes; food images
Funding
- CSAIL-QCRI collaboration projects and the framework of projects - Spanish Ministerio de Economia y Competitividad [TEC2013-43935R, TEC2016-75976-R]
- European Regional Development Fund
Ask authors/readers for more resources
This paper introduces Recipe 1M+, a large-scale corpus of cooking recipes and food images, and demonstrates how training neural networks on this data can improve image-recipe retrieval tasks. Regularization through the addition of a high-level classification objective not only enhances retrieval performance but also enables semantic vector arithmetic.
In this paper, we introduce Recipe 1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipes 1M+ affords the ability to train high-capacity models on aligned, multimodal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipes 1M+ dataset and food and cooking in general. Code, data and models are publicly available.(1)
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available