期刊
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
卷 43, 期 1, 页码 187-203出版社
IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2927476
关键词
Cross-modal; deep learning; cooking recipes; food images
资金
- CSAIL-QCRI collaboration projects and the framework of projects - Spanish Ministerio de Economia y Competitividad [TEC2013-43935R, TEC2016-75976-R]
- European Regional Development Fund
This paper introduces Recipe 1M+, a large-scale corpus of cooking recipes and food images, and demonstrates how training neural networks on this data can improve image-recipe retrieval tasks. Regularization through the addition of a high-level classification objective not only enhances retrieval performance but also enables semantic vector arithmetic.
In this paper, we introduce Recipe 1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipes 1M+ affords the ability to train high-capacity models on aligned, multimodal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipes 1M+ dataset and food and cooking in general. Code, data and models are publicly available.(1)
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据