Python实现基于ClipCap的看图说话Image Caption模型.zip

上传者: sheziqiong | 上传时间: 2024-05-27 21:05:53 | 文件大小: 5.62MB | 文件类型: ZIP
资源包含文件:设计报告word+源码及数据 Image Caption即我们常说的看图说话:给定一张图片,生成该图片对应的自然语言描述。 该任务涉及到了图像与自然语言两个模态,然而图像空间与自然语言空间本就十分庞大,并且两者之间存在巨大的语义鸿沟。 如何将两个庞大的语义空间进行对齐,这是该任务的重点。本项目对ClipCap: CLIP Prefix for Image Captioning 论文进行介绍,并且对论文在Flickr30k中文数据集上进行实验复现和效果展示。 详细介绍参考:https://biyezuopin.blog.csdn.net/article/details/125617468

文件下载

资源详情

[{"title":"( 54 个子文件 5.62MB ) Python实现基于ClipCap的看图说话Image Caption模型.zip","children":[{"title":"设计报告.docx <span style='color:#111;'> 1.26MB </span>","children":null,"spread":false},{"title":"models","children":[{"title":"model.py <span style='color:#111;'> 5.06KB </span>","children":null,"spread":false}],"spread":true},{"title":"images","children":[{"title":"mlp.jpg <span style='color:#111;'> 71.81KB </span>","children":null,"spread":false},{"title":"transformer.jpg <span style='color:#111;'> 98.32KB </span>","children":null,"spread":false},{"title":"loss.jpg <span style='color:#111;'> 222.00KB </span>","children":null,"spread":false},{"title":"caption_distribution.jpg <span style='color:#111;'> 28.25KB </span>","children":null,"spread":false},{"title":"overview.jpg <span style='color:#111;'> 147.50KB </span>","children":null,"spread":false}],"spread":true},{"title":"statistics.py <span style='color:#111;'> 1.12KB </span>","children":null,"spread":false},{"title":"pretrain_models","children":[{"title":"gpt2","children":[{"title":"vocab.txt <span style='color:#111;'> 106.97KB </span>","children":null,"spread":false},{"title":"config.json <span style='color:#111;'> 577B </span>","children":null,"spread":false}],"spread":true},{"title":"bert","children":[{"title":"vocab.txt <span style='color:#111;'> 106.97KB </span>","children":null,"spread":false},{"title":"tokenizer.json <span style='color:#111;'> 262.64KB </span>","children":null,"spread":false},{"title":"config.json <span style='color:#111;'> 622B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"train.py <span style='color:#111;'> 6.54KB </span>","children":null,"spread":false},{"title":"predict.py <span style='color:#111;'> 7.69KB </span>","children":null,"spread":false},{"title":"dataset.py <span style='color:#111;'> 3.80KB </span>","children":null,"spread":false},{"title":"output","children":[{"title":"test","children":[{"title":"caption_generate_finetune.txt <span style='color:#111;'> 9.08KB </span>","children":null,"spread":false},{"title":"caption_generate_no_finetune.txt <span style='color:#111;'> 9.11KB </span>","children":null,"spread":false}],"spread":true},{"title":"mlp_finetune_gpt2","children":[{"title":"events.out.tfevents.1647762890.982e5fd2-217a-4002-9edc-eb66e13cb88d.79186.0 <span style='color:#111;'> 13.76KB </span>","children":null,"spread":false}],"spread":true},{"title":"dev","children":[{"title":"caption_generate_finetune.txt <span style='color:#111;'> 2.99KB </span>","children":null,"spread":false},{"title":"caption_generate_no_finetune.txt <span style='color:#111;'> 3.02KB </span>","children":null,"spread":false}],"spread":true},{"title":"bert_no_finetune_gpt2","children":[{"title":"events.out.tfevents.1647763019.982e5fd2-217a-4002-9edc-eb66e13cb88d.79422.0 <span style='color:#111;'> 14.27KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"LICENSE <span style='color:#111;'> 1.05KB </span>","children":null,"spread":false},{"title":"process_caption.py <span style='color:#111;'> 1019B </span>","children":null,"spread":false},{"title":"process_flickr.py <span style='color:#111;'> 1.80KB </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 117B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 15.95KB </span>","children":null,"spread":false},{"title":"scripts","children":[{"title":"predict_no_finerune_gpt2.sh <span style='color:#111;'> 495B </span>","children":null,"spread":false},{"title":"train_finetune_gpt2.sh <span style='color:#111;'> 492B </span>","children":null,"spread":false},{"title":"train_no_finetune_gpt2.sh <span style='color:#111;'> 474B </span>","children":null,"spread":false},{"title":"predict_finerune_gpt2.sh <span style='color:#111;'> 511B </span>","children":null,"spread":false}],"spread":false},{"title":"datasets","children":[{"title":"test","children":[{"title":"50292297228_5c260d7dd9_b.jpg <span style='color:#111;'> 34.71KB </span>","children":null,"spread":false},{"title":"global-card-lego.png <span style='color:#111;'> 68.19KB </span>","children":null,"spread":false},{"title":"51249416246_26e7bcee71_b.jpg <span style='color:#111;'> 32.71KB </span>","children":null,"spread":false},{"title":"51324931128_0a4e482944_b.jpg <span style='color:#111;'> 16.77KB </span>","children":null,"spread":false},{"title":"4930864108_cd9fcb7a57_b.jpg <span style='color:#111;'> 26.48KB </span>","children":null,"spread":false},{"title":"51645058861_254767cde0_b.jpg <span style='color:#111;'> 26.31KB </span>","children":null,"spread":false},{"title":"25392547463_615de3cb70_b.jpg <span style='color:#111;'> 26.47KB </span>","children":null,"spread":false},{"title":"25167669554_839ac583a6_b.jpg <span style='color:#111;'> 23.82KB </span>","children":null,"spread":false},{"title":"48690120836_4824a12e6d_b.jpg <span style='color:#111;'> 27.92KB </span>","children":null,"spread":false},{"title":"51259608793_5bdda24605_b.jpg <span style='color:#111;'> 27.89KB </span>","children":null,"spread":false},{"title":"51220776286_cba3991787_b.jpg <span style='color:#111;'> 17.68KB </span>","children":null,"spread":false},{"title":"50825859087_29f3edbd7e_b.jpg <span style='color:#111;'> 30.30KB </span>","children":null,"spread":false},{"title":"50779458317_d4e1fc51a8_b.jpg <span style='color:#111;'> 41.18KB </span>","children":null,"spread":false},{"title":"46442548385_dc00b31170_b.jpg <span style='color:#111;'> 36.36KB </span>","children":null,"spread":false},{"title":"50334773578_d5c84ed71d_b.jpg <span style='color:#111;'> 52.38KB </span>","children":null,"spread":false},{"title":"631407374_db533106dd_b.jpg <span style='color:#111;'> 42.53KB </span>","children":null,"spread":false}],"spread":false},{"title":"dev","children":[{"title":"371902.jpg <span style='color:#111;'> 51.21KB </span>","children":null,"spread":false},{"title":"256063.jpg <span style='color:#111;'> 57.22KB </span>","children":null,"spread":false},{"title":"371897.jpg <span style='color:#111;'> 68.11KB </span>","children":null,"spread":false},{"title":"301246.jpg <span style='color:#111;'> 56.00KB </span>","children":null,"spread":false},{"title":"353913.jpg <span style='color:#111;'> 58.82KB </span>","children":null,"spread":false},{"title":"27860802.jpg <span style='color:#111;'> 136.88KB </span>","children":null,"spread":false}],"spread":false},{"title":"flickr_caption.txt <span style='color:#111;'> 10.09MB </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明