李锡荣 -亚博888

中国人民大学计算机科学与技术系教授、博士生导师,主要研究方向为多媒体智能、视频检索、模式识别、ai辅助诊断等。在相关领域主要国际期刊和会议如tpami, tmm, tkde, tcsvt, csur, pattern recognition, acm tomm, jbhi, acmmm, cvpr, iccv, eccv, miccai, aaai, ijcai, www, acl等发表论文百余篇,谷歌学术引用5700多次。先后获得国际图像与视频检索会议(civr 2010)最佳论文奖、ieee tmm 2012年度期刊最佳论文奖、acm sigmm 2013年杰出博士论文奖、acmmm 2016 grand challenge award、2017中国多媒体大会优秀论文奖、2022年ccf科技成果奖自然科学二等奖等。担任多媒体领域重要会议 mmm 2021 program co-chair,国际期刊acm tomm、multimedia systems、iet computer vision等编委。




2007.08 - 2012.03, 博士, 荷兰阿姆斯特丹大学 intelligent systems lab amsterdam
2005.09 - 2007.06, 硕士, 清华大学 计算机系
2001.09 - 2005.07, 本科, 清华大学 计算机系

2022.08 - 至今, 教授,中国人民大学

2016.08 - 2022.08, 副教授,中国人民大学

2012.05 - 2016.08, 讲师, 中国人民大学

* 图像/视频搜索
* 多模态深度学习
* 医学影像分析
* 多媒体内容安全
* 计算机视觉与模式识别


2023.10 中文短视频内容理解技术最新进展, 中国人民大学数智工作坊第4期, 北京

2023.03 superretina: 基于半监督深度学习的通用眼底图像匹配, 2023年北京医师协会眼科专科医师分会年会, 北京

2022.10 fusion, generation & knowledge: key elements in multimedia intelligence, acmmm 2022 panel on groundbreaking multimedia research directions, lisbon

2022.06 多模态眼底图像合成, 北京医师协会眼科专科医师分会ai眼科高峰论坛, 北京

2021.09 面向眼底疾病识别的多模态多示例学习方法, 北京医师协会眼科专科医师分会ai分委会高峰论坛, 北京

2021.01 deep multiple instance learning with spatial attention for rop case classification, instance selection and abnormality localization, icpr 2020, 线上
2020.08 面向眼底病识别的多模态深度学习,北京医师协会眼科专科医师分会ai分委会高峰论坛, 北京
2019.11 learn to represent queries and videos for ad-hoc video search, trecvid 2019 workshop, gaithersburg
2019.10 deep learning for video retrieval by natural language, keynote talk at the 1st international workshop on fairness, accountability, and transparency in multimedia
2019.09 眼科影像理解的人工智能方法,浙江工商大学,杭州
2019.08 面向眼底影像的深度学习技术,2019 北京眼科大会(协和眼科论坛),北京
2019.04 dual encoding for zero-example video retrieval, 第十四届图像图形技术与应用学术会议, 北京
2019.01 deep models for zero-example video retrieval, iti-certh, greece
2018.11 recent advances in zero-example video retrieval, yocsef成都“智能视觉前沿发展技术报告会”,成都
2018.11 word2visualvec for ad-hoc video search, trecvid 2018 workshop, gaithersburg
2018.06 人工智能与影像识别,中央财经大学文化与传媒学院,北京
2018.05 人工智能在眼科的应用, 眼科e20新设备新技术高峰论坛, 杭州
2018.04 基于深度学习的图像内容识别,cfda医疗器械技术审评中心,北京
2017.11 multi-scale word2visualvec for video caption retrieval, trecvid 2017 workshop, gaithersburg
2017.10 人工智能机遇与挑战: 以影像内容理解为例, 2017眼科影像与信息高峰论坛,上海
2016.11 word2visualvec for video-to-text matching and ranking, trecvid 2016 workshop,gaithersburg
2016.10 tag embeddings for multimedia retrieval and description, sigmm raising stars symposium 2016, amsterdam
2016.04 图片句子生成的新进展, 北京大学语言、逻辑、认知及计算论坛 (llcc), 北京
1. 模式识别与计算机视觉
2. 数据结构
3. 实用python编程
[9] 国家自然科学基金(面上项目):零样例短视频检索关键技术研究, 2022.01-2025.12 (no. 62172420)
[8] 北京市自然科学基金 (面上项目):面向常见眼底病识别的多模态可解释深度学习研究,2020.01-2022.12 (no. 4202033)
[7] 中国人民大学决策咨询及预研委托项目:多媒体内容的中文语言自动描述,2018.01-2020.12 (no. 18xnlg19)
[6] 国家自然科学基金(面上项目):面向中文的看图造句若干关键问题研究, 2017.01-2020.12 (no. 61672523)
[5] 上海市智能信息处理重点实验室开放基金:基于相关样本的图像标签相关性计算研究,2014.01-2015.12 (no. iipl-2013-002)
[4] 国家自然科学基金(青年基金项目): 基于网上弱标注数据的个性化图像标注研究,2014.01-2016.12 (no. 61303184)
[3] 教育部高等学校博士点专项科研基金(新教师类): 基于分类的社会化标签与图像相关度估计方法研究,2014.01-2016.12 (no. 20130004120006)
[2] 教育部留学回国人员科研启动基金项目: 社会网上图像检索若干关键问题研究,2014.01-2015.12
[1] 中国人民大学新教师启动金项目: 基于社会化媒体的图像检索新方法研究, 2013.01-2015.12 (no. 13xnlf05)
*** 论文 ***

[75] kaibin tian, ruixiang zhao, zijie xin, bangxiang lan, xirong li. holistic features are almost sufficient for text-to-video retrieval. cvpr 2024

[74] fan hu, yanlin wang, lun du, hongyu zhang, shi han, dongmei zhang, xirong li. tackling long code search with splitting, encoding, and aggregating. lrec-coling 2024

[73] aozhu chen, fangming zhou, ziyuan wang, xirong li. cliprerank: an extremely simple method for improving ad-hoc video search. icassp 2024

[72] bing li, huan chen, weihong yu, ming zhang, fang lu, jingxue ma, yuhua hao, xiaorong li, bojie hu, lijun shen, jianbo mao, xixi he, hao wang, dayong ding, xirong li, youxin chen. the performance of a deep learning system in assisting junior ophthalmologists in diagnosing 13 major fundus diseases: a prospective multi-center clinical trial. npj digital medicine, 2024

[71] qijie wei, jingyuan yang, bo wang, jinrui wang, jianchun zhao, xinyu zhao, sheng yang, niranchana manivannan, youxin chen, dayong ding, jing zhou, xirong li. supervised domain adaptation for recognizing retinal diseases from wide-field fundus images. bibm 2023

[70] aozhu chen, ziyuan wang, chengbo dong, kaibin tian, ruixiang zhao, xun liang, zhanhui kang, xirong li. chinaopen: a dataset for open-world multimodal learning. acmmm 2023

[69] jiazhen liu, xirong li. geometrized transformer for self-supervised homography estimation. iccv 2023

[68] zhihao sun, haoran jiang, danding wang, xirong li, juan cao. safl-net: semantic-agnostic feature learning network with auxiliary plugins for image manipulation detection. iccv 2023

[67] fan hu, aozhu chen, xirong li. towards making a trojan-horse attack on text-to-image retrieval. icassp 2023

[66] jiazhen liu, xirong li, qijie wei, jie xu and dayong ding. semi-supervised keypoint detector and descriptor for retinal image matching. eccv 2022

[65] fan hu, aozhu chen, ziyue wang, fangming zhou, jianfeng dong and xirong li. lightweight attentional feature fusion: a new baseline for text-to-video retrieval. eccv 2022

[64] ziyue wang, aozhu chen, fan hu and xirong li. learn to understand negation in video retrieval. acmmm 2022

[63] jianfeng dong, xiaoke chen, minsong zhang, xun yang, shujie chen, xirong li, xun wang. partially relevant video retrieval, acmmm 2022

[62] chengbo dong, xinru chen, ruohan hu, juan cao, xirong li. mvss-net: multi-view multi-scale supervised networks for image manipulation detection. ieee transactions on pattern analysis and machine intelligence (tpami), 2022

[61] yue wu, yang zhou, jianchun zhao, jingyuan yang, weihong yu, youxin chen, xirong li. lesion localization in oct by semi-supervised object detection inproceedings. icmr 2022

[60] weisen wang, xirong li, zhiyan xu, weihong yu, jianchun zhao, dayong ding, youxin chen. learning two-stream cnn for multi-modal age-related macular degeneration categorization. ieee journal of biomedical and health informatics (j-bhi), 2022

[59] jianfeng dong, yabing wang, xianke chen, xiaoye qu, xirong li, yuan he, xun wang. reading-strategy inspired visual representation learning for text-to-video retrieval. ieee transactions on circuits and systems for video technology (tcsvt), 2022

[58] tianyun yang, ziyao huang, juan cao, lei li, xirong li. deepfake network architecture attribution. aaai 2022

[57] guang yang, juan cao, qiang sheng, peng qi, xirong li, jintao li. drag: dynamic region-aware gcn for privacy-leaking image detection. aaai 2022

[56] rui qian, xin lai, xirong li. 3d object detection for autonomous driving: a survey. pattern recognition, 2022

[55] rui qian, xin lai, xirong li (2022): badet: boundary-aware 3d object detection from point clouds, pattern recognition (pr), 2022

[54] 李锡荣 (2021): 多模态深度学习及其在眼科人工智能的应用展望, 协和医学杂志, volume 12, number 5, september 2021
[53] xinru chen, chengbo dong, jiaqi ji, juan cao, xirong li (2021): image manipulation detection by multi-view multi-scale supervision. international conference on computer vision (iccv), 2021
[52] xirong li, yang zhou, jie wang, hailan lin, jianchun zhao, dayong ding, weihong yu, youxin chen (2021): multi-modal multi-instance learning for retinal disease recognition. acm multimedia (acmmm), 2021
[51] peng qi, juan cao, xirong li, huan liu, qiang sheng, xiaoyue mi, qin he, yongbiao lv, chenyang guo, yingchao yu (2021): improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues. acm multimedia (acmmm), 2021 (industrial track)
[50] aozhu chen, fan hu, zihan wang, fangming zhou, xirong li (2021): what matters for ad-hoc video search? a large-scale evaluation on trecvid. the 2nd international workshop on video retrieval methods and their limits (viral, in conjunction with iccv), 2021
[49] chengbo dong, xinru chen, aozhu chen, fan hu, zihan wang, xirong li (2021): multi-level visual representation with semantic-reinforced learning for video captioning. acm multimedia (acmmm), 2021 (grand challenge paper)
[48] qiang sheng, juan cao, xueyao zhang, xirong li, lei zhong (2021): article reranking by memory-enhanced key sentence matching for detecting previously fact-checked claims. the 59th annual meeting of the association for computational linguistics (acl), 2021
[47] jianfeng dong, xirong li, chaoxi xu, xun yang, gang yang, xun wang, meng wang (2021): dual encoding for video retrieval by text. ieee transactions on pattern analysis and machine intelligence (tpami), 2021
[46] jie wang, kaibin tian, dayong ding, gang yang, xirong li (2021): unsupervised domain expansion for visual categorization. acm transactions on multimedia computing communications and applications (tomm), 2021
[45] xueyao zhang, juan cao, xirong li, qiang sheng, lei zhong, kai shu (2021): mining dual emotion for fake news detection. the web conference 2021 (www), 2021
[44] xirong li, fangming zhou, chaoxi xu, jiaqi ji, gang yang (2021): sea: sentence encoder assembly for video retrieval by textual queries. ieee transactions on multimedia (tmm), 2021
[43] bing li, huan chen, bilei zhang, mingzhen yuan, xuemin jin, bo lei, jie xu, wei gu, david wong, xixi he, hao wang, dayong ding, xirong li, weihong yu, youxin chen (2021): development and evaluation of a deep learning model for the detection of multiple fundus diseases based on color fundus photography. british journal of ophthalmology (bjo), 2021
[42] aozhu chen, xinyi huang, hailan lin, xirong li (2020): towards annotation-free evaluation of cross-lingual image captioning. acm multimedia asia (mmasia), 2020.
[41] xirong li, wencui wan, yang zhou, jianchun zhao, qijie wei, junbo rong, pengyi zhou, limin xu, lijuan lang, yuying liu, chengzhi niu, dayong ding, xuemin jin (2020): deep multiple instance learning with spatial attention for rop case classification, instance selection and abnormality localization. the 25th international conference on pattern recognition (icpr), 2020
[40] qijie wei, xirong li, weihong yu, xiao zhang, yongpeng zhang, bojie hu, bin mo, di gong, ning chen, dayong ding, youxin chen (2020): learn to segment retinal lesions and beyond. the 25th international conference on pattern recognition (icpr), 2020
[39] jakub lokoč, tomáš souček, patrik veselý, františek mejzlík, jiaqi ji, chaoxi xu, xirong li (2020): a w2vv case study with automated and interactive text-to-video retrieval. acm multimedia (acmmm), 2020
[38] zhengxiong jia, xirong li (2020): icap: interactive image captioning with predictive text. acm international conference on multimedia retrieval (icmr), 2020
[37] yutong liu, jingyuan yang, yang zhou, weisen wang, jianchun zhao, weihong yu, dingding zhang, dayong ding, xirong li, youxin chen (2020): prediction of oct images of short-term response to anti-vegf treatment for neovascular age-related macular degeneration using generative adversarial network. british journal of ophthalmology, 2020
[36] jianfeng dong, xun wang, leimin zhang, chaoxi xu, gang yang, xirong li (2019): feature re-learning with data augmentation for video relevance prediction. ieee transactions on knowledge and data engineering (tkde), 2019
[35] xirong li, chaoxi xu, gang yang, zhineng chen, jianfeng dong (2019): w2vv : fully deep learning for ad-hoc video search. acm multimedia (acmmm), 2019
[34] zhuoya yang, xirong li, xixi he, dayong ding, yanting wang, fangfang dai, xuemin jin (2019): joint localization of optic disc and fovea in ultra-widefield fundus images. the 10th international workshop on machine learning in medical imaging (mlmi), 2019
[33] chaoxi xu, xiangjia zhu, wenwen he, yi lu, xixi he, zongjiang shang, jun wu, keke zhang, yinglei zhang, xianfang rong, zhennan zhao, lei cai, dayong ding, xirong li (2019): fully deep learning for slit-lamp photo based nuclear cataract grading. international conference on medical image computing and computer assisted intervention (miccai), 2019 (early accept)
[32] weisen wang, zhiyan xu, weihong yu, jianchun zhao, jingyuan yang, feng he, zhikun yang, di chen, dayong ding, youxin chen, xirong li (2019): two-stream cnn with loose pair training for multi-modal amd categorization. international conference on medical image computing and computer assisted intervention (miccai), 2019 (early accept)
[31] jianfeng dong, xirong li, chaoxi xu, shouling ji, yuan he, gang yang, xun wang (2019): dual encoding for zero-example video retrieval. ieee conference on computer vision and pattern recognition (cvpr), 2019
[30] xirong li, chaoxi xu, xiaoxu wang, weiyu lan, zhengxiong jia, gang yang, jieping xu (2019): coco-cn for cross-lingual image tagging, captioning and retrieval. ieee transactions on multimedia (tmm), 2019
[29] 蓝玮毓, 王晓旭, 杨刚, 李锡荣 (2019): 标签增强的中文看图造句, 计算机学报, 2019
[28] xin lai, xirong li, rui qian, dayong ding, jun wu, jieping xu (2019): four models for automatic recognition of left and right eye in fundus images. the 25th international conference on multimedia modeling (mmm), 2019
[27] qijie wei, xirong li, hao wang, dayong ding, weihong yu, youxin chen (2018): laser scar detection in fundus images using convolutional neural networks. asian conference on computer vision (accv), 2018
[26] xirong li, jianfeng dong, chaoxi xu, jing cao, xun wang, gang yang (2018): renmin university of china and zhejiang gongshang university at trecvid 2018: deep cross-modal embeddings for video-text retrieval. trecvid 2018 workshop, 2018
[25] jianfeng dong, xirong li, chaoxi xu, gang yang, xun wang, feature re-learning with data augmentation for content-based video recommendation, acm multimedia (acmmm), 2018 (grand challenge paper)
[24] gang yang, jinlu liu, jieping xu, xirong li, dissimilarity representation learning for generalized zero-shot recognition, acm multimedia (acmmm), 2018
[23] bin liang, hongcheng li, miaoqiang su, pan bian, xirong li, wenchang shi (2018): deep text classification can be fooled. ijcai, 2018
[22] gang yang, jinlu liu, xirong li (2018): imagination based sample construction for zero-shot learning. sigir, 2018
[21] jianfeng dong, xirong li, cees g. m. snoek (2018): predicting visual features from text for image and video caption retrieval. ieee transactions on multimedia (tmm), 2018
[20] jianfeng dong, xirong li, duanqing xu (2018): cross-media similarity evaluation for web image retrieval in the wild. ieee transactions on multimedia (tmm), 2018
[19] cees g. m. snoek, xirong li, chaoxi xu, dennis c. koelma (2017): university of amsterdam and renmin university at trecvid 2017: searching video, detecting events and describing video. trecvid workshop, 2017
[18] weiyu lan, xirong li, jianfeng dong (2017): fluency-guided cross-lingual image captioning. acm multimedia (acmmm), 2017
[17] qijie wei, xiaoxu wang, xirong li (2017): harvesting deep models for cross-lingual image annotation. cbmi, 2017
[16] xirong li (2017): tag relevance fusion for social image retrieval. in: multimedia systems, 23 (1), pp. 29–40, 2017
[15] cees g. m. snoek, jianfeng dong, xirong li, xiaoxu wang, qijie wei, weiyu lan, efstratios gavves, noureldien hussein, dennis c. koelma, arnold w. m. smeulders (2016): university of amsterdam and renmin university at trecvid 2016: searching video, detecting events and describing video. trecvid workshop, 2016
[14] jianfeng dong, xirong li, weiyu lan, yujia huo, cees g. m. snoek (2016): early embedding and late reranking for video captioning. acm multimedia (acmmm), 2016
[13] xirong li, yujia huo, qin jin, jieping xu (2016): detecting violence in video using subclasses. acm multimedia (acmmm), 2016
[12] xirong li, qin jin (2016): improving image captioning by concept-based sentence reranking. pcm, 2016
[11] masoud mazloom, xirong li, cees g. m. snoek (2016): tagbook: a semantic video representation without supervision for event detection. in: ieee transactions on multimedia (tmm), 18 (7), pp. 1378-1388, 2016
[10] xirong li, weiyu lan, jianfeng dong, hailong liu (2016): adding chinese captions to images. icmr, 2016
[9] xirong li, tiberio uricchio, lamberto ballan, marco bertini, cees g. m. snoek, alberto del bimbo (2016): socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval. acm computing surveys (csur), 49 (1), pp. 14:1-14:39, 2016
[8] xirong li, tiberio uricchio, lamberto ballan, marco bertini, cees g. m. snoek, alberto del bimbo (2015): image tag assignment, refinement and retrieval. acm multimedia, 2015
[7] jianfeng dong, xirong li, shuai liao, jieping xu, duanqing xu, xiaoyong du (2015): image retrieval by cross-media relevance fusion. acm multimedia (acmmm), 2015
[6] qin jin, xirong li, haibing cao, yujia huo, shuai liao, gang yang, jieping xu (2015): rucmm at mediaeval 2015 affective impact of movies task: fusion of audio and visual cues. in: working notes proceedings of the mediaeval 2015 workshop, 2015
[5] xirong li, qin jin, shuai liao, junwei liang, xixi he, yujia huo, weiyu lan, bin xiao, yanxiong lu, jieping xu (2015): ruc-tencent at imageclef 2015: concept detection, localization and sentence generation. clef (working notes), 2015
[4] xirong li, shuai liao, weiyu lan, xiaoyong du, gang yang (2015): zero-shot image tagging by hierarchical semantic embedding. sigir, 2015
[3] shuai liao, xirong li, heng tao shen, yang yang, xiaoyong du (2015): tag features for geo-aware image classification. in: ieee transactions on multimedia (tmm), 17 (7), pp. 1058-1067, 2015
[2] junwei liang, qin jin, xixi he, gang yang, jieping xu, xirong li (2015): detecting semantic concepts in consumer videos using audio. in: international conference on acoustics, speech and signal processing (icassp), pp. 2279–2283, 2015
[1] svetlana kordumova, xirong li, cees g.m. snoek (2015): best practices for learning video concept detectors from social media examples. in: multimedia tools and applications (mtap), 74 (4), pp. 1291–1315, 2015

*** 国际评测 ***

[9] runner-up of the trecvid 2020 ad-hoc video search (avs) task
[8] runner-up of the trecvid 2019 ad-hoc video search (avs) task
[7] winner of the acm multimedia 2018 hulu content-based video relevance prediction challenge
[6] top performer of the trecvid 2018 ad-hoc video search (avs) task
[5] top performer of the trecvid 2018 video-to-text (vtt) matching and ranking task
[4] top performer of the trecvid 2016 video-to-text (vtt) task
[3] top performer of the imageclef 2015 image sentence generation task
[2] top performer of the msr bing image retrieval challenge at acm multimedia 2015
[1] top performer of the trecvid 2013 video semantic indexing with no annotation task

*** 发明专利 ***

[1] 具备跨语言学习能力的图像自然语言描述生成方法和装置, 专利号: zl 201710657104.3, 发明人: 李锡荣; 蓝玮毓; 董建锋

[2] 一种基于深度学习的眼底图像匹配方法、系统和可读介质, 申请号: 202210667546.7, 发明人: 李锡荣; 刘家真

[3] 一种基于多粒度知识蒸馏的跨模态视频检索方法及系统, 申请号: 202310847299.3, 发明人: 李锡荣; 赵瑞祥; 田凯彬


associate editor (july 2020 - june 2022), acm tomm
associate editor (feb 2020 - ), multimedia systems
multimedia grand challenge co-chair, acm multimedia 2021
pc co-chair, mmm 2021 (https://mmm2021.cz/)
area chair, icpr 2020
workshop co-chair, acm multimedia asia 2019
area chair, acm multimedia 2019
senior pc, acm icmr 2019
area chair, acm multimedia 2018
area chair, icpr 2016
demo / short paper co-chair, pcm 2015
publication co-chair, icmr 2015
publicity co-chair, icmr 2013
[9] winner of the content-based video relevance prediction (cbvrp) challenge, acm multimedia 2018 (feature re-learning with data augmentation for content-based video recommendation)
[8] 2017 中国多媒体大会优秀论文奖 (标签增强的中文看图造句)
[7] acm multimedia 2016 grand challenge award (early embedding and late reranking for video captioning)
[6] pcm 2016 best paper runner-up (improving image captioning by concept-based sentence reranking)
[5] pcm 2014 outstanding reviewer award
[4] sigmm 2013 best phd thesis award (content-based visual search learned from social media)
[3] ieee transactions on multimedia 2012 prize paper award (learning social tag relevance by neighbor voting)
[2] 2011 国家优秀自费留学生奖学金
[1] acm international conference on image and video retrieval 2010 best paper award (unsupervised multi-feature tag relevance learning for social image retrieval)