MIT/Tuebingen Saliency Benchmark
Leaderboard MIT300
Click on a metric name to sort that metric. Models evaluated as probabilistic models are shown with green background. The performance under the metric which a model has been trained on is shown in italic. The code for evaluation models can be found here.
Name | Published | Code | IG | AUC | sAUC | NSS | CC | KLDiv | SIM | Date tested |
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259 |
0.5434 | 0.5357 | 0.4081 | 0.1307 | 1.4964 | 0.3378 |
First tested 2019-09-14 Last tested 2019-09-14 maps from SaliencyToolBox via pysaliency |
|||
Lingyun Zhang, Matthew H. Tong, Tim K. Marks, Honghao Shan, Garrison W. Cottrell. SUN: A Bayesian framework for saliency using natural statistics [JoV 2008] |
0.6939 | 0.6260 | 0.7620 | 0.2770 | 1.2815 | 0.3931 |
First tested 2019-10-23 Last tested 2019-10-23 maps from code via SMILER. Params: rescale=0.5 |
|||
Saliency Detection by Self-Resemblance (SSR) |
Hae Jong Seo, Peyman Milanfar Nonparametric Bottom-Up Saliency Detection by Self-Resemblance [CVPR 2009]. |
0.7064 | 0.6482 | 0.8110 | 0.2999 | 1.5255 | 0.4124 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
Quaternion-Based Spectral Saliency (QSS) |
B. Schauerte, R. Stiefelhagen. Quaternion-based Spectral Saliency Detection for Eye Fixation Prediction [ECCV 2012] |
0.7233 | 0.6679 | 0.9116 | 0.3300 | 1.1431 | 0.4208 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
Image Signature |
Xiaodi Hou, Jonathan Harel, Christof Koch. Image Signature: Highlighting Sparse Salient Regions [PAMI 2011] |
0.7461 | 0.6610 | 0.9907 | 0.3709 | 1.0897 | 0.4278 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
Dynamic Visual Attention (DVA) |
Hou, Xiaodi, and Liqing Zhang. Dynamic visual attention: Searching for coding length increments NIPS 2008 |
0.7548 | 0.6584 | 1.0142 | 0.3762 | 1.1136 | 0.4518 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
||
Stas Goferman, Lihi Zelnik-Manor, Ayellet Tal. Context-Aware Saliency Detection [CVPR 2010] [PAMI 2012] |
0.7581 | 0.6402 | 1.0186 | 0.3848 | 1.0723 | 0.4319 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
|||
AIM |
Neil Bruce, John Tsotsos. Attention based on information maximization [JoV 2007] |
0.7619 | 0.6647 | 0.8824 | 0.3419 | 1.2476 | 0.4096 |
First tested 2019-09-10 Last tested 2019-09-10 maps from code via SMILER. Params: resize=0.5, thebasis='31infomax975' |
||
Nicolas Riche, Matei Mancas, Matthieu Duvinage, Makiese Mibulumukini, Bernard Gosselin, Thierry Dutoit. RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis [Signal Processing: Image Communication, 2013] |
0.7700 | 0.6729 | 1.1513 | 0.4220 | 1.0090 | 0.4572 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
|||
Jianming Zhang, Stan Sclaroff. Saliency detection: a boolean map approach [ICCV 2013] |
0.7718 | 0.6918 | 1.1512 | 0.4130 | 1.0235 | 0.4456 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
|||
IttiKoch2 |
Implementation by Jonathan Harel (part of GBVS toolbox) |
0.7811 | 0.6323 | 1.1130 | 0.4299 | 0.9605 | 0.4648 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
Centerbias |
leave-one-image out kernel density estimate with uniform mixture component |
0.0000 | 0.7830 | 0.1303 | 1.0960 | 0.4455 | 0.9506 | 0.4815 |
First tested 2019-09-10 Last tested 2021-11-24 |
|
CASPER V1 Salience |
Rachel Heaton, Simona Buetti, Alejandro Lleras, and John Hummel |
0.7941 | 0.6014 | 1.2093 | 0.4676 | 1.0295 | 0.4946 |
First tested 2021-09-07 Last tested 2021-09-07 maps from authors |
||
Hamed Rezazadegan Tavakoli, Esa Rahtu, Janne Heikkila. Fast and efficient saliency detection using sparse sampling and kernel density estimation [SCIA 2011] |
0.8018 | 0.5941 | 1.2763 | 0.4827 | 2.3018 | 0.4919 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
|||
Jonathan Harel, Christof Koch, Pietro Perona. Graph-Based Visual Saliency [NIPS 2006] |
0.8062 | 0.6299 | 1.2457 | 0.4791 | 0.8878 | 0.4835 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
|||
Tilke Judd, Krista Ehinger, Fredo Durand, Antonio Torralba. Learning to predict where humans look [ICCV 2009] |
0.8095 | 0.6003 | 1.1882 | 0.4664 | 1.1084 | 0.4182 |
First tested 2019-09-12 Last tested 2019-09-12 maps from SaliencyToolBox via pysaliency |
|||
LDS |
Shu Fang, Jia Li, Yonghong Tian, Tiejun Huang, Xiaowu Chen. Learning Discriminative Subspaces on Random Contrasts for Image Saliency Analysis [TNNLS 2016] |
0.8108 | 0.6020 | 1.3649 | 0.5177 | 1.0631 | 0.5222 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
Erkut Erdem, Aykut Erdem. Visual saliency estimation by nonlinearly integrating features using region covariances [JoV 2013] |
0.8116 | 0.5894 | 1.3362 | 0.5000 | 1.7220 | 0.5058 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
|||
OpenSALICON |
Christopher Lee Thomas. OpenSalicon: An Open Source Implementation of the Salicon Saliency Model [arXiv 2016] |
0.8140 | 0.7395 | 1.7029 | 0.5620 | 0.7829 | 0.5166 |
First tested 2019-09-06 Last tested 2019-09-06 maps from SMILER |
||
Eleonora Vig, Michael Dorr, David Cox. Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [CVPR 2014] |
0.8171 | 0.6180 | 1.1399 | 0.4518 | 1.1369 | 0.4112 |
First tested 2019-09-06 Last tested 2019-09-06 maps from SMILER |
|||
Matthias Kümmerer, Thomas S. A. Wallis, Leon A. Gatys, Matthias Bethge. Understanding Low- and High-Level Contributions to Fixation Prediction [ICCV 2017] |
0.4140 | 0.8330 | 0.6957 | 1.6134 | 0.5876 | 0.7084 | 0.5576 |
First tested 2019-03-26 Last tested 2021-11-25 |
||
ML-Net |
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara. A Deep Multi-Level Network for Saliency Prediction [ICPR 2016] |
0.8386 | 0.7399 | 1.9748 | 0.6633 | 0.8006 | 0.5819 |
First tested 2019-09-05 Last tested 2019-09-05 maps from SMILER |
||
DeepGaze I |
Matthias Kümmerer, Lucas Theis, Matthias Bethge. Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet [arxiv 2014, ICLR 2015 workshop] |
0.4836 | 0.8427 | 0.7232 | 1.7234 | 0.6144 | 0.6678 | 0.5717 |
First tested 2019-09-05 Last tested 2021-11-19 predictions from authors |
|
Deep Visual Attention (DVA) |
W. Wang, and J. Shen. Deep Visual Attention Prediction [IEEE TIP 2018] |
0.8430 | 0.7257 | 1.9305 | 0.6631 | 0.6293 | 0.5848 |
First tested 2019-09-04 Last tested 2019-09-04 maps from SMILER |
||
Saliency Attentive Model (SAM-VGG) |
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara. Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model [IEEE TIP 2018] |
0.8473 | 0.7305 | 1.9552 | 0.6630 | 1.2746 | 0.5986 |
First tested 2019-09-09 Last tested 2019-09-09 maps from SMILER |
||
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. SalGAN: Visual Saliency Prediction with Generative Adversarial Networks [arXiv 2017] |
0.8498 | 0.7354 | 1.8620 | 0.6740 | 0.7574 | 0.5932 |
First tested 2019-09-06 Last tested 2019-09-06 maps from SMILER |
|||
Saliency Attentive Model (SAM-ResNet) |
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara. Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model [IEEE TIP 2018] |
0.8526 | 0.7396 | 2.0628 | 0.6897 | 1.1710 | 0.6122 |
First tested 2019-09-09 Last tested 2019-09-09 maps from SMILER |
||
S. Fan, Z. Shen, M. Jiang, B. Koenig, J. Xu, M. Kankanhali, Q. Zhao. Emotional Attention: A Study of Image Sentiment and Visual Attention [CVPR 2018] |
0.8552 | 0.7398 | 1.9859 | 0.7054 | 0.5857 | 0.5806 |
First tested 2019-11-11 Last tested 2019-11-11 maps from authors |
|||
GazeGAN |
0.8607 | 0.7316 | 2.2118 | 0.7579 | 1.3390 | 0.6491 |
First tested 2020-04-21 Last tested 2020-04-21 maps from authors |
|||
0.8640 | 0.7446 | 2.1825 | 0.7578 | 0.8873 | 0.6551 |
First tested 2019-06-29 Last tested 2021-11-22 maps from authors |
||||
TranSalNet |
Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe and Hantao Liu: TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing 2022 |
0.8730 | 0.7471 | 2.3758 | 0.7991 | 0.9019 | 0.6852 |
First tested 2021-05-04 Last tested 2021-05-04 maps from authors |
||
Matthias Kümmerer, Thomas S. A. Wallis, Leon A. Gatys, Matthias Bethge. Understanding Low- and High-Level Contributions to Fixation Prediction [ICCV 2017] |
0.9247 | 0.8733 | 0.7759 | 2.3371 | 0.7703 | 0.4239 | 0.6636 |
First tested 2019-09-11 Last tested 2021-11-16 |
||
TranSalNet_Dense |
Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe and Hantao Liu: TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing 2022 |
0.8734 | 0.7467 | 2.4134 | 0.8070 | 1.0141 | 0.6895 |
First tested 2021-12-08 Last tested 2021-12-08 maps from authors |
||
MSI-Net |
A. Kroner, M. Senden, K. Driessens, R. Goebel: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 2020 |
0.9185 | 0.8738 | 0.7787 | 2.3053 | 0.7790 | 0.4232 | 0.6704 |
First tested 2020-05-14 Last tested 2021-11-15 maps from authors |
|
HATES |
paper in preparation |
0.8744 | 0.7549 | 2.3762 | 0.7897 | 0.7146 | 0.5313 |
First tested 2021-11-24 Last tested 2021-11-24 maps from authors |
||
Ani |
(work in progress) |
0.8748 | 0.7490 | 2.3518 | 0.7997 | 0.6741 | 0.6879 |
First tested 2022-05-21 Last tested 2022-05-21 maps from authors |
||
EML-NET |
Sen Jia & Neil D.B. Bruce EML-NET: An Expandable Multi-Layer NETwork for Saliency Prediction [arXiv 2018] |
0.8762 | 0.7469 | 2.4876 | 0.7893 | 0.8439 | 0.6756 |
First tested 2019-07-06 Last tested 2019-07-06 maps from authors |
||
G. Ding, N. Imamoglu, A. Caglayan, M. Murakawa, R. Nakamura: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks. Image and Vision Computing 2022. |
0.8194 | 0.8769 | 0.7858 | 2.4702 | 0.8141 | 0.4151 | 0.6933 |
First tested 2021-11-08 Last tested 2021-11-08 maps from authors |
||
R. Droste, J. Jiao, J.A. Noble: Unified Image and Video Saliency Modeling. ECCV 2020 (arXiv) |
0.9505 | 0.8772 | 0.7840 | 2.3689 | 0.7851 | 0.4149 | 0.6746 |
First tested 2019-11-07 Last tested 2021-11-16 maps from authors, for probabilistic predictions see appendix of arxiv paper. |
||
A. Linardos, M. Kümmerer, O. Press, M. Bethge: DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling [ICCV 2021] |
1.0715 | 0.8829 | 0.7942 | 2.5265 | 0.8242 | 0.3474 | 0.6993 |
First tested 2020-09-24 Last tested 2020-09-24 maps from authors |
||
Gold Standard (leave-one-subject-out) |
Gaussian kernel density estimate using all fixations of an image with uniform mixture component. Crossvalidated over subjects. Leave-one-subject-out performance |
1.3172 | 0.8982 | 0.8234 | 2.8481 |
First tested 2019-10-24 Last tested 2019-10-24 |
||||
Gold Standard |
Gaussian kernel density estimate using all fixations of an image with uniform mixture component. Crossvalidated over subjects. Joint performance. |
1.7366 | 0.9341 | 0.8825 | 3.1408 | 0.9828 | 0.0602 | 0.8992 |
First tested 2019-09-15 Last tested 2021-11-26 |
Baseline models
- Gold Standard: Our gold standard model is a Gaussian Kernel Density Estimate. There are two versions of it. The crossvalidated performance is the leave-one-subject-out performance where for each subject and image the fixations of all other subject on the same image are used to construct a kernel density estimate that is then evaluated on the remaining subject. The kernel size and the mixture weight of a uniform regularization component are fitted by maximizing the cross-validated log-likelihood of the model. In addition to this crossvalidated version of the model, we also report the performance of a KDE model that uses all fixations on each image with the same parameters as the cross-validated model. One can interpret the cross-validated performance as a lower bound on the explainable performance and the joint performance as an upper bound.
- Centerbias: The center bias model is again a Gaussian Kernel Density Estimate. However, unlike the gold standard, it uses the fixations of all other images to predict the fixations on any given image. Kernel size and the mixture weight oa a uniform regularization component are again fitted by maximizing the model log-likelihood.