Scale space: Difference between revisions
m MOS:FRAC / convert special characters found by Wikipedia:Typo Team/moss (via WP:JWB) |
|||
(5 intermediate revisions by 3 users not shown) | |||
Line 38: | Line 38: | ||
:<math>\partial_t L = \frac{1}{2} \nabla^2 L,</math> |
:<math>\partial_t L = \frac{1}{2} \nabla^2 L,</math> |
||
with initial condition <math>L(x, y; 0) = f(x, y)</math>. This formulation of the scale-space representation ''L'' means that it is possible to interpret the intensity values of the image ''f'' as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of ''t'' corresponds to heat [[diffusion]] in the image plane over time ''t'' (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant |
with initial condition <math>L(x, y; 0) = f(x, y)</math>. This formulation of the scale-space representation ''L'' means that it is possible to interpret the intensity values of the image ''f'' as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of ''t'' corresponds to heat [[diffusion]] in the image plane over time ''t'' (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant {{sfrac|1|2}}). Although this connection may appear superficial for a reader not familiar with [[differential equation]]s, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on [[partial derivative]]s in the 2+1-D volume generated by the scale space, thus within the framework of [[partial differential equation]]s. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using [[anisotropic diffusion]]. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the [[Green's function]] of this specific partial differential equation. |
||
==Motivations== |
==Motivations== |
||
Line 94: | Line 94: | ||
==Scale selection== |
==Scale selection== |
||
The theory presented so far describes a well-founded framework for ''representing'' image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for ''scale selection'' originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown ''a priori''. |
The theory presented so far describes a well-founded framework for ''representing'' image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for ''scale selection'' originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown ''a priori''. |
||
A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selection<ref name=lin08/><ref name=lin14compvisrefguide /><ref name=lin98 /><ref name=lin98b /><ref name=lin99cvhandbook /><ref>{{Cite journal|last=Lindeberg|first=Tony|date=2017-05-01|title=Temporal Scale Selection in Time-Causal Scale Space|journal=Journal of Mathematical Imaging and Vision|language=en|volume=58|issue=1|pages=57–101|doi=10.1007/s10851-016-0691-3|s2cid=254645013 |issn=1573-7683|doi-access=free}}</ref><ref name=Lin17SSVM>{{Cite journal|last=Lindeberg|first=Tony|date=2018-05-01|title=Spatio-Temporal Scale Selection in Video Data|journal=Journal of Mathematical Imaging and Vision|language=en|volume=60|issue=4|pages=525–562|doi=10.1007/s10851-017-0766-9|s2cid=254649837 |issn=1573-7683|doi-access=free}}</ref><ref name=Lin18SIIMS>{{Cite journal|last=Lindeberg|first=Tony|date=2018|title=Dense scale selection over space, time and space-time|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-218340|journal=SIAM Journal on Imaging Sciences|volume=11|issue=1|pages=407–441|doi=10.1137/17M114892X |arxiv=1709.08603 |s2cid=22220902 }}</ref> based on local [[Maxima and minima|maxima]] (or [[minima]]) over scales of scale-normalized [[derivative]]s |
A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selection<ref name=lin08/><ref name=lin14compvisrefguide /><ref name=lin98 /><ref name=lin98b /><ref name=lin99cvhandbook /><ref>{{Cite journal|last=Lindeberg|first=Tony|date=2017-05-01|title=Temporal Scale Selection in Time-Causal Scale Space|journal=Journal of Mathematical Imaging and Vision|language=en|volume=58|issue=1|pages=57–101|doi=10.1007/s10851-016-0691-3|s2cid=254645013 |issn=1573-7683|doi-access=free|arxiv=1701.05088}}</ref><ref name=Lin17SSVM>{{Cite journal|last=Lindeberg|first=Tony|date=2018-05-01|title=Spatio-Temporal Scale Selection in Video Data|journal=Journal of Mathematical Imaging and Vision|language=en|volume=60|issue=4|pages=525–562|doi=10.1007/s10851-017-0766-9|s2cid=254649837 |issn=1573-7683|doi-access=free}}</ref><ref name=Lin18SIIMS>{{Cite journal|last=Lindeberg|first=Tony|date=2018|title=Dense scale selection over space, time and space-time|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-218340|journal=SIAM Journal on Imaging Sciences|volume=11|issue=1|pages=407–441|doi=10.1137/17M114892X |arxiv=1709.08603 |s2cid=22220902 }}</ref> based on local [[Maxima and minima|maxima]] (or [[minima]]) over scales of scale-normalized [[derivative]]s |
||
:<math>L_{\xi^m \eta^n}(x, y; t) = t^{(m+n) \gamma/2} L_{x^m y^n}(x, y; t)</math> |
:<math>L_{\xi^m \eta^n}(x, y; t) = t^{(m+n) \gamma/2} L_{x^m y^n}(x, y; t)</math> |
||
where <math>\gamma \in [0,1]</math> is a parameter that is related to the dimensionality of the image feature. This algebraic expression for ''scale normalized Gaussian derivative operators'' originates from the introduction of ''<math>\gamma</math>-normalized derivatives'' according to |
where <math>\gamma \in [0,1]</math> is a parameter that is related to the dimensionality of the image feature. This algebraic expression for ''scale normalized Gaussian derivative operators'' originates from the introduction of ''<math>\gamma</math>-normalized derivatives'' according to |
||
Line 102: | Line 102: | ||
===Scale invariant feature detection=== |
===Scale invariant feature detection=== |
||
Following this approach of gamma-normalized derivatives, it can be shown that different types of ''scale adaptive and scale invariant [[feature detection (computer vision)|feature detectors]]''<ref name=lin08/><ref name=lin14compvisrefguide/><ref name=lin98/><ref name=lin98b/><ref name=lin99cvhandbook/><ref>{{Cite journal|last=Lindeberg|first=Tony|date=2013-06-01|title=Scale Selection Properties of Generalized Scale-Space Interest Point Detectors |
Following this approach of gamma-normalized derivatives, it can be shown that different types of ''scale adaptive and scale invariant [[feature detection (computer vision)|feature detectors]]''<ref name=lin08/><ref name=lin14compvisrefguide/><ref name=lin98/><ref name=lin98b/><ref name=lin99cvhandbook/><ref>{{Cite journal|last=Lindeberg|first=Tony|date=2013-06-01|title=Scale Selection Properties of Generalized Scale-Space Interest Point Detectors|journal=Journal of Mathematical Imaging and Vision|language=en|volume=46|issue=2|pages=177–210|doi=10.1007/s10851-012-0378-3|s2cid=254653631 |issn=1573-7683|doi-access=free}}</ref><ref>{{Cite journal|last=Lindeberg|first=Tony|date=2015-05-01|title=Image Matching Using Generalized Scale-Space Interest Points|journal=Journal of Mathematical Imaging and Vision|language=en|volume=52|issue=1|pages=3–36|doi=10.1007/s10851-014-0541-0|s2cid=254657377 |issn=1573-7683|doi-access=free}}</ref><ref name=Lin17SSVM/> can be expressed for tasks such as [[blob detection]], [[corner detection]], [[ridge detection]], [[edge detection]] and [[Corner detection#Spatio-temporal interest point detectors|spatio-temporal interest point detection]] (see the specific articles on these topics for in-depth descriptions of how these scale-invariant feature detectors are formulated). Furthermore, the scale levels obtained from automatic scale selection can be used for determining regions of interest for subsequent [[affine shape adaptation]]<ref name=lingar97 /> to obtain affine invariant interest points<ref name="Baumberg-2000" /><ref name="Mikolajczyk-Schmid" /> or for determining scale levels for computing associated [[image descriptors]], such as locally scale adapted [[N-jet]]s. |
||
Recent work has shown that also more complex operations, such as scale-invariant [[object recognition]] can be performed in this way, |
Recent work has shown that also more complex operations, such as scale-invariant [[object recognition]] can be performed in this way, |
||
Line 119: | Line 119: | ||
A first-order extension of the isotropic Gaussian scale space is provided by the ''affine (Gaussian) scale space''.<ref name=lin94/> One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a [[perspective camera model]]. To handle such non-linear deformations locally, partial invariance (or more correctly [[Covariant transformation|covariance]]) to local [[affine deformation]]s can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure,<ref name=lingar97/> see the article on [[affine shape adaptation]] for theory and algorithms. Indeed, this affine scale space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear [[partial differential equation]]s. |
A first-order extension of the isotropic Gaussian scale space is provided by the ''affine (Gaussian) scale space''.<ref name=lin94/> One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a [[perspective camera model]]. To handle such non-linear deformations locally, partial invariance (or more correctly [[Covariant transformation|covariance]]) to local [[affine deformation]]s can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure,<ref name=lingar97/> see the article on [[affine shape adaptation]] for theory and algorithms. Indeed, this affine scale space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear [[partial differential equation]]s. |
||
There exists a more general extension of the Gaussian scale-space model to affine and spatio-temporal scale-spaces.<ref name=lin94/><ref name=lingar97/><ref name=Lin11>{{Cite journal|last=Lindeberg|first=Tony|date=2011|title=Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-40136|journal=Journal of Mathematical Imaging and Vision|volume=40|issue=1|pages=36–81|doi=10.1007/s10851-010-0242-2 |s2cid=950099 }}</ref><ref name="Lin13-AIEP">{{Cite book|last=Lindeberg|first=Tony|title=Generalized Axiomatic Scale-Space Theory|series=Advances in Imaging and Electron Physics |date=2013-01-01|url=https://www.sciencedirect.com/science/article/pii/B9780124077010000017|volume=178|pages=1–96|editor-last=Hawkes|editor-first=Peter W.|publisher=Elsevier|language=en|doi=10.1016/b978-0-12-407701-0.00001-7|isbn=9780124077010 |access-date=2023-01-07}}</ref><ref name=Lin16-JMIV>{{Cite journal|last=Lindeberg|first=Tony|date=2016-05-01|title=Time-Causal and Time-Recursive Spatio-Temporal Receptive Fields|journal=Journal of Mathematical Imaging and Vision|language=en|volume=55|issue=1|pages=50–88|doi=10.1007/s10851-015-0613-9|s2cid=120619833 |issn=1573-7683|doi-access=free}}</ref> In addition to variabilities over scale, which original scale-space theory was designed to handle, this ''generalized scale-space theory''<ref name="Lin13-AIEP" /> also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local [[Galilean transformation]]s. This generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision.<ref name=Lin13BICY>{{Cite journal|last=Lindeberg|first=Tony|date=2013-12-01|title=A computational theory of visual receptive fields|url=https://doi.org/10.1007/s00422-013-0569-z|journal=Biological Cybernetics|language=en|volume=107|issue=6|pages=589–635|doi=10.1007/s00422-013-0569-z|issn=1432-0770|pmc=3840297|pmid=24197240}}</ref><ref name=Lin13PONE>{{Cite journal|last=Lindeberg|first=Tony|date=2013-07-19|title=Invariance of visual operations at the level of receptive fields|journal=PLOS ONE|language=en|volume=8|issue=7|pages=e66990|doi=10.1371/journal.pone.0066990|issn=1932-6203|pmc=3716821|pmid=23894283|arxiv=1210.0754 |bibcode=2013PLoSO...866990L |doi-access=free }}</ref><ref name=Lin16-JMIV/><ref name=Lin21/> |
There exists a more general extension of the Gaussian scale-space model to affine and spatio-temporal scale-spaces.<ref name=lin94/><ref name=lingar97/><ref name=Lin11>{{Cite journal|last=Lindeberg|first=Tony|date=2011|title=Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-40136|journal=Journal of Mathematical Imaging and Vision|volume=40|issue=1|pages=36–81|doi=10.1007/s10851-010-0242-2 |s2cid=950099 }}</ref><ref name="Lin13-AIEP">{{Cite book|last=Lindeberg|first=Tony|title=Generalized Axiomatic Scale-Space Theory|series=Advances in Imaging and Electron Physics |date=2013-01-01|url=https://www.sciencedirect.com/science/article/pii/B9780124077010000017|volume=178|pages=1–96|editor-last=Hawkes|editor-first=Peter W.|publisher=Elsevier|language=en|doi=10.1016/b978-0-12-407701-0.00001-7|isbn=9780124077010 |access-date=2023-01-07}}</ref><ref name=Lin16-JMIV>{{Cite journal|last=Lindeberg|first=Tony|date=2016-05-01|title=Time-Causal and Time-Recursive Spatio-Temporal Receptive Fields|journal=Journal of Mathematical Imaging and Vision|language=en|volume=55|issue=1|pages=50–88|doi=10.1007/s10851-015-0613-9|s2cid=120619833 |issn=1573-7683|doi-access=free|arxiv=1504.02648}}</ref> In addition to variabilities over scale, which original scale-space theory was designed to handle, this ''generalized scale-space theory''<ref name="Lin13-AIEP" /> also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local [[Galilean transformation]]s. This generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision.<ref name=Lin13BICY>{{Cite journal|last=Lindeberg|first=Tony|date=2013-12-01|title=A computational theory of visual receptive fields|url=https://doi.org/10.1007/s00422-013-0569-z|journal=Biological Cybernetics|language=en|volume=107|issue=6|pages=589–635|doi=10.1007/s00422-013-0569-z|issn=1432-0770|pmc=3840297|pmid=24197240}}</ref><ref name=Lin13PONE>{{Cite journal|last=Lindeberg|first=Tony|date=2013-07-19|title=Invariance of visual operations at the level of receptive fields|journal=PLOS ONE|language=en|volume=8|issue=7|pages=e66990|doi=10.1371/journal.pone.0066990|issn=1932-6203|pmc=3716821|pmid=23894283|arxiv=1210.0754 |bibcode=2013PLoSO...866990L |doi-access=free }}</ref><ref name=Lin16-JMIV/><ref name=Lin21/> |
||
There are strong relations between scale-space theory and [[wavelets|wavelet theory]], although these two notions of multi-scale representation have been developed from somewhat different premises. |
There are strong relations between scale-space theory and [[wavelets|wavelet theory]], although these two notions of multi-scale representation have been developed from somewhat different premises. |
||
Line 127: | Line 127: | ||
There are interesting relations between scale-space representation and biological vision and hearing. |
There are interesting relations between scale-space representation and biological vision and hearing. |
||
Neurophysiological studies of biological vision have shown that there are [[receptive field]] profiles in the mammalian [[retina]] and [[visual cortex]], |
Neurophysiological studies of biological vision have shown that there are [[receptive field]] profiles in the mammalian [[retina]] and [[visual cortex]], |
||
that can be well modelled by linear Gaussian derivative operators, in some cases also complemented by a non-isotropic affine scale-space model, a spatio-temporal scale-space model and/or non-linear combinations of such linear operators.<ref name=Lin11 /><ref name=Lin13BICY/><ref name=Lin13PONE /><ref name=Lin16-JMIV/><ref name=Lin21>{{Cite journal|last=Lindeberg|first=Tony|date=2021-01-01|title=Normative theory of visual receptive fields|journal=Heliyon|language=English|volume=7|issue=1|pages=e05897 |doi=10.1016/j.heliyon.2021.e05897|issn=2405-8440|pmc=7820928|pmid=33521348}}</ref><ref name="DeAngelis-EtAl" /><ref name="Young-1987" /><ref>{{Cite journal|last1=Young|first1=Richard|last2=Lesperance|first2=Ronald|last3=Meyer|first3=W. Weston|date=2001-01-01|title=The Gaussian Derivative model for spatial-temporal vision: I. Cortical model|url=https://brill.com/view/journals/sv/14/3-4/article-p261_3.xml|journal=Spatial Vision|language=en|volume=14|issue=3–4|pages=261–319|doi=10.1163/156856801753253582|pmid=11817740 |issn=0169-1015}}</ref><ref>{{Cite journal|last1=Lesperance|first1=Ronald|last2=Young|first2=Richard|date=2001-01-01|title=The Gaussian Derivative model for spatial-temporal vision: II. Cortical data|url=https://brill.com/view/journals/sv/14/3-4/article-p321_4.xml|journal=Spatial Vision|language=en|volume=14|issue=3–4|pages=321–389|doi=10.1163/156856801753253591|pmid=11817741 |issn=0169-1015}}</ref> |
that can be well modelled by linear Gaussian derivative operators, in some cases also complemented by a non-isotropic affine scale-space model, a spatio-temporal scale-space model and/or non-linear combinations of such linear operators.<ref name=Lin11 /><ref name=Lin13BICY/><ref name=Lin13PONE /><ref name=Lin16-JMIV/><ref name=Lin21>{{Cite journal|last=Lindeberg|first=Tony|date=2021-01-01|title=Normative theory of visual receptive fields|journal=Heliyon|language=English|volume=7|issue=1|pages=e05897 |doi=10.1016/j.heliyon.2021.e05897|doi-access=free |issn=2405-8440|pmc=7820928|pmid=33521348|bibcode=2021Heliy...705897L }}</ref><ref name="DeAngelis-EtAl" /><ref name="Young-1987" /><ref>{{Cite journal|last1=Young|first1=Richard|last2=Lesperance|first2=Ronald|last3=Meyer|first3=W. Weston|date=2001-01-01|title=The Gaussian Derivative model for spatial-temporal vision: I. Cortical model|url=https://brill.com/view/journals/sv/14/3-4/article-p261_3.xml|journal=Spatial Vision|language=en|volume=14|issue=3–4|pages=261–319|doi=10.1163/156856801753253582|pmid=11817740 |issn=0169-1015}}</ref><ref>{{Cite journal|last1=Lesperance|first1=Ronald|last2=Young|first2=Richard|date=2001-01-01|title=The Gaussian Derivative model for spatial-temporal vision: II. Cortical data|url=https://brill.com/view/journals/sv/14/3-4/article-p321_4.xml|journal=Spatial Vision|language=en|volume=14|issue=3–4|pages=321–389|doi=10.1163/156856801753253591|pmid=11817741 |issn=0169-1015}}</ref> |
||
Regarding biological hearing there are [[receptive field]] profiles in the [[inferior colliculus]] and the [[primary auditory cortex]] that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.<ref name=LinFri15PONE>{{Cite journal|last1=Lindeberg|first1=Tony|last2=Friberg|first2=Anders|date=2015-03-30|title=Idealized Computational Models for Auditory Receptive Fields|journal=PLOS ONE|language=en|volume=10|issue=3|pages=e0119032|doi=10.1371/journal.pone.0119032|issn=1932-6203|pmc=4379182|pmid=25822973|arxiv=1404.2037 |bibcode=2015PLoSO..1019032L |doi-access=free }}</ref><ref name=LinFri15SSVM>{{Cite book|last1=Lindeberg|first1=Tony|last2=Friberg|first2=Anders|title=Scale Space and Variational Methods in Computer Vision |chapter=Scale-Space Theory for Auditory Signals |date=2015|journal=Proc Scale Space Theory and Variational Methods in Computer Vision (SSVM 2015)|series=Lecture Notes in Computer Science |chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-160481|publisher=Springer Lecture Notes in Computer Science|volume=9087|pages=3–15|doi=10.1007/978-3-319-18461-6_1 |isbn=978-3-319-18460-9 }}</ref> |
Regarding biological hearing there are [[receptive field]] profiles in the [[inferior colliculus]] and the [[primary auditory cortex]] that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.<ref name=LinFri15PONE>{{Cite journal|last1=Lindeberg|first1=Tony|last2=Friberg|first2=Anders|date=2015-03-30|title=Idealized Computational Models for Auditory Receptive Fields|journal=PLOS ONE|language=en|volume=10|issue=3|pages=e0119032|doi=10.1371/journal.pone.0119032|issn=1932-6203|pmc=4379182|pmid=25822973|arxiv=1404.2037 |bibcode=2015PLoSO..1019032L |doi-access=free }}</ref><ref name=LinFri15SSVM>{{Cite book|last1=Lindeberg|first1=Tony|last2=Friberg|first2=Anders|title=Scale Space and Variational Methods in Computer Vision |chapter=Scale-Space Theory for Auditory Signals |date=2015|journal=Proc Scale Space Theory and Variational Methods in Computer Vision (SSVM 2015)|series=Lecture Notes in Computer Science |chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-160481|publisher=Springer Lecture Notes in Computer Science|volume=9087|pages=3–15|doi=10.1007/978-3-319-18461-6_1 |isbn=978-3-319-18460-9 }}</ref> |
||
Line 133: | Line 133: | ||
==Deep learning and scale space== |
==Deep learning and scale space== |
||
In the area of classical computer vision, scale-space theory has established itself as a theoretical framework for early vision, with Gaussian derivatives constituting a canonical model for the first layer of receptive fields. With the introduction of [[deep learning]], there has also been work on also using Gaussian derivatives or Gaussian kernels as a general basis for receptive fields in deep networks.<ref name=Jac16>{{Cite web|url=https://openaccess.thecvf.com/content_cvpr_2016/papers/Jacobsen_Structured_Receptive_Fields_CVPR_2016_paper.pdf|title=Jacobsen, J.J., van Gemert, J., Lou, Z., Smeulders, A.W.M. (2016) Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2610–2619.}}</ref><ref name=Wor19>{{Cite journal|last1=Worrall|first1=Daniel E.|last2=Welling|first2=Max|date=2019-11-05|title=Deep Scale-spaces: Equivariance Over Scale|arxiv=1905.11697 |url=https://openreview.net/forum?id=S1eS24HlUH|language=en}}</ref><ref name=Lin20>{{Cite journal|last=Lindeberg|first=Tony|date=2020-01-01|title=Provably Scale-Covariant Continuous Hierarchical Networks Based on Scale-Normalized Differential Expressions Coupled in Cascade|journal=Journal of Mathematical Imaging and Vision|language=en|volume=62|issue=1|pages=120–148|doi=10.1007/s10851-019-00915-x|s2cid=254646822 |issn=1573-7683|doi-access=free}}</ref><ref name=Lin22>{{Cite journal|last=Lindeberg|first=Tony|date=2022-03-01|title=Scale-Covariant and Scale-Invariant Gaussian Derivative Networks|journal=Journal of Mathematical Imaging and Vision|language=en|volume=64|issue=3|pages=223–242|doi=10.1007/s10851-021-01057-9|s2cid=227227887 |issn=1573-7683|doi-access=free}}</ref><ref name=Pin21>{{Cite journal|last1=Pintea|first1=Silvia L.|last2=Tomen|first2=Nergis|last3=Goes|first3=Stanley F.|last4=Loog|first4=Marco|last5=van Gemert|first5=Jan C.|date=2021-06-30|title=Resolution learning in deep convolutional networks using scale-space theory|journal=IEEE Transactions on Image Processing |volume=30 |pages=8342–8353 |doi=10.1109/TIP.2021.3115001 |pmid=34587011 |arxiv=2106.03412 |bibcode=2021ITIP...30.8342P |s2cid=235358752 }}</ref> Using the transformation properties of the Gaussian derivatives and Gaussian kernels under scaling transformations, it is in this way possible to obtain scale covariance/equivariance and scale invariance of the deep network to handle image structures at different scales in a theoretically well-founded manner.<ref name=Lin20/><ref name=Lin22/> There have also been approaches developed to obtain scale covariance/equivariance and scale invariance by learned filters combined with multiple scale channels.<ref name=Sos20>{{Cite journal|last1=Sosnovik|first1=Ivan|last2=Szmaja|first2=Michał|last3=Smeulders|first3=Arnold|date=2020-06-08|title=Scale-Equivariant Steerable Networks|arxiv=1910.11093 |url=https://openreview.net/forum?id=HJgpugrKPS|language=en}}</ref><ref name=Bek20>{{Cite web|url=https://openreview.net/pdf?id=H1gBhkBFDH|title=Bekkers, E.J.: B-spline CNNs on Lie groups (2020) In: International Conference on Learning Representations.}}</ref><ref name=Jan21>{{Cite book|last1=Jansson|first1=Ylva|last2=Lindeberg|first2=Tony|date=2021|title=2020 25th International Conference on Pattern Recognition (ICPR)|chapter=Exploring the ability of CNN s to generalise to previously unseen scales over wide scale ranges |chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288539|publisher=Institute of Electrical and Electronics Engineers (IEEE)|pages=1181–1188|doi=10.1109/ICPR48806.2021.9413276 |arxiv=2004.01536 |isbn=978-1-7281-8808-9 |s2cid=214795413 }}</ref><ref name=Sos21>{{Cite web|url=https://www.bmvc2021-virtualconference.com/assets/papers/1258.pdf|title=Sosnovik, I., Moskalev, A., Smeulders, A. (2021) DISCO: Accurate discrete scale convolutions. In: British Machine Vision Conference.}}</ref><ref name=Jan22>{{Cite journal|last1=Jansson|first1=Ylva|last2=Lindeberg|first2=Tony|date=2022-06-01|title=Scale-Invariant Scale-Channel Networks: Deep Networks That Generalise to Previously Unseen Scales|journal=Journal of Mathematical Imaging and Vision|language=en|volume=64|issue=5|pages=506–536|doi=10.1007/s10851-022-01082-2|s2cid=235417440 |issn=1573-7683|doi-access=free}}</ref><ref>{{Cite web|url=https://www.jmlr.org/papers/volume23/20-099/20-099.pdf|title=Zhu, W., Qiu, Q., Calderbank, R., Sapiro, G., & Cheng, X. (2022) Scaling-translation-equivariant networks with decomposed convolutional filters. Journal of Machine Learning Research, 23(68): 1-45.}}</ref> Specifically, using the notions of scale covariance/equivariance and scale invariance, it is possible to make deep networks operate robustly at scales not spanned by the training data, thus enabling scale generalization.<ref name=Lin20/><ref name=Lin22/><ref name=Jan21/><ref name=Jan22/> |
In the area of classical computer vision, scale-space theory has established itself as a theoretical framework for early vision, with Gaussian derivatives constituting a canonical model for the first layer of receptive fields. With the introduction of [[deep learning]], there has also been work on also using Gaussian derivatives or Gaussian kernels as a general basis for receptive fields in deep networks.<ref name=Jac16>{{Cite web|url=https://openaccess.thecvf.com/content_cvpr_2016/papers/Jacobsen_Structured_Receptive_Fields_CVPR_2016_paper.pdf|title=Jacobsen, J.J., van Gemert, J., Lou, Z., Smeulders, A.W.M. (2016) Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2610–2619.}}</ref><ref name=Wor19>{{Cite journal|last1=Worrall|first1=Daniel E.|last2=Welling|first2=Max|date=2019-11-05|title=Deep Scale-spaces: Equivariance Over Scale|arxiv=1905.11697 |url=https://openreview.net/forum?id=S1eS24HlUH|language=en}}</ref><ref name=Lin20>{{Cite journal|last=Lindeberg|first=Tony|date=2020-01-01|title=Provably Scale-Covariant Continuous Hierarchical Networks Based on Scale-Normalized Differential Expressions Coupled in Cascade|journal=Journal of Mathematical Imaging and Vision|language=en|volume=62|issue=1|pages=120–148|doi=10.1007/s10851-019-00915-x|s2cid=254646822 |issn=1573-7683|doi-access=free|arxiv=1905.13555}}</ref><ref name=Lin22>{{Cite journal|last=Lindeberg|first=Tony|date=2022-03-01|title=Scale-Covariant and Scale-Invariant Gaussian Derivative Networks|journal=Journal of Mathematical Imaging and Vision|language=en|volume=64|issue=3|pages=223–242|doi=10.1007/s10851-021-01057-9|s2cid=227227887 |issn=1573-7683|doi-access=free|arxiv=2011.14759}}</ref><ref name=Pin21>{{Cite journal|last1=Pintea|first1=Silvia L.|last2=Tomen|first2=Nergis|last3=Goes|first3=Stanley F.|last4=Loog|first4=Marco|last5=van Gemert|first5=Jan C.|date=2021-06-30|title=Resolution learning in deep convolutional networks using scale-space theory|journal=IEEE Transactions on Image Processing |volume=30 |pages=8342–8353 |doi=10.1109/TIP.2021.3115001 |pmid=34587011 |arxiv=2106.03412 |bibcode=2021ITIP...30.8342P |s2cid=235358752 }}</ref> Using the transformation properties of the Gaussian derivatives and Gaussian kernels under scaling transformations, it is in this way possible to obtain scale covariance/equivariance and scale invariance of the deep network to handle image structures at different scales in a theoretically well-founded manner.<ref name=Lin20/><ref name=Lin22/> There have also been approaches developed to obtain scale covariance/equivariance and scale invariance by learned filters combined with multiple scale channels.<ref name=Sos20>{{Cite journal|last1=Sosnovik|first1=Ivan|last2=Szmaja|first2=Michał|last3=Smeulders|first3=Arnold|date=2020-06-08|title=Scale-Equivariant Steerable Networks|arxiv=1910.11093 |url=https://openreview.net/forum?id=HJgpugrKPS|language=en}}</ref><ref name=Bek20>{{Cite web|url=https://openreview.net/pdf?id=H1gBhkBFDH|title=Bekkers, E.J.: B-spline CNNs on Lie groups (2020) In: International Conference on Learning Representations.}}</ref><ref name=Jan21>{{Cite book|last1=Jansson|first1=Ylva|last2=Lindeberg|first2=Tony|date=2021|title=2020 25th International Conference on Pattern Recognition (ICPR)|chapter=Exploring the ability of CNN s to generalise to previously unseen scales over wide scale ranges |chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288539|publisher=Institute of Electrical and Electronics Engineers (IEEE)|pages=1181–1188|doi=10.1109/ICPR48806.2021.9413276 |arxiv=2004.01536 |isbn=978-1-7281-8808-9 |s2cid=214795413 }}</ref><ref name=Sos21>{{Cite web|url=https://www.bmvc2021-virtualconference.com/assets/papers/1258.pdf|title=Sosnovik, I., Moskalev, A., Smeulders, A. (2021) DISCO: Accurate discrete scale convolutions. In: British Machine Vision Conference.}}</ref><ref name=Jan22>{{Cite journal|last1=Jansson|first1=Ylva|last2=Lindeberg|first2=Tony|date=2022-06-01|title=Scale-Invariant Scale-Channel Networks: Deep Networks That Generalise to Previously Unseen Scales|journal=Journal of Mathematical Imaging and Vision|language=en|volume=64|issue=5|pages=506–536|doi=10.1007/s10851-022-01082-2|s2cid=235417440 |issn=1573-7683|doi-access=free|arxiv=2106.06418}}</ref><ref>{{Cite web|url=https://www.jmlr.org/papers/volume23/20-099/20-099.pdf|title=Zhu, W., Qiu, Q., Calderbank, R., Sapiro, G., & Cheng, X. (2022) Scaling-translation-equivariant networks with decomposed convolutional filters. Journal of Machine Learning Research, 23(68): 1-45.}}</ref> Specifically, using the notions of scale covariance/equivariance and scale invariance, it is possible to make deep networks operate robustly at scales not spanned by the training data, thus enabling scale generalization.<ref name=Lin20/><ref name=Lin22/><ref name=Jan21/><ref name=Jan22/> |
||
==Time-causal temporal scale space== |
==Time-causal temporal scale space== |
||
Line 166: | Line 166: | ||
pages = 224–270 | |
pages = 224–270 | |
||
year = 1994 | |
year = 1994 | |
||
doi = 10.1080/757582976 |
doi = 10.1080/757582976| |
||
bibcode = 1994JApSt..21..225L}}</ref> |
|||
<ref name=flo97> |
<ref name=flo97> |
||
Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.</ref> |
Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.</ref> |
||
Line 207: | Line 208: | ||
<ref name="Linde-Lindeberg-2004">{{Cite journal|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-44311|title=Object recognition using composed receptive field histograms of higher dimensionality|first1=Oskar|last1=Linde|first2=Tony|last2=Lindeberg|date=7 January 2004|publisher=IEEE Conference Proceedings|pages=1–6|journal=International Conference on Pattern Recognition (ICPR 2004)|via=kth.diva-portal.org}}</ref> |
<ref name="Linde-Lindeberg-2004">{{Cite journal|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-44311|title=Object recognition using composed receptive field histograms of higher dimensionality|first1=Oskar|last1=Linde|first2=Tony|last2=Lindeberg|date=7 January 2004|publisher=IEEE Conference Proceedings|pages=1–6|journal=International Conference on Pattern Recognition (ICPR 2004)|via=kth.diva-portal.org}}</ref> |
||
<ref name="Linde-Lindeberg-2012">{{Cite journal|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-52279|title=Composed Complex-Cue Histograms: An Investigation of the Information Content in Receptive Field Based Image Descriptors for Object Recognition|first1=Oskar|last1=Linde|first2=Tony|last2=Lindeberg|date=7 January 2012|journal=Computer Vision and Image Understanding|volume=116|issue=4|pages=538–560|doi=10.1016/j.cviu.2011.12.003 |via=kth.diva-portal.org}}</ref> |
<ref name="Linde-Lindeberg-2012">{{Cite journal|url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-52279|title=Composed Complex-Cue Histograms: An Investigation of the Information Content in Receptive Field Based Image Descriptors for Object Recognition|first1=Oskar|last1=Linde|first2=Tony|last2=Lindeberg|date=7 January 2012|journal=Computer Vision and Image Understanding|volume=116|issue=4|pages=538–560|doi=10.1016/j.cviu.2011.12.003 |via=kth.diva-portal.org}}</ref> |
||
⚫ | <ref name="Burt-Adelson">Burt, Peter and Adelson, Ted, "[https://www.seas.upenn.edu/~cse399b/LaplacianPyramid.pdf The Laplacian Pyramid as a Compact Image Code] {{Webarchive|url=https://web.archive.org/web/20220123223830/https://www.seas.upenn.edu/~cse399b/LaplacianPyramid.pdf |date=23 January 2022 }}", IEEE Trans. Communications, 9:4, 532–540, 1983.</ref> |
||
<ref name="Burt-Adelson"> |
|||
⚫ | |||
<ref name="Crowley-Sanderson"> |
<ref name="Crowley-Sanderson"> |
||
[http://www-prima.inrialpes.fr/Prima/Homepages/jlc/papers/Crowley-Sanderson-PAMI87.pdf Crowley, J. L. and Sanderson, A. C. "Multiple resolution representation and probabilistic matching of 2-D gray-scale shape", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113–121, 1987.]</ref> |
[http://www-prima.inrialpes.fr/Prima/Homepages/jlc/papers/Crowley-Sanderson-PAMI87.pdf Crowley, J. L. and Sanderson, A. C. "Multiple resolution representation and probabilistic matching of 2-D gray-scale shape", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113–121, 1987.]</ref> |
||
Line 214: | Line 214: | ||
T. Lindeberg (1992) [https://dx.doi.org/10.1007/BF00135225 Scale-space behaviour of local extrema and blobs, J. of Mathematical Imaging and Vision, 1(1), pages 65—99.]</ref> |
T. Lindeberg (1992) [https://dx.doi.org/10.1007/BF00135225 Scale-space behaviour of local extrema and blobs, J. of Mathematical Imaging and Vision, 1(1), pages 65—99.]</ref> |
||
<ref name="Koendering-vanDoorn-1986"> |
<ref name="Koendering-vanDoorn-1986"> |
||
Jan Koenderink and Andrea van Doorn, A. J. (1986), |
Jan Koenderink and Andrea van Doorn, A. J. (1986), '[https://link.springer.com/article/10.1007/BF00318204 Dynamic shape]',{{closed access}} Biological Cybernetics 53, 383–396.</ref> |
||
<ref name="Damon-1995"> |
<ref name="Damon-1995"> |
||
Damon, J. (1995), |
Damon, J. (1995), '[https://core.ac.uk/download/pdf/81189379.pdf Local Morse theory for solutions to the heat equation and Gaussian blurring]', Journal of Differential Equations 115(2), 386–401.</ref> |
||
<ref name="Romeny-1994"> |
<ref name="Romeny-1994"> |
||
ter Haar Romeny, Bart M. (Editor), [https://books.google.com/books?id=Fr2rCAAAQBAJ Geometry-Driven Diffusion in Computer Vision], Kluwer Academic Publishers, 1994.</ref> |
ter Haar Romeny, Bart M. (Editor), [https://books.google.com/books?id=Fr2rCAAAQBAJ Geometry-Driven Diffusion in Computer Vision], Kluwer Academic Publishers, 1994.</ref> |
||
Line 222: | Line 222: | ||
<ref name="Young-1987"> |
<ref name="Young-1987"> |
||
Young, R. A. "[https://www.researchgate.net/profile/Richard_Young9/publication/20001242_The_Gaussian_derivative_model_for_spatial_vision_I_Retinal_mechanisms/links/5a159a174585153b546c8f2e/The-Gaussian-derivative-model-for-spatial-vision-I-Retinal-mechanisms.pdf The Gaussian derivative model for spatial vision: Retinal mechanisms]", Spatial Vision, 2:273–293, 1987.</ref> |
Young, R. A. "[https://www.researchgate.net/profile/Richard_Young9/publication/20001242_The_Gaussian_derivative_model_for_spatial_vision_I_Retinal_mechanisms/links/5a159a174585153b546c8f2e/The-Gaussian-derivative-model-for-spatial-vision-I-Retinal-mechanisms.pdf The Gaussian derivative model for spatial vision: Retinal mechanisms]", Spatial Vision, 2:273–293, 1987.</ref> |
||
⚫ | <ref name="DeAngelis-EtAl">[http://ohzawa-lab.bpe.es.osaka-u.ac.jp/ohzawa-lab/pdf/TINS95.pdf DeAngelis, G. C., Ohzawa, I., and Freeman, R. D., "Receptive-field dynamics in the central visual pathways", Trends Neurosci. 18: 451–458, 1995.]{{Dead link|date=June 2024 |bot=InternetArchiveBot |fix-attempted=yes }}</ref> |
||
<ref name="DeAngelis-EtAl"> |
|||
⚫ | |||
<ref name=Lindeberg-Bretzner> |
<ref name=Lindeberg-Bretzner> |
||
T. Lindeberg and L. Bretzner (2003) [http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A440700&dswid=-4789 "Real-time scale selection in hybrid multi-scale representations", Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 148–163.]</ref> |
T. Lindeberg and L. Bretzner (2003) [http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A440700&dswid=-4789 "Real-time scale selection in hybrid multi-scale representations", Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 148–163.]</ref> |
||
Line 246: | Line 245: | ||
*[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A457189&dswid=-658 Lindeberg, Tony: Scale-space theory: A basic tool for analysing structures at different scales, in J. of Applied Statistics, 21(2), pp. 224–270, 1994.] (longer pdf tutorial on scale-space) |
*[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A457189&dswid=-658 Lindeberg, Tony: Scale-space theory: A basic tool for analysing structures at different scales, in J. of Applied Statistics, 21(2), pp. 224–270, 1994.] (longer pdf tutorial on scale-space) |
||
*[https://cds.cern.ch/record/400314/files/p27.pdf Lindeberg, Tony: Scale-space: A framework for handling image structures at multiple scales, Proc. CERN School of Computing, 96(8): 27-38, 1996.] |
*[https://cds.cern.ch/record/400314/files/p27.pdf Lindeberg, Tony: Scale-space: A framework for handling image structures at multiple scales, Proc. CERN School of Computing, 96(8): 27-38, 1996.] |
||
*[https://faculty.idc.ac.il/arik/LODSeminar/05ScaleSpace/romeny_scalespace.pdf Romeny, Bart ter Haar: Introduction to Scale-Space Theory: Multiscale Geometric Image Analysis, Tutorial VBC |
*[https://faculty.idc.ac.il/arik/LODSeminar/05ScaleSpace/romeny_scalespace.pdf Romeny, Bart ter Haar: Introduction to Scale-Space Theory: Multiscale Geometric Image Analysis, Tutorial VBC '96, Hamburg, Germany, Fourth International Conference on Visualization in Biomedical Computing.] |
||
*[https://link.springer.com/article/10.1007/BF01262401 Florack, Luc, Romeny, Bart ter Haar, Viergever, Max, & Koenderink, Jan: Linear scale space, Journal of Mathematical Imaging and Vision volume 4: 325–351, 1994.] |
*[https://link.springer.com/article/10.1007/BF01262401 Florack, Luc, Romeny, Bart ter Haar, Viergever, Max, & Koenderink, Jan: Linear scale space, Journal of Mathematical Imaging and Vision volume 4: 325–351, 1994.] |
||
*[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A450871&dswid=-1968 Lindeberg, Tony, "Principles for automatic scale selection", In: B. Jähne (et al., eds.), Handbook on Computer Vision and Applications, volume 2, pp 239—274, Academic Press, Boston, USA, 1999.] (tutorial on approaches to automatic scale selection) |
*[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A450871&dswid=-1968 Lindeberg, Tony, "Principles for automatic scale selection", In: B. Jähne (et al., eds.), Handbook on Computer Vision and Applications, volume 2, pp 239—274, Academic Press, Boston, USA, 1999.] (tutorial on approaches to automatic scale selection) |
Latest revision as of 18:55, 20 June 2024
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures.[1][2][3][4][5][6][7][8] The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .
The main type of scale space is the linear (Gaussian) scale space, which has wide applicability as well as the attractive property of being possible to derive from a small set of scale-space axioms. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances.[9][10]
Definition
[edit]The notion of scale space applies to signals of arbitrary numbers of variables. The most common case in the literature applies to two-dimensional images, which is what is presented here. For a given image , its linear (Gaussian) scale-space representation is a family of derived signals defined by the convolution of with the two-dimensional Gaussian kernel
such that
where the semicolon in the argument of implies that the convolution is performed only over the variables , while the scale parameter after the semicolon just indicates which scale level is being defined. This definition of works for a continuum of scales , but typically only a finite discrete set of levels in the scale-space representation would be actually considered.
The scale parameter is the variance of the Gaussian filter and as a limit for the filter becomes an impulse function such that that is, the scale-space representation at scale level is the image itself. As increases, is the result of smoothing with a larger and larger filter, thereby removing more and more of the details that the image contains. Since the standard deviation of the filter is , details that are significantly smaller than this value are to a large extent removed from the image at scale parameter , see the following figure and[11] for graphical illustrations.
-
Scale-space representation at scale , corresponding to the original image
-
Scale-space representation at scale
-
Scale-space representation at scale
-
Scale-space representation at scale
-
Scale-space representation at scale
-
Scale-space representation at scale
Why a Gaussian filter?
[edit]When faced with the task of generating a multi-scale representation one may ask: could any filter g of low-pass type and with a parameter t which determines its width be used to generate a scale space? The answer is no, as it is of crucial importance that the smoothing filter does not introduce new spurious structures at coarse scales that do not correspond to simplifications of corresponding structures at finer scales. In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms.
The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale space constitutes the canonical way to generate a linear scale space, based on the essential requirement that new structures must not be created when going from a fine scale to any coarser scale.[1][3][4][6][9][12][13][14][15][16][17][18][19] Conditions, referred to as scale-space axioms, that have been used for deriving the uniqueness of the Gaussian kernel include linearity, shift invariance, semi-group structure, non-enhancement of local extrema, scale invariance and rotational invariance. In the works,[15][20][21] the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed. The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality[3] or non-enhancement of local extrema.[16][18]
Alternative definition
[edit]Equivalently, the scale-space family can be defined as the solution of the diffusion equation (for example in terms of the heat equation),
with initial condition . This formulation of the scale-space representation L means that it is possible to interpret the intensity values of the image f as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of t corresponds to heat diffusion in the image plane over time t (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant 1/2). Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equations. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using anisotropic diffusion. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the Green's function of this specific partial differential equation.
Motivations
[edit]The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation. For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a computer vision system analysing an unknown scene, there is no way to know a priori what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales in order to be able to capture the unknown scale variations that may occur. Taken to the limit, a scale-space representation considers representations at all scales.[9]
Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from a measurement process, one has to apply operators of non-infinitesimal size to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement.[5]
There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex. In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.[4][9]
Gaussian derivatives
[edit]At any scale in scale space, we can apply local derivative operators to the scale-space representation:
Due to the commutative property between the derivative operator and the Gaussian smoothing operator, such scale-space derivatives can equivalently be computed by convolving the original image with Gaussian derivative operators. For this reason they are often also referred to as Gaussian derivatives:
The uniqueness of the Gaussian derivative operators as local operations derived from a scale-space representation can be obtained by similar axiomatic derivations as are used for deriving the uniqueness of the Gaussian kernel for scale-space smoothing.[4][22]
Visual front end
[edit]These Gaussian derivative operators can in turn be combined by linear or non-linear operators into a larger variety of different types of feature detectors, which in many cases can be well modelled by differential geometry. Specifically, invariance (or more appropriately covariance) to local geometric transformations, such as rotations or local affine transformations, can be obtained by considering differential invariants under the appropriate class of transformations or alternatively by normalizing the Gaussian derivative operators to a locally determined coordinate frame determined from e.g. a preferred orientation in the image domain, or by applying a preferred local affine transformation to a local image patch (see the article on affine shape adaptation for further details).
When Gaussian derivative operators and differential invariants are used in this way as basic feature detectors at multiple scales, the uncommitted first stages of visual processing are often referred to as a visual front-end. This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification, image segmentation, image matching, motion estimation, computation of shape cues and object recognition. The set of Gaussian derivative operators up to a certain order is often referred to as the N-jet and constitutes a basic type of feature within the scale-space framework.
Detector examples
[edit]Following the idea of expressing visual operations in terms of differential invariants computed at multiple scales using Gaussian derivative operators, we can express an edge detector from the set of points that satisfy the requirement that the gradient magnitude
should assume a local maximum in the gradient direction
By working out the differential geometry, it can be shown [4] that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant
that satisfy the following sign condition on a third-order differential invariant:
Similarly, multi-scale blob detectors at any given fixed scale[23][9] can be obtained from local maxima and local minima of either the Laplacian operator (also referred to as the Laplacian of Gaussian)
or the determinant of the Hessian matrix
In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives. The algebraic expressions for the corner and ridge detection operators are, however, somewhat more complex and the reader is referred to the articles on corner detection and ridge detection for further details.
Scale-space operations have also been frequently used for expressing coarse-to-fine methods, in particular for tasks such as image matching and for multi-scale image segmentation.
Scale selection
[edit]The theory presented so far describes a well-founded framework for representing image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for scale selection originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown a priori. A highly useful property of scale-space representation is that image representations can be made invariant to scales, by performing automatic local scale selection[9][10][23][24][25][26][27][28] based on local maxima (or minima) over scales of scale-normalized derivatives
where is a parameter that is related to the dimensionality of the image feature. This algebraic expression for scale normalized Gaussian derivative operators originates from the introduction of -normalized derivatives according to
- and
It can be theoretically shown that a scale selection module working according to this principle will satisfy the following scale covariance property: if for a certain type of image feature a local maximum is assumed in a certain image at a certain scale , then under a rescaling of the image by a scale factor the local maximum over scales in the rescaled image will be transformed to the scale level .[23]
Scale invariant feature detection
[edit]Following this approach of gamma-normalized derivatives, it can be shown that different types of scale adaptive and scale invariant feature detectors[9][10][23][24][25][29][30][27] can be expressed for tasks such as blob detection, corner detection, ridge detection, edge detection and spatio-temporal interest point detection (see the specific articles on these topics for in-depth descriptions of how these scale-invariant feature detectors are formulated). Furthermore, the scale levels obtained from automatic scale selection can be used for determining regions of interest for subsequent affine shape adaptation[31] to obtain affine invariant interest points[32][33] or for determining scale levels for computing associated image descriptors, such as locally scale adapted N-jets.
Recent work has shown that also more complex operations, such as scale-invariant object recognition can be performed in this way, by computing local image descriptors (N-jets or local histograms of gradient directions) at scale-adapted interest points obtained from scale-space extrema of the normalized Laplacian operator (see also scale-invariant feature transform[34]) or the determinant of the Hessian (see also SURF);[35] see also the Scholarpedia article on the scale-invariant feature transform[36] for a more general outlook of object recognition approaches based on receptive field responses[19][37][38][39] in terms Gaussian derivative operators or approximations thereof.
Related multi-scale representations
[edit]An image pyramid is a discrete representation in which a scale space is sampled in both space and scale. For scale invariance, the scale factors should be sampled exponentially, for example as integer powers of 2 or √2. When properly constructed, the ratio of the sample rates in space and scale are held constant so that the impulse response is identical in all levels of the pyramid.[40][41][42][43] Fast, O(N), algorithms exist for computing a scale invariant image pyramid, in which the image or signal is repeatedly smoothed then subsampled. Values for scale space between pyramid samples can easily be estimated using interpolation within and between scales and allowing for scale and position estimates with sub resolution accuracy.[43]
In a scale-space representation, the existence of a continuous scale parameter makes it possible to track zero crossings over scales leading to so-called deep structure. For features defined as zero-crossings of differential invariants, the implicit function theorem directly defines trajectories across scales,[4][44] and at those scales where bifurcations occur, the local behaviour can be modelled by singularity theory.[4][44][45][46][47]
Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes.[48][49] These non-linear scale-spaces often start from the equivalent diffusion formulation of the scale-space concept, which is subsequently extended in a non-linear fashion. A large number of evolution equations have been formulated in this way, motivated by different specific requirements (see the abovementioned book references for further information). It should be noted, however, that not all of these non-linear scale-spaces satisfy similar "nice" theoretical requirements as the linear Gaussian scale-space concept. Hence, unexpected artifacts may sometimes occur and one should be very careful of not using the term "scale-space" for just any type of one-parameter family of images.
A first-order extension of the isotropic Gaussian scale space is provided by the affine (Gaussian) scale space.[4] One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a perspective camera model. To handle such non-linear deformations locally, partial invariance (or more correctly covariance) to local affine deformations can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure,[31] see the article on affine shape adaptation for theory and algorithms. Indeed, this affine scale space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear partial differential equations.
There exists a more general extension of the Gaussian scale-space model to affine and spatio-temporal scale-spaces.[4][31][18][19][50] In addition to variabilities over scale, which original scale-space theory was designed to handle, this generalized scale-space theory[19] also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations. This generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision.[51][52][50][53]
There are strong relations between scale-space theory and wavelet theory, although these two notions of multi-scale representation have been developed from somewhat different premises. There has also been work on other multi-scale approaches, such as pyramids and a variety of other kernels, that do not exploit or require the same requirements as true scale-space descriptions do.
Relations to biological vision and hearing
[edit]There are interesting relations between scale-space representation and biological vision and hearing. Neurophysiological studies of biological vision have shown that there are receptive field profiles in the mammalian retina and visual cortex, that can be well modelled by linear Gaussian derivative operators, in some cases also complemented by a non-isotropic affine scale-space model, a spatio-temporal scale-space model and/or non-linear combinations of such linear operators.[18][51][52][50][53][54][55][56][57]
Regarding biological hearing there are receptive field profiles in the inferior colliculus and the primary auditory cortex that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.[58][59]
Deep learning and scale space
[edit]In the area of classical computer vision, scale-space theory has established itself as a theoretical framework for early vision, with Gaussian derivatives constituting a canonical model for the first layer of receptive fields. With the introduction of deep learning, there has also been work on also using Gaussian derivatives or Gaussian kernels as a general basis for receptive fields in deep networks.[60][61][62][63][64] Using the transformation properties of the Gaussian derivatives and Gaussian kernels under scaling transformations, it is in this way possible to obtain scale covariance/equivariance and scale invariance of the deep network to handle image structures at different scales in a theoretically well-founded manner.[62][63] There have also been approaches developed to obtain scale covariance/equivariance and scale invariance by learned filters combined with multiple scale channels.[65][66][67][68][69][70] Specifically, using the notions of scale covariance/equivariance and scale invariance, it is possible to make deep networks operate robustly at scales not spanned by the training data, thus enabling scale generalization.[62][63][67][69]
Time-causal temporal scale space
[edit]For processing pre-recorded temporal signals or video, the Gaussian kernel can also be used for smoothing and suppressing fine-scale structures over the temporal domain, since the data are pre-recorded and available in all directions. When processing temporal signals or video in real-time situations, the Gaussian kernel cannot, however, be used for temporal smoothing, since it would access data from the future, which obviously cannot be available. For temporal smoothing in real-time situations, one can instead use the temporal kernel referred to as the time-causal limit kernel,[71] which possesses similar properties in a time-causal situation (non-creation of new structures towards increasing scale and temporal scale covariance) as the Gaussian kernel obeys in the non-causal case. The time-causal limit kernel corresponds to convolution with an infinite number of truncated exponential kernels coupled in cascade, with specifically chosen time constants to obtain temporal scale covariance. For discrete data, this kernel can often be numerically well approximated by a small set of first-order recursive filters coupled in cascade, see [71] for further details.
For an earlier approach to handling temporal scales in a time-causal way, by performing Gaussian smoothing over a logarithmically transformed temporal axis, however, not having any known memory-efficient time-recursive implementation as the time-causal limit kernel has, see,[72]
Implementation issues
[edit]When implementing scale-space smoothing in practice there are a number of different approaches that can be taken in terms of continuous or discrete Gaussian smoothing, implementation in the Fourier domain, in terms of pyramids based on binomial filters that approximate the Gaussian or using recursive filters. More details about this are given in a separate article on scale space implementation.
See also
[edit]References
[edit]- ^ a b Ijima, T. "Basic theory on normalization of pattern (in case of typical one-dimensional pattern)". Bull. Electrotech. Lab. 26, 368– 388, 1962. (in Japanese)
- ^ "Witkin, A. P. "Scale-space filtering", Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany,1019–1022, 1983" (PDF).
- ^ a b c Koenderink, Jan "The structure of images", Biological Cybernetics, 50:363–370, 1984
- ^ a b c d e f g h i Lindeberg, T. (1993). Scale-Space Theory in Computer Vision. Springer. doi:10.1007/978-1-4757-6465-9. ISBN 978-1-4419-5139-7.
- ^ a b T. Lindeberg (1994). "Scale-space theory: A basic tool for analysing structures at different scales". Journal of Applied Statistics (Supplement on Advances in Applied Statistics: Statistics and Images: 2). 21 (2): 224–270. Bibcode:1994JApSt..21..225L. doi:10.1080/757582976.
- ^ a b Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.
- ^ "Sporring, Jon et al. (Eds), Gaussian Scale-Space Theory, Kluwer Academic Publishers, 1997".
- ^ ter Haar Romeny, Bart M. (2008). Front-End Vision and Multi-Scale Image Analysis: Multi-scale Computer Vision Theory and Applications, written in Mathematica. Springer Science & Business Media. ISBN 978-1-4020-8840-7.
- ^ a b c d e f g Lindeberg, Tony (2008). "Scale-space". In Benjamin Wah (ed.). Encyclopedia of Computer Science and Engineering. Vol. IV. John Wiley and Sons. pp. 2495–2504. doi:10.1002/9780470050118.ecse609. ISBN 978-0470050118.
- ^ a b c T. Lindeberg (2014) "Scale selection", Computer Vision: A Reference Guide, (K. Ikeuchi, Editor), Springer, pages 701–713.
- ^ "Scale-space representation: Definition and basic ideas". www.csc.kth.se.
- ^ J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, Uniqueness of the Gaussian kernel for scale-space filtering. IEEE Trans. Pattern Anal. Machine Intell. 8(1), 26–33, 1986.
- ^ Yuille, A L; Poggio, T A (1 January 1986). "Scaling Theorems for Zero Crossings". IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (1): 15–25. doi:10.1109/TPAMI.1986.4767748. hdl:1721.1/5655. ISSN 0162-8828. PMID 21869319. S2CID 14815630.
- ^ Lindeberg, Tony (1990). "Scale-space for discrete signals". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (3): 234–254. doi:10.1109/34.49051.
- ^ a b Pauwels, Eric J.; Van Gool, Luc J.; Fiddelaers, Peter; Moons, Theo (1 July 1995). "An Extended Class of Scale-Invariant and Recursive Scale Space Filters". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (7): 691–701. doi:10.1109/34.391411 – via July 1995.
- ^ a b Lindeberg, Tony (7 January 1996). "On the axiomatic foundations of linear scale-space: Combining semi-group structure with causality vs. scale invariance". Gaussian Scale-Space Theory: Proc PhD School on Scale-Space Theory. Kluwer Academic Publishers: 75–97 – via kth.diva-portal.org.
- ^ Weickert, Joachim; Ishikawa, Seiji; Imiya, Atsushi (1 May 1999). "Linear Scale-Space has First been Proposed in Japan". Journal of Mathematical Imaging and Vision. 10 (3): 237–252. doi:10.1023/A:1008344623873. ISSN 0924-9907. S2CID 17835046.
- ^ a b c d Lindeberg, Tony (2011). "Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space". Journal of Mathematical Imaging and Vision. 40 (1): 36–81. doi:10.1007/s10851-010-0242-2. S2CID 950099.
- ^ a b c d Lindeberg, Tony (1 January 2013). Hawkes, Peter W. (ed.). Generalized Axiomatic Scale-Space Theory. Advances in Imaging and Electron Physics. Vol. 178. Elsevier. pp. 1–96. doi:10.1016/b978-0-12-407701-0.00001-7. ISBN 9780124077010. Retrieved 7 January 2023.
- ^ M. Felsberg and G.Sommer "The Monogenic Scale-Space: A Unifying Approach to Phase-Based Image Processing in Scale Space", Journal of Mathematical Imaging and Vision, 21(1): 5–28, 2004.
- ^ R. Duits, L. Florack, J. de Graaf and B. ter Haar Romeny "On the Axioms of Scale Space Theory", Journal of Mathematical Imaging and Vision, 20(3): 267–298, 2004.
- ^ Koenderink, J.J.; van Doorn, A.J. (7 June 1992). "Generic neighborhood operators". IEEE Transactions on Pattern Analysis and Machine Intelligence. 14 (6): 597–605. doi:10.1109/34.141551 – via IEEE Xplore.
- ^ a b c d Lindeberg, Tony (7 January 1998). "Feature detection with automatic scale selection". International Journal of Computer Vision. 30 (2): 79–116. doi:10.1023/A:1008045108935. S2CID 723210 – via kth.diva-portal.org.
- ^ a b Lindeberg, Tony (7 January 1998). "Edge detection and ridge detection with automatic scale selection". International Journal of Computer Vision. 30 (2): 117–154. doi:10.1023/A:1008097225773. S2CID 35328443 – via kth.diva-portal.org.
- ^ a b Lindeberg, Tony (7 January 1999). "Principles for Automatic Scale Selection". Handbook on Computer Vision and Applications. Academic Press: 239–274 – via kth.diva-portal.org.
- ^ Lindeberg, Tony (1 May 2017). "Temporal Scale Selection in Time-Causal Scale Space". Journal of Mathematical Imaging and Vision. 58 (1): 57–101. arXiv:1701.05088. doi:10.1007/s10851-016-0691-3. ISSN 1573-7683. S2CID 254645013.
- ^ a b Lindeberg, Tony (1 May 2018). "Spatio-Temporal Scale Selection in Video Data". Journal of Mathematical Imaging and Vision. 60 (4): 525–562. doi:10.1007/s10851-017-0766-9. ISSN 1573-7683. S2CID 254649837.
- ^ Lindeberg, Tony (2018). "Dense scale selection over space, time and space-time". SIAM Journal on Imaging Sciences. 11 (1): 407–441. arXiv:1709.08603. doi:10.1137/17M114892X. S2CID 22220902.
- ^ Lindeberg, Tony (1 June 2013). "Scale Selection Properties of Generalized Scale-Space Interest Point Detectors". Journal of Mathematical Imaging and Vision. 46 (2): 177–210. doi:10.1007/s10851-012-0378-3. ISSN 1573-7683. S2CID 254653631.
- ^ Lindeberg, Tony (1 May 2015). "Image Matching Using Generalized Scale-Space Interest Points". Journal of Mathematical Imaging and Vision. 52 (1): 3–36. doi:10.1007/s10851-014-0541-0. ISSN 1573-7683. S2CID 254657377.
- ^ a b c Lindeberg, Tony; Gårding, Jonas (7 January 1997). "Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D brightness structure". Image and Vision Computing. 15 (6): 415–434. doi:10.1016/S0262-8856(97)01144-X – via kth.diva-portal.org.
- ^ Baumberg, A. (7 January 2000). "Reliable feature matching across widely separated views". Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662). Vol. 1. IEEE Comput. Soc. pp. 774–781. doi:10.1109/CVPR.2000.855899. ISBN 0-7695-0662-3. S2CID 15626261.
- ^ Mikolajczyk, K. and Schmid, C.: Scale and affine invariant interest point detectors, Int. Journal of Computer Vision, 60:1, 63 – 86, 2004.
- ^ "Lowe, D. G., "Distinctive image features from scale-invariant keypoints", International Journal of Computer Vision, 60, 2, pp. 91–110, 2004".
- ^ Bay, Herbert; Ess, Andreas; Tuytelaars, Tinne; Van Gool, Luc (1 June 2008). "Speeded-Up Robust Features (SURF)". Computer Vision and Image Understanding. 110 (3): 346–359. doi:10.1016/j.cviu.2007.09.014. S2CID 14777911 – via ScienceDirect.
- ^ Lindeberg, Tony (22 May 2012). "Scale Invariant Feature Transform". Scholarpedia. 7 (5): 10491. Bibcode:2012SchpJ...710491L. doi:10.4249/scholarpedia.10491.
- ^ Schiele, Bernt; Crowley, James L. (1 January 2000). "Recognition without Correspondence using Multidimensional Receptive Field Histograms". International Journal of Computer Vision. 36 (1): 31–50. doi:10.1023/A:1008120406972. S2CID 2551159 – via Springer Link.
- ^ Linde, Oskar; Lindeberg, Tony (7 January 2004). "Object recognition using composed receptive field histograms of higher dimensionality". International Conference on Pattern Recognition (ICPR 2004). IEEE Conference Proceedings: 1–6 – via kth.diva-portal.org.
- ^ Linde, Oskar; Lindeberg, Tony (7 January 2012). "Composed Complex-Cue Histograms: An Investigation of the Information Content in Receptive Field Based Image Descriptors for Object Recognition". Computer Vision and Image Understanding. 116 (4): 538–560. doi:10.1016/j.cviu.2011.12.003 – via kth.diva-portal.org.
- ^ Burt, Peter and Adelson, Ted, "The Laplacian Pyramid as a Compact Image Code Archived 23 January 2022 at the Wayback Machine", IEEE Trans. Communications, 9:4, 532–540, 1983.
- ^ Crowley, James L.; Stern, Richard M. (March 1984). "Fast Computation of the Difference of Low-Pass Transform". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-6 (2): 212–222. doi:10.1109/TPAMI.1984.4767504. ISSN 1939-3539. PMID 21869184. S2CID 17032188.
- ^ Crowley, J. L. and Sanderson, A. C. "Multiple resolution representation and probabilistic matching of 2-D gray-scale shape", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113–121, 1987.
- ^ a b T. Lindeberg and L. Bretzner (2003) "Real-time scale selection in hybrid multi-scale representations", Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 148–163.
- ^ a b T. Lindeberg (1992) Scale-space behaviour of local extrema and blobs, J. of Mathematical Imaging and Vision, 1(1), pages 65—99.
- ^ Jan Koenderink and Andrea van Doorn, A. J. (1986), 'Dynamic shape', Biological Cybernetics 53, 383–396.
- ^ Damon, J. (1995), 'Local Morse theory for solutions to the heat equation and Gaussian blurring', Journal of Differential Equations 115(2), 386–401.
- ^ Florack, Luc; Kuijper, Arjan (1 February 2000). "The Topological Structure of Scale-Space Images". Journal of Mathematical Imaging and Vision. 12 (1): 65–79. doi:10.1023/A:1008304909717. ISSN 1573-7683. S2CID 7515494.
- ^ ter Haar Romeny, Bart M. (Editor), Geometry-Driven Diffusion in Computer Vision, Kluwer Academic Publishers, 1994.
- ^ Weickert, Joachim (1998). Anisotropic Diffusion in Image Processing. Teubner-Verlag.
- ^ a b c Lindeberg, Tony (1 May 2016). "Time-Causal and Time-Recursive Spatio-Temporal Receptive Fields". Journal of Mathematical Imaging and Vision. 55 (1): 50–88. arXiv:1504.02648. doi:10.1007/s10851-015-0613-9. ISSN 1573-7683. S2CID 120619833.
- ^ a b Lindeberg, Tony (1 December 2013). "A computational theory of visual receptive fields". Biological Cybernetics. 107 (6): 589–635. doi:10.1007/s00422-013-0569-z. ISSN 1432-0770. PMC 3840297. PMID 24197240.
- ^ a b Lindeberg, Tony (19 July 2013). "Invariance of visual operations at the level of receptive fields". PLOS ONE. 8 (7): e66990. arXiv:1210.0754. Bibcode:2013PLoSO...866990L. doi:10.1371/journal.pone.0066990. ISSN 1932-6203. PMC 3716821. PMID 23894283.
- ^ a b Lindeberg, Tony (1 January 2021). "Normative theory of visual receptive fields". Heliyon. 7 (1): e05897. Bibcode:2021Heliy...705897L. doi:10.1016/j.heliyon.2021.e05897. ISSN 2405-8440. PMC 7820928. PMID 33521348.
- ^ DeAngelis, G. C., Ohzawa, I., and Freeman, R. D., "Receptive-field dynamics in the central visual pathways", Trends Neurosci. 18: 451–458, 1995.[permanent dead link]
- ^ Young, R. A. "The Gaussian derivative model for spatial vision: Retinal mechanisms", Spatial Vision, 2:273–293, 1987.
- ^ Young, Richard; Lesperance, Ronald; Meyer, W. Weston (1 January 2001). "The Gaussian Derivative model for spatial-temporal vision: I. Cortical model". Spatial Vision. 14 (3–4): 261–319. doi:10.1163/156856801753253582. ISSN 0169-1015. PMID 11817740.
- ^ Lesperance, Ronald; Young, Richard (1 January 2001). "The Gaussian Derivative model for spatial-temporal vision: II. Cortical data". Spatial Vision. 14 (3–4): 321–389. doi:10.1163/156856801753253591. ISSN 0169-1015. PMID 11817741.
- ^ Lindeberg, Tony; Friberg, Anders (30 March 2015). "Idealized Computational Models for Auditory Receptive Fields". PLOS ONE. 10 (3): e0119032. arXiv:1404.2037. Bibcode:2015PLoSO..1019032L. doi:10.1371/journal.pone.0119032. ISSN 1932-6203. PMC 4379182. PMID 25822973.
- ^ Lindeberg, Tony; Friberg, Anders (2015). "Scale-Space Theory for Auditory Signals". Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science. Vol. 9087. Springer Lecture Notes in Computer Science. pp. 3–15. doi:10.1007/978-3-319-18461-6_1. ISBN 978-3-319-18460-9.
{{cite book}}
:|journal=
ignored (help) - ^ "Jacobsen, J.J., van Gemert, J., Lou, Z., Smeulders, A.W.M. (2016) Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2610–2619" (PDF).
- ^ Worrall, Daniel E.; Welling, Max (5 November 2019). "Deep Scale-spaces: Equivariance Over Scale". arXiv:1905.11697.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ a b c Lindeberg, Tony (1 January 2020). "Provably Scale-Covariant Continuous Hierarchical Networks Based on Scale-Normalized Differential Expressions Coupled in Cascade". Journal of Mathematical Imaging and Vision. 62 (1): 120–148. arXiv:1905.13555. doi:10.1007/s10851-019-00915-x. ISSN 1573-7683. S2CID 254646822.
- ^ a b c Lindeberg, Tony (1 March 2022). "Scale-Covariant and Scale-Invariant Gaussian Derivative Networks". Journal of Mathematical Imaging and Vision. 64 (3): 223–242. arXiv:2011.14759. doi:10.1007/s10851-021-01057-9. ISSN 1573-7683. S2CID 227227887.
- ^ Pintea, Silvia L.; Tomen, Nergis; Goes, Stanley F.; Loog, Marco; van Gemert, Jan C. (30 June 2021). "Resolution learning in deep convolutional networks using scale-space theory". IEEE Transactions on Image Processing. 30: 8342–8353. arXiv:2106.03412. Bibcode:2021ITIP...30.8342P. doi:10.1109/TIP.2021.3115001. PMID 34587011. S2CID 235358752.
- ^ Sosnovik, Ivan; Szmaja, Michał; Smeulders, Arnold (8 June 2020). "Scale-Equivariant Steerable Networks". arXiv:1910.11093.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ "Bekkers, E.J.: B-spline CNNs on Lie groups (2020) In: International Conference on Learning Representations".
- ^ a b Jansson, Ylva; Lindeberg, Tony (2021). "Exploring the ability of CNN s to generalise to previously unseen scales over wide scale ranges". 2020 25th International Conference on Pattern Recognition (ICPR). Institute of Electrical and Electronics Engineers (IEEE). pp. 1181–1188. arXiv:2004.01536. doi:10.1109/ICPR48806.2021.9413276. ISBN 978-1-7281-8808-9. S2CID 214795413.
- ^ "Sosnovik, I., Moskalev, A., Smeulders, A. (2021) DISCO: Accurate discrete scale convolutions. In: British Machine Vision Conference" (PDF).
- ^ a b Jansson, Ylva; Lindeberg, Tony (1 June 2022). "Scale-Invariant Scale-Channel Networks: Deep Networks That Generalise to Previously Unseen Scales". Journal of Mathematical Imaging and Vision. 64 (5): 506–536. arXiv:2106.06418. doi:10.1007/s10851-022-01082-2. ISSN 1573-7683. S2CID 235417440.
- ^ "Zhu, W., Qiu, Q., Calderbank, R., Sapiro, G., & Cheng, X. (2022) Scaling-translation-equivariant networks with decomposed convolutional filters. Journal of Machine Learning Research, 23(68): 1-45" (PDF).
- ^ a b Lindeberg, T. (23 January 2023). "A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time". Biological Cybernetics. 117 (1–2): 21–59. doi:10.1007/s00422-022-00953-6. PMC 10160219. PMID 36689001.
- ^ Koenderink, J. (1988). "Scale-time". Biological Cybernetics. 58 (3): 159–162. doi:10.1007/BF00364135. S2CID 209034116.
Further reading
[edit]- Lindeberg, Tony (2008). "Scale-space". In Benjamin Wah (ed.). Encyclopedia of Computer Science and Engineering. Vol. IV. John Wiley and Sons. pp. 2495–2504. doi:10.1002/9780470050118.ecse609. ISBN 978-0470050118.
- Lindeberg, Tony: Scale-space theory: A basic tool for analysing structures at different scales, in J. of Applied Statistics, 21(2), pp. 224–270, 1994. (longer pdf tutorial on scale-space)
- Lindeberg, Tony: Scale-space: A framework for handling image structures at multiple scales, Proc. CERN School of Computing, 96(8): 27-38, 1996.
- Romeny, Bart ter Haar: Introduction to Scale-Space Theory: Multiscale Geometric Image Analysis, Tutorial VBC '96, Hamburg, Germany, Fourth International Conference on Visualization in Biomedical Computing.
- Florack, Luc, Romeny, Bart ter Haar, Viergever, Max, & Koenderink, Jan: Linear scale space, Journal of Mathematical Imaging and Vision volume 4: 325–351, 1994.
- Lindeberg, Tony, "Principles for automatic scale selection", In: B. Jähne (et al., eds.), Handbook on Computer Vision and Applications, volume 2, pp 239—274, Academic Press, Boston, USA, 1999. (tutorial on approaches to automatic scale selection)
- Lindeberg, Tony: "Scale-space theory" In: Encyclopedia of Mathematics, (Michiel Hazewinkel, ed) Kluwer, 1997.
- Web archive backup: Lecture on scale-space at the University of Massachusetts (pdf)
External links
[edit]- Powers of ten interactive Java tutorial at Molecular Expressions website
- Ohzawa, Izumi. "Space-Time Receptive Fields of Visual Neurons". Osaka University. Archived from the original on 18 February 2006.
- pyscsp : Scale-Space Toolbox for Python at GitHub and PyPi
- pytempscsp : Temporal Scale-Space Toolbox for Python at GitHub and PyPi
- Peak detection in 1D data using a scale-space approach BSD-licensed MATLAB code