<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>영스퀘어</title>
    <link>https://young-square.tistory.com/</link>
    <description></description>
    <language>ko</language>
    <pubDate>Tue, 14 Apr 2026 12:31:54 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>영스퀘어</managingEditor>
    <image>
      <title>영스퀘어</title>
      <url>https://tistory1.daumcdn.net/tistory/4671977/attach/0c8504f1a86646d0a16ddf5cb828349c</url>
      <link>https://young-square.tistory.com</link>
    </image>
    <item>
      <title>[정리] Overview of Video Super-Resolution (VSR)</title>
      <link>https://young-square.tistory.com/64</link>
      <description>&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;Super-Resolution의 목표&lt;/b&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;저해상도 (low-resolution, LR) 이미지로부터 고해상도 (high-resolution, HR) 이미지를 생성 (복원)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;Super-Resolution의 종류&lt;/b&gt;&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Single Image Super-Resolution (SISR)&lt;/b&gt; : 이미지 한 장에 대한 SR 문제를 의미함&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Multi-Image Super-Resolution (MISR)&lt;/b&gt; : 여러 장의 이미지들에 대한 SR 문제를 의미하며, 일반적으로 여러 장의 이미지들을 모델의 입력으로 놓고 출력은 중간에 위치한 한 장의 이미지라고 문제를 정의함&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Video Super-Resolution (VSR)&lt;/b&gt; : 많은 논문들에서 MISR과 VSR을 동일한 문제로서 두는 경우가 많지만 차이를 설명하자면 시간적 정보에 대한 정렬과 관련한 프로세스가 존재하는 지 여부가 될 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;b&gt;딥러닝 기반 Video Super-Resolution (VSR) 연구의 트렌드 흐름&lt;/b&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;딥러닝 기반 VSR 연구는 'VSRNet' [1] 이라는 이름의 모델과 함께 2016년에 시작되었으며, 본 글에서는 매 해 핵심적인 모델들을 대표로 하여 연구 트렌드 흐름을 살펴보고자 한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;VSR.png&quot; data-origin-width=&quot;1718&quot; data-origin-height=&quot;813&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eo8W5h/btr2D3Nko55/cdMzd9ZZfDKUSe3DyBBrg0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eo8W5h/btr2D3Nko55/cdMzd9ZZfDKUSe3DyBBrg0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eo8W5h/btr2D3Nko55/cdMzd9ZZfDKUSe3DyBBrg0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Feo8W5h%2Fbtr2D3Nko55%2FcdMzd9ZZfDKUSe3DyBBrg0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1718&quot; height=&quot;813&quot; data-filename=&quot;VSR.png&quot; data-origin-width=&quot;1718&quot; data-origin-height=&quot;813&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;[2016~2017]&lt;/b&gt; 네트워크 구조는 단순하게 설계하면서 optical-flow 기반의 explicit alignment를 수행 (참고로, VESPCN [2]의 기반이 되었던 모델인 ESPCN [13]을 통해 제안된 PixelShuffle 기반 Up-sampling 구조는 현재도 가장 일반적으로 사용되는 Up-sampling 방식)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;[2018~2019]&lt;/b&gt; DUF [3], EDVR [4] 모델이 차례로 SOTA 성능을 보이며 implicit alignment 연구가 주를 이룸&lt;/li&gt;
&lt;li&gt;&lt;b&gt;[2020~2021]&lt;/b&gt; 2020년 들어서는 모델 구조도 훨씬 복잡해지고, 당시 SOTA를 차지한 모델인 iSeeBetter [5]과 RRN-L [6]의 공통점이 recurrent propagation 구조를 기반으로 한다는 점으로 보아 local propagation에서 recurrent propagation으로 연구 트렌드가 변했던 시점이 바로 이 때 였음을 알 수 있음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;[2021~2022]&lt;/b&gt; 2021년에 많은 최신 연구들의 베이스라인이 되는 BasicVSR [7] 모델이 등장하였으며 2022년에 제안된 BasicVSR++ [8] 에서 구조화 한 새로운 recurrent propagation 프레임워크와 explicit과 implicit을 결합한 alignment 모듈은 많은 최신 연구들에 영향을 끼쳤음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;[2022~]&lt;/b&gt; 2022년 3월 기준, VSR SOTA 모델들의 트렌드는 다른 vision task들과 마찬가지로 'Transformer'라고 할 수 있음. 초창기 Transformer 기반 VSR 모델인 VSR-Transformer (2021) [14], VRT [9]의 최대 단점은 높은 모델 복잡도 (VSR-Transformer: 32.6M, VRT: 35.6M) 라고 할 수 있는데 이 후 제안된 모델들 (RVRT: 10.8M, PSRT: 13.4M, FTVSR: 10.8M)의 경우 복잡도도 줄이면서 더 높은 PSNR 성능을 달성함&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;[References]&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;color: #000000;&quot;&gt;A. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Kappeler&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, S. H. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Yoo&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, Q. Dai, and A. K. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Katsaggelos&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;Video super-resolution &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;with convolutional neural networks&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;IEEE &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Transactions &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;on Computational Imaging&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, vol. 2, no. 2, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;109-122&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2016&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[2] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;J. Caballero, C. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Ledig&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, A. Andrew, A. Alejandro, J. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Totz&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Z.Wang&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, and W. Shi, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;Real-time &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;video super-resolution with &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;spatio&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;-temporal networks &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;and motion compensation&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings of &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;the IEEE/CVF &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Conference on Computer Vision and Pattern &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Recognition &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;(CVPR)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;4778-4787&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2017&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[3] &lt;span style=&quot;color: #000000;&quot;&gt;Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;Deep &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;video &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;super-resolution network &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;using dynamic &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;upsampling&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;filters &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;without &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;explicit motion &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;compensation&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings of the IEEE/CVF &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Conference on &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Computer Vision and Pattern Recognition (CVPR)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;3224-3232&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2018&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[4] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;X&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;. Wang, K. C. Chan, K. Yu, C. Dong, and C. C. Loy, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Edvr&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Video restoration &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;with enhanced deformable convolutional networks&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;of the IEEE/CVF Conference on Computer Vision &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;and Pattern &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Recognition Workshops (CVPRW)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2019&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[5] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;A. Chadha, J. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Britto&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, and &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;M. M. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Roja&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &amp;ldquo;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;iSeeBetter&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Spatio&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;-temporal video super-resolution using recurrent generative back-projection &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;networks,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Computational Visual Media&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;vol. 6, pp. 307-317, 2020.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[6] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;T. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Isobe&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, F. Zhu, X. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Jia&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, and S. Wang, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;Revisiting &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;temporal &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;modeling for &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;video super-resolution&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;arXiv&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;preprint arXiv:2008.05765&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2020.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[7] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Basicvsr&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;: The &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;search for essential components in video super-resolution &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;and beyond,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings of the IEEE/CVF Conference on &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Computer Vision &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;and Pattern Recognition (CVPR)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;4947-4956&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2021&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[8] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;K. C. Chan, S. Zhou, X. Xu, and C. C. Loy, &amp;ldquo;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Basicvsr&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;++: Improving video super-resolution with enhanced propagation and alignment&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;ldquo; in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings of the IEEE/CVF Conference on Computer &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Vision and &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Pattern Recognition (CVPR)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;5972-5981&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2022&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[9] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;J&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;. Liang, J. Cao, Y. Fan, K. Zhang, R. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Ranjan&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, Y. Li, R. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Timofte&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, and &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;L. Van &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Gool&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Vrt&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;: A video restoration transformer&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;rdquo; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;arXiv &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;preprint arXiv:2201.12288&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2022&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[10] J. Liang, Y. Fan, X. Xiang, R. Ranjan, E. Ilg, S. Green, J. Cao, K. Zhang, R. Timofte, and L. Van Gool, &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&quot;Recurrent video restoration transformer with guided deformable attention,&quot;&lt;span&gt; in Proceedings of &lt;span style=&quot;background-color: #ffffff; color: #4d5156;&quot;&gt;Conference on Neural Information Processing Systems (NeurIPS)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;,&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span&gt; &lt;/span&gt;2022.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[11] S. Shi, J. Gu, L. Xie, X. Wang, Y. Yang, and C. Dong,&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt; &quot;Rethinking alignment in video super-resolution transformers,&quot;&lt;span&gt; &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span&gt;in Proceedings of&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #4d5156;&quot;&gt;Conference on Neural Information Processing Systems (NeurIPS)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;2022.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[12] Z. Qiu, H. Yang, J. Fu, and D. Fu,&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span style=&quot;color: #222222;&quot;&gt; &quot;Learning spatiotemporal frequency-transformer for compressed video super-resolution,&quot; in Proceedings of European Conference on Computer Vision (ECCV),&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&amp;nbsp;2022.&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[13] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;W. Shi, J. Caballero, F. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Huszar&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, J. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Totz&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, A. P. Aitken, R. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Bishop, D&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Rueckert&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, and Z. Wang, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;ldquo;Real-time &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;single image and video &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;super-resolution &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;using an &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;efficient &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;sub-pixel convolutional neural network&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;,&amp;ldquo; in &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Proceedings of the IEEE/CVF Conference on Computer &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Vision and &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Pattern Recognition (CVPR)&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, pp. &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;1874-1883&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 2016&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[14] Jiezhang Cao, Yawei Li, Kai Zhang, and Luc Van Gool. Video super-resolution transformer. arXiv preprint arXiv:2106.06847,&amp;nbsp;2021.&lt;/span&gt;&lt;/p&gt;</description>
      <category>Research/Super-Resolution</category>
      <category>Super-Resolution</category>
      <category>Video Super-Resolution</category>
      <category>VSR</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/64</guid>
      <comments>https://young-square.tistory.com/64#entry64comment</comments>
      <pubDate>Tue, 7 Mar 2023 17:10:50 +0900</pubDate>
    </item>
    <item>
      <title>[PyTorch][에러 해결] Solving problem of GPU memory: &amp;lsquo;torch.utils.checkpoint&amp;rsquo;</title>
      <link>https://young-square.tistory.com/63</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://pytorch.org/docs/stable/checkpoint.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://pytorch.org/docs/stable/checkpoint.html&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1678166137486&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;torch.utils.checkpoint &amp;mdash; PyTorch 1.13 documentation&quot; data-og-description=&quot;Shortcuts&quot; data-og-host=&quot;pytorch.org&quot; data-og-source-url=&quot;https://pytorch.org/docs/stable/checkpoint.html&quot; data-og-url=&quot;https://pytorch.org/docs/stable/checkpoint.html&quot; data-og-image=&quot;&quot;&gt;&lt;a href=&quot;https://pytorch.org/docs/stable/checkpoint.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://pytorch.org/docs/stable/checkpoint.html&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url();&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;torch.utils.checkpoint &amp;mdash; PyTorch 1.13 documentation&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Shortcuts&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;pytorch.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;많은 수의 파라미터를 가지는 크기가 큰 딥러닝 모델의 경우, batch를 1로 설정하더라도 'out of memory'를 만나게 될 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나의 경우, Transformer backbone을 사용하는 VSR 모델에 대한 실험을 해보려하니 48GB 메모리를 가지는 Quadro RTX 8000 GPU로도 실험이 불가능하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;GPU를 활용한 실험을 할 때 이러한 메모리 제한 문제는 'torch.utils.checkpoint'로 해결 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;tutorial을 보면, 아래와 같이 쓰여있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;'&lt;span style=&quot;color: #000000;&quot;&gt;Rather than storing all intermediate activations of the entire computation graph for computing backward, the &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;checkpointed&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; part does not save intermediate activations, and instead &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;recomputes&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; them in backward pass.&lt;/span&gt;'&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;자세한 정보가 적혀있지는 않지만, 원래는 학습 시 'intermediate activations' 저장을 위해 memory를 차지해야만 했지만, 이렇게 저장하는 방식 대신 backward pass에서 계산하는 방식으로 변경하면서 메모리를 줄일 수 있는 것으로 보인다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;더 자세한 정보는 tutorial 링크를 첨부해놓았으니 참고바란다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사용법은 간단하다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;[Train 프로세스가 구현되어 있는 부분]&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. DDP 적용을 위한 추가 설정&lt;/p&gt;
&lt;pre id=&quot;code_1680082703459&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;model._set_static_graph()&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. cudnn 관련 설정&lt;/p&gt;
&lt;pre id=&quot;code_1680082368845&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;torch.backends.cudnn.benchmark = True&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. AMP와 함께 학습&lt;/p&gt;
&lt;pre id=&quot;code_1680082398993&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;scaler = torch.cuda.amp.GradScaler(enabled=True)

optimizer_g.zero_grad()

with autocast(enabled=True):
    output_HR = model(input_LR)
    l_pix = loss_function(output_HR, gt)
    
    scaler.scale(l_pix).backward()
    scaler.step(optimizer_g)
    scaler.update()&lt;/code&gt;&lt;/pre&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;[모델이 구현되어 있는 부분]&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. 모듈 import&lt;/p&gt;
&lt;pre id=&quot;code_1678166668580&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import torch.utils.checkpoint as checkpoint&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. checkpoint 함수 적용&lt;/p&gt;
&lt;pre id=&quot;code_1678167028804&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;class BasicLayer(nn.Module):
    def __init__(self, dim, depth):

        super().__init__()

        # build sample blocks
        self.blocks = nn.ModuleList([
            SampleBlock(
                dim=dim) for i in range(depth)
        ])

    def forward(self, x):
        for blk in self.blocks:
            # x = blk(x)
            x = checkpoint.checkpoint(blk, x)
            
        return x&lt;/code&gt;&lt;/pre&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Programming/Python</category>
      <category>pytorch</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/63</guid>
      <comments>https://young-square.tistory.com/63#entry63comment</comments>
      <pubDate>Tue, 7 Mar 2023 15:09:10 +0900</pubDate>
    </item>
    <item>
      <title>[LaTeX][에러 해결] Package hyperref Warning:~~~, Underfull \hbox ~~~, Overfull \hbox ~~~</title>
      <link>https://young-square.tistory.com/62</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 글에서 활용한 환경은 overleaf 웹사이트이며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.overleaf.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://www.overleaf.com/&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1660132749085&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Overleaf, Online LaTeX Editor&quot; data-og-description=&quot;An online LaTeX editor that&amp;rsquo;s easy to use. No installation, real-time collaboration, version control, hundreds of LaTeX templates, and more.&quot; data-og-host=&quot;www.overleaf.com&quot; data-og-source-url=&quot;https://www.overleaf.com/&quot; data-og-url=&quot;https://www.overleaf.com/&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/fw3LB/hyPofmE6Ax/dcy3SUArACURTLB9Z2kGkk/img.png?width=256&amp;amp;height=256&amp;amp;face=0_0_256_256,https://scrap.kakaocdn.net/dn/cWo6So/hyPpLqIhnL/WaykR0kjF57KuHE0EolyG0/img.png?width=256&amp;amp;height=256&amp;amp;face=0_0_256_256,https://scrap.kakaocdn.net/dn/denz2B/hyPpBIoxnq/tZZuvxOvh8Ot53oOkjNkhk/img.png?width=834&amp;amp;height=522&amp;amp;face=0_0_834_522&quot;&gt;&lt;a href=&quot;https://www.overleaf.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://www.overleaf.com/&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/fw3LB/hyPofmE6Ax/dcy3SUArACURTLB9Z2kGkk/img.png?width=256&amp;amp;height=256&amp;amp;face=0_0_256_256,https://scrap.kakaocdn.net/dn/cWo6So/hyPpLqIhnL/WaykR0kjF57KuHE0EolyG0/img.png?width=256&amp;amp;height=256&amp;amp;face=0_0_256_256,https://scrap.kakaocdn.net/dn/denz2B/hyPpBIoxnq/tZZuvxOvh8Ot53oOkjNkhk/img.png?width=834&amp;amp;height=522&amp;amp;face=0_0_834_522');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Overleaf, Online LaTeX Editor&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;An online LaTeX editor that&amp;rsquo;s easy to use. No installation, real-time collaboration, version control, hundreds of LaTeX templates, and more.&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;www.overleaf.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;에러 해결을 위해 아래 글들을 참고하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1660132837031&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;What are underfull hboxes and vboxes and how can I get rid of them?&quot; data-og-description=&quot;Often, TeX outputs underfull hbox and vbox warnings when running and in the generated log file. What are these and how can I get rid of them?&quot; data-og-host=&quot;tex.stackexchange.com&quot; data-og-source-url=&quot;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&quot; data-og-url=&quot;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/IiByy/hyPorgjZny/zEttBwCtk39DcZTj3BfpFK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316&quot;&gt;&lt;a href=&quot;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://tex.stackexchange.com/questions/138/what-are-underfull-hboxes-and-vboxes-and-how-can-i-get-rid-of-them&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/IiByy/hyPorgjZny/zEttBwCtk39DcZTj3BfpFK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;What are underfull hboxes and vboxes and how can I get rid of them?&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Often, TeX outputs underfull hbox and vbox warnings when running and in the generated log file. What are these and how can I get rid of them?&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;tex.stackexchange.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1660132854341&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding)&quot; data-og-description=&quot;When loading the hyperref package like: \usepackage[bookmarks,bookmarksnumbered]{hyperref} \hypersetup{colorlinks = true,linkcolor = blue,anchorcolor =red,citecolor = blue,filecolor = red,urlcolor...&quot; data-og-host=&quot;tex.stackexchange.com&quot; data-og-source-url=&quot;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&quot; data-og-url=&quot;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/csLxrf/hyPokVNPt3/IXccSiwwS0sKJmXJqLHOZK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316&quot;&gt;&lt;a href=&quot;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://tex.stackexchange.com/questions/504814/package-hyperref-warning-token-not-allowed-in-a-pdf-string-pdfdocencoding&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/csLxrf/hyPokVNPt3/IXccSiwwS0sKJmXJqLHOZK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;When loading the hyperref package like: \usepackage[bookmarks,bookmarksnumbered]{hyperref} \hypersetup{colorlinks = true,linkcolor = blue,anchorcolor =red,citecolor = blue,filecolor = red,urlcolor...&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;tex.stackexchange.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Latex 환경으로 논문을 작성하다보면 warning 메세지를 자주 만나게 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;일반적으로 이러한 warning 메세지는 pdf로 컴파일하는데 문제가 없어 무시하는 경우가 많다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만, 간혹 저널을 제출하려고 할 때 해당 시스템에서 warning 메세지까지 문제로 삼는 경우가 있어 이를 해결해야 할 필요가 있을 때가 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 그림을 보면, 나의 경우 log창을 펼쳤을 때 이렇게 많은 warning 메세지들이 존재함을 확인 할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;976&quot; data-origin-height=&quot;910&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/eztchl/btrJm8BEM8Q/uu4qFUvbm5acTIGXEHwrg1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/eztchl/btrJm8BEM8Q/uu4qFUvbm5acTIGXEHwrg1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/eztchl/btrJm8BEM8Q/uu4qFUvbm5acTIGXEHwrg1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Feztchl%2FbtrJm8BEM8Q%2Fuu4qFUvbm5acTIGXEHwrg1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;976&quot; height=&quot;910&quot; data-origin-width=&quot;976&quot; data-origin-height=&quot;910&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 글에서는 이러한 warning들을 어떻게 해결했는지를 정리하고자 한다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;1.&amp;nbsp;Package hyperref Warning:~~~&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나의 경우, 아래와 같은 3개의 warning 메세지가 같은 라인에서 발생했다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Package hyperref Warning: Token not allowed in a PDF string (Unicode): removing `\&amp;lt;def&amp;gt;-command' on input line&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Package hyperref Warning: Token not allowed in a PDF string (Unicode): removing `\cnotenum' on input line&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Package hyperref Warning: Token not allowed in a PDF string (Unicode): removing `\@corref' on input line&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;발생한 라인을 가보면 줄바꿈을 위해 공백으로 비워둔 라인인데 처음에는 이것 자체가 문제라고 생각하여 공백 라인 관련해서 해결해보려는 시도를 하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;하지만, 아무리 해봐도 해결이 되지 않았고 이 라인 자체가 문제가 있는 것이 아님을 알 수 있었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;해결을 위해서는 .tex 파일 상단에 보통 \begin{document} 하기 전에 \usepackage를 통해 필요한 패키지들을 정의하는 부분에 아래 라인들을 추가하면 된다.&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1660133655387&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;\usepackage[bookmarks,bookmarksnumbered]{hyperref}
\hypersetup{colorlinks = true, linkcolor = blue, anchorcolor = red, citecolor = blue, filecolor = red, urlcolor = red,
            pdfauthor=author}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하이퍼링크 관련해서 세팅을 할 때 pdfauthor=author 를 추가해줘야 하는 것이 핵심이다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;2. Underfull \hbox ~~~, Overfull \hbox ~~~, Underfull \vbox ~~~, Overfull \vbox ~~~&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나의 경우, 아래 메세지들을 비롯해 이와 관련해서 많은 메세지들이 발생하고 있었다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Underfull \hbox (badness 10000) in paragraph at lines&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;underfull, overfull 에러는 warning 메세지까지는 아니더라도 어쨌든 문제가 발생한 부분이니 해결이 필요하다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해결을 위해서는 1번 문제와 마찬가지로 &lt;span&gt;.tex 파일 상단에 보통&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;\begin{document} 하기 전에&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;\usepackage를 통해 필요한 패키지들을 정의하는 부분에 아래 라인들을 추가하면 된다.&lt;/p&gt;
&lt;pre id=&quot;code_1660133939935&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;\hbadness=99999  % or any number &amp;gt;=10000
\vbadness=99999  % or any number &amp;gt;=10000
\hfuzz=20pt&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;underfull hbox 에러를 해결하려면, \hbadness&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;underfull vbox 에러를 해결하려면, \vbadness&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;overfull hbox &lt;span&gt;에러를 해결하려면,&lt;span&gt; \hfuzz&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;overfull vbox&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;에러를 해결하려면,&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;\vfuzz&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;값을 설정해주면 되고, 설정 할 때는 메세지에서 뜬 값보다 더 큰 값으로 설정해주면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;참고로 나의 경우 hfuzz도 처음에는 99999로 설정하였으나, 에러 메세지가 떠서 메세지의 뜬 값보다 약간 더 큰 값인 20으로 설정하였다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위와 같이 해결하였을 때, 아래와 같이 warning메세지들이 대부분 사라진 것을 확인 할 수 있었다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;135&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2lgii/btrJqmS09HK/6lAXmOn0dlH99d3zA0lFGK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2lgii/btrJqmS09HK/6lAXmOn0dlH99d3zA0lFGK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2lgii/btrJqmS09HK/6lAXmOn0dlH99d3zA0lFGK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2lgii%2FbtrJqmS09HK%2F6lAXmOn0dlH99d3zA0lFGK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;989&quot; height=&quot;135&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;135&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;현재 남아있는 warning 메세지는 \end{frontmatter} 부분에서 발생하는데,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이는 아직 해결하지 못하였다.&lt;/p&gt;</description>
      <category>Programming/ETC</category>
      <category>LaTeX</category>
      <category>WARNING</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/62</guid>
      <comments>https://young-square.tistory.com/62#entry62comment</comments>
      <pubDate>Wed, 10 Aug 2022 21:31:17 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] Video Compression based on Jointly Learned Down-Sampling and Super-Resolution Networks</title>
      <link>https://young-square.tistory.com/57</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘 살펴 볼 논문은 2021년 VCIP에서 발표된 논문으로, 제목은 'Video Compression based on Jointly Learned Down-Sampling and Super-Resolution Networks' [1] 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;down-sampling과 super-resolution 기반의 video coding 논문이며 앞서 리뷰한 논문인 '&lt;span style=&quot;color: #555555;&quot;&gt;RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding&lt;/span&gt;' [2] 를 비교 논문으로 하여 실험 결과를 낸 바가 있어 리뷰해보려고 한다.&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RR-DnCNN v2.0 논문에 대한 리뷰는 아래 글을 참고바란다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://young-square.tistory.com/56&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2022.06.16 - [Research/Deep Video Coding] - [논문 리뷰] RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1655377351205&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;[논문 리뷰] RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding&quot; data-og-description=&quot;오늘 살펴 볼 논문은 'RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding' 라는 제목으로 2021년 IEEE Transactions on Image Processing에 실린 논..&quot; data-og-host=&quot;young-square.tistory.com&quot; data-og-source-url=&quot;https://young-square.tistory.com/56&quot; data-og-url=&quot;https://young-square.tistory.com/56&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/f452j/hyONSDSEH7/98EiCyKPHfRAn3vS49KlIK/img.png?width=800&amp;amp;height=432&amp;amp;face=0_0_800_432,https://scrap.kakaocdn.net/dn/cihiUm/hyOMNxqmka/1K4IL6UrRCGPV7W4SYskHK/img.png?width=800&amp;amp;height=432&amp;amp;face=0_0_800_432,https://scrap.kakaocdn.net/dn/yZjtv/hyOMLTWmft/9PYf9kYJjAoJM6IDrJN101/img.png?width=1596&amp;amp;height=863&amp;amp;face=0_0_1596_863&quot;&gt;&lt;a href=&quot;https://young-square.tistory.com/56&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://young-square.tistory.com/56&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/f452j/hyONSDSEH7/98EiCyKPHfRAn3vS49KlIK/img.png?width=800&amp;amp;height=432&amp;amp;face=0_0_800_432,https://scrap.kakaocdn.net/dn/cihiUm/hyOMNxqmka/1K4IL6UrRCGPV7W4SYskHK/img.png?width=800&amp;amp;height=432&amp;amp;face=0_0_800_432,https://scrap.kakaocdn.net/dn/yZjtv/hyOMLTWmft/9PYf9kYJjAoJM6IDrJN101/img.png?width=1596&amp;amp;height=863&amp;amp;face=0_0_1596_863');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;[논문 리뷰] RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;오늘 살펴 볼 논문은 'RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding' 라는 제목으로 2021년 IEEE Transactions on Image Processing에 실린 논..&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;young-square.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Introduction&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;최근 제안된 많은 딥러닝 기반의 super-resolution (SR) 모델을 활용한 down-sampling-based video coding 연구들은 bicubic과 같은 predefined down-sampling을 사용하기 때문에 low-resolution (LR) 프레임을 adaptive하게 얻을 수 없다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, down-sampling-based video coding의 기본 구조를 생각해보면, 만약 down-sampling과 up-sampling을 각각 network를 구성하여 학습시킨다고 할 때, 이들 사이에 코덱의 인코딩, 디코딩 모듈이 위치해있고 코덱 모듈은 non-differentiability하다. 즉, 코덱 모듈로는 back-propagation을 할 수 없다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이러한 경우, down-sampling과 up-sampling을 함께 joint 하여 optimization을 하기에 어려움이 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는, 딥러닝 기반의 down-sampling network를 설계 및 학습시키고 virtual codec network를 설계 및 학습하여 위 두 가지 문제를 해결했다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Proposed Method&lt;/h3&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1714&quot; data-origin-height=&quot;719&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/brGoQW/btrFiS22Y3Y/BMiMl44JpiQ98JMKk7NL9k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/brGoQW/btrFiS22Y3Y/BMiMl44JpiQ98JMKk7NL9k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/brGoQW/btrFiS22Y3Y/BMiMl44JpiQ98JMKk7NL9k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbrGoQW%2FbtrFiS22Y3Y%2FBMiMl44JpiQ98JMKk7NL9k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1714&quot; height=&quot;719&quot; data-origin-width=&quot;1714&quot; data-origin-height=&quot;719&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 그림은 본 논문에서 제안하는 모델의 전체 구조를 보여준다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;high-resolution (HR) 프레임이 down-sampling network (DSN) 을 통해 low-resolution (LR) 프레임이 되며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 LR 프레임은 인코더와 디코더를 거쳐 압축에 의한 손실이 발생한 reconstructed LR이 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Inference 단계에서는 이렇게 얻은 decoded LR (DLR)이 super-resolution network (SRN)의 입력으로 들어가 SR 프레임을 생성하지만,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Training 단계에서는 코덱에서 끊기지 않고 end-to-end로 DSN과 SRN을 학습시키기 위해 virtual codec network (VCN)을 구축해 기존의 코덱을 대체하도록 구성했다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때, 학습의 안정화를 위해 VCN은 DSN+SRN과 번갈아가며 파라미터를 업데이트 시켜 학습시켰다. 즉, DSN+SRN을 학습시킬 때는 VCN의 파라미터를 고정시키고, VCN을 학습 시킬 때는 DSN+SRN의 파라미터를 고정시켰다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 비디오 코덱에서는 기본적으로 최적의 모드를 결정하기 위해 rate distortion (RD) cost를 계산해 optimization 시키는 방식을 사용한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여기서 rate는 bitrate를 의미하고 distortion은 원본과 복원된 영상의 차이를 나타내므로 픽셀 기반의 에러값이라고 생각하면된다. 따라서, 데이터의 양과 화질의 trade-off를 고려하여 모드를 결정짓는다고 이해하면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는 현재 풀고자 하는 문제도 아래와 같은 RD function으로 표현했다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1532&quot; data-origin-height=&quot;105&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vV1ge/btrFi94yKNo/xNjjY0SKnOVIhF7hIrFnjk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vV1ge/btrFi94yKNo/xNjjY0SKnOVIhF7hIrFnjk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vV1ge/btrFi94yKNo/xNjjY0SKnOVIhF7hIrFnjk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvV1ge%2FbtrFi94yKNo%2FxNjjY0SKnOVIhF7hIrFnjk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;48&quot; data-origin-width=&quot;1532&quot; data-origin-height=&quot;105&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여기서, f()는 DSN, g()는 SRN을 나타낸다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는, 앞에서도 언급했듯이 코덱을 virtual하게 대체하여 학습시킬 수 있도록 하면 모델을 전체적으로 joint하여 학습시킬 수 있으므로 위 수식을 아래와 같이 대체한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1507&quot; data-origin-height=&quot;137&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cHMYAF/btrFjlX0SNb/koN2oXwIDo94mN7cIkPJck/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cHMYAF/btrFjlX0SNb/koN2oXwIDo94mN7cIkPJck/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cHMYAF/btrFjlX0SNb/koN2oXwIDo94mN7cIkPJck/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcHMYAF%2FbtrFjlX0SNb%2FkoN2oXwIDo94mN7cIkPJck%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;706&quot; height=&quot;64&quot; data-origin-width=&quot;1507&quot; data-origin-height=&quot;137&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;각 네트워크에 대한 구조는 아래 그림과 같다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여기서 Invert Pixelshuffle은 Space-to-Depth 연산을 말하는 것이며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SRN은 EDSR [3] 베이스라인 구조를 활용했다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;895&quot; data-origin-height=&quot;879&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/budOcn/btrFhBU6Qtx/dDKkPft4j5560sKH2ylm91/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/budOcn/btrFhBU6Qtx/dDKkPft4j5560sKH2ylm91/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/budOcn/btrFhBU6Qtx/dDKkPft4j5560sKH2ylm91/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbudOcn%2FbtrFhBU6Qtx%2FdDKkPft4j5560sKH2ylm91%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;559&quot; height=&quot;549&quot; data-origin-width=&quot;895&quot; data-origin-height=&quot;879&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;VCN의 InvBlock 구조는 아래와 같다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 그림에서 앞 InvBlock 2개는 Forward, Quantization은 Round, 뒤 InvBlock 2개는 Backward라고 보면 되며, 간략하게 인코딩, 디코딩 구조를 표현한 것이라고 이해 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;참고로, 그림을 PixelShuffle로 표현했지만, 개인적으로 뒤에서 f1, f2로 feature channel level에서 나눈다는 점에서 Invert PixelShuffle이라고 해석하는 게 더 맞을 것 같다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 그림의 내부 구조는 수식 (3), (4)에서 동일하게 표현하고 있다. 전제적으로 affine transformation 구조를 이루고 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1095&quot; data-origin-height=&quot;529&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bnSuJl/btrFiUl3JhL/ZcmIirb4MwCDCDInufIKGk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bnSuJl/btrFiUl3JhL/ZcmIirb4MwCDCDInufIKGk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bnSuJl/btrFiUl3JhL/ZcmIirb4MwCDCDInufIKGk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbnSuJl%2FbtrFiUl3JhL%2FZcmIirb4MwCDCDInufIKGk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;293&quot; data-origin-width=&quot;1095&quot; data-origin-height=&quot;529&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1531&quot; data-origin-height=&quot;235&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bHAEyQ/btrFfZWsQ5e/fltTDNIZd4xMJDykN4fj2k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bHAEyQ/btrFfZWsQ5e/fltTDNIZd4xMJDykN4fj2k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bHAEyQ/btrFfZWsQ5e/fltTDNIZd4xMJDykN4fj2k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbHAEyQ%2FbtrFfZWsQ5e%2FfltTDNIZd4xMJDykN4fj2k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;619&quot; height=&quot;95&quot; data-origin-width=&quot;1531&quot; data-origin-height=&quot;235&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1521&quot; data-origin-height=&quot;206&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/crpk8i/btrFhBU7JOT/YypyAK89FbkPymrAkpDhM0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/crpk8i/btrFhBU7JOT/YypyAK89FbkPymrAkpDhM0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/crpk8i/btrFhBU7JOT/YypyAK89FbkPymrAkpDhM0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcrpk8i%2FbtrFhBU7JOT%2FYypyAK89FbkPymrAkpDhM0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;653&quot; height=&quot;88&quot; data-origin-width=&quot;1521&quot; data-origin-height=&quot;206&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서 구조화한 loss function은 아래와 같다. 우선, distortion을 맡고 있는 loss function은 수식 (5)와 같다. 실제 codec, virtual codec에 의한 결과물 각각에 대하여 MSE함수 기반의 loss를 구해 더해준 형태이다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1538&quot; data-origin-height=&quot;214&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bH76q5/btrFjyJO2c5/hDjTMv1FN8FAicESd5TBI0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bH76q5/btrFjyJO2c5/hDjTMv1FN8FAicESd5TBI0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bH76q5/btrFjyJO2c5/hDjTMv1FN8FAicESd5TBI0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbH76q5%2FbtrFjyJO2c5%2FhDjTMv1FN8FAicESd5TBI0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;717&quot; height=&quot;100&quot; data-origin-width=&quot;1538&quot; data-origin-height=&quot;214&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;rate를 맡고 있는 loss function은 수식 (6)와 같다. 이러한 bitrate loss를 위해서는 estimated entopy of LR을 구하는 방식으로 활용한다고 한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1536&quot; data-origin-height=&quot;130&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MFZKT/btrFi9Kj3R9/zOID8upo5ZFrNgAIcC6uNk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MFZKT/btrFi9Kj3R9/zOID8upo5ZFrNgAIcC6uNk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MFZKT/btrFi9Kj3R9/zOID8upo5ZFrNgAIcC6uNk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMFZKT%2FbtrFi9Kj3R9%2FzOID8upo5ZFrNgAIcC6uNk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;683&quot; height=&quot;58&quot; data-origin-width=&quot;1536&quot; data-origin-height=&quot;130&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, regularization term으로써 아래 수식 (7)을 활용한다. 즉, 일반적인 bicubic에 의해 생성된 LR과 DSN을 통해 생성된 LR 사이의 loss를 구한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1534&quot; data-origin-height=&quot;116&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ITYEb/btrFkxKrmi9/ZHLuZc4NleDwKrEGwblPDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ITYEb/btrFkxKrmi9/ZHLuZc4NleDwKrEGwblPDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ITYEb/btrFkxKrmi9/ZHLuZc4NleDwKrEGwblPDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FITYEb%2FbtrFkxKrmi9%2FZHLuZc4NleDwKrEGwblPDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;648&quot; height=&quot;49&quot; data-origin-width=&quot;1534&quot; data-origin-height=&quot;116&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위에서 언급한 모든 loss들을 각각의 weight를 사용하여 더하여 최종 loss function을 구성한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1539&quot; data-origin-height=&quot;105&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/z1v73/btrFkxjphXB/9zkxHgXSblM4EfR9q8HfqK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/z1v73/btrFkxjphXB/9zkxHgXSblM4EfR9q8HfqK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/z1v73/btrFkxjphXB/9zkxHgXSblM4EfR9q8HfqK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fz1v73%2FbtrFkxjphXB%2F9zkxHgXSblM4EfR9q8HfqK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;714&quot; height=&quot;49&quot; data-origin-width=&quot;1539&quot; data-origin-height=&quot;105&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;추가적으로, 앞에서 언급한 바 있듯이 VCN은 개별로 학습되므로 아래와 같은 개별 loss function을 구성한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1530&quot; data-origin-height=&quot;91&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/XrhXK/btrFhAIGPru/iMRuQpKUr33mywcwz41ZW1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/XrhXK/btrFhAIGPru/iMRuQpKUr33mywcwz41ZW1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/XrhXK/btrFhAIGPru/iMRuQpKUr33mywcwz41ZW1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FXrhXK%2FbtrFhAIGPru%2FiMRuQpKUr33mywcwz41ZW1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;691&quot; height=&quot;41&quot; data-origin-width=&quot;1530&quot; data-origin-height=&quot;91&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Experiments&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사용된 HEVC 버전은 16.20이며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;All Intra (AI), QP 37로 세팅하여 학습 데이터를 만들어 학습시켰다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때 학습에 사용된 데이터셋은 Vimeo-90k 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;논문에 삽입된 결과는 A, B, C, E CTC 시퀀스 클래스에 대한 결과이며, 사용된 QP range는 {32, 37, 42, 47} 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, Y에 대한 결과만 보여주고 있다.&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 표를 보면, RR-DnCnn v2.0과 비교하여 LDP와 RA의 A 클래스를 제외한 나머지 모든 클래스에서 더 좋은 BD-rate 결과를 보이고 있음을 확인 할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1727&quot; data-origin-height=&quot;774&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cdNhxg/btrFiRwnVsI/W1R0jW5RQfy5eD3Iq3DEM1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cdNhxg/btrFiRwnVsI/W1R0jW5RQfy5eD3Iq3DEM1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cdNhxg/btrFiRwnVsI/W1R0jW5RQfy5eD3Iq3DEM1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcdNhxg%2FbtrFiRwnVsI%2FW1R0jW5RQfy5eD3Iq3DEM1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1727&quot; height=&quot;774&quot; data-origin-width=&quot;1727&quot; data-origin-height=&quot;774&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;References&lt;/h3&gt;
&lt;blockquote data-ke-style=&quot;style3&quot;&gt;[1] Wei, Yuzhuo, Li Chen, and Li Song. &quot;Video Compression based on Jointly Learned Down-Sampling and Super-Resolution Networks.&quot; 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2021.&lt;br /&gt;[2] Ho, Man M., Jinjia Zhou, and Gang He. &quot;RR-DnCNN v2. 0: enhanced restoration-reconstruction deep neural network for down-sampling-based video coding.&quot; IEEE Transactions on Image Processing 30 (2021): 1702-1715.&lt;br /&gt;[3] Lim, Bee, et al. &quot;Enhanced deep residual networks for single image super-resolution.&quot; Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.&lt;/blockquote&gt;</description>
      <category>Research/Neural Network Video Coding</category>
      <category>down-sampling</category>
      <category>HEVC</category>
      <category>Super-Resolution</category>
      <category>video-coding</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/57</guid>
      <comments>https://young-square.tistory.com/57#entry57comment</comments>
      <pubDate>Mon, 20 Jun 2022 19:35:18 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding</title>
      <link>https://young-square.tistory.com/56</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;오늘 살펴 볼 논문은 'RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling-Based Video Coding' 라는 제목으로 2021년&lt;i&gt; IEEE Transactions on Image Processing&lt;/i&gt;에 실린 논문 [1] 이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;해당 논문은 동일 저자들이 2020년&lt;i&gt; Springer MultiMedia Modeling&lt;/i&gt; 에 출판한 'Down-sampling based video coding with degradation-aware restoration-reconstruction deep neural network' 라는 제목의 논문 [2] 을 개선한 version 2 연구라고 볼 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;Introduction&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;Video coding에 있어서 핵심은 bitrate를 최대한 줄이면서 화질적인 퀄리티는 유지하는 것이다.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;따라서, down-sampling 기반의 video coding은 bitrate를 크게 줄일 수 있으므로 전체 코딩 효율에 큰 기여를 할 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;다시 말해, 만약 1920x1080 해상도의 영상을 960x540 해상도의 영상으로 down-sampling 시킨다면, 영상 자체의 크기를 단순히 생각했을 때 비트를 기존의 약 4분의 1 정도로 줄일 수 있으므로 그 차이는 매우 크다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;이러한 장점을 기반으로, 많은 down-sampling 기반의 video coding 연구들이 활발히 진행 중에 있으며 해당 논문 또한 그 중 하나이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;하지만, down-sampling을 수행하게 되면 당연하게도 손실되는 화소 데이터가 많다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;이렇게 너무 많은 손실이 발생하면, 원본으로 되돌리기 훨씬 어려워지므로 결국 영상의 화질이 크게 저하될 수 밖에 없는데&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;비트를 아무리 많이 줄이더라도 화질 저하가 그만큼 크면 bitrate와 dirtortion 사이의 tradeoff를 고려해보았을 때 오히려 전체 코딩 효율이 떨어질 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;따라서, 딥러닝 기반의 super-resolution (SR) 기법을 적용해 up-sampling 단계에서 저하된 화질을 개선하면서 해상도를 높일 수 있도록 많은 연구들이 최근 진행되고 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;아래 그림은 이러한 연구들의 기본 구조를 표현하고 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;원본 영상을 down-sampling 한 후 기존 표준 코덱을 통해 encoding, decoding을 수행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;decoding 이후에 얻은 decoded low-resolution (DLR)은 기본적인 bicubic, bilinear을 사용하여 up-sampling 될 수 있지만,&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;화질을 더 개선하기 위하여 최근 이 과정에 딥러닝 기반의 super-resolution 모델을 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;최종적으로, 원본 해상도와 동일한 reconstructed 영상을 얻을 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1596&quot; data-origin-height=&quot;863&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cL11S8/btrEYGVbeRw/B6kZHgsIjdihhen7qJ95R1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cL11S8/btrEYGVbeRw/B6kZHgsIjdihhen7qJ95R1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cL11S8/btrEYGVbeRw/B6kZHgsIjdihhen7qJ95R1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcL11S8%2FbtrEYGVbeRw%2FB6kZHgsIjdihhen7qJ95R1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1596&quot; height=&quot;863&quot; data-origin-width=&quot;1596&quot; data-origin-height=&quot;863&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;Proposed Video Coding System&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;해당 논문에서 제안한 RR-DnCNN v2.0의 모델 구조는 아래와 같다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1651&quot; data-origin-height=&quot;447&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/KdjHb/btrETACpS5j/jVQLafKyesd0eEYmCytpxk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/KdjHb/btrETACpS5j/jVQLafKyesd0eEYmCytpxk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/KdjHb/btrETACpS5j/jVQLafKyesd0eEYmCytpxk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FKdjHb%2FbtrETACpS5j%2FjVQLafKyesd0eEYmCytpxk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1651&quot; height=&quot;447&quot; data-origin-width=&quot;1651&quot; data-origin-height=&quot;447&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;모델 구조를 보면, 크게 Restoration Network, Reconstruction Network 두 개 파트로 나뉜다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;일반적인 SR과는 다르게, video coding을 위한 SR은 풀어야 하는 문제가 좀 더 복잡해진다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;이말은 즉, down-sampling에 의해 발생한 degradation만 고려할 것이 아닌 compression에 의한 degradation까지 풀어야 할 숙제가 된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;그러므로 이러한 compression degradation을 어떻게 해결하는 지가 관련 연구의 핵심이라 할 수 있는데&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;본 논문에서 제안한 모델은 restoration에서 compression degradation을 해결하고, reconstruction을 통해 SR을 수행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;우선, bicubic interpolation을 통해 high-resolution (HR) 을 down-sampling하여 low-resolution (LR)을 만든다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;LR은 기존 표준 HEVC 인코더, 디코더를 거쳐 decoded LR (DLR)이 되고,&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;DLR은 Restoration network에 들어가 compression에 의한 missing information을 restore 할 수 있도록 학습된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;본 논문에서는 up-sampling skip-connection 을 통해 restoration의 hidden feature 들을 Reconstruction network에 전달하여 더하도록 구조화하였으며 이러한 구조는 U-Net [3] 으로부터 영감을 받았다고 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;아래 그림은 각각의 up-sampling skip-connection이 어떻게 이루어지고 있는지를 보여주고 있으며 feature를 up-sampling 시킬 때는 deconvolution을 사용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;777&quot; data-origin-height=&quot;701&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MlWQ3/btrEWpUP2Br/M3oeGfT2hr67JzY5YVNCu0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MlWQ3/btrEWpUP2Br/M3oeGfT2hr67JzY5YVNCu0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MlWQ3/btrEWpUP2Br/M3oeGfT2hr67JzY5YVNCu0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMlWQ3%2FbtrEWpUP2Br%2FM3oeGfT2hr67JzY5YVNCu0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;525&quot; height=&quot;474&quot; data-origin-width=&quot;777&quot; data-origin-height=&quot;701&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;전체 구조를 수식으로 표현하면 아래와 같다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;h는 본 논문에서 제안한 RR-DnCNN v2.0을 말하며, Rres, Rrec은 각 network의 output residual을 말한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;893&quot; data-origin-height=&quot;73&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rOUUp/btrEXVrQZJL/6EJxun1mN2Wi8A5fLoxjX0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rOUUp/btrEXVrQZJL/6EJxun1mN2Wi8A5fLoxjX0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rOUUp/btrEXVrQZJL/6EJxun1mN2Wi8A5fLoxjX0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrOUUp%2FbtrEXVrQZJL%2F6EJxun1mN2Wi8A5fLoxjX0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;688&quot; height=&quot;56&quot; data-origin-width=&quot;893&quot; data-origin-height=&quot;73&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;895&quot; data-origin-height=&quot;64&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wN6Km/btrEXO7CphM/ROnKKhbx3nkqfkSunmVfSK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wN6Km/btrEXO7CphM/ROnKKhbx3nkqfkSunmVfSK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wN6Km/btrEXO7CphM/ROnKKhbx3nkqfkSunmVfSK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwN6Km%2FbtrEXO7CphM%2FROnKKhbx3nkqfkSunmVfSK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;691&quot; height=&quot;49&quot; data-origin-width=&quot;895&quot; data-origin-height=&quot;64&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;900&quot; data-origin-height=&quot;65&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c6Qfci/btrEU2y2vhc/Vmzon5Cg0pSQr6d2TChAD0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c6Qfci/btrEU2y2vhc/Vmzon5Cg0pSQr6d2TChAD0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c6Qfci/btrEU2y2vhc/Vmzon5Cg0pSQr6d2TChAD0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc6Qfci%2FbtrEU2y2vhc%2FVmzon5Cg0pSQr6d2TChAD0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;693&quot; height=&quot;50&quot; data-origin-width=&quot;900&quot; data-origin-height=&quot;65&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;RR-DnCNN v2.0은 restoration net, reconstruction net 두 모듈의 output에 대한 MSE를 더한 Loss function을 사용하여 학습된다. 즉, restoration을 통해 개선된 LR과 원본을 bicubic을 통해 down-sampling 시킨 LR 사이의 MSE와 최종 reconstructed HR과 원본 HR 사이의 MSE를 아래 수식에 따라 더하여 활용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;771&quot; data-origin-height=&quot;51&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/I8fzW/btrEWHt72Ok/54z7tkte9K1ay4n6Fve741/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/I8fzW/btrEWHt72Ok/54z7tkte9K1ay4n6Fve741/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/I8fzW/btrEWHt72Ok/54z7tkte9K1ay4n6Fve741/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FI8fzW%2FbtrEWHt72Ok%2F54z7tkte9K1ay4n6Fve741%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;711&quot; height=&quot;47&quot; data-origin-width=&quot;771&quot; data-origin-height=&quot;51&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;실험적으로, 알파=0.5 , 베타=0.05 로 설정했다고 한다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;Experiments&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;우선, 아래 표를 보면 학습 데이터를 3가지 각각의 configuration 기반으로 만들고 이를 각각 테스트 해본 결과를 보여주는데 결과는 흥미롭다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;일반적으로 3가지 각각의 특징이 다르므로 (특히, AI의 경우 I-Slice만으로만 구성되어 있으므로) 학습데이터와 테스트데이터를 다르게 사용할 경우 좋지 않은 결과를 보일 것이라 예상하지만 아래 결과를 보면 사실 그 차이가 그렇게 크지 않다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;이 점이 우선 주목할만한 점이라 생각한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;데이터를 굳이 따로 만들지 않아도 한가지 configuration 만으로도 충분히 커버 가능하다는 것을 보여주는 것이기 때문이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: AppleSDGothicNeo-Regular, 'Malgun Gothic', '맑은 고딕', dotum, 돋움, sans-serif;&quot;&gt;결과를 더 자세히 살펴보면, 학습 데이터로 RA를 사용했을 때 RA와 AI에서 가장 높은 성능을 보인다는 것을 통해 AI의 경우에도 AI만으로 학습시키기보다는 좀 더 다양한 경우의 수가 있는 RA로 학습을 시키는게 그 차이가 크진 않지만 더 효과적이라는 것을 알 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;LDP의 경우, 나머지 두 개에서 worst 결과를 보이는 것으로 보아 LDP 자체에 최적화 되어 있다는 것을 확인 할 수 있다.&lt;/span&gt;&lt;span&gt;결론적으로, 이러한 실험을 통해 본 논문에서는 RA만으로 학습 데이터셋을 구축하여 학습시켰다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1174&quot; data-origin-height=&quot;649&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dF3hqT/btrEX1MrdNz/NCiQaZOXRlg6NZYWckJE80/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dF3hqT/btrEX1MrdNz/NCiQaZOXRlg6NZYWckJE80/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dF3hqT/btrEX1MrdNz/NCiQaZOXRlg6NZYWckJE80/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdF3hqT%2FbtrEX1MrdNz%2FNCiQaZOXRlg6NZYWckJE80%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;336&quot; data-origin-width=&quot;1174&quot; data-origin-height=&quot;649&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;RR-DnCNN v2.0은 아래와 같은 two stage training 과정을 통해 학습되었다. 즉, CTC Sequence로 표현하자면 D 클래스 크기의 영상들로 학습 시킨 후 best model로 초기화하여 B 클래스 크기의 영상들로 Fine-tunning 시키는 방식을 택했다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;1) First stage : &lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;CIF (352x288) 크기의 영상 34개 (18,478 frames) 로 구성된 Xiph Video Test Media + 416x240 크기를 352x288로 resize한 CTC D class 영상 (1,912 frames)&lt;/li&gt;
&lt;li&gt;HR : 352x288&lt;/li&gt;
&lt;li&gt;LR : 176x144&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2) Second stage&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Fine-tune the best model from the&amp;nbsp;first&amp;nbsp;stage&lt;/li&gt;
&lt;li&gt;UHD (3840x2160) 크기를 1920x1152 크기로 down-sampling한 영상 11개 (3,300 frames) 로 구성된 SJTU&lt;/li&gt;
&lt;li&gt;HR : 1920x1152&lt;/li&gt;
&lt;li&gt;LR : 960x576&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 표를 보면, two stage training을 수행 했을 때 더 높은 결과를 얻을 수 있음을 확인 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그리고, 본 연구에서는 Adam 대신 RAdam을 optimizer로 사용했고 더 높은 성능을 얻었다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1182&quot; data-origin-height=&quot;761&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Eb93F/btrEXkMBHNN/A56K3GvznkI7xA4QB85NnK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Eb93F/btrEXkMBHNN/A56K3GvznkI7xA4QB85NnK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Eb93F/btrEXkMBHNN/A56K3GvznkI7xA4QB85NnK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FEb93F%2FbtrEXkMBHNN%2FA56K3GvznkI7xA4QB85NnK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;593&quot; height=&quot;382&quot; data-origin-width=&quot;1182&quot; data-origin-height=&quot;761&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 표 2개는 RA, LDP, RA 각각에 대하여 BD-rate를 비교한 것을 보여주며&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때 '[3]' 연구는 compression degradation을 고려하지 않은 MISR 모델을 나타내고,&amp;nbsp; &amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;'[2]'와 '[4]' 연구는 down-sampling 기반의 video coding을 intra prediction에 맞게 제안한 연구들을 나타낸다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #666666;&quot;&gt;'[2]' Y. Li et al., &amp;ldquo;Convolutional neural network-based block up-sampling for intra frame coding,&amp;rdquo; IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 9, pp. 2316&amp;ndash;2330, Sep. 2018, doi: 10.1109/TCSVT.2017.2727682.&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;'[3]' J. Lin, D. Liu, H. Yang, H. Li, and F. Wu, &amp;ldquo;Convolutional neural network-based block up-sampling for HEVC,&amp;rdquo; IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 12, pp. 3701&amp;ndash;3715, Dec. 2019.&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #666666;&quot;&gt;'[4]'&amp;nbsp;Y.&amp;nbsp;Li,&amp;nbsp;D.&amp;nbsp;Liu,&amp;nbsp;H.&amp;nbsp;Li,&amp;nbsp;L.&amp;nbsp;Li,&amp;nbsp;Z.&amp;nbsp;Li,&amp;nbsp;and&amp;nbsp;F.&amp;nbsp;Wu,&amp;nbsp;&amp;ldquo;Learning&amp;nbsp;a&amp;nbsp;convolutional&amp;nbsp;neural&amp;nbsp;network&amp;nbsp;for&amp;nbsp;image&amp;nbsp;compact-resolution,&amp;rdquo;&amp;nbsp;IEEE&amp;nbsp;Trans.&amp;nbsp;Image&amp;nbsp;Process.,&amp;nbsp;vol.&amp;nbsp;28,&amp;nbsp;no.&amp;nbsp;3,&amp;nbsp;pp.&amp;nbsp;1092&amp;ndash;1107,&amp;nbsp;Mar.&amp;nbsp;2019,&amp;nbsp;doi:&amp;nbsp;10.1109/TIP.2018.2872876.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;기존 제안된 연구들과 비교하였을 때, &lt;/span&gt;RR-DnCNN v2.0이 특히 A클래스에서 좋은 성능을 보임을 알 수 있으며 RA C클래스를 제외하고 모든 시퀀스에서 평균적으로 HEVC Anchor와 비교하여 BD-rate gain을 얻고 있음을 확인 할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1596&quot; data-origin-height=&quot;821&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/6BawD/btrEZbVbmV3/vkdc51nGgzcwxR4FKCs9Kk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/6BawD/btrEZbVbmV3/vkdc51nGgzcwxR4FKCs9Kk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/6BawD/btrEZbVbmV3/vkdc51nGgzcwxR4FKCs9Kk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F6BawD%2FbtrEZbVbmV3%2Fvkdc51nGgzcwxR4FKCs9Kk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1596&quot; height=&quot;821&quot; data-origin-width=&quot;1596&quot; data-origin-height=&quot;821&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1514&quot; data-origin-height=&quot;897&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/muq6t/btrEZcmgmEt/WN3UkhQaGQkICSikApITs0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/muq6t/btrEZcmgmEt/WN3UkhQaGQkICSikApITs0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/muq6t/btrEZcmgmEt/WN3UkhQaGQkICSikApITs0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fmuq6t%2FbtrEZcmgmEt%2FWN3UkhQaGQkICSikApITs0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1514&quot; height=&quot;897&quot; data-origin-width=&quot;1514&quot; data-origin-height=&quot;897&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;References&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Ho, Man M., Jinjia Zhou, and Gang He. &quot;RR-DnCNN v2. 0: enhanced restoration-reconstruction deep neural network for down-sampling-based video coding.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;IEEE Transactions on Image Processing&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;30 (2021): 1702-1715.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[2] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Ho, Minh-Man, et al. &quot;Down-sampling based video coding with degradation-aware restoration-reconstruction deep neural network.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;International Conference on Multimedia Modeling&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. Springer, Cham, 2020.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[3] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. &quot;U-net: Convolutional networks for biomedical image segmentation.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;International Conference on Medical image computing and computer-assisted intervention&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. Springer, Cham, 2015.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Research/Neural Network Video Coding</category>
      <category>Super-Resolution</category>
      <category>Video Coding</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/56</guid>
      <comments>https://young-square.tistory.com/56#entry56comment</comments>
      <pubDate>Thu, 16 Jun 2022 14:41:28 +0900</pubDate>
    </item>
    <item>
      <title>[PyCharm][에러 해결] 디버깅 할 때 &amp;quot;Collecting data...&amp;quot; 라는 메세지만 뜨고 값을 볼 수 없을 때</title>
      <link>https://young-square.tistory.com/51</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;본 글은 아래 링크를 참고하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1652097894116&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Debugger times out at &amp;quot;Collecting data...&amp;quot;&quot; data-og-description=&quot;I am debugging a Python (3.5) program with PyCharm (PyCharm Community Edition 2016.2.2 ; Build #PC-162.1812.1, built on August 16, 2016 ; JRE: 1.8.0_76-release-b216 x86 ; JVM: OpenJDK Server VM by&quot; data-og-host=&quot;stackoverflow.com&quot; data-og-source-url=&quot;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&quot; data-og-url=&quot;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/buzdEY/hyOk5YNt3M/ywiFdEXTyas55prexMoAsK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316,https://scrap.kakaocdn.net/dn/dko8MW/hyOkSd43As/eDIvIXzSAZvrZQf8BUeJnK/img.png?width=1028&amp;amp;height=714&amp;amp;face=0_0_1028_714,https://scrap.kakaocdn.net/dn/Rgq4O/hyOkX0KwGs/NZjswL1QdmgwTgT5KI2O1K/img.png?width=397&amp;amp;height=437&amp;amp;face=0_0_397_437&quot;&gt;&lt;a href=&quot;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://stackoverflow.com/questions/39371676/debugger-times-out-at-collecting-data&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/buzdEY/hyOk5YNt3M/ywiFdEXTyas55prexMoAsK/img.png?width=316&amp;amp;height=316&amp;amp;face=0_0_316_316,https://scrap.kakaocdn.net/dn/dko8MW/hyOkSd43As/eDIvIXzSAZvrZQf8BUeJnK/img.png?width=1028&amp;amp;height=714&amp;amp;face=0_0_1028_714,https://scrap.kakaocdn.net/dn/Rgq4O/hyOkX0KwGs/NZjswL1QdmgwTgT5KI2O1K/img.png?width=397&amp;amp;height=437&amp;amp;face=0_0_397_437');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Debugger times out at &quot;Collecting data...&quot;&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;I am debugging a Python (3.5) program with PyCharm (PyCharm Community Edition 2016.2.2 ; Build #PC-162.1812.1, built on August 16, 2016 ; JRE: 1.8.0_76-release-b216 x86 ; JVM: OpenJDK Server VM by&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;stackoverflow.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;PyCharm으로 디버깅을 할 때, 각각의 값이 &quot;Collecting data...&quot; 메세지만 뜨고 값을 볼 수 없을 때가 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이러한 경우 아래 그림과 같이 File-Settings-Python Debugger 에 들어가서 &lt;b&gt;Gevent compatible&lt;/b&gt; 을 enable 시켜주면 해결된다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;984&quot; data-origin-height=&quot;702&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nwo72/btrBDm8NP51/mBKcPJlJxP9a6hKHLxzpzk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nwo72/btrBDm8NP51/mBKcPJlJxP9a6hKHLxzpzk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nwo72/btrBDm8NP51/mBKcPJlJxP9a6hKHLxzpzk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fnwo72%2FbtrBDm8NP51%2FmBKcPJlJxP9a6hKHLxzpzk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;984&quot; height=&quot;702&quot; data-origin-width=&quot;984&quot; data-origin-height=&quot;702&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>Programming/Python</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/51</guid>
      <comments>https://young-square.tistory.com/51#entry51comment</comments>
      <pubDate>Mon, 9 May 2022 21:05:40 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution</title>
      <link>https://young-square.tistory.com/50</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #555555;&quot;&gt;오늘 살펴 볼 논문은 2021년 ICCV에서 발표된 논문인 &quot;Mutual&amp;nbsp;Affine&amp;nbsp;Network&amp;nbsp;for&amp;nbsp;Spatially&amp;nbsp;Variant&amp;nbsp;Kernel&amp;nbsp;Estimation&amp;nbsp;in&amp;nbsp;Blind&amp;nbsp;Image&amp;nbsp;Super-Resolution&quot; [1] 이다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Introduction&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서 말하는 기존 연구들의 문제점은 크게 두 가지로 나뉜다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;대부분의 기존 SR method 들은 blur kernel이 이상적이며 고정된 경우라고 가정한다. 즉, 고정된 bucubic kernel을 사용하여 데이터셋 전체에 대해 down-sampling을 수행한 것을 모델의 입력으로 두기 때문에 이러한 이상적인 데이터에 한정되어 모델이 학습된다. 이를 다른 말로 표현하면 이상적인 경우를 벗어날 수 있는 여지가 있는 실제 환경에서는 효과적으로 적용되기 어렵다는 뜻이 된다.&lt;/li&gt;
&lt;li&gt;위와 같은 문제를 해결하기 위해, 구해야 할 kernel이 고정되어 있지 않은 blind SR에 대한 연구가 활발히 진행 중에 있다. 그러나, 기존 blind SR method 들은 kernel이 spatially invariant 한 경우라고 가정한다. 즉 이미지 한장의 전체 영역에 대하여 동일한 하나의 kernel을 예측한다는 것을 의미하는데 아래 그림과 같이 edge 영역과 flat한 영역은 분명히 차이가 있고 이들을 동일한 kernel로 해결하려는 것은 부정확한 결과물을 야기한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그러므로, spatially variant kernel들을 예측하는 것이 blind SR에 있어서 더 효과적이라고 할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;862&quot; data-origin-height=&quot;547&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cnwkeA/btrBFHYzlUg/lcmlIrqkvCWhFMXV0UEbX0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cnwkeA/btrBFHYzlUg/lcmlIrqkvCWhFMXV0UEbX0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cnwkeA/btrBFHYzlUg/lcmlIrqkvCWhFMXV0UEbX0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcnwkeA%2FbtrBFHYzlUg%2FlcmlIrqkvCWhFMXV0UEbX0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;594&quot; height=&quot;377&quot; data-origin-width=&quot;862&quot; data-origin-height=&quot;547&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Methodology&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 그림은 본 논문에서 제안한 &lt;span style=&quot;color: #000000;&quot;&gt;mutual affine network (&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;MANet&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;) 의 구조이다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1925&quot; data-origin-height=&quot;543&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/byeJGb/btrBDnNiFUw/ACYaDG9vividbOZ3C6e6gk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/byeJGb/btrBDnNiFUw/ACYaDG9vividbOZ3C6e6gk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/byeJGb/btrBDnNiFUw/ACYaDG9vividbOZ3C6e6gk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbyeJGb%2FbtrBDnNiFUw%2FACYaDG9vividbOZ3C6e6gk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1925&quot; height=&quot;543&quot; data-origin-width=&quot;1925&quot; data-origin-height=&quot;543&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델을 살펴보면, 기존의 Residual block의 convolution layer를 본 논문에서 제안하는 mutual affine convolution (MAConv)로 대체하였음을 확인 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그리고 Downsampling과 Upsampling을 활용하여 설계한 구조는 U-Net[2] 의 구조를 가져온 것이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;마지막 kernel reconstruction을 수행하는 구조를 보면, softmax를 활용함을 확인 할 수 있는데 본 네트워크의 출력이 이미지가 아닌 kernel이므로 출력부에 activation function을 추가한 것이라 할 수있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 그림은 본 논문에서 제안한 mutual affine convolution (MAConv)의 구조이다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1859&quot; data-origin-height=&quot;713&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ba4djo/btrBFdC19Ci/ZZTMXsmn9ADXX9YDYB4Bj0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ba4djo/btrBFdC19Ci/ZZTMXsmn9ADXX9YDYB4Bj0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ba4djo/btrBFdC19Ci/ZZTMXsmn9ADXX9YDYB4Bj0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fba4djo%2FbtrBFdC19Ci%2FZZTMXsmn9ADXX9YDYB4Bj0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1859&quot; height=&quot;713&quot; data-origin-width=&quot;1859&quot; data-origin-height=&quot;713&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선, 입력 feature는 channel에 대하여 정해진 개수로 나뉘어진다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;예를 들어, feature x의 channel이 64이고 나누려는 개수가 4라면, number of feature가 16인 feature 4개로 분리가 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;MAConv는 현재 feature를 제외한 나머지 feature들을 활용하여 affine transformation parameter를 구한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;즉, 다시 한번 예를 들면, feature map 개수가 16인 x1을 제외한 나머지의 number of feature는 48이라 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이러한 나머지 feature map들로 구한 parameter 베타와 감마는 각각 affine transform 수식에서 scale과 translation 담당한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모든 sub feature에 대하여 해당 affine transform이 수행 된 이후, 이들을 concat하여 출력을 낸다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델의 구조가 위와 같다고 할 때, 본 논문의 또 하나의 기여점은 loss function 또한 kernel 도메인으로 설계했다는 점이다. 아래와 같이 추론된 kernel을 적용한 이미지-이미지 간 loss를 구하는 것이 아니라 kernel 자체에 대한 loss를 구한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;987&quot; data-origin-height=&quot;385&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b6pajG/btrBwtNJHkM/s0PJdyolEBk9KBwV43ffvK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b6pajG/btrBwtNJHkM/s0PJdyolEBk9KBwV43ffvK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b6pajG/btrBwtNJHkM/s0PJdyolEBk9KBwV43ffvK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb6pajG%2FbtrBwtNJHkM%2Fs0PJdyolEBk9KBwV43ffvK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;619&quot; height=&quot;241&quot; data-origin-width=&quot;987&quot; data-origin-height=&quot;385&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Experiments&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사실 blind라 하면 kernel을 알 수 없는 완전한 real한 경우를 의미하긴 하나, data-driven 방식으로 모델을 학습시켜야 하므로 데이터를 임의로 만들어야 함은 변함이 없다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는 LR 데이터를 만들기 위하여 &lt;span style=&quot;color: #000000;&quot;&gt;21x21 anisotropic &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Gaussian &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;kernels 을 사용하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 때, k&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;ernel widths, rotation angle&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;을 지정한 범위 내에서 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;patch&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;마다 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;random&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;하게 적용하여 데이터를 구축하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;또한, 최종 결과를 비교하기 위해 본 논문은 기존 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;non-blind SR &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;모델인 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;RRDB-SFT[3,4]&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;에 제안하는 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;MANet&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;으로 추론된 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;kernel&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;을 적용하여 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;fine-&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;tunning&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;하는 방식으로 결합하여 구조화 하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1795&quot; data-origin-height=&quot;564&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdnvwU/btrBGftA6ai/JPjabY2GKLXXlrqlymwEbK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdnvwU/btrBGftA6ai/JPjabY2GKLXXlrqlymwEbK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdnvwU/btrBGftA6ai/JPjabY2GKLXXlrqlymwEbK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdnvwU%2FbtrBGftA6ai%2FJPjabY2GKLXXlrqlymwEbK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1795&quot; height=&quot;564&quot; data-origin-width=&quot;1795&quot; data-origin-height=&quot;564&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 연구들과의 비교 전에, 위 테이블은 본 논문에서 제안한 MAConv가 기존 convolution layer에 비해 얼마나 효과적인가를 보여준다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;#channel은 각 conv layer의 output feature의 channel 수를 의미하는데,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;동일한 #channel 끼리 ( 예를들어, [128,256,128] ) 비교를 하면 기존 plain conv, group conv 보다 높은 psnr, ssim 결과를 보임을 확인 할 수 있고, 심지어 plain conv는 MAConv보다 훨씬 더 많은 parameter를 필요로 한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아래 그림은 image loss보다 kernel loss가 kernel을 추론함에 있어서 더 효과적임을 보여주고, MAConv 개수가 2일때가 4일때보다 더 효과적임을 보여준다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1893&quot; data-origin-height=&quot;603&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/BOWjU/btrBHT4zW0J/Ay2sCDhEztyQzdNbmvlnzk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BOWjU/btrBHT4zW0J/Ay2sCDhEztyQzdNbmvlnzk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BOWjU/btrBHT4zW0J/Ay2sCDhEztyQzdNbmvlnzk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBOWjU%2FbtrBHT4zW0J%2FAy2sCDhEztyQzdNbmvlnzk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1893&quot; height=&quot;603&quot; data-origin-width=&quot;1893&quot; data-origin-height=&quot;603&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;아래 테이블들과 비주얼 비교 그림을 통해 기존 최신 연구들보다 더 성능이 좋음을 보여주고 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1405&quot; data-origin-height=&quot;928&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/tZgvj/btrBDn0VfVz/Y1GwlTFFXWC2DPXTfSLLkk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/tZgvj/btrBDn0VfVz/Y1GwlTFFXWC2DPXTfSLLkk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/tZgvj/btrBDn0VfVz/Y1GwlTFFXWC2DPXTfSLLkk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FtZgvj%2FbtrBDn0VfVz%2FY1GwlTFFXWC2DPXTfSLLkk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1405&quot; height=&quot;928&quot; data-origin-width=&quot;1405&quot; data-origin-height=&quot;928&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1572&quot; data-origin-height=&quot;907&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bbtbaU/btrBHVnOV2I/NKY1owMO9eKDoKmyX49mV1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bbtbaU/btrBHVnOV2I/NKY1owMO9eKDoKmyX49mV1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bbtbaU/btrBHVnOV2I/NKY1owMO9eKDoKmyX49mV1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbbtbaU%2FbtrBHVnOV2I%2FNKY1owMO9eKDoKmyX49mV1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1572&quot; height=&quot;907&quot; data-origin-width=&quot;1572&quot; data-origin-height=&quot;907&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1386&quot; data-origin-height=&quot;932&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c6p9J9/btrBED3hN51/xnprheKtyiOTx2dK2E8zdK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c6p9J9/btrBED3hN51/xnprheKtyiOTx2dK2E8zdK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c6p9J9/btrBED3hN51/xnprheKtyiOTx2dK2E8zdK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc6p9J9%2FbtrBED3hN51%2FxnprheKtyiOTx2dK2E8zdK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1386&quot; height=&quot;932&quot; data-origin-width=&quot;1386&quot; data-origin-height=&quot;932&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;References&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Liang, Jingyun, et al. &quot;Mutual affine network for spatially variant kernel estimation in blind image super-resolution.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF International Conference on Computer Vision&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2021.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[2] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. &quot;U-net: Convolutional networks for biomedical image segmentation.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;International Conference on Medical image computing and computer-assisted intervention&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. Springer, Cham, 2015.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[3] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Wang, Xintao, et al. &quot;Esrgan: Enhanced super-resolution generative adversarial networks.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the European conference on computer vision (ECCV) workshops&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2018.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[4] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Wang, Xintao, et al. &quot;Recovering realistic texture in image super-resolution by deep spatial feature transform.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE conference on computer vision and pattern recognition&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2018.&lt;/span&gt;&lt;/p&gt;</description>
      <category>Research/Super-Resolution</category>
      <category>blind super-resolution</category>
      <category>Manet</category>
      <category>Super-Resolution</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/50</guid>
      <comments>https://young-square.tistory.com/50#entry50comment</comments>
      <pubDate>Mon, 9 May 2022 16:22:53 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] SwinIR: Image Restoration Using Swin Transformer</title>
      <link>https://young-square.tistory.com/48</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘 살펴 볼 논문은 2021년 ICCV에서 발표된 논문인 &quot;SwinIR: Image Restoration Using Swin Transformer&quot; [1] 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SwinIR은 2022년 3월 현재 PapersWithCode 웹사이트 기준 Set5, Set14 데이터셋에서 1위를 차지하고 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://paperswithcode.com/task/image-super-resolution&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://paperswithcode.com/task/image-super-resolution&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1647424486189&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;Papers with Code - Image Super-Resolution&quot; data-og-description=&quot;In this task, we try to upsample the image and create the high resolution image with help of a low resolution image.&quot; data-og-host=&quot;paperswithcode.com&quot; data-og-source-url=&quot;https://paperswithcode.com/task/image-super-resolution&quot; data-og-url=&quot;https://paperswithcode.com/task/image-super-resolution&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/eiU9Og/hyNI3tsc9f/GMkx8hHbqUh8PWWlfFmrQ1/img.jpg?width=519&amp;amp;height=519&amp;amp;face=0_0_519_519&quot;&gt;&lt;a href=&quot;https://paperswithcode.com/task/image-super-resolution&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://paperswithcode.com/task/image-super-resolution&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/eiU9Og/hyNI3tsc9f/GMkx8hHbqUh8PWWlfFmrQ1/img.jpg?width=519&amp;amp;height=519&amp;amp;face=0_0_519_519');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Papers with Code - Image Super-Resolution&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;In this task, we try to upsample the image and create the high resolution image with help of a low resolution image.&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;paperswithcode.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SwinIR은 동일한 해인 2021년 ICCV에서 발표된 논문인 &quot;Swin Transformer: Hierarchical Vision Transformer using Shifted Windows&quot; [2] 의 아카이브 버전 공개 당시 해당 Swin Transformer 구조를 SISR 문제 해결에 활용하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본문으로 들어가기에 앞서, 본 논문에서는 다양한 태스크에 대하여 실험을 보이고 있으며 본 글에서 언급하는 정의된 값들은 'Classical image SR' 태스크에 해당함을 참고바란다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Network &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Architecture&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1392&quot; data-origin-height=&quot;577&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qZIUA/btrv3Qgeno9/RAwO5lXXiPzKUjNyecKavk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qZIUA/btrv3Qgeno9/RAwO5lXXiPzKUjNyecKavk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qZIUA/btrv3Qgeno9/RAwO5lXXiPzKUjNyecKavk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqZIUA%2Fbtrv3Qgeno9%2FRAwO5lXXiPzKUjNyecKavk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1392&quot; height=&quot;577&quot; data-origin-width=&quot;1392&quot; data-origin-height=&quot;577&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SwinIR의 전체적인 구조는 세 개 부분으로 나눌 수 있다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Shallow Feature Extraction&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;1개의 Conv layer (number of feature : 3 -&amp;gt; 180)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Deep Feature Extraction&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Based on Residual Swin Transformer Block&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;HQ Image Reconstruction&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Conv layer -&amp;gt; Depth-to-Space -&amp;gt; Conv layer&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Shallow Feature Extraction는 1개의 Conv layer를 통해 feature 개수를 늘린다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;즉, 이미지를 embedding 시키는 역할을 한다고 볼 수 있으며, 본 논문에서는 feature 개수를 180으로 지정하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Deep Feature Extraction은 Residual Swin Transformer Block (RSTB)에 기반한 Deep한 네트워크이며, 본 논문의 핵심이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;6개의 RSTB를 거친 후, Conv layer 1개를 거친 출력에 원본 feature (Shallow Feature Extraction 모듈의 출력) 를 더한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;HQ Image Reconstruction은 모든 Feature Extraction 과정이 끝난 후, 기본적인 SR을 위한 Reconstruction에 맞게 구조화되어 있다. 즉, Conv layer를 통해 원하는 크기의 출력 사이즈에 맞게 feature 개수를 조정하고, Depth-to-Space를 수행 한후, Conv layer를 통해 최종 이미지를 출력한다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Deep Feature Extraction&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Residual Swin Transformer Block (RSTB)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;659&quot; data-origin-height=&quot;234&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/k9hut/btrv6XGaZFk/yzSmgcFtF9Gzcah7rpRt6K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/k9hut/btrv6XGaZFk/yzSmgcFtF9Gzcah7rpRt6K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/k9hut/btrv6XGaZFk/yzSmgcFtF9Gzcah7rpRt6K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fk9hut%2Fbtrv6XGaZFk%2FyzSmgcFtF9Gzcah7rpRt6K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;456&quot; height=&quot;162&quot; data-origin-width=&quot;659&quot; data-origin-height=&quot;234&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;992&quot; data-origin-height=&quot;178&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btQMEI/btrv6lsXvpg/aykhr9aeKZl6XKIyKIA9qk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btQMEI/btrv6lsXvpg/aykhr9aeKZl6XKIyKIA9qk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btQMEI/btrv6lsXvpg/aykhr9aeKZl6XKIyKIA9qk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbtQMEI%2Fbtrv6lsXvpg%2Faykhr9aeKZl6XKIyKIA9qk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;503&quot; height=&quot;90&quot; data-origin-width=&quot;992&quot; data-origin-height=&quot;178&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;각각의 RSTB는 6개의 Swin Transformer Layer (STL) 를 거친 후, Conv layer 1개를 거친 출력에 현재 RSTB의 첫 입력 feature를 다시 더해주는 구조를 가진다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Swin Transformer layer (STL)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;571&quot; data-origin-height=&quot;228&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cf9eGH/btrwb6VF26M/ppENfkDicKTx01AXv0UtlK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cf9eGH/btrwb6VF26M/ppENfkDicKTx01AXv0UtlK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cf9eGH/btrwb6VF26M/ppENfkDicKTx01AXv0UtlK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcf9eGH%2Fbtrwb6VF26M%2FppENfkDicKTx01AXv0UtlK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;401&quot; height=&quot;160&quot; data-origin-width=&quot;571&quot; data-origin-height=&quot;228&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;각각의 STL은 위와 같은 구조를 가진다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 글에서는 Normalization이나 Drop out 같은 부수적인 부분들은 설명에서 생략하고, 핵심 모듈들을 소스코드 관점 순서에 맞게 설명을 하도록 하겠다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;각 STL에서는 우선 Window Shift를 수행한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해당 부분의 피규어는 Swin Transformer 제안 논문 [2] 에서 가져와 설명하도록 하겠다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;506&quot; data-origin-height=&quot;193&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bCNjhg/btrwaFj12JG/kyY52PnFdytlEFKamCYK4k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bCNjhg/btrwaFj12JG/kyY52PnFdytlEFKamCYK4k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bCNjhg/btrwaFj12JG/kyY52PnFdytlEFKamCYK4k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbCNjhg%2FbtrwaFj12JG%2FkyY52PnFdytlEFKamCYK4k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;506&quot; height=&quot;193&quot; data-origin-width=&quot;506&quot; data-origin-height=&quot;193&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 그림처럼, 이미지는 x축, y축 각각에 대해 shift 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;SwinIR은 window size를 8x8로 지정하고 있으며, shift 크기는 [-(window_size//2), -(window_size//2)] 만큼 수행되어 결국 [-4, -4]만큼 shift된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, regular(shift를 수행하지 않음)와 shift를 번갈아가면서 수행하는 구조를 가지는데, SwinIR은 6개의 STL 레이어 중 0,2,4 번째 레이어는 regular, 1,3,5번째 레이어는 shift를 수행한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;attention을 완료한 이후에 reverse shift를 통해서 원래 자리로 되돌려 놓는다. ([4, 4] 만큼 shift한다는 의미)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 후, window partition을 수행한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그러므로, feature의 shape은 아래와 같은 순서로 변환된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[B, H, W, C] (=[Batch, Height, Width, numberOfFeature(180)]&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;-&amp;gt; [&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;numberOfWindow&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*B, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, C] &lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;-&amp;gt; [&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;numberOfWindow&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*B, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, C&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;] = ([&lt;span style=&quot;color: #000000;&quot;&gt;numberOfWindow&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*B,&lt;span&gt; 8&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*8&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, 180])&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;참고로, 원래의 Swin Transformer의 제안 논문에서는 window merge 구조도 포함되어 있는데 SwinIR은 해당 방법은 수행하지 않는 것으로 보인다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;그런 후에 실제 Window Attention이 수행된다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이는 원래의 Transformer 논문 [3] 에서 제안된 Multi-Head Attention 구조를 거의 그대로 따른다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;417&quot; data-origin-height=&quot;514&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pNiLq/btrv8D8g0P4/5mnRdW6sfySi9C0b2vEwEk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pNiLq/btrv8D8g0P4/5mnRdW6sfySi9C0b2vEwEk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pNiLq/btrv8D8g0P4/5mnRdW6sfySi9C0b2vEwEk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpNiLq%2Fbtrv8D8g0P4%2F5mnRdW6sfySi9C0b2vEwEk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;253&quot; height=&quot;312&quot; data-origin-width=&quot;417&quot; data-origin-height=&quot;514&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Attention 입력의 shape은 위에서도 언급했듯이 아래와 같다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;numberOfWindow&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*B, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, C&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이를 더 간단히 표현하기 위해,&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;B_ : &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;numberOfWindow&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*B&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;N : &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;window_size&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; (8*8&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;C : &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;180&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;라고 하겠다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Multi-Head Attention은 Self-Attention 구조이며, 따라서 동일한 feature &lt;span style=&quot;color: #000000;&quot;&gt;Q(Query), K(Key), V(Value)가 활용된다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;그러면 결국 여기에서는 Q, K, V 각각 [N, C] Matrix를 가지게 된다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;우선, Shape이 아래와 같이 변경된다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[B_, N, 3, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6(&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;num_heads&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;), C // &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6(&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;num_heads&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;)] -&amp;gt; [3, B_, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;N, C // &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;본 논문에서는 head 개수를 6으로 지정하였으며, 위 feature를 3개로 분리하면 shape은 아래와 같다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[B_, 6, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;N, &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;C // 6&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Attention 수식은 아래와 같다.&lt;/span&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;883&quot; data-origin-height=&quot;81&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/A5aLi/btrv9J8icHt/tgjjNI3fI2xs77nB0MSVWk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/A5aLi/btrv9J8icHt/tgjjNI3fI2xs77nB0MSVWk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/A5aLi/btrv9J8icHt/tgjjNI3fI2xs77nB0MSVWk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FA5aLi%2Fbtrv9J8icHt%2FtgjjNI3fI2xs77nB0MSVWk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;523&quot; height=&quot;48&quot; data-origin-width=&quot;883&quot; data-origin-height=&quot;81&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선, 본 논문에서는 d 값을 C//6으로 지정하고 있으며, B는 &lt;span style=&quot;color: #000000;&quot;&gt;learnable relative positional encoding 값이다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;핵심적으로 보면, SoftMax(QK)V 라는 Self-Attention 구조를 가진다.&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 때, Matrix shape을 분석해보면 아래와 같다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;QK &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;= [64, 30] X [30, 64] = [64, 64] = [N, N]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Softmax&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;(QK)V &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;= [64, 64] X [64, 30] = [64, 30] = [N, C // 6]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Attention 이 후, Shape을 multi-head로 분리되어 있던 것을 하나로 합치기 위해 아래와 같이 변환한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;[B_,&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6,N,C &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;// 6] &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;-&amp;gt; [&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;B_,N,C&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 후, FC 레이어를 거쳐 최종 출력을 반환하고, 다음 STL 로 들어간다.&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 때 shape은 변하지 않는다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Experiments&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1282&quot; data-origin-height=&quot;868&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/QNA0R/btrwaFK5adv/CIBVx6ku4DKzXUlViKkkGK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/QNA0R/btrwaFK5adv/CIBVx6ku4DKzXUlViKkkGK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/QNA0R/btrwaFK5adv/CIBVx6ku4DKzXUlViKkkGK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FQNA0R%2FbtrwaFK5adv%2FCIBVx6ku4DKzXUlViKkkGK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1282&quot; height=&quot;868&quot; data-origin-width=&quot;1282&quot; data-origin-height=&quot;868&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1619&quot; data-origin-height=&quot;334&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/BN9m7/btrv07P7eWB/iPMoBNX6Sv6lgpv6vIyeZ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/BN9m7/btrv07P7eWB/iPMoBNX6Sv6lgpv6vIyeZ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/BN9m7/btrv07P7eWB/iPMoBNX6Sv6lgpv6vIyeZ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FBN9m7%2Fbtrv07P7eWB%2FiPMoBNX6Sv6lgpv6vIyeZ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1619&quot; height=&quot;334&quot; data-origin-width=&quot;1619&quot; data-origin-height=&quot;334&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[References]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Liang, Jingyun, et al. &quot;Swinir: Image restoration using swin transformer.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF International Conference on Computer Vision&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2021.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[2] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Liu, Ze, et al. &quot;Swin transformer: Hierarchical vision transformer using shifted windows.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF International Conference on Computer Vision&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2021.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[3] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Vaswani, Ashish, et al. &quot;Attention is all you need.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Advances in neural information processing systems&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;30 (2017).&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Research/Super-Resolution</category>
      <category>Super-Resolution</category>
      <category>Transformer</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/48</guid>
      <comments>https://young-square.tistory.com/48#entry48comment</comments>
      <pubDate>Tue, 8 Mar 2022 18:48:36 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] Deep Unfolding Network for Image Super-Resolution</title>
      <link>https://young-square.tistory.com/47</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘 살펴 볼 논문은 2020년 CVPR에서 발표된 논문인 &quot;Deep Unfolding Network for Image Super-Resolution&quot; [1] 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해당 논문은 &quot;&lt;span style=&quot;color: #555555;&quot;&gt;Deep Blind Video Super-Resolution&lt;/span&gt;&quot; [2] 이라는 논문의 latent frame restoration 파트 (MAP 프레임워크를 통해 HR 이미지를 추론하는 데 있어서 FFT 기반의 수식을 이끌어냄) 의 기반이 된 논문이라 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;논문 제목에 맞게 '펼치다' 라는 의미를 담고 있는 unfolding 알고리즘을 통해 data term, prior term 두 가지 sub-problem을 해결하는 문제로 변환하였다는 점이 이 논문의 핵심이며, 실제 딥러닝 모델이 활용되는 부분은 prior term과 관련이 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 논문은 기존의 model-based SR 기법을 다양한 scale factor, blur kernel, noise level을 사용하여 end-to-end learning-based SR 모델에 효과적으로 결합한 연구라고 볼 수 있다.&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;1) Unfolding optimization&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;MAP (Maximum a posteriori)에 기반하여 이미지 degradation model은 아래와 같이 정의 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;(y : LR image, x : HR image, k : blur kernel, s : scale factor, &amp;lambda; : trade-off parameter)&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;957&quot; data-origin-height=&quot;150&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c59P7K/btrvr97ZVoQ/ISPLY9MX3FabF7MhyZsim0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c59P7K/btrvr97ZVoQ/ISPLY9MX3FabF7MhyZsim0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c59P7K/btrvr97ZVoQ/ISPLY9MX3FabF7MhyZsim0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc59P7K%2Fbtrvr97ZVoQ%2FISPLY9MX3FabF7MhyZsim0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;554&quot; height=&quot;87&quot; data-origin-width=&quot;957&quot; data-origin-height=&quot;150&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 수식은 data term과 prior term으로 이루어져 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는 두 가지를 분리하여 sub-problem으로 변환하기 위해 half-quadratic splitting (HQS) 알고리즘을 적용하였으며, 그렇게 변환한 수식은 아래와 같다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;157&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdmg5o/btrvhox7wg6/cKRbwdqVgKxbapeiNbK2ek/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdmg5o/btrvhox7wg6/cKRbwdqVgKxbapeiNbK2ek/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdmg5o/btrvhox7wg6/cKRbwdqVgKxbapeiNbK2ek/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbdmg5o%2Fbtrvhox7wg6%2FcKRbwdqVgKxbapeiNbK2ek%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;559&quot; height=&quot;89&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;157&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1001&quot; data-origin-height=&quot;171&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dfy17R/btrvvaky1ro/hr8INZvZSjPQfkWLxkheak/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dfy17R/btrvvaky1ro/hr8INZvZSjPQfkWLxkheak/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dfy17R/btrvvaky1ro/hr8INZvZSjPQfkWLxkheak/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdfy17R%2Fbtrvvaky1ro%2Fhr8INZvZSjPQfkWLxkheak%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;579&quot; height=&quot;99&quot; data-origin-width=&quot;1001&quot; data-origin-height=&quot;171&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;여기서 z 라는 보조 변수를 사용하였으며, 수식 (5)와 (6)을 반복적으로 계산함으로써 최소화시키는 문제로 변환된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;결론적으로, z를 계산하는 수식은 data term을 담고 있으며 계산된 z를 x를 계산하는 식에 넣어 최종 HR x를 얻는 방식을 가진다. 참고로 여기서의 z와 x에 붙어있는 k는 iteration을 의미한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선, 수식 (5)는 z에 대하여 미분한 결과를 0과 같다고 두어 문제를 풀 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;blur kernel k를 H라는 matrix로 두었을 때, 미분한 결과는 아래와 같다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;929&quot; data-origin-height=&quot;398&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xRySs/btrvwJUgMoJ/q54sJvC1KabTGAUPfhOh91/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xRySs/btrvwJUgMoJ/q54sJvC1KabTGAUPfhOh91/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xRySs/btrvwJUgMoJ/q54sJvC1KabTGAUPfhOh91/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxRySs%2FbtrvwJUgMoJ%2Fq54sJvC1KabTGAUPfhOh91%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;587&quot; height=&quot;251&quot; data-origin-width=&quot;929&quot; data-origin-height=&quot;398&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때, 두 개의 방정식 A와 B가 있다고 할 때, AXB=FFT^-1(FFT(AXB)) 라는 특성을 반영하여 아래와 같이 FFT 기반의 식을 얻을 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;819&quot; data-origin-height=&quot;285&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/toZyL/btrvt3y94aD/TyGfspu9zsNNebVcwGlcWk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/toZyL/btrvt3y94aD/TyGfspu9zsNNebVcwGlcWk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/toZyL/btrvt3y94aD/TyGfspu9zsNNebVcwGlcWk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FtoZyL%2Fbtrvt3y94aD%2FTyGfspu9zsNNebVcwGlcWk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;524&quot; height=&quot;182&quot; data-origin-width=&quot;819&quot; data-origin-height=&quot;285&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때 알파는 &lt;span style=&quot;color: #000000;&quot;&gt;&amp;mu;_k*&amp;sigma;^2&lt;/span&gt; 과 같고, 해당 값은 HR을 얻기 위한 수식을 통제할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음으로, 수식 (6)은 아래와 같이 다시 쓸 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;670&quot; data-origin-height=&quot;117&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/5BQmo/btrvtcQXuzw/bxDLgGYcK15MlXnXE7cVqK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/5BQmo/btrvtcQXuzw/bxDLgGYcK15MlXnXE7cVqK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/5BQmo/btrvtcQXuzw/bxDLgGYcK15MlXnXE7cVqK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F5BQmo%2FbtrvtcQXuzw%2FbxDLgGYcK15MlXnXE7cVqK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;447&quot; height=&quot;78&quot; data-origin-width=&quot;670&quot; data-origin-height=&quot;117&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 수식을 해석하면, 입력 이미지 z를 잡음 레벨 &lt;span&gt;&amp;radic;(&lt;/span&gt;&lt;span&gt;&amp;lambda;/&amp;mu;) 에 해당하는 가우시안 잡음 제거기에 통과시킨 결과와 같다고 할 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;본 논문에서는 이러한 잡음 레벨 &amp;radic;(&amp;lambda;/&amp;mu;) 을 베타로 둔다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2) &lt;span style=&quot;color: #000000;&quot;&gt;Deep &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;unfolding network&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1760&quot; data-origin-height=&quot;630&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/3ou7r/btrvtd3ummF/on0MUEAiBqqtHDnJyV83W0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/3ou7r/btrvtd3ummF/on0MUEAiBqqtHDnJyV83W0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/3ou7r/btrvtd3ummF/on0MUEAiBqqtHDnJyV83W0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F3ou7r%2Fbtrvtd3ummF%2Fon0MUEAiBqqtHDnJyV83W0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1760&quot; height=&quot;630&quot; data-origin-width=&quot;1760&quot; data-origin-height=&quot;630&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서 제안하는 네트워크는 크게 3가지 모듈로 구성되어 있다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;D : data module&lt;/li&gt;
&lt;li&gt;P : prior module&lt;/li&gt;
&lt;li&gt;H : hyper-parameter module&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우선, H 모듈은 node level과 scale factor 를 입력으로 받아 (size 2 벡터) iteration 수 (본 논문에서는 8) 만큼의 알파, 베타를 (size 2*8 벡터) 출력으로 낸다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;H는 3개의 FC 레이어로 이루어져 있음.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;D 모듈은 앞에서 언급한 FFT 기반의 수식을 통해 z를 출력함.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;P 모듈은 &lt;span style=&quot;color: #000000;&quot;&gt;U-Net[3]&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;에 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;residual block&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;을 결합한 &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;ResUNet&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;구조로 설계됨. 즉, 이는 denoising을 위한 모듈이 됨.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;3) Experiments&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;본 논문에서 사용하는 scale factor, blur kernel, noise level :&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;s : {2, 3, 4}&lt;/li&gt;
&lt;li&gt;K : they only consider 12 representative and diverse blur kernels, including 4 isotropic Gaussian kernels with different widths (i.e., 0.7, 1.2, 1.6 and 2.0), 4 anisotropic Gaussian kernels, and 4 motion blur kernels.&lt;/li&gt;
&lt;li&gt;&amp;sigma; : {0, 2.55, 7.65}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1961&quot; data-origin-height=&quot;804&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cJ02y3/btrvr9fVDXu/xWnWHRhWHdY2quwxKlYOKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cJ02y3/btrvr9fVDXu/xWnWHRhWHdY2quwxKlYOKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cJ02y3/btrvr9fVDXu/xWnWHRhWHdY2quwxKlYOKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcJ02y3%2Fbtrvr9fVDXu%2FxWnWHRhWHdY2quwxKlYOKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1961&quot; height=&quot;804&quot; data-origin-width=&quot;1961&quot; data-origin-height=&quot;804&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1657&quot; data-origin-height=&quot;877&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/drhq7N/btrvtAKRUqr/lpYnw5DLqVCUpQdBeQLNAk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/drhq7N/btrvtAKRUqr/lpYnw5DLqVCUpQdBeQLNAk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/drhq7N/btrvtAKRUqr/lpYnw5DLqVCUpQdBeQLNAk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdrhq7N%2FbtrvtAKRUqr%2FlpYnw5DLqVCUpQdBeQLNAk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1657&quot; height=&quot;877&quot; data-origin-width=&quot;1657&quot; data-origin-height=&quot;877&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 결과, 기존 연구들과 비교하여 PSNR이 전체 실험에서 가장 높은 결과를 보이고 있으며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;비주얼적으로 봐도 다른 연구들보다 더 섬세한 디테일을 잘 살리고 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, 본 논문에서 추가적으로 실험한 USRGAN 결과가 PSNR은 USRNet보다 떨어지더라도 비주얼적인 결과는 가장 좋다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[References]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Zhang, Kai, Luc Van Gool, and Radu Timofte. &quot;Deep unfolding network for image super-resolution.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF conference on computer vision and pattern recognition&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2020.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[2] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Pan, Jinshan, et al. &quot;Deep blind video super-resolution.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF International Conference on Computer Vision&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2021.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[3] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. &quot;U-net: Convolutional networks for biomedical image segmentation.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;International Conference on Medical image computing and computer-assisted intervention&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. Springer, Cham, 2015.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Research/Super-Resolution</category>
      <category>map</category>
      <category>Super-Resolution</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/47</guid>
      <comments>https://young-square.tistory.com/47#entry47comment</comments>
      <pubDate>Wed, 2 Mar 2022 22:03:30 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] Deep Blind Video Super-Resolution</title>
      <link>https://young-square.tistory.com/46</link>
      <description>&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오늘 살펴 볼 논문은 2021년 ICCV에서 발표된 논문인 &quot;Deep Blind Video Super-Resolution&quot; [1] 이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해당 논문은 &lt;b&gt;1) 기존의 MAP (&lt;/b&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;b&gt;Maximum a posteriori) 기반의 많은 연구들이 직접 hand-crafted priors를 구해야 하기 때문에 문제를 해결하기에 복잡해 질 수 있으며&lt;/b&gt;, &lt;b&gt;2) 대부분의 딥러닝 기반의 VSR 모델들은 Blur kernel을 알고 있다는 가정하에 설계되었으므로 결과 이미지가 over-smoothed 될 수 있다는 단점을 가진다&lt;/b&gt;는 점을 들어 문제를 제기하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;따라서, 본 논문에서는 두 가지 방법을 모두 결합한 VSR 딥러닝 모델을 제안하였다.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style8&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;1) Revisiting &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Variational&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt; &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Methods&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;제안하는 딥러닝 모델의 기반이 된 MAP 기반의 방법론을 먼저 살펴본다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;VSR을 위한 Degradation 모델은 아래와 같이 정의 할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;978&quot; data-origin-height=&quot;76&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/G8GZQ/btruVL0dQ0P/cnKftKDENiws6HBNIgE5xk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/G8GZQ/btruVL0dQ0P/cnKftKDENiws6HBNIgE5xk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/G8GZQ/btruVL0dQ0P/cnKftKDENiws6HBNIgE5xk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FG8GZQ%2FbtruVL0dQ0P%2FcnKftKDENiws6HBNIgE5xk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;603&quot; height=&quot;76&quot; data-origin-width=&quot;978&quot; data-origin-height=&quot;76&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 수식에서, 각 notation이 의미하는 바는 아래와 같다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;L : LR images&lt;/span&gt;&lt;/u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;I : HR frame&lt;/span&gt;&lt;/u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;S : down-sampling matrix&lt;/span&gt;&lt;/u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;K : blur kernel&lt;/span&gt;&lt;/u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;/span&gt;&lt;u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;F : warping matrix (u : optical flow)&lt;/span&gt;&lt;/u&gt;&lt;span style=&quot;color: #000000;&quot;&gt;, &lt;u&gt;n : noise&lt;/u&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;즉, i번째 HR (High-resolution) 프레임에 optical flow에 의한 warping matrix가 곱해지고, blur kernel이 적용되고, down-sampling이 수행 된 후 noise가 가해지면 2N+1개의 LR (Low-resolution) 프레임들 (i-N~i+N)을 얻게 된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;VSR 문제에서 주어진 LR 프레임들로부터 우리가 구하고자 하는 것은 HR 프레임, optical flow, blur kernel 이며 MAP에 기반하여 아래와 같이 표현 할 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1378&quot; data-origin-height=&quot;355&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cdHcQG/btruAjYDPc7/OjQez0I7rSZRjNXaEn8rz0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cdHcQG/btruAjYDPc7/OjQez0I7rSZRjNXaEn8rz0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cdHcQG/btruAjYDPc7/OjQez0I7rSZRjNXaEn8rz0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcdHcQG%2FbtruAjYDPc7%2FOjQez0I7rSZRjNXaEn8rz0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;794&quot; height=&quot;205&quot; data-origin-width=&quot;1378&quot; data-origin-height=&quot;355&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;즉, Bayes' Rule에 의해 posterior를 maximize 하는 문제는 prior와 likelihood의 곱으로 구할 수가 있다는 것을 의미하며&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 때, 주어진 LR에 대한 prior를 의미하는 분모에 해당하는 값이 없는 이유는 분모는 Maximum 최적화 과정에서 소거가 되기 때문이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;최적화하는 입장에서 위와 같은 maximize 하는 문제는 아래와 같이 minimize하는 문제로 바뀔 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1118&quot; data-origin-height=&quot;530&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/XIbQC/btruUoqIk88/gTKKLtyAnTM38VpXTA9S8k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/XIbQC/btruUoqIk88/gTKKLtyAnTM38VpXTA9S8k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/XIbQC/btruUoqIk88/gTKKLtyAnTM38VpXTA9S8k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FXIbQC%2FbtruUoqIk88%2FgTKKLtyAnTM38VpXTA9S8k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;580&quot; height=&quot;275&quot; data-origin-width=&quot;1118&quot; data-origin-height=&quot;530&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이렇게 변형 할 수 있는 이유는 구하고자 하는 값들은 실제로 알 수 없기 때문에 random variable이라 할 수 있는데,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그러므로 Gaussian 분포, Laplace 분포와 같은 분포를 따른다고 가정 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;likelihood를 의미하는 수식들은 두 가지 분포 어떤게 됐든 상수 값들은 모두 소거하고 log를 취하면 -가 붙은 (예측-정답)^2 혹은 |&lt;span&gt;예측-정답| 형태의 수식을 얻을 수 있다. 이를 다시 말하면, &lt;/span&gt;Gaussian 분포를 따른다고 하면, L2 형태가 될 것이며, Laplace 분포를 따른다고 하면 위와 같은 L1 형태가 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;마지막으로, 앞에 -를 없애면 maximize 하는 문제가 minimize하는 문제로 변하게 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, 마찬가지로 log를 취했으므로 각각의 prior들은 더해주는 형태로 남게 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;결과적으로, 모든 prior들을 hand-crafted 하게 정의했다고 가정 할 때, 위 수식들을 통해 HR 프레임, optical flow, blur kernel 각각에 대한 estimation 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만, 앞에서도 언급했듯이 이러한 방법은 직접 hand-crafted priors를 구해야 하기 때문에 문제를 해결하기에 복잡해 질 수 있으며, non-convex objective function을 풀어야 하는 문제가 될 수 있기 때문에 한계를 가진다.&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;2) An&amp;nbsp;overview&amp;nbsp;of&amp;nbsp;the&amp;nbsp;proposed&amp;nbsp;method&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1572&quot; data-origin-height=&quot;861&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cT4egX/btruUoK3ql2/WtIWkkHkC8sLlK2jTmHxDK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cT4egX/btruUoK3ql2/WtIWkkHkC8sLlK2jTmHxDK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cT4egX/btruUoK3ql2/WtIWkkHkC8sLlK2jTmHxDK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcT4egX%2FbtruUoK3ql2%2FWtIWkkHkC8sLlK2jTmHxDK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1572&quot; height=&quot;861&quot; data-origin-width=&quot;1572&quot; data-origin-height=&quot;861&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 그림은 본 논문에서 제안하는 모델의 전체 구조를 나타낸다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 글에서는 전체 모듈의 부분들을 아래 순서대로 설명한다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Optical flow estimation (+Feature extraction, Feature warping and fusion)&lt;/li&gt;
&lt;li&gt;Blur kernel estimation&lt;/li&gt;
&lt;li&gt;Latent frame restoration&lt;/li&gt;
&lt;li&gt;Sharp feature extraction and HR frame restoration&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;3) &lt;span&gt;&lt;/span&gt;&lt;/span&gt;Optical flow&lt;span&gt;&amp;nbsp;&lt;/span&gt;estimation (+Feature extraction, Feature warping and fusion)&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서 제안하는 모델의 optical flow estimation 모듈은 2018년 CVPR에서 발표된 논문 &quot;&lt;span style=&quot;color: #000000;&quot;&gt;PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume&quot; [2] 에서 제안된 PWC-Net을 사용하였다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;optical flow estimation 네트워크는 논문 상에서 N_0로 표기하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;구한 optical flow map들은 입력 LR 이미지들에 warping을 수행하는 게 아닌, feature space에서 수행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;따라서, feature extraction 네트워크 N_e로부터 feature를 추출한 후 feature에 warping을 수행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;warped feature들은 concat 된 후 fusion 네트워크 N_f로 들어가 최종 warped feature H^f를 구할 수 있게 된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;N_e와 N_f가 어떤 레이어들로 구성되어 있는 지는 위 전체 구조 그림을 참고 바란다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;4)&lt;span&gt; &lt;/span&gt;&lt;/span&gt;Blur kernel estimation&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서 제안하는 모델의 핵심이라 할 수 있는 blur kenel estimation 파트는 몇 개의 Convolution layer, RCAB (Residual Channel Attention Block), Adaptive Average Pooling, FC layer, Softmax layer로 이루어져 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;blur kernel estimation 네트워크는 논문 상에서 N_k로 표기하였다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이러한 레이어들을 거쳐 여기서 구하고자 하는 것은 blur kernel 이며,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 다른 VSR 딥러닝 모델에서 blur kernel을 bicubic 등으로 고정시키고 시작하는 것과 달리 본 논문에서는 실제로 blur kernel을 구하는 방식을 가지므로 더 정확하게 deblurring을 수행 할 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문에서는 kernel estimation의 정확도를 높이기 위해, 이를 위한 loss function을 추가적으로 아래와 같이 설계하였다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1085&quot; data-origin-height=&quot;71&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dzLylI/btruUoK5sdL/cm83w5joJBbgALhCuSKapK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dzLylI/btruUoK5sdL/cm83w5joJBbgALhCuSKapK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dzLylI/btruUoK5sdL/cm83w5joJBbgALhCuSKapK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdzLylI%2FbtruUoK5sdL%2Fcm83w5joJBbgALhCuSKapK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;614&quot; height=&quot;40&quot; data-origin-width=&quot;1085&quot; data-origin-height=&quot;71&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이러한 loss function L_k는 아래와 같이 최종 loss function에 더해주는 형태로 사용된다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1405&quot; data-origin-height=&quot;189&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/O16Qz/btruWjoWY8B/58J9scqXInsEBjEtKbtqC0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/O16Qz/btruWjoWY8B/58J9scqXInsEBjEtKbtqC0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/O16Qz/btruWjoWY8B/58J9scqXInsEBjEtKbtqC0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FO16Qz%2FbtruWjoWY8B%2F58J9scqXInsEBjEtKbtqC0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;667&quot; height=&quot;90&quot; data-origin-width=&quot;1405&quot; data-origin-height=&quot;189&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;5)&lt;span&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Latent frame &lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;restoration&lt;/span&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Blur kernel K를 구한 이후, K와 LR 프레임들을 통해 HR 프레임을 예측 할 수 있다.&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;그러므로, 본 논문에서는 최종 HR 프레임을 구하기 앞서 draft한 HR 프레임을 아래 수식에 기반하여 예측하며, 이를 Intermediate latent HR frame 이라 부른다.&lt;/span&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;971&quot; data-origin-height=&quot;86&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ch1TLl/btruRMMpBHA/n0pEXW7bijcsXnfElE5xi0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ch1TLl/btruRMMpBHA/n0pEXW7bijcsXnfElE5xi0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ch1TLl/btruRMMpBHA/n0pEXW7bijcsXnfElE5xi0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fch1TLl%2FbtruRMMpBHA%2Fn0pEXW7bijcsXnfElE5xi0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;611&quot; height=&quot;54&quot; data-origin-width=&quot;971&quot; data-origin-height=&quot;86&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 수식에서 HR 프레임의 이미지 prior에 해당하는 부분은 gradient operation 기반으로 정의한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또한, 위 수식은 FFT (Fast Fourier Transform)에 기반하여 아래와 같은 closed-form solution을 구할 수 있다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;954&quot; data-origin-height=&quot;146&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/0ukm1/btruPbMHpRv/MsT6MVQRnyq2CHI2b0sJuK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/0ukm1/btruPbMHpRv/MsT6MVQRnyq2CHI2b0sJuK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/0ukm1/btruPbMHpRv/MsT6MVQRnyq2CHI2b0sJuK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F0ukm1%2FbtruPbMHpRv%2FMsT6MVQRnyq2CHI2b0sJuK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;606&quot; height=&quot;93&quot; data-origin-width=&quot;954&quot; data-origin-height=&quot;146&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위 수식을 통해 구한 값을 &lt;span style=&quot;color: #000000;&quot;&gt;Intermediate latent HR frame 이라 한다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;6)&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Sharp feature extraction and HR frame restoration&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;앞에서 구한 warped feature H^f 와 latent HR frame을 사용하여 최종적으로 HR 프레임을 추론한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;우선, latent HR frame은 아래와 같이 sharp feature extraction 네트워크 N_d를 통과한다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;288&quot; data-origin-height=&quot;67&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zASpW/btruNWI2nxq/XduxvewSPIiRCBEzGuBAV0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zASpW/btruNWI2nxq/XduxvewSPIiRCBEzGuBAV0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zASpW/btruNWI2nxq/XduxvewSPIiRCBEzGuBAV0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzASpW%2FbtruNWI2nxq%2FXduxvewSPIiRCBEzGuBAV0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;211&quot; height=&quot;49&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;288&quot; data-origin-height=&quot;67&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 때, warped feature H^f와 HxW shape을 맞춰주기 위해 space-to-depth operation (S) 을 통해 HR 크기를 다시 LR 크기로 변형시키고, Convolution layer와 LeakyReLU로 이루어진 N_d 네트워크를 통과한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;그리고, 본 논문에서는 sharp feature transform을 affine transformation에 기반하여 설계하였는데 그 수식은 아래와 같다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;843&quot; data-origin-height=&quot;71&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/csRjuV/btruK6xoGqT/OxeUvsUzvyhABvxaflq91K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/csRjuV/btruK6xoGqT/OxeUvsUzvyhABvxaflq91K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/csRjuV/btruK6xoGqT/OxeUvsUzvyhABvxaflq91K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcsRjuV%2FbtruK6xoGqT%2FOxeUvsUzvyhABvxaflq91K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;499&quot; height=&quot;42&quot; data-origin-width=&quot;843&quot; data-origin-height=&quot;71&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;즉, affine transformation과 같이 ax+b와 같은 형태를 가지도록 설계되었으며 곱해주는 matrix와 더해주는 matrix 각각을 각각의 convolution layer를 통해 구한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;모델의 전체 구조를 보면, 최종 HR frame restoration 네트워크에서도 sharp feature extraction이 반복적으로 삽입되어 있음을 확인 할 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;또한, 최종적으로 up-sampling을 하는 부분은 PixelShuffle 함수를 활용하고 있음을 확인 할 수 있고, 마지막에 LR 프레임에 Bilinear up-sampling을 적용하여 출력부에 더해줌으로써 global하게 residual learning을 수행하고 있음을 알 수 있다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;7) &lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Implementation&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;구현에 있어서 중요한 부분 하나만 언급하고 가자면,&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;본 논문에서는 optical flow estimation network를 pre-trained model로 초기화를 시키며,&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;blur kernel estimation network의 경우 먼저 학습을 따로 수행 한 후 해당 pre-trained model을 활용하여 전체 네트워크를 학습시키는 구조를 가진다.&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;이 두가지 네트워크를 제외한 나머지 네트워크들은 lr 0.0001로부터 시작하고, 두 네트워크는 이미 학습이 진행된 바 있으므로 lr 0.000001로부터 시작한다.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;8)&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;Experimental Results&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;결과적으로, 기존 VSR SOTA 모델들보다 높은 PSNR, SSIM 값을 얻을 수 있고, 비주얼적으로도 더 sharp한 특징이 강조된 결과를 보인다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;828&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b9A6Cc/btruUnFqdxE/kvRkCsuXyvB3lLf6UUsOZ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b9A6Cc/btruUnFqdxE/kvRkCsuXyvB3lLf6UUsOZ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b9A6Cc/btruUnFqdxE/kvRkCsuXyvB3lLf6UUsOZ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb9A6Cc%2FbtruUnFqdxE%2FkvRkCsuXyvB3lLf6UUsOZ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2000&quot; height=&quot;828&quot; data-origin-width=&quot;2000&quot; data-origin-height=&quot;828&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;[References]&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[1] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Pan, Jinshan, et al. &quot;Deep blind video super-resolution.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE/CVF International Conference on Computer Vision&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2021.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[2] &lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;Sun, Deqing, et al. &quot;Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume.&quot;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;Proceedings of the IEEE conference on computer vision and pattern recognition&lt;/i&gt;&lt;span style=&quot;background-color: #ffffff; color: #222222;&quot;&gt;. 2018.&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Research/Super-Resolution</category>
      <category>map</category>
      <category>Video Super-Resolution</category>
      <category>VSR</category>
      <author>영스퀘어</author>
      <guid isPermaLink="true">https://young-square.tistory.com/46</guid>
      <comments>https://young-square.tistory.com/46#entry46comment</comments>
      <pubDate>Thu, 24 Feb 2022 20:46:58 +0900</pubDate>
    </item>
  </channel>
</rss>