diff --git a/LICENSE.txt b/LICENSE.txt
index 44a4cfcc8..7fb1c6515 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -1,6 +1,6 @@
     RawTherapee - A powerful, cross-platform raw image processing program.
     Copyright (C) 2004-2012 Gabor Horvath <hgabor@rawtherapee.com>
-    Copyright (C) 2010-2017 RawTherapee development team.
+    Copyright (C) 2010-2018 RawTherapee development team.
 
     RawTherapee is free software: you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
diff --git a/RELEASE_NOTES.txt b/RELEASE_NOTES.txt
index 83ad8acc4..a041b0daf 100644
--- a/RELEASE_NOTES.txt
+++ b/RELEASE_NOTES.txt
@@ -1,10 +1,9 @@
-RAWTHERAPEE 5.3-dev RELEASE NOTES
------------------------------
+RAWTHERAPEE 5.5-dev RELEASE NOTES
+
 This is a development version of RawTherapee. We update the code almost daily. Every few months, once enough changes have accumulated and the code is stabilized, we make a new official release. Every code change between these releases is known as a "development" version, and this is one of them.
 
-RawTherapee provides you with a selection of powerful tools with which you can practise the art of developing raw photos. Be sure to read RawPedia to understand how each tool works so that you may make the most of it.
+Start by reading the "Getting Started" article on RawPedia:
 http://rawpedia.rawtherapee.com/
-A great place to start is the "Getting Started" article. Click on "Main page" in the top-left corner when you have finished reading that article to see all other articles.
 
 While we only commit tested and relatively stable code and so the development versions should be fairly stable, you should be aware that:
 - Development versions only had limited testing, so there may be bugs unknown to us.
@@ -13,72 +12,91 @@ While we only commit tested and relatively stable code and so the development ve
 - The way new tools work in the development versions is likely to change as we tweak and tune them, so your processing profiles may produce different results when used in a future stable version.
 - Bugs present in the stable versions get fixed in the development versions, and make it into the next stable version when we make a new official release. That means that in some ways the development versions can be "more stable" than the latest stable release. At the same time, new features may introduce new bugs. This is a trade-off you should be aware of.
 
-News Relevant to Photographers
-------------------------------
-RawTherapee supports most raw formats, including Pentax Pixel Shift, Canon Dual-Pixel, and those from Foveon and X-Trans sensors.
+
+NEWS RELEVANT TO PHOTOGRAPHERS
+
+RawTherapee supports most raw formats, including Pentax and Sony Pixel Shift, Canon Dual-Pixel, and those from Foveon and X-Trans sensors.
 If you're wondering whether it supports your camera's raw format, first download RawTherapee and try for yourself. If a raw format is not supported it will either not open, or the preview in the Editor tab will appear black, white, or have a strong color cast - usually magenta. In that case, read the "Adding Support for New Raw Formats" RawPedia article.
 
 In order to use RawTherapee efficiently you should know that:
 - You can scroll all panels using the mouse scroll-wheel.
 - You can right-click on a tool's name to automatically expand it while collapsing all others.
 - To change slider values or drop-down list items with the mouse scroll-wheel, hold the Shift key. This is so that you can safely scroll the panels without accidentally changing a slider or other tool setting.
-- All curves support the Shift and Ctrl keys while dragging a point. Shift+drag makes the point snap to meaningful axes (top, bottom, diagonal, other), while Ctrl+drag makes your mouse movement super-fine for precise point positioning.
+- All curves support the Shift and Ctrl keys while dragging a point. Shift+drag makes the point snap to a meaningful axis (top, bottom, diagonal, other), while Ctrl+drag makes your mouse movement super-fine for precise point positioning.
 - There are many keyboard shortcuts which make working with RawTherapee much faster and give you greater control. Make sure you familiarize yourself with them on RawPedia's "Keyboard Shortcuts" page!
 
-New features since 5.3:
-- To be filled in when 5.4 is released.
+New features since 5.4:
+- TODO.
+
+RawTherapee and other open-source projects require access to sample raw files from various camera makes and models in order to support those raw formats correctly.
+You can help by submitting raw files to RPU:
+https://raw.pixls.us/
+
+
+
+NEWS RELEVANT TO PACKAGE MAINTAINERS
 
-News Relevant to Package Maintainers
-------------------------------------
 In general:
 - To get the source code, either clone from git or use the tarball from http://rawtherapee.com/shared/source/ . Do not use the auto-generated GitHub release tarballs.
-- Requires GTK+ version >=3.16, though 3.22 is recommended.
+- Requires GTK+ version >=3.16, though >=3.22.24 is recommended.
 - RawTherapee 5 requires GCC-4.9 or higher, or Clang.
 - Do not use -ffast-math, it will not make RawTherapee faster but will introduce artifacts.
 - Use -O3, it will make RawTherapee faster with no known side-effects.
 - For stable releases use -DCACHE_NAME_SUFFIX=""
 - For development builds and release-candidates use -DCACHE_NAME_SUFFIX="5-dev"
 
-Changes since 5.2:
-- To be filled in when 5.4 is released.
+Changes since 5.4:
+- TODO.
+
+
+
+NEWS RELEVANT TO DEVELOPERS
+
+See CONTRIBUTING.md
+
 
-News Relevant to Developers
----------------------------
-- Announce and discuss your plans in GitHub before starting work.
-- Keep branches small so that completed and working features can be merged into the "dev" branch often, and so that they can be abandoned if they head in the wrong direction.
-- Use C++11.
 
 DOCUMENTATION
--------------
-http://rawtherapee.com/blog/documentation
+
 http://rawpedia.rawtherapee.com/
+http://rawtherapee.com/blog/documentation
+
+
 
 REPORTING BUGS
---------------
+
 If you found a problem, don't keep it to yourself. Read the "How to write useful bug reports" article to get the problem fixed:
 http://rawpedia.rawtherapee.com/How_to_write_useful_bug_reports
 
+
+
 FORUM
------
+
 RawTherapee shares a forum with users and developers of other Free/Libre/Open Source Software:
 https://discuss.pixls.us/c/software/rawtherapee
 
+
+
 LIVE CHAT WITH USERS AND DEVELOPERS
---------------------------------------
-  Network: freenode
-  Server:  chat.freenode.net
-  Channel: #rawtherapee
+
+Network: freenode
+Server: chat.freenode.net
+Channel: #rawtherapee
 
 You can use freenode webchat to communicate without installing anything:
 http://webchat.freenode.net/?randomnick=1&channels=rawtherapee&prompt=1
 More information here: http://rawpedia.rawtherapee.com/IRC
 
+
+
 SOCIAL NETWORKS
----------------
+
 Google+
-http://plus.google.com/106783532637761598368
+https://plus.google.com/+RawTherapee
+
+
 
 REVISION HISTORY
-----------------
+
 The complete changelog is available at:
 https://github.com/Beep6581/RawTherapee/commits/
diff --git a/rawtherapee.astylerc b/rawtherapee.astylerc
deleted file mode 100644
index 3d49d821f..000000000
--- a/rawtherapee.astylerc
+++ /dev/null
@@ -1,8 +0,0 @@
-style=1tbs
-indent=spaces=4
-indent-switches
-break-blocks
-pad-oper
-convert-tabs
-pad-header
-unpad-paren
diff --git a/rtdata/dcpprofiles/MINOLTA DYNAX 7D.dcp b/rtdata/dcpprofiles/MINOLTA DYNAX 7D.dcp
new file mode 100644
index 000000000..807e4e05f
Binary files /dev/null and b/rtdata/dcpprofiles/MINOLTA DYNAX 7D.dcp differ
diff --git a/rtdata/dcpprofiles/SONY DSLR-A580.dcp b/rtdata/dcpprofiles/SONY DSLR-A580.dcp
new file mode 100644
index 000000000..73576c892
Binary files /dev/null and b/rtdata/dcpprofiles/SONY DSLR-A580.dcp differ
diff --git a/rtdata/images/rt_splash.svg b/rtdata/images/rt_splash.svg
index c80cdd5c7..94d9a42b1 100644
--- a/rtdata/images/rt_splash.svg
+++ b/rtdata/images/rt_splash.svg
@@ -15,7 +15,7 @@
    viewBox="0 0 146.05 91.545836"
    version="1.1"
    id="svg783"
-   inkscape:version="0.92.1 r"
+   inkscape:version="0.92.2 5c3e80d, 2017-08-06"
    sodipodi:docname="rt_splash.svg"
    inkscape:export-filename="/tmp/splash.png"
    inkscape:export-xdpi="96"
@@ -517,10 +517,10 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="1.6106713"
-     inkscape:cx="274.80436"
-     inkscape:cy="87.101745"
-     inkscape:document-units="mm"
+     inkscape:zoom="1.4142136"
+     inkscape:cx="366.55779"
+     inkscape:cy="87.326779"
+     inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
      units="px"
@@ -544,7 +544,7 @@
      inkscape:snap-nodes="false"
      inkscape:snap-global="true"
      inkscape:window-width="1920"
-     inkscape:window-height="1023"
+     inkscape:window-height="1019"
      inkscape:window-x="0"
      inkscape:window-y="0"
      inkscape:window-maximized="1">
@@ -599,7 +599,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title />
+        <dc:title></dc:title>
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -902,10 +902,34 @@
          y="2.2370076"
          x="283.85016"
          id="tspan3664"
-         sodipodi:role="line">. 3</tspan></text>
+         sodipodi:role="line">. 5</tspan></text>
+    <rect
+       style="fill:#2a2a2a;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:0.26458338;enable-background:new"
+       id="rect1167"
+       width="73.670166"
+       height="9.4604874"
+       x="153.45833"
+       y="279.20517" />
+    <g
+       style="fill:#ffffff;enable-background:new"
+       transform="matrix(0.24804687,0,0,0.2480469,-16.632726,273.1232)"
+       id="g1249">
+      <text
+         transform="skewX(3)"
+         id="text1247"
+         y="49.720543"
+         x="596.36224"
+         style="font-style:normal;font-weight:normal;font-size:12.80000019px;line-height:0%;font-family:sans-serif;text-align:end;letter-spacing:-1.06666672px;word-spacing:0px;text-anchor:end;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1.06666672px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+         xml:space="preserve"><tspan
+           style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:17.06666565px;line-height:110.00000238%;font-family:'ITC Eras Std';-inkscape-font-specification:'ITC Eras Std Bold';text-align:end;letter-spacing:-1.06666672px;text-anchor:end;fill:#ffffff;stroke-width:1.06666672px"
+           y="49.720543"
+           x="595.29559"
+           sodipodi:role="line"
+           id="tspan1245">Development</tspan></text>
+    </g>
     <g
        id="g3712"
-       transform="matrix(0.24804687,0,0,0.2480469,-10.282719,273.1232)"
+       transform="matrix(0.24804687,0,0,0.2480469,58.508951,273.1232)"
        style="fill:#ffffff;enable-background:new">
       <text
          xml:space="preserve"
@@ -918,7 +942,7 @@
            sodipodi:role="line"
            x="595.29559"
            y="49.720543"
-           style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:24px;line-height:110.00000238%;font-family:'ITC Eras Std';-inkscape-font-specification:'ITC Eras Std Bold';text-align:end;letter-spacing:-1.06666672px;text-anchor:end;fill:#ffffff;stroke-width:1.06666672px">Development</tspan></text>
+           style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:17.06666565px;line-height:110.00000238%;font-family:'ITC Eras Std';-inkscape-font-specification:'ITC Eras Std Bold';text-align:end;letter-spacing:-1.06666672px;text-anchor:end;fill:#ffffff;stroke-width:1.06666672px">Release Candidate 1</tspan></text>
     </g>
   </g>
 </svg>
diff --git a/rtdata/images/splash.png b/rtdata/images/splash.png
index 701ef02bf..f3c523b88 100644
Binary files a/rtdata/images/splash.png and b/rtdata/images/splash.png differ
diff --git a/rtdata/languages/Czech b/rtdata/languages/Czech
index 71c92fed1..66440ad06 100644
--- a/rtdata/languages/Czech
+++ b/rtdata/languages/Czech
@@ -2243,3 +2243,4 @@ ZOOMPANEL_ZOOMFITCROPSCREEN;Přizpůsobit ořez obrazovce\nZkratka: <b>f</b>
 ZOOMPANEL_ZOOMFITSCREEN;Přizpůsobit celý obrázek obrazovce\nZkratka: <b>Alt</b>-<b>f</b>
 ZOOMPANEL_ZOOMIN;Přiblížit\nZkratka: <b>+</b>
 ZOOMPANEL_ZOOMOUT;Oddálit\nZkratka: <b>-</b>
+
diff --git a/rtengine/CMakeLists.txt b/rtengine/CMakeLists.txt
index c12ade822..c30da357b 100644
--- a/rtengine/CMakeLists.txt
+++ b/rtengine/CMakeLists.txt
@@ -106,6 +106,7 @@ set(RTENGINESOURCEFILES
     profilestore.cc
     rawimage.cc
     rawimagesource.cc
+    rcd_demosaic.cc
     refreshmap.cc
     rt_algo.cc
     rtthumbnail.cc
diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index a55c5f5bd..c0a63037a 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -397,94 +397,34 @@ public:
         vfloat diff = indexv - _mm_cvtepi32_ps(indexes);
         return vintpf(diff, upper, lower);
     }
+
+    // vectorized LUT access with integer indices. Clips at lower and upper bounds
 #ifdef __SSE4_1__
     template<typename U = T, typename = typename std::enable_if<std::is_same<U, float>::value>::type>
-    vfloat operator[](vint idxv ) const
+    vfloat operator[](vint idxv) const
     {
-        vfloat tempv, p1v;
         idxv = _mm_max_epi32( _mm_setzero_si128(), _mm_min_epi32(idxv, sizeiv));
-        // access the LUT 4 times and shuffle the values into p1v
-
-        int idx;
-
-        // get 4th value
-        idx = _mm_extract_epi32(idxv, 3);
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS(tempv, _MM_SHUFFLE(0, 0, 0, 0));
-        // now p1v is 3 3 3 3
-
-        // get 3rd value
-        idx = _mm_extract_epi32(idxv, 2);
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = _mm_move_ss( p1v, tempv);
-        // now p1v is 3 3 3 2
-
-        // get 2nd value
-        idx = _mm_extract_epi32(idxv, 1);
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS( p1v, _MM_SHUFFLE(1, 0, 1, 0));
-        // now p1v is 3 2 3 2
-        p1v = _mm_move_ss( p1v, tempv );
-        // now p1v is 3 2 3 1
-
-        // get 1st value
-        idx = _mm_cvtsi128_si32(idxv);
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS( p1v, _MM_SHUFFLE(3, 2, 0, 0));
-        // now p1v is 3 2 1 1
-        p1v = _mm_move_ss( p1v, tempv );
-        // now p1v is 3 2 1 0
-
-        return p1v;
+        // access the LUT 4 times. Trust the compiler. It generates good code here, better than hand written SSE code
+        return _mm_setr_ps(data[_mm_extract_epi32(idxv,0)], data[_mm_extract_epi32(idxv,1)], data[_mm_extract_epi32(idxv,2)], data[_mm_extract_epi32(idxv,3)]);
     }
 #else
     template<typename U = T, typename = typename std::enable_if<std::is_same<U, float>::value>::type>
-    vfloat operator[](vint idxv ) const
+    vfloat operator[](vint idxv) const
     {
-        vfloat tempv, p1v;
-        tempv = _mm_cvtepi32_ps(idxv);
-        tempv = _mm_min_ps( tempv, sizev );
-        idxv = _mm_cvttps_epi32(_mm_max_ps( tempv, _mm_setzero_ps( )  ));
-        // access the LUT 4 times and shuffle the values into p1v
-
-        int idx;
-
-        // get 4th value
-        idx = _mm_cvtsi128_si32 (_mm_shuffle_epi32(idxv, _MM_SHUFFLE(3, 3, 3, 3)));
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS(tempv, _MM_SHUFFLE(0, 0, 0, 0));
-        // now p1v is 3 3 3 3
-
-        // get 3rd value
-        idx = _mm_cvtsi128_si32 (_mm_shuffle_epi32(idxv, _MM_SHUFFLE(2, 2, 2, 2)));
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = _mm_move_ss( p1v, tempv);
-        // now p1v is 3 3 3 2
-
-        // get 2nd value
-        idx = _mm_cvtsi128_si32 (_mm_shuffle_epi32(idxv, _MM_SHUFFLE(1, 1, 1, 1)));
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS( p1v, _MM_SHUFFLE(1, 0, 1, 0));
-        // now p1v is 3 2 3 2
-        p1v = _mm_move_ss( p1v, tempv );
-        // now p1v is 3 2 3 1
-
-        // get 1st value
-        idx = _mm_cvtsi128_si32 (idxv);
-        tempv = _mm_load_ss(&data[idx]);
-        p1v = PERMUTEPS( p1v, _MM_SHUFFLE(3, 2, 0, 0));
-        // now p1v is 3 2 1 1
-        p1v = _mm_move_ss( p1v, tempv );
-        // now p1v is 3 2 1 0
-
-        return p1v;
+        vfloat tempv = vmaxf(ZEROV, vminf(sizev, _mm_cvtepi32_ps(idxv))); // convert to float because SSE2 has no min/max for 32bit integers
+        idxv = _mm_cvttps_epi32(tempv);
+        // access the LUT 4 times. Trust the compiler. It generates good code here, better than hand written SSE code
+        return _mm_setr_ps(data[_mm_cvtsi128_si32(idxv)],
+                           data[_mm_cvtsi128_si32(_mm_shuffle_epi32(idxv, _MM_SHUFFLE(1, 1, 1, 1)))],
+                           data[_mm_cvtsi128_si32(_mm_shuffle_epi32(idxv, _MM_SHUFFLE(2, 2, 2, 2)))],
+                           data[_mm_cvtsi128_si32(_mm_shuffle_epi32(idxv, _MM_SHUFFLE(3, 3, 3, 3)))]);
     }
 #endif
 #endif
 
     // use with float indices
-    template<typename U = T, typename = typename std::enable_if<std::is_same<U, float>::value>::type>
-    T operator[](float index) const
+    template<typename U = T, typename V, typename = typename std::enable_if<std::is_floating_point<V>::value && std::is_same<U, float>::value>::type>
+    T operator[](V index) const
     {
         int idx = (int)index;  // don't use floor! The difference in negative space is no problems here
 
diff --git a/rtengine/PF_correct_RT.cc b/rtengine/PF_correct_RT.cc
index 70c334928..dbb42db0d 100644
--- a/rtengine/PF_correct_RT.cc
+++ b/rtengine/PF_correct_RT.cc
@@ -7,6 +7,10 @@
 //
 // code dated: November 24, 2010
 // optimized: September 2013, Ingo Weyrich
+// further optimized: February 2018, Ingo Weyrich
+//
+//  Ingo Weyrich March 2018: The above comment 'Chromatic Aberration Auto-correction' sounds wrong
+//                           I guess it should have been 'Purple fringe correction' though it's not restricted to 'Purple'
 //
 //  PF_correct_RT.cc is free software: you can redistribute it and/or modify
 //  it under the terms of the GNU General Public License as published by
@@ -22,53 +26,36 @@
 //  along with this program.  If not, see <http://www.gnu.org/licenses/>.
 //
 ////////////////////////////////////////////////////////////////
-//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 #include "gauss.h"
 #include "improcfun.h"
 #include "sleef.c"
-#include "mytime.h"
 #include "../rtgui/myflatcurve.h"
 #include "rt_math.h"
 #include "opthelper.h"
 #include "median.h"
-
-#ifdef _OPENMP
-#include <omp.h>
-#endif
-
-using namespace std;
+#include "jaggedarray.h"
+#include "StopWatch.h"
 
 namespace rtengine
 {
-extern const Settings* settings;
 
-void ImProcFunctions::PF_correct_RT(LabImage * src, LabImage * dst, double radius, int thresh)
+// Defringe in Lab mode
+void ImProcFunctions::PF_correct_RT(LabImage * lab, double radius, int thresh)
 {
-    const int halfwin = ceil(2 * radius) + 1;
-
-    FlatCurve* chCurve = nullptr;
-
+    BENCHFUN
+    std::unique_ptr<FlatCurve> chCurve;
     if (params->defringe.huecurve.size() && FlatCurveType(params->defringe.huecurve.at(0)) > FCT_Linear) {
-        chCurve = new FlatCurve(params->defringe.huecurve);
+        chCurve.reset(new FlatCurve(params->defringe.huecurve));
     }
 
-    // local variables
-    const int width = src->W, height = src->H;
-    //temporary array to store chromaticity
-    float (*fringe);
-    fringe = (float (*)) malloc (height * width * sizeof(*fringe));
+    const int width = lab->W, height = lab->H;
 
-    LabImage * tmp1;
-    tmp1 = new LabImage(width, height);
+    // temporary array to store chromaticity
+    const std::unique_ptr<float[]> fringe(new float[width * height]);
 
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        gaussianBlur (src->a, tmp1->a, src->W, src->H, radius);
-        gaussianBlur (src->b, tmp1->b, src->W, src->H, radius);
-    }
+    JaggedArray<float> tmpa(width, height);
+    JaggedArray<float> tmpb(width, height);
 
     double chromave = 0.0; // use double precision for large summations
 
@@ -76,1791 +63,1110 @@ void ImProcFunctions::PF_correct_RT(LabImage * src, LabImage * dst, double radiu
     #pragma omp parallel
 #endif
     {
-        float chromaChfactor = 1.0f;
+        gaussianBlur(lab->a, tmpa, width, height, radius);
+        gaussianBlur(lab->b, tmpb, width, height, radius);
+
 #ifdef _OPENMP
-        #pragma omp for reduction(+:chromave)
+        #pragma omp for reduction(+:chromave) schedule(dynamic,16)
 #endif
 
-        for(int i = 0; i < height; i++ ) {
+        for (int i = 0; i < height; i++) {
 #ifdef __SSE2__
 
             // vectorized per row precalculation of the atan2 values
             if (chCurve) {
                 int k = 0;
 
-                for(; k < width - 3; k += 4) {
-                    STVFU(fringe[i * width + k], xatan2f(LVFU(src->b[i][k]), LVFU(src->a[i][k])));
+                for (; k < width - 3; k += 4) {
+                    STVFU(fringe[i * width + k], xatan2f(LVFU(lab->b[i][k]), LVFU(lab->a[i][k])));
                 }
 
-                for(; k < width; k++) {
-                    fringe[i * width + k] = xatan2f(src->b[i][k], src->a[i][k]);
+                for (; k < width; k++) {
+                    fringe[i * width + k] = xatan2f(lab->b[i][k], lab->a[i][k]);
                 }
             }
 
-#endif // __SSE2__
+#endif
 
-            for(int j = 0; j < width; j++) {
+            for (int j = 0; j < width; j++) {
+                float chromaChfactor = 1.f;
                 if (chCurve) {
 #ifdef __SSE2__
                     // use the precalculated atan values
-                    float HH = fringe[i * width + j];
+                    const float HH = fringe[i * width + j];
 #else
                     // no precalculated values without SSE => calculate
-                    float HH = xatan2f(src->b[i][j], src->a[i][j]);
+                    const float HH = xatan2f(lab->b[i][j], lab->a[i][j]);
 #endif
-                    float chparam = float((chCurve->getVal((Color::huelab_to_huehsv2(HH))) - 0.5f) * 2.0f); //get C=f(H)
+                    float chparam = chCurve->getVal((Color::huelab_to_huehsv2(HH))) - 0.5f; // get C=f(H)
 
-                    if(chparam > 0.f) {
-                        chparam /= 2.f;    // reduced action if chparam > 0
+                    if (chparam < 0.f) {
+                        chparam *= 2.f; // increased action if chparam < 0
                     }
 
-                    chromaChfactor = 1.0f + chparam;
+                    chromaChfactor = SQR(1.f + chparam);
                 }
 
-                float chroma = SQR(chromaChfactor * (src->a[i][j] - tmp1->a[i][j])) + SQR(chromaChfactor * (src->b[i][j] - tmp1->b[i][j])); //modulate chroma function hue
+                const float chroma = chromaChfactor * (SQR(lab->a[i][j] - tmpa[i][j]) + SQR(lab->b[i][j] - tmpb[i][j])); // modulate chroma function hue
                 chromave += chroma;
                 fringe[i * width + j] = chroma;
             }
         }
     }
 
-    chromave /= (height * width);
-    float threshfactor = SQR(thresh / 33.f) * chromave * 5.0f;
+    chromave /= height * width;
 
-
-// now chromave is calculated, so we postprocess fringe to reduce the number of divisions in future
-#ifdef __SSE2__
+    if (chromave > 0.0) {
+        // now as chromave is calculated, we postprocess fringe to reduce the number of divisions in future
 #ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        __m128 sumv = F2V( chromave );
-        __m128 onev = F2V( 1.0f );
-#ifdef _OPENMP
-        #pragma omp for nowait
+        #pragma omp parallel for simd
 #endif
 
-        for(int j = 0; j < width * height - 3; j += 4) {
-            STVFU(fringe[j], onev / (LVFU(fringe[j]) + sumv));
-        }
-
-        #pragma omp single
-
-        for(int j = width * height - (width * height) % 4; j < width * height; j++) {
+        for (int j = 0; j < width * height; j++) {
             fringe[j] = 1.f / (fringe[j] + chromave);
         }
-    }
 
-#else
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-
-    for(int j = 0; j < width * height; j++) {
-        fringe[j] = 1.f / (fringe[j] + chromave);
-    }
-
-#endif
-
-    // because we changed the values of fringe we also have to recalculate threshfactor
-    threshfactor = 1.0f / (threshfactor + chromave);
+        const float threshfactor = 1.f / (SQR(thresh / 33.f) * chromave * 5.0f + chromave);
+        const int halfwin = std::ceil(2 * radius) + 1;
 
 // Issue 1674:
-// often, CA isn't evenly distributed, e.g. a lot in contrasty regions and none in the sky.
+// often, colour fringe is not evenly distributed, e.g. a lot in contrasty regions and none in the sky.
 // so it's better to schedule dynamic and let every thread only process 16 rows, to avoid running big threads out of work
 // Measured it and in fact gives better performance than without schedule(dynamic,16). Of course, there could be a better
 // choice for the chunk_size than 16
 // Issue 1972: Split this loop in three parts to avoid most of the min and max-operations
 #ifdef _OPENMP
-    #pragma omp parallel for schedule(dynamic,16)
+        #pragma omp parallel for schedule(dynamic,16)
 #endif
 
-    for(int i = 0; i < height; i++ ) {
-        int j;
+        for (int i = 0; i < height; i++) {
+            int j = 0;
+            for (; j < halfwin - 1; j++) {
 
-        for(j = 0; j < halfwin - 1; j++) {
-            tmp1->a[i][j] = src->a[i][j];
-            tmp1->b[i][j] = src->b[i][j];
+                // test for pixel darker than some fraction of neighbourhood ave, near an edge, more saturated than average
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
 
-            //test for pixel darker than some fraction of neighborhood ave, near an edge, more saturated than average
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++)
+                        for (int j1 = 0; j1 < j + halfwin; j1++) {
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
 
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = 0; j1 < j + halfwin; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * src->a[i1][j1];
-                        btot += wt * src->b[i1][j1];
-                        norm += wt;
-                    }
-
-                tmp1->a[i][j] = atot / norm;
-                tmp1->b[i][j] = btot / norm;
+                    lab->a[i][j] = atot / norm;
+                    lab->b[i][j] = btot / norm;
+                }
             }
-        }
 
-        for(; j < width - halfwin + 1; j++) {
-            tmp1->a[i][j] = src->a[i][j];
-            tmp1->b[i][j] = src->b[i][j];
+            for (; j < width - halfwin + 1; j++) {
 
-            //test for pixel darker than some fraction of neighborhood ave, near an edge, more saturated than average
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
+                // test for pixel darker than some fraction of neighbourhood ave, near an edge, more saturated than average
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
 
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * src->a[i1][j1];
-                        btot += wt * src->b[i1][j1];
-                        norm += wt;
-                    }
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++)
+                        for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
 
-                tmp1->a[i][j] = atot / norm;
-                tmp1->b[i][j] = btot / norm;
+                    lab->a[i][j] = atot / norm;
+                    lab->b[i][j] = btot / norm;
+                }
             }
-        }
 
-        for(; j < width; j++) {
-            tmp1->a[i][j] = src->a[i][j];
-            tmp1->b[i][j] = src->b[i][j];
+            for (; j < width; j++) {
 
-            //test for pixel darker than some fraction of neighborhood ave, near an edge, more saturated than average
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
+                // test for pixel darker than some fraction of neighbourhood ave, near an edge, more saturated than average
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
 
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = j - halfwin + 1; j1 < width; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * src->a[i1][j1];
-                        btot += wt * src->b[i1][j1];
-                        norm += wt;
-                    }
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++)
+                        for (int j1 = j - halfwin + 1; j1 < width; j1++) {
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
 
-                tmp1->a[i][j] = atot / norm;
-                tmp1->b[i][j] = btot / norm;
+                    lab->a[i][j] = atot / norm;
+                    lab->b[i][j] = btot / norm;
+                }
             }
-        }
-    }//end of ab channel averaging
-
-    if(src != dst)
-#ifdef _OPENMP
-        #pragma omp parallel for
-#endif
-        for(int i = 0; i < height; i++ ) {
-            for(int j = 0; j < width; j++) {
-                dst->L[i][j] = src->L[i][j];
-            }
-        }
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-
-    for(int i = 0; i < height; i++ ) {
-        for(int j = 0; j < width; j++) {
-            dst->a[i][j] = tmp1->a[i][j];
-            dst->b[i][j] = tmp1->b[i][j];
-        }
+        } // end of ab channel averaging
     }
-
-
-    delete tmp1;
-
-    if(chCurve) {
-        delete chCurve;
-    }
-
-    free(fringe);
 }
 
-void ImProcFunctions::PF_correct_RTcam(CieImage * src, CieImage * dst, double radius, int thresh)
+// Defringe in CIECAM02 mode
+void ImProcFunctions::PF_correct_RTcam(CieImage * ncie, double radius, int thresh)
 {
-    const int halfwin = ceil(2 * radius) + 1;
+    BENCHFUN
 
-    FlatCurve* chCurve = nullptr;
+    std::unique_ptr<FlatCurve> chCurve;
 
     if (params->defringe.huecurve.size() && FlatCurveType(params->defringe.huecurve.at(0)) > FCT_Linear) {
-        chCurve = new FlatCurve(params->defringe.huecurve);
+        chCurve.reset(new FlatCurve(params->defringe.huecurve));
     }
 
-    // local variables
-    const int width = src->W, height = src->H;
-    const float piid = 3.14159265f / 180.f;
-    const float eps2 = 0.01f;
+    const int width = ncie->W, height = ncie->H;
 
-    //temporary array to store chromaticity
-    float (*fringe);
-    fringe = (float (*)) malloc (height * width * sizeof(*fringe));
-
-    float** sraa;
-    sraa = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        sraa[i] = new float[width];
-    }
-
-    float** srbb;
-    srbb = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        srbb[i] = new float[width];
-    }
-
-    float** tmaa;
-    tmaa = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmaa[i] = new float[width];
-    }
-
-    float** tmbb;
-    tmbb = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmbb[i] = new float[width];
-    }
+    // temporary array to store chromaticity
+    const std::unique_ptr<float[]> fringe(new float[width * height]);
 
+    float** const sraa = ncie->h_p; // we use the ncie->h_p buffer to avoid memory allocation/deallocation and reduce memory pressure
+    float** const srbb = ncie->C_p; // we use the ncie->C_p buffer to avoid memory allocation/deallocation and reduce memory pressure
+    JaggedArray<float> tmaa(width, height);
+    JaggedArray<float> tmbb(width, height);
 
 #ifdef _OPENMP
     #pragma omp parallel
 #endif
     {
-        float2 sincosval;
 #ifdef __SSE2__
-        int j;
-        vfloat2 sincosvalv;
-        __m128 piidv = F2V(piid);
-#endif // __SSE2__
+        const vfloat piDiv180v = F2V(RT_PI_F_180);
+#endif
 #ifdef _OPENMP
         #pragma omp for
 #endif
 
         for (int i = 0; i < height; i++) {
+            int j = 0;
 #ifdef __SSE2__
 
-            for (j = 0; j < width - 3; j += 4) {
-                sincosvalv = xsincosf(piidv * LVFU(src->h_p[i][j]));
-                STVFU(sraa[i][j], LVFU(src->C_p[i][j])*sincosvalv.y);
-                STVFU(srbb[i][j], LVFU(src->C_p[i][j])*sincosvalv.x);
+            for (; j < width - 3; j += 4) {
+                const vfloat2 sincosvalv = xsincosf(piDiv180v * LVFU(ncie->h_p[i][j]));
+                STVFU(sraa[i][j], LVFU(ncie->C_p[i][j]) * sincosvalv.y);
+                STVFU(srbb[i][j], LVFU(ncie->C_p[i][j]) * sincosvalv.x);
             }
-
+#endif
             for (; j < width; j++) {
-                sincosval = xsincosf(piid * src->h_p[i][j]);
-                sraa[i][j] = src->C_p[i][j] * sincosval.y;
-                srbb[i][j] = src->C_p[i][j] * sincosval.x;
+                const float2 sincosval = xsincosf(RT_PI_F_180 * ncie->h_p[i][j]);
+                sraa[i][j] = ncie->C_p[i][j] * sincosval.y;
+                srbb[i][j] = ncie->C_p[i][j] * sincosval.x;
             }
-
-#else
-
-            for (int j = 0; j < width; j++) {
-                sincosval = xsincosf(piid * src->h_p[i][j]);
-                sraa[i][j] = src->C_p[i][j] * sincosval.y;
-                srbb[i][j] = src->C_p[i][j] * sincosval.x;
-            }
-
-#endif
         }
     }
 
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        gaussianBlur (sraa, tmaa, src->W, src->H, radius);
-        gaussianBlur (srbb, tmbb, src->W, src->H, radius);
-    }
-
     double chromave = 0.0; // use double precision for large summations
 
-#ifdef __SSE2__
-
-    if( chCurve ) {
-// vectorized precalculation of the atan2 values
-#ifdef _OPENMP
-        #pragma omp parallel
-#endif
-        {
-            int j;
-#ifdef _OPENMP
-            #pragma omp for
-#endif
-
-            for(int i = 0; i < height; i++ )
-            {
-                for(j = 0; j < width - 3; j += 4) {
-                    STVFU(fringe[i * width + j], xatan2f(LVFU(srbb[i][j]), LVFU(sraa[i][j])));
-                }
-
-                for(; j < width; j++) {
-                    fringe[i * width + j] = xatan2f(srbb[i][j], sraa[i][j]);
-                }
-            }
-        }
-    }
-
-#endif
-
 #ifdef _OPENMP
     #pragma omp parallel
 #endif
     {
-        float chromaChfactor = 1.0f;
+        gaussianBlur(sraa, tmaa, width, height, radius);
+        gaussianBlur(srbb, tmbb, width, height, radius);
+
+        float chromaChfactor = 1.f;
 #ifdef _OPENMP
-        #pragma omp for reduction(+:chromave)
+        #pragma omp for reduction(+:chromave) schedule(dynamic,16)
 #endif
 
-        for(int i = 0; i < height; i++ ) {
-            for(int j = 0; j < width; j++) {
-                if (chCurve) {
+        for (int i = 0; i < height; i++) {
 #ifdef __SSE2__
-                    // use the precalculated atan values
-                    float HH = fringe[i * width + j];
-#else
-                    // no precalculated values without SSE => calculate
-                    float HH = xatan2f(srbb[i][j], sraa[i][j]);
-#endif
-                    float chparam = float((chCurve->getVal((Color::huelab_to_huehsv2(HH))) - 0.5f) * 2.0f); //get C=f(H)
-
-                    if(chparam > 0.f) {
-                        chparam /= 2.f;    // reduced action if chparam > 0
-                    }
-
-                    chromaChfactor = 1.0f + chparam;
+            // vectorized per row precalculation of the atan2 values
+            if (chCurve) {
+                int j = 0;
+                for (; j < width - 3; j += 4) {
+                    STVFU(fringe[i * width + j], xatan2f(LVFU(srbb[i][j]), LVFU(sraa[i][j])));
                 }
 
-                float chroma = SQR(chromaChfactor * (sraa[i][j] - tmaa[i][j])) + SQR(chromaChfactor * (srbb[i][j] - tmbb[i][j])); //modulate chroma function hue
+                for (; j < width; j++) {
+                    fringe[i * width + j] = xatan2f(srbb[i][j], sraa[i][j]);
+                }
+            }
+#endif
+
+            for (int j = 0; j < width; j++) {
+                if (chCurve) {
+#ifdef __SSE2__
+                    // use the precalculated atan2 values
+                    const float HH = fringe[i * width + j];
+#else
+                    // no precalculated values without SSE => calculate
+                    const float HH = xatan2f(srbb[i][j], sraa[i][j]);
+#endif
+                    float chparam = chCurve->getVal(Color::huelab_to_huehsv2(HH)) - 0.5f; //get C=f(H)
+
+                    if (chparam < 0.f) {
+                        chparam *= 2.f;    // increase action if chparam < 0
+                    }
+
+                    chromaChfactor = SQR(1.f + chparam);
+                }
+
+                const float chroma = chromaChfactor * (SQR(sraa[i][j] - tmaa[i][j]) + SQR(srbb[i][j] - tmbb[i][j])); //modulate chroma function hue
                 chromave += chroma;
                 fringe[i * width + j] = chroma;
             }
         }
     }
 
-    chromave /= (height * width);
-    float threshfactor = SQR(thresh / 33.f) * chromave * 5.0f; // Calculated once to eliminate mult inside the next loop
+    chromave /= height * width;
 
-// now chromave is calculated, so we postprocess fringe to reduce the number of divisions in future
-#ifdef __SSE2__
+    if (chromave > 0.0) {
+        // now as chromave is calculated, we postprocess fringe to reduce the number of divisions in future
 #ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        __m128 sumv = F2V( chromave + eps2 );
-        __m128 onev = F2V( 1.0f );
-#ifdef _OPENMP
-        #pragma omp for
+        #pragma omp parallel for simd
 #endif
 
-        for(int j = 0; j < width * height - 3; j += 4) {
-            STVFU(fringe[j], onev / (LVFU(fringe[j]) + sumv));
+        for (int j = 0; j < width * height; j++) {
+            fringe[j] = 1.f / (fringe[j] + chromave);
         }
-    }
 
-    for(int j = width * height - (width * height) % 4; j < width * height; j++) {
-        fringe[j] = 1.f / (fringe[j] + chromave + eps2);
-    }
-
-#else
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-
-    for(int j = 0; j < width * height; j++) {
-        fringe[j] = 1.f / (fringe[j] + chromave + eps2);
-    }
-
-#endif
-
-    // because we changed the values of fringe we also have to recalculate threshfactor
-    threshfactor = 1.0f / (threshfactor + chromave + eps2);
+        const float threshfactor = 1.f / (SQR(thresh / 33.f) * chromave * 5.0f + chromave);
+        const int halfwin = std::ceil(2 * radius) + 1;
 
 // Issue 1674:
-// often, CA isn't evenly distributed, e.g. a lot in contrasty regions and none in the sky.
+// often, colour fringe is not evenly distributed, e.g. a lot in contrasty regions and none in the sky.
 // so it's better to schedule dynamic and let every thread only process 16 rows, to avoid running big threads out of work
 // Measured it and in fact gives better performance than without schedule(dynamic,16). Of course, there could be a better
 // choice for the chunk_size than 16
 // Issue 1972: Split this loop in three parts to avoid most of the min and max-operations
-#ifdef _OPENMP
-    #pragma omp parallel for schedule(dynamic,16)
-#endif
-
-    for(int i = 0; i < height; i++ ) {
-        int j;
-
-        for(j = 0; j < halfwin - 1; j++) {
-            tmaa[i][j] = sraa[i][j];
-            tmbb[i][j] = srbb[i][j];
-
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
-
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = 0; j1 < j + halfwin; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * sraa[i1][j1];
-                        btot += wt * srbb[i1][j1];
-                        norm += wt;
-                    }
-
-                if(norm > 0.f) {
-                    tmaa[i][j] = (atot / norm);
-                    tmbb[i][j] = (btot / norm);
-                }
-            }
-        }
-
-        for(; j < width - halfwin + 1; j++) {
-            tmaa[i][j] = sraa[i][j];
-            tmbb[i][j] = srbb[i][j];
-
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
-
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * sraa[i1][j1];
-                        btot += wt * srbb[i1][j1];
-                        norm += wt;
-                    }
-
-                if(norm > 0.f) {
-                    tmaa[i][j] = (atot / norm);
-                    tmbb[i][j] = (btot / norm);
-                }
-            }
-        }
-
-        for(; j < width; j++) {
-            tmaa[i][j] = sraa[i][j];
-            tmbb[i][j] = srbb[i][j];
-
-            if (fringe[i * width + j] < threshfactor) {
-                float atot = 0.f;
-                float btot = 0.f;
-                float norm = 0.f;
-                float wt;
-
-                for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                    for (int j1 = j - halfwin + 1; j1 < width; j1++) {
-                        //neighborhood average of pixels weighted by chrominance
-                        wt = fringe[i1 * width + j1];
-                        atot += wt * sraa[i1][j1];
-                        btot += wt * srbb[i1][j1];
-                        norm += wt;
-                    }
-
-                if(norm > 0.f) {
-                    tmaa[i][j] = (atot / norm);
-                    tmbb[i][j] = (btot / norm);
-                }
-            }
-        }
-    } //end of ab channel averaging
-
 
 #ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-#ifdef __SSE2__
-        int j;
-        __m128 interav, interbv;
-        __m128 piidv = F2V(piid);
-#endif
-#ifdef _OPENMP
-        #pragma omp for
-#endif
-
-        for(int i = 0; i < height; i++ ) {
-#ifdef __SSE2__
-
-            for(j = 0; j < width - 3; j += 4) {
-                STVFU(dst->sh_p[i][j], LVFU(src->sh_p[i][j]));
-                interav = LVFU(tmaa[i][j]);
-                interbv = LVFU(tmbb[i][j]);
-                STVFU(dst->h_p[i][j], (xatan2f(interbv, interav)) / piidv);
-                STVFU(dst->C_p[i][j], vsqrtf(SQRV(interbv) + SQRV(interav)));
-            }
-
-            for(; j < width; j++) {
-                dst->sh_p[i][j] = src->sh_p[i][j];
-                float intera = tmaa[i][j];
-                float interb = tmbb[i][j];
-                dst->h_p[i][j] = (xatan2f(interb, intera)) / piid;
-                dst->C_p[i][j] = sqrt(SQR(interb) + SQR(intera));
-            }
-
-#else
-
-            for(int j = 0; j < width; j++) {
-                dst->sh_p[i][j] = src->sh_p[i][j];
-                float intera = tmaa[i][j];
-                float interb = tmbb[i][j];
-                dst->h_p[i][j] = (xatan2f(interb, intera)) / piid;
-                dst->C_p[i][j] = sqrt(SQR(interb) + SQR(intera));
-            }
-
-#endif
-        }
-    }
-
-    for (int i = 0; i < height; i++) {
-        delete [] sraa[i];
-    }
-
-    delete [] sraa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] srbb[i];
-    }
-
-    delete [] srbb;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmaa[i];
-    }
-
-    delete [] tmaa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmbb[i];
-    }
-
-    delete [] tmbb;
-
-    if(chCurve) {
-        delete chCurve;
-    }
-
-    free(fringe);
-}
-
-void ImProcFunctions::Badpixelscam(CieImage * src, CieImage * dst, double radius, int thresh, int mode, float skinprot, float chrom, int hotbad)
-{
-    const int halfwin = ceil(2 * radius) + 1;
-    MyTime t1, t2;
-    t1.set();
-
-    const int width = src->W, height = src->H;
-    const float piid = 3.14159265f / 180.f;
-
-    int i1, j1;
-    const float eps = 1.0f;
-    const float eps2 = 0.01f;
-
-    float** sraa;
-    sraa = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        sraa[i] = new float[width];
-    }
-
-    float** srbb;
-    srbb = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        srbb[i] = new float[width];
-    }
-
-    float** tmaa;
-    tmaa = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmaa[i] = new float[width];
-    }
-
-    float** tmbb;
-    tmbb = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmbb[i] = new float[width];
-    }
-
-    float* badpix = (float*)malloc(width * height * sizeof(float));
-
-    float** tmL;
-    tmL = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmL[i] = new float[width];
-    }
-
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        float2 sincosval;
-#ifdef __SSE2__
-        int j;
-        vfloat2 sincosvalv;
-        __m128 piidv = F2V(piid);
-#endif // __SSE2__
-#ifdef _OPENMP
-        #pragma omp for
+        #pragma omp parallel for schedule(dynamic,16)
 #endif
 
         for (int i = 0; i < height; i++) {
-#ifdef __SSE2__
-
-            for (j = 0; j < width - 3; j += 4) {
-                sincosvalv = xsincosf(piidv * LVFU(src->h_p[i][j]));
-                STVFU(sraa[i][j], LVFU(src->C_p[i][j])*sincosvalv.y);
-                STVFU(srbb[i][j], LVFU(src->C_p[i][j])*sincosvalv.x);
-            }
-
-            for (; j < width; j++) {
-                sincosval = xsincosf(piid * src->h_p[i][j]);
-                sraa[i][j] = src->C_p[i][j] * sincosval.y;
-                srbb[i][j] = src->C_p[i][j] * sincosval.x;
-            }
-
-#else
-
-            for (int j = 0; j < width; j++) {
-                sincosval = xsincosf(piid * src->h_p[i][j]);
-                sraa[i][j] = src->C_p[i][j] * sincosval.y;
-                srbb[i][j] = src->C_p[i][j] * sincosval.x;
-            }
-
-#endif
-        }
-    }
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        //chroma a and b
-        if(mode == 2) { //choice of gaussian blur
-            gaussianBlur (sraa, tmaa, src->W, src->H, radius);
-            gaussianBlur (srbb, tmbb, src->W, src->H, radius);
-        }
-
-        //luma sh_p
-        gaussianBlur (src->sh_p, tmL, src->W, src->H, 2.0);//low value to avoid artifacts
-    }
-
-    if(mode == 1) { //choice of median
-        #pragma omp parallel
-        {
-            int ip, in, jp, jn;
-            #pragma omp for nowait  //nowait because next loop inside this parallel region is independent on this one
-
-            for (int i = 0; i < height; i++) {
-                if (i < 2) {
-                    ip = i + 2;
-                } else {
-                    ip = i - 2;
-                }
-
-                if (i > height - 3) {
-                    in = i - 2;
-                } else {
-                    in = i + 2;
-                }
-
-                for (int j = 0; j < width; j++) {
-                    if (j < 2) {
-                        jp = j + 2;
-                    } else {
-                        jp = j - 2;
-                    }
-
-                    if (j > width - 3) {
-                        jn = j - 2;
-                    } else {
-                        jn = j + 2;
-                    }
-
-                    tmaa[i][j] = median(sraa[ip][jp], sraa[ip][j], sraa[ip][jn], sraa[i][jp], sraa[i][j], sraa[i][jn], sraa[in][jp], sraa[in][j], sraa[in][jn]);
-                }
-            }
-
-            #pragma omp for
-
-            for (int i = 0; i < height; i++) {
-                if (i < 2) {
-                    ip = i + 2;
-                } else {
-                    ip = i - 2;
-                }
-
-                if (i > height - 3) {
-                    in = i - 2;
-                } else {
-                    in = i + 2;
-                }
-
-                for (int j = 0; j < width; j++) {
-                    if (j < 2) {
-                        jp = j + 2;
-                    } else {
-                        jp = j - 2;
-                    }
-
-                    if (j > width - 3) {
-                        jn = j - 2;
-                    } else {
-                        jn = j + 2;
-                    }
-
-                    tmbb[i][j] = median(srbb[ip][jp], srbb[ip][j], srbb[ip][jn], srbb[i][jp], srbb[i][j], srbb[i][jn], srbb[in][jp], srbb[in][j], srbb[in][jn]);
-                }
-            }
-        }
-    }
-
-//luma badpixels
-    const float sh_thr = 4.5f;//low value for luma sh_p to avoid artifacts
-    const float shthr = sh_thr / 24.0f;
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-#ifdef __SSE2__
-        __m128 shfabsv, shmedv;
-        __m128 shthrv = F2V(shthr);
-        __m128 onev = F2V(1.0f);
-#endif // __SSE2__
-#ifdef _OPENMP
-        #pragma omp for private(i1,j1)
-#endif
-
-        for (int i = 0; i < height; i++) {
-            for (j = 0; j < 2; j++) {
-                float shfabs = fabs(src->sh_p[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = 0; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->sh_p[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-
-#ifdef __SSE2__
-
-            for (; j < width - 5; j += 4) {
-                shfabsv = vabsf(LVFU(src->sh_p[i][j]) - LVFU(tmL[i][j]));
-                shmedv = ZEROV;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmedv += vabsf(LVFU(src->sh_p[i1][j1]) - LVFU(tmL[i1][j1]));
-                    }
-
-                STVFU(badpix[i * width + j], vself(vmaskf_gt(shfabsv, (shmedv - shfabsv)*shthrv), onev, ZEROV));
-            }
-
-            for (; j < width - 2; j++) {
-                float shfabs = fabs(src->sh_p[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->sh_p[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-
-#else
-
-            for (; j < width - 2; j++) {
-                float shfabs = fabs(src->sh_p[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->sh_p[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-
-#endif
-
-            for (; j < width; j++) {
-                float shfabs = fabs(src->sh_p[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 < width; j1++ ) {
-                        shmed += fabs(src->sh_p[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-        }
-    }
-
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-#ifdef _OPENMP
-        #pragma omp for private(i1,j1) schedule(dynamic,16)
-#endif
-
-        for (int i = 0; i < height; i++) {
-            for (j = 0; j < 2; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
-
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = 0; j1 <= j + 2; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
-                        }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->sh_p[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->sh_p[i1][j1] - src->sh_p[i][j]) + eps);
-                        shsum += dirsh * src->sh_p[i1][j1];
-                        norm += dirsh;
-                    }
-
-                if (norm > 0.f) {
-                    src->sh_p[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->sh_p[i][j] = sum / tot;
-                    }
-                }
-            }
-
-            for (; j < width - 2; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
-
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
-                        }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->sh_p[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->sh_p[i1][j1] - src->sh_p[i][j]) + eps);
-                        shsum += dirsh * src->sh_p[i1][j1];
-                        norm += dirsh;
-                    }
-
-                if (norm > 0.f) {
-                    src->sh_p[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->sh_p[i][j] = sum / tot;
-                    }
-                }
-            }
-
-            for (; j < width; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
-
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 < width; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
-                        }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->sh_p[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->sh_p[i1][j1] - src->sh_p[i][j]) + eps);
-                        shsum += dirsh * src->sh_p[i1][j1];
-                        norm += dirsh;
-                    }
-
-                if (norm > 0.f) {
-                    src->sh_p[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->sh_p[i][j] = sum / tot;
-                    }
-                }
-            }
-        }
-    }
-// end luma badpixels
-
-
-// begin chroma badpixels
-    double chrommed = 0.0; // use double precision for large summations
-#ifdef _OPENMP
-    #pragma omp parallel for reduction(+:chrommed)
-#endif
-
-    for(int i = 0; i < height; i++ ) {
-        for(int j = 0; j < width; j++) {
-            float chroma = SQR(sraa[i][j] - tmaa[i][j]) + SQR(srbb[i][j] - tmbb[i][j]);
-            chrommed += chroma;
-            badpix[i * width + j] = chroma;
-        }
-    }
-
-    chrommed /= (height * width);
-    float threshfactor = (thresh * chrommed) / 33.f;
-
-// now chrommed is calculated, so we postprocess badpix to reduce the number of divisions in future
-#ifdef __SSE2__
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-        __m128 sumv = F2V( chrommed + eps2 );
-        __m128 onev = F2V( 1.0f );
-#ifdef _OPENMP
-        #pragma omp for
-#endif
-
-        for(int i = 0; i < height; i++) {
-            for(j = 0; j < width - 3; j += 4) {
-                STVFU(badpix[i * width + j], onev / (LVFU(badpix[i * width + j]) + sumv));
-            }
-
-            for(; j < width; j++) {
-                badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed + eps2);
-            }
-        }
-    }
-#else
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-
-    for(int i = 0; i < height; i++)
-        for(int j = 0; j < width; j++) {
-            badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed + eps2);
-        }
-
-#endif
-
-    // because we changed the values of badpix we also have to recalculate threshfactor
-    threshfactor = 1.0f / (threshfactor + chrommed + eps2);
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-#ifdef _OPENMP
-        #pragma omp for schedule(dynamic,16)
-#endif
-
-        for(int i = 0; i < height; i++ ) {
-            for(j = 0; j < halfwin; j++) {
-                tmaa[i][j] = sraa[i][j];
-                tmbb[i][j] = srbb[i][j];
-
-                if (badpix[i * width + j] < threshfactor) {
-                    float atot = 0.f;
-                    float btot = 0.f;
-                    float norm = 0.f;
-                    float wt;
-
-                    for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
+            int j = 0;
+            for (; j < halfwin - 1; j++) {
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
                         for (int j1 = 0; j1 < j + halfwin; j1++) {
-                            wt = badpix[i1 * width + j1];
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
                             atot += wt * sraa[i1][j1];
                             btot += wt * srbb[i1][j1];
                             norm += wt;
                         }
-
-                    if(norm > 0.f) {
-                        tmaa[i][j] = (atot / norm);
-                        tmbb[i][j] = (btot / norm);
-                    }
-                }
-            }
-
-            for(; j < width - halfwin; j++) {
-                tmaa[i][j] = sraa[i][j];
-                tmbb[i][j] = srbb[i][j];
-
-                if (badpix[i * width + j] < threshfactor) {
-                    float atot = 0.f;
-                    float btot = 0.f;
-                    float norm = 0.f;
-                    float wt;
-
-                    for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                        for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
-                            wt = badpix[i1 * width + j1];
-                            atot += wt * sraa[i1][j1];
-                            btot += wt * srbb[i1][j1];
-                            norm += wt;
-                        }
-
-                    if(norm > 0.f) {
-                        tmaa[i][j] = (atot / norm);
-                        tmbb[i][j] = (btot / norm);
-                    }
-                }
-            }
-
-            for(; j < width; j++) {
-                tmaa[i][j] = sraa[i][j];
-                tmbb[i][j] = srbb[i][j];
-
-                if (badpix[i * width + j] < threshfactor) {
-                    float atot = 0.f;
-                    float btot = 0.f;
-                    float norm = 0.f;
-                    float wt;
-
-                    for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                        for (int j1 = j - halfwin + 1; j1 < width; j1++) {
-                            wt = badpix[i1 * width + j1];
-                            atot += wt * sraa[i1][j1];
-                            btot += wt * srbb[i1][j1];
-                            norm += wt;
-                        }
-
-                    if(norm > 0.f) {
-                        tmaa[i][j] = (atot / norm);
-                        tmbb[i][j] = (btot / norm);
-                    }
-                }
-            }
-        }
-    }
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-#ifdef _OPENMP
-        #pragma omp for
-#endif
-
-        for(int i = 0; i < height; i++ ) {
-            for(int j = 0; j < width; j++) {
-                float intera = tmaa[i][j];
-                float interb = tmbb[i][j];
-                float CC = sqrt(SQR(interb) + SQR(intera));
-
-                if(hotbad == 0) {
-                    if(CC < chrom && skinprot != 0.f) {
-                        dst->h_p[i][j] = (xatan2f(interb, intera)) / piid;
-                        dst->C_p[i][j] = sqrt(SQR(interb) + SQR(intera));
                     }
+                    tmaa[i][j] = atot / norm;
+                    tmbb[i][j] = btot / norm;
                 } else {
-                    dst->h_p[i][j] = (xatan2f(interb, intera)) / piid;
-                    dst->C_p[i][j] = sqrt(SQR(interb) + SQR(intera));
+                    tmaa[i][j] = sraa[i][j];
+                    tmbb[i][j] = srbb[i][j];
                 }
             }
-        }
-    }
 
-    if(src != dst) {
+            for (; j < width - halfwin + 1; j++) {
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
+                            atot += wt * sraa[i1][j1];
+                            btot += wt * srbb[i1][j1];
+                            norm += wt;
+                        }
+                    }
+                    tmaa[i][j] = atot / norm;
+                    tmbb[i][j] = btot / norm;
+                } else {
+                    tmaa[i][j] = sraa[i][j];
+                    tmbb[i][j] = srbb[i][j];
+               }
+            }
+
+            for (; j < width; j++) {
+                if (fringe[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f,  norm = 0.f;
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = j - halfwin + 1; j1 < width; j1++) {
+                            // neighbourhood average of pixels weighted by chrominance
+                            const float wt = fringe[i1 * width + j1];
+                            atot += wt * sraa[i1][j1];
+                            btot += wt * srbb[i1][j1];
+                            norm += wt;
+                        }
+                    }
+                    tmaa[i][j] = atot / norm;
+                    tmbb[i][j] = btot / norm;
+                } else {
+                    tmaa[i][j] = sraa[i][j];
+                    tmbb[i][j] = srbb[i][j];
+                }
+            }
+        } // end of ab channel averaging
 #ifdef _OPENMP
         #pragma omp parallel for
 #endif
+        for(int i = 0; i < height; i++) {
+            int j = 0;
+#ifdef __SSE2__
 
-        for(int i = 0; i < height; i++ )
-            for(int j = 0; j < width; j++) {
-                dst->sh_p[i][j] = src->sh_p[i][j];
+            for (; j < width - 3; j += 4) {
+                const vfloat interav = LVFU(tmaa[i][j]);
+                const vfloat interbv = LVFU(tmbb[i][j]);
+                STVFU(ncie->h_p[i][j], xatan2f(interbv, interav) / F2V(RT_PI_F_180));
+                STVFU(ncie->C_p[i][j], vsqrtf(SQRV(interbv) + SQRV(interav)));
             }
+#endif
+            for (; j < width; j++) {
+                const float intera = tmaa[i][j];
+                const float interb = tmbb[i][j];
+                ncie->h_p[i][j] = xatan2f(interb, intera) / RT_PI_F_180;
+                ncie->C_p[i][j] = sqrt(SQR(interb) + SQR(intera));
+            }
+        }
     }
-
-
-    for (int i = 0; i < height; i++) {
-        delete [] sraa[i];
-    }
-
-    delete [] sraa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] srbb[i];
-    }
-
-    delete [] srbb;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmaa[i];
-    }
-
-    delete [] tmaa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmbb[i];
-    }
-
-    delete [] tmbb;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmL[i];
-    }
-
-    delete [] tmL;
-
-    free(badpix);
-
-    t2.set();
-
-    if( settings->verbose ) {
-        printf("Ciecam badpixels:- %d usec\n", t2.etime(t1));
-    }
-
-
 }
 
-void ImProcFunctions::BadpixelsLab(LabImage * src, LabImage * dst, double radius, int thresh, int mode, float skinprot, float chrom)
+// CIECAM02 hot/bad pixel filter
+void ImProcFunctions::Badpixelscam(CieImage * ncie, double radius, int thresh, int mode, float chrom, bool hotbad)
 {
-    const int halfwin = ceil(2 * radius) + 1;
-    MyTime t1, t2;
-    t1.set();
-
-    const int width = src->W, height = src->H;
-
-    int i1, j1;
-    const float eps = 1.0f;
-    const float eps2 = 0.01f;
-
-    float** sraa;
-    sraa = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        sraa[i] = new float[width];
+    BENCHFUN
+    if (mode == 2 && radius < 0.25) { // for gauss sigma less than 0.25 gaussianblur() just calls memcpy => nothing to do here
+        return;
     }
 
-    float** srbb;
-    srbb = new float*[height];
+    const int width = ncie->W, height = ncie->H;
 
-    for (int i = 0; i < height; i++) {
-        srbb[i] = new float[width];
-    }
+    constexpr float eps = 1.f;
 
-    float** tmaa;
-    tmaa = new float*[height];
+    JaggedArray<float> tmL(width, height);
 
-    for (int i = 0; i < height; i++) {
-        tmaa[i] = new float[width];
-    }
-
-    float** tmbb;
-    tmbb = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmbb[i] = new float[width];
-    }
-
-    float* badpix = (float*)malloc(width * height * sizeof(float));
-
-    float** tmL;
-    tmL = new float*[height];
-
-    for (int i = 0; i < height; i++) {
-        tmL[i] = new float[width];
-    }
+    const std::unique_ptr<float[]> badpix(new float[width * height]);
 
+    if (radius >= 0.5) { // for gauss sigma less than 0.25 gaussianblur() just calls memcpy => nothing to do here
+        // luma badpixels
+        // for bad pixels in sh channel we need 0 / != 0 information. Use 1 byte per pixel instead of 4 to reduce memory pressure
+        uint8_t *badpixb = reinterpret_cast<uint8_t*>(badpix.get());
+        constexpr float sh_thr = 4.5f; // low value for luma sh_p to avoid artifacts
+        constexpr float shthr = sh_thr / 24.0f; // divide by 24 because we are using a 5x5 grid and centre point is excluded from summation
 
 #ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-//  float2 sincosval;
-#ifdef __SSE2__
-        int j;
-//  vfloat2 sincosvalv;
-//  __m128 piidv = F2V(piid);
-#endif // __SSE2__
-#ifdef _OPENMP
-        #pragma omp for
-#endif
-
-        for (int i = 0; i < height; i++) {
-#ifdef __SSE2__
-
-            for (j = 0; j < width - 3; j += 4) {
-                STVFU(sraa[i][j], LVFU(src->a[i][j]));
-                STVFU(srbb[i][j], LVFU(src->b[i][j]));
-            }
-
-            for (; j < width; j++) {
-                sraa[i][j] = src->a[i][j];
-                srbb[i][j] = src->b[i][j];
-            }
-
-#else
-
-            for (int j = 0; j < width; j++) {
-                sraa[i][j] = src->a[i][j];
-                srbb[i][j] = src->b[i][j];
-            }
-
-#endif
-        }
-    }
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        //chroma a and b
-        if(mode >= 2) { //choice of gaussian blur
-            gaussianBlur (sraa, tmaa, src->W, src->H, radius);
-            gaussianBlur (srbb, tmbb, src->W, src->H, radius);
-        }
-
-        //luma sh_p
-        gaussianBlur (src->L, tmL, src->W, src->H, 2.0);//low value to avoid artifacts
-    }
-
-    if(mode == 1) { //choice of median
         #pragma omp parallel
+#endif
         {
-            int ip, in, jp, jn;
-            #pragma omp for nowait  //nowait because next loop inside this parallel region is independent on this one
-
-            for (int i = 0; i < height; i++) {
-                if (i < 2) {
-                    ip = i + 2;
-                } else {
-                    ip = i - 2;
-                }
-
-                if (i > height - 3) {
-                    in = i - 2;
-                } else {
-                    in = i + 2;
-                }
-
-                for (int j = 0; j < width; j++) {
-                    if (j < 2) {
-                        jp = j + 2;
-                    } else {
-                        jp = j - 2;
-                    }
-
-                    if (j > width - 3) {
-                        jn = j - 2;
-                    } else {
-                        jn = j + 2;
-                    }
-
-                    tmaa[i][j] = median(sraa[ip][jp], sraa[ip][j], sraa[ip][jn], sraa[i][jp], sraa[i][j], sraa[i][jn], sraa[in][jp], sraa[in][j], sraa[in][jn]);
-                }
-            }
+            //luma sh_p
+            gaussianBlur(ncie->sh_p, tmL, width, height, radius / 2.0); // low value to avoid artifacts
 
+#ifdef __SSE2__
+            const vfloat shthrv = F2V(shthr);
+#endif
+#ifdef _OPENMP
             #pragma omp for
+#endif
 
             for (int i = 0; i < height; i++) {
-                if (i < 2) {
-                    ip = i + 2;
-                } else {
-                    ip = i - 2;
-                }
+                int j = 0;
+                for (; j < 2; j++) {
+                    const float shfabs = std::fabs(ncie->sh_p[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
 
-                if (i > height - 3) {
-                    in = i - 2;
-                } else {
-                    in = i + 2;
-                }
-
-                for (int j = 0; j < width; j++) {
-                    if (j < 2) {
-                        jp = j + 2;
-                    } else {
-                        jp = j - 2;
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = 0; j1 <= j + 2; j1++) {
+                            shmed += std::fabs(ncie->sh_p[i1][j1] - tmL[i1][j1]);
+                        }
                     }
 
-                    if (j > width - 3) {
-                        jn = j - 2;
-                    } else {
-                        jn = j + 2;
-                    }
-
-                    tmbb[i][j] = median(srbb[ip][jp], srbb[ip][j], srbb[ip][jn], srbb[i][jp], srbb[i][j], srbb[i][jn], srbb[in][jp], srbb[in][j], srbb[in][jn]);
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
                 }
-            }
-        }
-    }
-
-//luma badpixels
-    const float sh_thr = 4.5f;//low value for luma sh_p to avoid artifacts
-    const float shthr = sh_thr / 24.0f;
-
-#ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-#ifdef __SSE2__
-        __m128 shfabsv, shmedv;
-        __m128 shthrv = F2V(shthr);
-        __m128 onev = F2V(1.0f);
-#endif // __SSE2__
-#ifdef _OPENMP
-        #pragma omp for private(i1,j1)
-#endif
-
-        for (int i = 0; i < height; i++) {
-            for (j = 0; j < 2; j++) {
-                float shfabs = fabs(src->L[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = 0; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->L[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
 
 #ifdef __SSE2__
 
-            for (; j < width - 5; j += 4) {
-                shfabsv = vabsf(LVFU(src->L[i][j]) - LVFU(tmL[i][j]));
-                shmedv = ZEROV;
+                for (; j < width - 5; j += 4) {
+                    const vfloat shfabsv = vabsf(LVFU(ncie->sh_p[i][j]) - LVFU(tmL[i][j]));
+                    vfloat shmedv = ZEROV;
 
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmedv += vabsf(LVFU(src->L[i1][j1]) - LVFU(tmL[i1][j1]));
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            shmedv += vabsf(LVFU(ncie->sh_p[i1][j1]) - LVFU(tmL[i1][j1]));
+                        }
                     }
 
-                STVFU(badpix[i * width + j], vself(vmaskf_gt(shfabsv, (shmedv - shfabsv)*shthrv), onev, ZEROV));
-            }
-
-            for (; j < width - 2; j++) {
-                float shfabs = fabs(src->L[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->L[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-
-#else
-
-            for (; j < width - 2; j++) {
-                float shfabs = fabs(src->L[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        shmed += fabs(src->L[i1][j1] - tmL[i1][j1]);
-                    }
-
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
-            }
-
+                    uint8_t mask = _mm_movemask_ps((vfloat)vmaskf_gt(shfabsv, (shmedv - shfabsv) * shthrv));
+                    badpixb[i * width + j] = mask & 1;
+                    badpixb[i * width + j + 1] = mask & 2;
+                    badpixb[i * width + j + 2] = mask & 4;
+                    badpixb[i * width + j + 3] = mask & 8;
+                }
 #endif
+                for (; j < width - 2; j++) {
+                    const float shfabs = std::fabs(ncie->sh_p[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
 
-            for (; j < width; j++) {
-                float shfabs = fabs(src->L[i][j] - tmL[i][j]);
-                float shmed = 0.0f;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 < width; j1++ ) {
-                        shmed += fabs(src->L[i1][j1] - tmL[i1][j1]);
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            shmed += std::fabs(ncie->sh_p[i1][j1] - tmL[i1][j1]);
+                        }
                     }
 
-                badpix[i * width + j] = (shfabs > ((shmed - shfabs) * shthr));
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
+                }
+
+                for (; j < width; j++) {
+                    const float shfabs = std::fabs(ncie->sh_p[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
+
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 < width; j1++) {
+                            shmed += std::fabs(ncie->sh_p[i1][j1] - tmL[i1][j1]);
+                        }
+                    }
+
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
+                }
             }
         }
-    }
-
 
 #ifdef _OPENMP
-    #pragma omp parallel
-#endif
-    {
-        int j;
-#ifdef _OPENMP
-        #pragma omp for private(i1,j1) schedule(dynamic,16)
+        #pragma omp parallel for schedule(dynamic,16)
 #endif
 
         for (int i = 0; i < height; i++) {
-            for (j = 0; j < 2; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
+            int j = 0;
+            for (; j < 2; j++) {
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
 
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = 0; j1 <= j + 2; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = 0; j1 <= j + 2; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += ncie->sh_p[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(ncie->sh_p[i1][j1] - ncie->sh_p[i][j]) + eps);
+                                shsum += dirsh * ncie->sh_p[i1][j1];
+                                norm += dirsh;
+                            }
                         }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->L[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->L[i1][j1] - src->L[i][j]) + eps);
-                        shsum += dirsh * src->L[i1][j1];
-                        norm += dirsh;
                     }
-
-                if (norm > 0.f) {
-                    src->L[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->L[i][j] = sum / tot;
+                    if (norm > 0.f) {
+                        ncie->sh_p[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        ncie->sh_p[i][j] = sum / tot;
                     }
                 }
             }
 
             for (; j < width - 2; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
 
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 <= j + 2; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += ncie->sh_p[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(ncie->sh_p[i1][j1] - ncie->sh_p[i][j]) + eps);
+                                shsum += dirsh * ncie->sh_p[i1][j1];
+                                norm += dirsh;
+                            }
                         }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->L[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->L[i1][j1] - src->L[i][j]) + eps);
-                        shsum += dirsh * src->L[i1][j1];
-                        norm += dirsh;
                     }
-
-                if (norm > 0.f) {
-                    src->L[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->L[i][j] = sum / tot;
+                    if (norm > 0.f) {
+                        ncie->sh_p[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        ncie->sh_p[i][j] = sum / tot;
                     }
                 }
             }
 
             for (; j < width; j++) {
-                if (!badpix[i * width + j]) {
-                    continue;
-                }
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
 
-                float norm = 0.0f;
-                float shsum = 0.0f;
-                float sum = 0.0f;
-                int tot = 0;
-
-                for (i1 = max(0, i - 2); i1 <= min(i + 2, height - 1); i1++ )
-                    for (j1 = j - 2; j1 < width; j1++ ) {
-                        if (i1 == i && j1 == j) {
-                            continue;
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 < width; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += ncie->sh_p[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(ncie->sh_p[i1][j1] - ncie->sh_p[i][j]) + eps);
+                                shsum += dirsh * ncie->sh_p[i1][j1];
+                                norm += dirsh;
+                            }
                         }
-
-                        if (badpix[i1 * width + j1]) {
-                            continue;
-                        }
-
-                        sum += src->L[i1][j1];
-                        tot++;
-                        float dirsh = 1.f / (SQR(src->L[i1][j1] - src->L[i][j]) + eps);
-                        shsum += dirsh * src->L[i1][j1];
-                        norm += dirsh;
                     }
-
-                if (norm > 0.f) {
-                    src->L[i][j] = shsum / norm;
-                } else {
-                    if(tot > 0) {
-                        src->L[i][j] = sum / tot;
+                    if (norm > 0.f) {
+                        ncie->sh_p[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        ncie->sh_p[i][j] = sum / tot;
                     }
                 }
             }
         }
-    }
-// end luma badpixels
+    } // end luma badpixels
 
-    if(mode == 3) {
-// begin chroma badpixels
+    if (hotbad) {
+        JaggedArray<float> sraa(width, height);
+        JaggedArray<float> srbb(width, height);
+
+#ifdef _OPENMP
+        #pragma omp parallel
+#endif
+        {
+
+#ifdef __SSE2__
+            const vfloat piDiv180v = F2V(RT_PI_F_180);
+#endif
+#ifdef _OPENMP
+            #pragma omp for
+#endif
+
+            for (int i = 0; i < height; i++) {
+                int j = 0;
+#ifdef __SSE2__
+
+                for (; j < width - 3; j += 4) {
+                    const vfloat2 sincosvalv = xsincosf(piDiv180v * LVFU(ncie->h_p[i][j]));
+                    STVFU(sraa[i][j], LVFU(ncie->C_p[i][j])*sincosvalv.y);
+                    STVFU(srbb[i][j], LVFU(ncie->C_p[i][j])*sincosvalv.x);
+                }
+#endif
+                for (; j < width; j++) {
+                    const float2 sincosval = xsincosf(RT_PI_F_180 * ncie->h_p[i][j]);
+                    sraa[i][j] = ncie->C_p[i][j] * sincosval.y;
+                    srbb[i][j] = ncie->C_p[i][j] * sincosval.x;
+                }
+            }
+        }
+
+        float** const tmaa = tmL; // reuse tmL buffer
+        JaggedArray<float> tmbb(width, height);
+
+        if (mode == 2) { // choice of gaussian blur
+#ifdef _OPENMP
+            #pragma omp parallel
+#endif
+            {
+                //chroma a and b
+                gaussianBlur(sraa, tmaa, width, height, radius);
+                gaussianBlur(srbb, tmbb, width, height, radius);
+            }
+
+        } else if (mode == 1) { // choice of median
+#ifdef _OPENMP
+            #pragma omp parallel
+#endif
+            {
+#ifdef _OPENMP
+                #pragma omp for nowait // nowait because next loop inside this parallel region is independent on this one
+#endif
+
+                for (int i = 0; i < height; i++) {
+                    const int ip = i < 2 ? i + 2 : i - 2;
+                    const int in = i > height - 3 ? i - 2 : i + 2;
+
+                    for (int j = 0; j < width; j++) {
+                        const int jp = j < 2 ? j + 2 : j -2;
+                        const int jn = j > width - 3 ? j - 2 : j + 2;
+
+                        tmaa[i][j] = median(sraa[ip][jp], sraa[ip][j], sraa[ip][jn], sraa[i][jp], sraa[i][j], sraa[i][jn], sraa[in][jp], sraa[in][j], sraa[in][jn]);
+                    }
+                }
+
+#ifdef _OPENMP
+                #pragma omp for
+#endif
+                for (int i = 0; i < height; i++) {
+                    const int ip = i < 2 ? i + 2 : i - 2;
+                    const int in = i > height - 3 ? i - 2 : i + 2;
+
+                    for (int j = 0; j < width; j++) {
+                        const int jp = j < 2 ? j + 2 : j -2;
+                        const int jn = j > width - 3 ? j - 2 : j + 2;
+
+                        tmbb[i][j] = median(srbb[ip][jp], srbb[ip][j], srbb[ip][jn], srbb[i][jp], srbb[i][j], srbb[i][jn], srbb[in][jp], srbb[in][j], srbb[in][jn]);
+                    }
+                }
+            }
+        }
+
+        // begin chroma badpixels
         double chrommed = 0.0; // use double precision for large summations
 #ifdef _OPENMP
         #pragma omp parallel for reduction(+:chrommed)
 #endif
 
-        for(int i = 0; i < height; i++ ) {
-            for(int j = 0; j < width; j++) {
-                float chroma = SQR(sraa[i][j] - tmaa[i][j]) + SQR(srbb[i][j] - tmbb[i][j]);
+        for (int i = 0; i < height; i++) {
+            for (int j = 0; j < width; j++) {
+                const float chroma = SQR(sraa[i][j] - tmaa[i][j]) + SQR(srbb[i][j] - tmbb[i][j]);
                 chrommed += chroma;
                 badpix[i * width + j] = chroma;
             }
         }
 
-        chrommed /= (height * width);
-        float threshfactor = (thresh * chrommed) / 33.f;
+        chrommed /= height * width;
 
-// now chrommed is calculated, so we postprocess badpix to reduce the number of divisions in future
+        if (chrommed > 0.0) {
+            // now as chrommed is calculated, we postprocess badpix to reduce the number of divisions in future
+            const float threshfactor = 1.f / ((thresh * chrommed) / 33.f + chrommed);
+            const int halfwin = std::ceil(2 * radius) + 1;
+#ifdef _OPENMP
+            #pragma omp parallel
+#endif
+            {
 #ifdef __SSE2__
+                const vfloat chrommedv = F2V(chrommed);
+                const vfloat onev = F2V(1.f);
+#endif
+#ifdef _OPENMP
+                #pragma omp for
+#endif
+
+                for (int i = 0; i < height; i++) {
+                    int j = 0;
+#ifdef __SSE2__
+                    for (; j < width - 3; j += 4) {
+                        STVFU(badpix[i * width + j], onev / (LVFU(badpix[i * width + j]) + chrommedv));
+                    }
+#endif
+                    for (; j < width; j++) {
+                        badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed);
+                    }
+                }
+
+#ifdef _OPENMP
+                #pragma omp for schedule(dynamic,16)
+#endif
+
+                for (int i = 0; i < height; i++) {
+                    int j = 0;
+                    for (; j < halfwin; j++) {
+
+                        if (badpix[i * width + j] < threshfactor) {
+                            float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                            for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                                for (int j1 = 0; j1 < j + halfwin; j1++) {
+                                    const float wt = badpix[i1 * width + j1];
+                                    atot += wt * sraa[i1][j1];
+                                    btot += wt * srbb[i1][j1];
+                                    norm += wt;
+                                }
+                            }
+                            const float intera = atot / norm;
+                            const float interb = btot / norm;
+                            const float CC = sqrt(SQR(interb) + SQR(intera));
+
+                            if (CC < chrom) {
+                                ncie->h_p[i][j] = xatan2f(interb, intera) / RT_PI_F_180;
+                                ncie->C_p[i][j] = CC;
+                            }
+                        }
+                    }
+
+#ifdef __SSE2__
+                    const vfloat threshfactorv = F2V(threshfactor);
+                    const vfloat chromv = F2V(chrom);
+                    const vfloat piDiv180v = F2V(RT_PI_F_180);
+                    for (; j < width - halfwin - 3; j+=4) {
+
+                        vmask selMask = vmaskf_lt(LVFU(badpix[i * width + j]), threshfactorv);
+                        if (_mm_movemask_ps((vfloat)selMask)) {
+                            vfloat atotv = ZEROV, btotv = ZEROV, normv = ZEROV;
+
+                            for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                                for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                                    const vfloat wtv = LVFU(badpix[i1 * width + j1]);
+                                    atotv += wtv * LVFU(sraa[i1][j1]);
+                                    btotv += wtv * LVFU(srbb[i1][j1]);
+                                    normv += wtv;
+                                }
+                            }
+                            const vfloat interav = atotv / normv;
+                            const vfloat interbv = btotv / normv;
+                            const vfloat CCv = vsqrtf(SQRV(interbv) + SQRV(interav));
+
+                            selMask = vandm(selMask, vmaskf_lt(CCv, chromv));
+                            if (_mm_movemask_ps((vfloat)selMask)) {
+                                STVFU(ncie->h_p[i][j], vself(selMask, xatan2f(interbv, interav) / piDiv180v, LVFU(ncie->h_p[i][j])));
+                                STVFU(ncie->C_p[i][j], vself(selMask, CCv, LVFU(ncie->C_p[i][j])));
+                            }
+                        }
+                    }
+#endif
+                    for (; j < width - halfwin; j++) {
+
+                        if (badpix[i * width + j] < threshfactor) {
+                            float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                            for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                                for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                                    const float wt = badpix[i1 * width + j1];
+                                    atot += wt * sraa[i1][j1];
+                                    btot += wt * srbb[i1][j1];
+                                    norm += wt;
+                                }
+                            }
+                            const float intera = atot / norm;
+                            const float interb = btot / norm;
+                            const float CC = sqrt(SQR(interb) + SQR(intera));
+
+                            if (CC < chrom) {
+                                ncie->h_p[i][j] = xatan2f(interb, intera) / RT_PI_F_180;
+                                ncie->C_p[i][j] = CC;
+                            }
+                        }
+                    }
+
+                    for (; j < width; j++) {
+
+                        if (badpix[i * width + j] < threshfactor) {
+                            float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                            for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                                for (int j1 = j - halfwin + 1; j1 < width; j1++) {
+                                    const float wt = badpix[i1 * width + j1];
+                                    atot += wt * sraa[i1][j1];
+                                    btot += wt * srbb[i1][j1];
+                                    norm += wt;
+                                }
+                            }
+                            const float intera = atot / norm;
+                            const float interb = btot / norm;
+                            const float CC = sqrt(SQR(interb) + SQR(intera));
+
+                            if (CC < chrom) {
+                                ncie->h_p[i][j] = xatan2f(interb, intera) / RT_PI_F_180;
+                                ncie->C_p[i][j] = CC;
+                            }
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
+
+// CbDL reduce artifacts
+void ImProcFunctions::BadpixelsLab(LabImage * lab, double radius, int thresh, float chrom)
+{
+    BENCHFUN
+
+    if (radius < 0.25) { // for gauss sigma less than 0.25 gaussianblur() just calls memcpy => nothing to do here
+        return;
+    }
+
+    const int halfwin = std::ceil(2 * radius) + 1;
+
+    const int width = lab->W, height = lab->H;
+
+    constexpr float eps = 1.f;
+
+    JaggedArray<float> tmL(width, height);
+
+    const std::unique_ptr<float[]> badpix(new float[width * height]);
+
+    if (radius >= 0.5) { // for gauss sigma less than 0.25 gaussianblur() just calls memcpy => nothing to do here
+        //luma badpixels
+        // for bad pixels in L channel we need 0 / != 0 information. Use 1 byte per pixel instead of 4 to reduce memory pressure
+        uint8_t *badpixb = reinterpret_cast<uint8_t*>(badpix.get());
+        constexpr float sh_thr = 4.5f; // low value for luma L to avoid artifacts
+        constexpr float shthr = sh_thr / 24.0f; // divide by 24 because we are using a 5x5 grid and centre point is excluded from summation
+
 #ifdef _OPENMP
         #pragma omp parallel
 #endif
         {
-            int j;
-            __m128 sumv = F2V( chrommed + eps2 );
-            __m128 onev = F2V( 1.0f );
+            // blur L channel
+            gaussianBlur(lab->L, tmL, width, height, radius / 2.0); // low value to avoid artifacts
+
+#ifdef __SSE2__
+            const vfloat shthrv = F2V(shthr);
+#endif
 #ifdef _OPENMP
             #pragma omp for
 #endif
 
-            for(int i = 0; i < height; i++) {
-                for(j = 0; j < width - 3; j += 4) {
-                    STVFU(badpix[i * width + j], onev / (LVFU(badpix[i * width + j]) + sumv));
-                }
+            for (int i = 0; i < height; i++) {
+                int j = 0;
+                for (; j < 2; j++) {
+                    const float shfabs = std::fabs(lab->L[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
 
-                for(; j < width; j++) {
-                    badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed + eps2);
-                }
-            }
-        }
-#else
-#ifdef _OPENMP
-        #pragma omp parallel for
-#endif
-
-        for(int i = 0; i < height; i++)
-            for(int j = 0; j < width; j++) {
-                badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed + eps2);
-            }
-
-#endif
-
-        // because we changed the values of badpix we also have to recalculate threshfactor
-        threshfactor = 1.0f / (threshfactor + chrommed + eps2);
-
-#ifdef _OPENMP
-        #pragma omp parallel
-#endif
-        {
-            int j;
-#ifdef _OPENMP
-            #pragma omp for schedule(dynamic,16)
-#endif
-
-            for(int i = 0; i < height; i++ ) {
-                for(j = 0; j < halfwin; j++) {
-                    tmaa[i][j] = sraa[i][j];
-                    tmbb[i][j] = srbb[i][j];
-
-                    if (badpix[i * width + j] < threshfactor) {
-                        float atot = 0.f;
-                        float btot = 0.f;
-                        float norm = 0.f;
-                        float wt;
-
-                        for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                            for (int j1 = 0; j1 < j + halfwin; j1++) {
-                                wt = badpix[i1 * width + j1];
-                                atot += wt * sraa[i1][j1];
-                                btot += wt * srbb[i1][j1];
-                                norm += wt;
-                            }
-
-                        if(norm > 0.f) {
-                            tmaa[i][j] = (atot / norm);
-                            tmbb[i][j] = (btot / norm);
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = 0; j1 <= j + 2; j1++) {
+                            shmed += std::fabs(lab->L[i1][j1] - tmL[i1][j1]);
                         }
                     }
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
                 }
 
-                for(; j < width - halfwin; j++) {
-                    tmaa[i][j] = sraa[i][j];
-                    tmbb[i][j] = srbb[i][j];
+#ifdef __SSE2__
 
-                    if (badpix[i * width + j] < threshfactor) {
-                        float atot = 0.f;
-                        float btot = 0.f;
-                        float norm = 0.f;
-                        float wt;
+                for (; j < width - 5; j += 4) {
+                    const vfloat shfabsv = vabsf(LVFU(lab->L[i][j]) - LVFU(tmL[i][j]));
+                    vfloat shmedv = ZEROV;
 
-                        for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                            for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
-                                wt = badpix[i1 * width + j1];
-                                atot += wt * sraa[i1][j1];
-                                btot += wt * srbb[i1][j1];
-                                norm += wt;
-                            }
-
-                        if(norm > 0.f) {
-                            tmaa[i][j] = (atot / norm);
-                            tmbb[i][j] = (btot / norm);
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            shmedv += vabsf(LVFU(lab->L[i1][j1]) - LVFU(tmL[i1][j1]));
                         }
                     }
+                    uint8_t mask = _mm_movemask_ps((vfloat)vmaskf_gt(shfabsv, (shmedv - shfabsv) * shthrv));
+                    badpixb[i * width + j] = mask & 1;
+                    badpixb[i * width + j + 1] = mask & 2;
+                    badpixb[i * width + j + 2] = mask & 4;
+                    badpixb[i * width + j + 3] = mask & 8;
+                }
+#endif
+                for (; j < width - 2; j++) {
+                    const float shfabs = std::fabs(lab->L[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
+
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            shmed += std::fabs(lab->L[i1][j1] - tmL[i1][j1]);
+                        }
+                    }
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
                 }
 
-                for(; j < width; j++) {
-                    tmaa[i][j] = sraa[i][j];
-                    tmbb[i][j] = srbb[i][j];
+                for (; j < width; j++) {
+                    const float shfabs = std::fabs(lab->L[i][j] - tmL[i][j]);
+                    float shmed = 0.f;
 
-                    if (badpix[i * width + j] < threshfactor) {
-                        float atot = 0.f;
-                        float btot = 0.f;
-                        float norm = 0.f;
-                        float wt;
-
-                        for (int i1 = max(0, i - halfwin + 1); i1 < min(height, i + halfwin); i1++)
-                            for (int j1 = j - halfwin + 1; j1 < width; j1++) {
-                                wt = badpix[i1 * width + j1];
-                                atot += wt * sraa[i1][j1];
-                                btot += wt * srbb[i1][j1];
-                                norm += wt;
-                            }
-
-                        if(norm > 0.f) {
-                            tmaa[i][j] = (atot / norm);
-                            tmbb[i][j] = (btot / norm);
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 < width; j1++) {
+                            shmed += std::fabs(lab->L[i1][j1] - tmL[i1][j1]);
                         }
                     }
+                    badpixb[i * width + j] = shfabs > ((shmed - shfabs) * shthr);
                 }
             }
         }
 
+#ifdef _OPENMP
+        #pragma omp for schedule(dynamic,16)
+#endif
+
+        for (int i = 0; i < height; i++) {
+            int j = 0;
+            for (; j < 2; j++) {
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
+
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = 0; j1 <= j + 2; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += lab->L[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(lab->L[i1][j1] - lab->L[i][j]) + eps);
+                                shsum += dirsh * lab->L[i1][j1];
+                                norm += dirsh;
+                            }
+                        }
+                    }
+                    if (norm > 0.f) {
+                        lab->L[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        lab->L[i][j] = sum / tot;
+                    }
+                }
+            }
+
+            for (; j < width - 2; j++) {
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
+
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 <= j + 2; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += lab->L[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(lab->L[i1][j1] - lab->L[i][j]) + eps);
+                                shsum += dirsh * lab->L[i1][j1];
+                                norm += dirsh;
+                            }
+                        }
+                    }
+                    if (norm > 0.f) {
+                        lab->L[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        lab->L[i][j] = sum / tot;
+                    }
+                }
+            }
+
+            for (; j < width; j++) {
+                if (badpixb[i * width + j]) {
+                    float norm = 0.f, shsum = 0.f, sum = 0.f, tot = 0.f;
+
+                    for (int i1 = std::max(0, i - 2); i1 <= std::min(i + 2, height - 1); i1++) {
+                        for (int j1 = j - 2; j1 < width; j1++) {
+                            if (!badpixb[i1 * width + j1]) {
+                                sum += lab->L[i1][j1];
+                                tot += 1.f;
+                                const float dirsh = 1.f / (SQR(lab->L[i1][j1] - lab->L[i][j]) + eps);
+                                shsum += dirsh * lab->L[i1][j1];
+                                norm += dirsh;
+                            }
+                        }
+                    }
+                    if (norm > 0.f) {
+                        lab->L[i][j] = shsum / norm;
+                    } else if (tot > 0.f) {
+                        lab->L[i][j] = sum / tot;
+                    }
+                }
+            }
+        }
+    } // end luma badpixels
+
+    float** const tmaa = tmL; // reuse tmL buffer
+    JaggedArray<float> tmbb(width, height);
+
+#ifdef _OPENMP
+    #pragma omp parallel
+#endif
+    {
+        // blur chroma a and b
+        gaussianBlur(lab->a, tmaa, width, height, radius);
+        gaussianBlur(lab->b, tmbb, width, height, radius);
+    }
+
+    // begin chroma badpixels
+    double chrommed = 0.0; // use double precision for large summations
+
+#ifdef _OPENMP
+    #pragma omp parallel for reduction(+:chrommed)
+#endif
+
+    for (int i = 0; i < height; i++) {
+        for (int j = 0; j < width; j++) {
+            const float chroma = SQR(lab->a[i][j] - tmaa[i][j]) + SQR(lab->b[i][j] - tmbb[i][j]);
+            chrommed += chroma;
+            badpix[i * width + j] = chroma;
+        }
+    }
+
+    chrommed /= height * width;
+
+    if (chrommed > 0.0) {
+    // now as chrommed is calculated, we postprocess badpix to reduce the number of divisions in future
+
 #ifdef _OPENMP
         #pragma omp parallel
 #endif
         {
+#ifdef __SSE2__
+            const vfloat chrommedv = F2V(chrommed);
+            const vfloat onev = F2V(1.f);
+#endif
 #ifdef _OPENMP
             #pragma omp for
 #endif
 
-            for(int i = 0; i < height; i++ ) {
-                for(int j = 0; j < width; j++) {
-                    float intera = tmaa[i][j];
-                    float interb = tmbb[i][j];
-                    float CC = sqrt(SQR(interb / 327.68) + SQR(intera / 327.68f));
+            for (int i = 0; i < height; i++) {
+                int j = 0;
+#ifdef __SSE2__
+                for (; j < width - 3; j += 4) {
+                    STVFU(badpix[i * width + j], onev / (LVFU(badpix[i * width + j]) + chrommedv));
+                }
+#endif
+                for (; j < width; j++) {
+                    badpix[i * width + j] = 1.f / (badpix[i * width + j] + chrommed);
+                }
+            }
+        }
 
-                    if(CC < chrom && skinprot != 0.f) {
-                        dst->a[i][j] = intera;
-                        dst->b[i][j] = interb;
+        const float threshfactor = 1.f / ((thresh * chrommed) / 33.f + chrommed);
+
+        chrom *= 327.68f;
+        chrom *= chrom;
+
+#ifdef _OPENMP
+        #pragma omp parallel for schedule(dynamic,16)
+#endif
+
+        for (int i = 0; i < height; i++) {
+            int j = 0;
+            for (; j < halfwin; j++) {
+                if (badpix[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = 0; j1 < j + halfwin; j1++) {
+                            const float wt = badpix[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
+                    }
+                    if (SQR(atot) + SQR(btot) < chrom * SQR(norm)) {
+                        lab->a[i][j] = atot / norm;
+                        lab->b[i][j] = btot / norm;
+                    }
+                }
+            }
+
+#ifdef __SSE2__
+            const vfloat chromv = F2V(chrom);
+            const vfloat threshfactorv = F2V(threshfactor);
+            for (; j < width - halfwin - 3; j += 4) {
+                vmask selMask = vmaskf_lt(LVFU(badpix[i * width + j]), threshfactorv);
+                if (_mm_movemask_ps(reinterpret_cast<vfloat>(selMask))) {
+                    vfloat atotv = ZEROV, btotv = ZEROV, normv = ZEROV;
+
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                            const vfloat wtv = LVFU(badpix[i1 * width + j1]);
+                            atotv += wtv * LVFU(lab->a[i1][j1]);
+                            btotv += wtv * LVFU(lab->b[i1][j1]);
+                            normv += wtv;
+                        }
+                    }
+                    selMask = vandm(selMask, vmaskf_lt(SQRV(atotv) + SQR(btotv), chromv * SQRV(normv)));
+                    if (_mm_movemask_ps(reinterpret_cast<vfloat>(selMask))) {
+                        const vfloat aOrig = LVFU(lab->a[i][j]);
+                        const vfloat bOrig = LVFU(lab->b[i][j]);
+                        STVFU(lab->a[i][j], vself(selMask, atotv / normv, aOrig));
+                        STVFU(lab->b[i][j], vself(selMask, btotv / normv, bOrig));
+                    }
+                }
+            }
+#endif
+            for (; j < width - halfwin; j++) {
+
+                if (badpix[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = j - halfwin + 1; j1 < j + halfwin; j1++) {
+                            const float wt = badpix[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
+                    }
+                    if (SQR(atot) + SQR(btot) < chrom * SQR(norm)) {
+                        lab->a[i][j] = atot / norm;
+                        lab->b[i][j] = btot / norm;
+                    }
+                }
+            }
+
+            for (; j < width; j++) {
+
+                if (badpix[i * width + j] < threshfactor) {
+                    float atot = 0.f, btot = 0.f, norm = 0.f;
+
+                    for (int i1 = std::max(0, i - halfwin + 1); i1 < std::min(height, i + halfwin); i1++) {
+                        for (int j1 = j - halfwin + 1; j1 < width; j1++) {
+                            const float wt = badpix[i1 * width + j1];
+                            atot += wt * lab->a[i1][j1];
+                            btot += wt * lab->b[i1][j1];
+                            norm += wt;
+                        }
+                    }
+                    if (SQR(atot) + SQR(btot) < chrom * SQR(norm)) {
+                        lab->a[i][j] = atot / norm;
+                        lab->b[i][j] = btot / norm;
                     }
                 }
             }
         }
     }
-
-    if(src != dst) {
-#ifdef _OPENMP
-        #pragma omp parallel for
-#endif
-
-        for(int i = 0; i < height; i++ )
-            for(int j = 0; j < width; j++) {
-                dst->L[i][j] = src->L[i][j];
-            }
-    }
-
-
-    for (int i = 0; i < height; i++) {
-        delete [] sraa[i];
-    }
-
-    delete [] sraa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] srbb[i];
-    }
-
-    delete [] srbb;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmaa[i];
-    }
-
-    delete [] tmaa;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmbb[i];
-    }
-
-    delete [] tmbb;
-
-    for (int i = 0; i < height; i++) {
-        delete [] tmL[i];
-    }
-
-    delete [] tmL;
-
-    free(badpix);
-
-    t2.set();
-
-    if( settings->verbose ) {
-        printf("Lab artifacts:- %d usec\n", t2.etime(t1));
-    }
-
-
 }
 
 }
diff --git a/rtengine/dcraw.cc b/rtengine/dcraw.cc
index 871fcf7c5..49792ada9 100644
--- a/rtengine/dcraw.cc
+++ b/rtengine/dcraw.cc
@@ -2353,30 +2353,35 @@ void CLASS unpacked_load_raw()
 // RT
 void CLASS sony_arq_load_raw()
 {
-  static unsigned frame2pos[] = { 0, 1, 3, 2 };
-  int row, col, bits=0;
-  ushort samples[4];
-  unsigned frame = frame2pos[shot_select];
+    constexpr unsigned frame2pos[] = { 0, 1, 3, 2 };
+    const unsigned frame = frame2pos[shot_select];
 
-  while (1 << ++bits < maximum);
-  for (row=0; row < ((frame < 2) ? 1 : raw_height); row++) {
-    for (col=0; col < ((row == 0) ? raw_width : 1); col++) {
-      RAW(row,col) = 0;
+    // allocate memory for a row of pixels
+    ushort *samples = new ushort[4 * raw_width];
+
+    int bits = 0;
+    while (1 << ++bits < maximum)
+        ;
+    ++bits;
+    bits = (1 << bits) - 1;
+
+    for (int row = 0; row < ((frame < 2) ? 1 : raw_height); row++) {
+        for (int col = 0; col < ((row == 0) ? raw_width : 1); col++) {
+            RAW(row, col) = 0;
+        }
     }
-  }
-  for (row=0; row < raw_height; row++) {
-    int r = row + (frame & 1);
-    for (col=0; col < raw_width; col++) {
-      int c = col + ((frame >> 1) & 1);
-      read_shorts(samples, 4);
-      if (r < raw_height && c < raw_width) {
-          RAW(r,c) = samples[(2 * (r & 1)) + (c & 1)];
-          if ((RAW(r,c) >>= load_flags) >> bits
-              && (unsigned) (row-top_margin) < height
-              && (unsigned) (col-left_margin) < width) derror();
-      }
+
+    for (int row = 0; row < raw_height; ++row) {
+        int r = row + (frame & 1);
+        read_shorts(samples, 4 * raw_width);
+        if (r < raw_height) {
+            int offset = 2 * (r & 1);
+            for (int c = (frame >> 1) & 1; c < raw_width; ++c, offset = (offset + 4) ^ 1) {
+                RAW(r, c) = samples[offset] & bits;
+            }
+        }
     }
-  }
+    delete [] samples;
 }
 
 
diff --git a/rtengine/demosaic_algos.cc b/rtengine/demosaic_algos.cc
index 368a8edde..89387a9b4 100644
--- a/rtengine/demosaic_algos.cc
+++ b/rtengine/demosaic_algos.cc
@@ -92,7 +92,7 @@ void RawImageSource::eahd_demosaic ()
 
     // end of cielab preparation
 
-    const JaggedArray<float>
+    JaggedArray<float>
     rh (W, 3), gh (W, 4), bh (W, 3),
     rv (W, 3), gv (W, 4), bv (W, 3),
     lLh (W, 3), lah (W, 3), lbh (W, 3),
@@ -497,7 +497,7 @@ void RawImageSource::hphd_demosaic ()
         plistener->setProgress (0.0);
     }
 
-    const JaggedArray<float> hpmap (W, H, true);
+    JaggedArray<float> hpmap (W, H, true);
 
 #ifdef _OPENMP
     #pragma omp parallel
@@ -3927,298 +3927,6 @@ void RawImageSource::cielab (const float (*rgb)[3], float* l, float* a, float *b
 }
 
 
-/**
-* RATIO CORRECTED DEMOSAICING
-* Luis Sanz Rodriguez (luis.sanz.rodriguez(at)gmail(dot)com)
-*
-* Release 2.3 @ 171125
-*
-* Original code from https://github.com/LuisSR/RCD-Demosaicing
-* Licensed under the GNU GPL version 3
-*/
-void RawImageSource::rcd_demosaic()
-{
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgressStr(Glib::ustring::compose(M("TP_RAW_DMETHOD_PROGRESSBAR"), "rcd"));
-        plistener->setProgress(0);
-    }
-    
-    int width = W, height = H;
-
-    std::vector<float> cfa(width * height);
-    std::vector<std::array<float, 3>> rgb(width * height);
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for (int row = 0; row < height; row++) {
-        for (int col = 0, indx = row * width + col; col < width; col++, indx++) {
-            int c = FC(row, col);
-            cfa[indx] = rgb[indx][c] = LIM01(rawData[row][col] / 65535.f);
-        }
-    }
-
-    if (plistener) {
-        plistener->setProgress(0.05);
-    }
-    // ------------------------------------------------------------------------
-/* RT    
-    int row, col, indx, c;
-*/    
-    int w1 = width, w2 = 2 * width, w3 = 3 * width, w4 = 4 * width;
-
-    //Tolerance to avoid dividing by zero
-    static const float eps = 1e-5, epssq = 1e-10;
-
-/* RT    
-    //Gradients
-    float N_Grad, E_Grad, W_Grad, S_Grad, NW_Grad, NE_Grad, SW_Grad, SE_Grad;
-
-    //Pixel estimation
-    float N_Est, E_Est, W_Est, S_Est, NW_Est, NE_Est, SW_Est, SE_Est, V_Est, H_Est, P_Est, Q_Est;
-
-    //Directional discrimination
-    //float V_Stat, H_Stat, P_Stat, Q_Stat;
-    float VH_Central_Value, VH_Neighbour_Value, PQ_Central_Value, PQ_Neighbour_Value;
-*/    
-    float *VH_Dir, *PQ_Dir;
-
-    //Low pass filter
-    float *lpf;
-
-
-    /**
-    * STEP 1: Find cardinal and diagonal interpolation directions
-    */
-
-    VH_Dir = ( float ( * ) ) calloc( width * height, sizeof *VH_Dir ); //merror ( VH_Dir, "rcd_demosaicing_171117()" );
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for (int row = 4; row < height - 4; row++ ) {
-        for (int col = 4, indx = row * width + col; col < width - 4; col++, indx++ ) {
-            //Calculate h/v local discrimination
-            float V_Stat = max(epssq, - 18.0f  *  cfa[indx] * cfa[indx - w1] - 18.0f * cfa[indx] * cfa[indx + w1] - 36.0f * cfa[indx] * cfa[indx - w2] - 36.0f * cfa[indx] * cfa[indx + w2] + 18.0f * cfa[indx] * cfa[indx - w3] + 18.0f * cfa[indx] * cfa[indx + w3] - 2.0f * cfa[indx] * cfa[indx - w4] - 2.0f * cfa[indx] * cfa[indx + w4] + 38.0f * cfa[indx] * cfa[indx] - 70.0f * cfa[indx - w1] * cfa[indx + w1] - 12.0f * cfa[indx - w1] * cfa[indx - w2] + 24.0f * cfa[indx - w1] * cfa[indx + w2] - 38.0f * cfa[indx - w1] * cfa[indx - w3] + 16.0f * cfa[indx - w1] * cfa[indx + w3] + 12.0f * cfa[indx - w1] * cfa[indx - w4] - 6.0f * cfa[indx - w1] * cfa[indx + w4] + 46.0f * cfa[indx - w1] * cfa[indx - w1] + 24.0f * cfa[indx + w1] * cfa[indx - w2] - 12.0f * cfa[indx + w1] * cfa[indx + w2] + 16.0f * cfa[indx + w1] * cfa[indx - w3] - 38.0f * cfa[indx + w1] * cfa[indx + w3] - 6.0f * cfa[indx + w1] * cfa[indx - w4] + 12.0f * cfa[indx + w1] * cfa[indx + w4] + 46.0f * cfa[indx + w1] * cfa[indx + w1] + 14.0f * cfa[indx - w2] * cfa[indx + w2] - 12.0f * cfa[indx - w2] * cfa[indx + w3] - 2.0f * cfa[indx - w2] * cfa[indx - w4] + 2.0f * cfa[indx - w2] * cfa[indx + w4] + 11.0f * cfa[indx - w2] * cfa[indx - w2] - 12.0f * cfa[indx + w2] * cfa[indx - w3] + 2.0f * cfa[indx + w2] * cfa[indx - w4] - 2.0f * cfa[indx + w2] * cfa[indx + w4] + 11.0f * cfa[indx + w2] * cfa[indx + w2] + 2.0f * cfa[indx - w3] * cfa[indx + w3] - 6.0f * cfa[indx - w3] * cfa[indx - w4] + 10.0f * cfa[indx - w3] * cfa[indx - w3] - 6.0f * cfa[indx + w3] * cfa[indx + w4] + 10.0f * cfa[indx + w3] * cfa[indx + w3] + 1.0f * cfa[indx - w4] * cfa[indx - w4] + 1.0f * cfa[indx + w4] * cfa[indx + w4]);
-            
-            float H_Stat = max(epssq, - 18.0f  *  cfa[indx] * cfa[indx -  1] - 18.0f * cfa[indx] * cfa[indx +  1] - 36.0f * cfa[indx] * cfa[indx -  2] - 36.0f * cfa[indx] * cfa[indx +  2] + 18.0f * cfa[indx] * cfa[indx -  3] + 18.0f * cfa[indx] * cfa[indx +  3] - 2.0f * cfa[indx] * cfa[indx -  4] - 2.0f * cfa[indx] * cfa[indx +  4] + 38.0f * cfa[indx] * cfa[indx] - 70.0f * cfa[indx -  1] * cfa[indx +  1] - 12.0f * cfa[indx -  1] * cfa[indx -  2] + 24.0f * cfa[indx -  1] * cfa[indx +  2] - 38.0f * cfa[indx -  1] * cfa[indx -  3] + 16.0f * cfa[indx -  1] * cfa[indx +  3] + 12.0f * cfa[indx -  1] * cfa[indx -  4] - 6.0f * cfa[indx -  1] * cfa[indx +  4] + 46.0f * cfa[indx -  1] * cfa[indx -  1] + 24.0f * cfa[indx +  1] * cfa[indx -  2] - 12.0f * cfa[indx +  1] * cfa[indx +  2] + 16.0f * cfa[indx +  1] * cfa[indx -  3] - 38.0f * cfa[indx +  1] * cfa[indx +  3] - 6.0f * cfa[indx +  1] * cfa[indx -  4] + 12.0f * cfa[indx +  1] * cfa[indx +  4] + 46.0f * cfa[indx +  1] * cfa[indx +  1] + 14.0f * cfa[indx -  2] * cfa[indx +  2] - 12.0f * cfa[indx -  2] * cfa[indx +  3] - 2.0f * cfa[indx -  2] * cfa[indx -  4] + 2.0f * cfa[indx -  2] * cfa[indx +  4] + 11.0f * cfa[indx -  2] * cfa[indx -  2] - 12.0f * cfa[indx +  2] * cfa[indx -  3] + 2.0f * cfa[indx +  2] * cfa[indx -  4] - 2.0f * cfa[indx +  2] * cfa[indx +  4] + 11.0f * cfa[indx +  2] * cfa[indx +  2] + 2.0f * cfa[indx -  3] * cfa[indx +  3] - 6.0f * cfa[indx -  3] * cfa[indx -  4] + 10.0f * cfa[indx -  3] * cfa[indx -  3] - 6.0f * cfa[indx +  3] * cfa[indx +  4] + 10.0f * cfa[indx +  3] * cfa[indx +  3] + 1.0f * cfa[indx -  4] * cfa[indx -  4] + 1.0f * cfa[indx +  4] * cfa[indx +  4]);
-
-            VH_Dir[indx] = V_Stat / (V_Stat + H_Stat);
-        }
-    }
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.2);
-    }
-    // -------------------------------------------------------------------------
-
-    /**
-    * STEP 2: Calculate the low pass filter
-    */
-
-    // Step 2.1: Low pass filter incorporating green, red and blue local samples from the raw data
-    lpf = ( float ( * ) ) calloc( width * height, sizeof *lpf ); //merror ( lpf, "rcd_demosaicing_171125()" );
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif    
-    for ( int row = 2; row < height - 2; row++ ) {
-        for ( int col = 2 + (FC( row, 0 ) & 1), indx = row * width + col; col < width - 2; col += 2, indx += 2 ) {
-
-            lpf[indx] = 0.25f * cfa[indx] + 0.125f * ( cfa[indx - w1] + cfa[indx + w1] + cfa[indx - 1] + cfa[indx + 1] ) + 0.0625f * ( cfa[indx - w1 - 1] + cfa[indx - w1 + 1] + cfa[indx + w1 - 1] + cfa[indx + w1 + 1] );
-
-        }
-    }
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.4);
-    }
-    // ------------------------------------------------------------------------
-
-    /**
-    * STEP 3: Populate the green channel
-    */
-
-    // Step 3.1: Populate the green channel at blue and red CFA positions
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-   for ( int row = 4; row < height - 4; row++ ) {
-       for ( int col = 4 + (FC( row, 0 ) & 1), indx = row * width + col; col < width - 4; col += 2, indx += 2 ) {
-
-            // Refined vertical and horizontal local discrimination
-            float VH_Central_Value   = VH_Dir[indx];
-            float VH_Neighbourhood_Value = 0.25f * ( VH_Dir[indx - w1 - 1] + VH_Dir[indx - w1 + 1] + VH_Dir[indx + w1 - 1] + VH_Dir[indx + w1 + 1] );
-
-            float VH_Disc = ( fabs( 0.5f - VH_Central_Value ) < fabs( 0.5f - VH_Neighbourhood_Value ) ) ? VH_Neighbourhood_Value : VH_Central_Value;
-
-            // Cardinal gradients
-            float N_Grad = eps + fabs( cfa[indx - w1] - cfa[indx + w1] ) + fabs( cfa[indx] - cfa[indx - w2] ) + fabs( cfa[indx - w1] - cfa[indx - w3] ) + fabs( cfa[indx - w2] - cfa[indx - w4] );
-            float S_Grad = eps + fabs( cfa[indx + w1] - cfa[indx - w1] ) + fabs( cfa[indx] - cfa[indx + w2] ) + fabs( cfa[indx + w1] - cfa[indx + w3] ) + fabs( cfa[indx + w2] - cfa[indx + w4] );
-            float W_Grad = eps + fabs( cfa[indx -  1] - cfa[indx +  1] ) + fabs( cfa[indx] - cfa[indx -  2] ) + fabs( cfa[indx -  1] - cfa[indx -  3] ) + fabs( cfa[indx -  2] - cfa[indx -  4] );
-            float E_Grad = eps + fabs( cfa[indx +  1] - cfa[indx -  1] ) + fabs( cfa[indx] - cfa[indx +  2] ) + fabs( cfa[indx +  1] - cfa[indx +  3] ) + fabs( cfa[indx +  2] - cfa[indx +  4] );
-
-            // Cardinal pixel estimations
-            float N_Est = cfa[indx - w1] * ( 1.f + ( lpf[indx] - lpf[indx - w2] ) / ( eps + lpf[indx] + lpf[indx - w2] ) );
-            float S_Est = cfa[indx + w1] * ( 1.f + ( lpf[indx] - lpf[indx + w2] ) / ( eps + lpf[indx] + lpf[indx + w2] ) );
-            float W_Est = cfa[indx -  1] * ( 1.f + ( lpf[indx] - lpf[indx -  2] ) / ( eps + lpf[indx] + lpf[indx -  2] ) );
-            float E_Est = cfa[indx +  1] * ( 1.f + ( lpf[indx] - lpf[indx +  2] ) / ( eps + lpf[indx] + lpf[indx +  2] ) );
-
-            // Vertical and horizontal estimations
-            float V_Est = ( S_Grad * N_Est + N_Grad * S_Est ) / max(eps, N_Grad + S_Grad );
-            float H_Est = ( W_Grad * E_Est + E_Grad * W_Est ) / max(eps, E_Grad + W_Grad );
-
-            // G@B and G@R interpolation
-            rgb[indx][1] = LIM( VH_Disc * H_Est + ( 1.f - VH_Disc ) * V_Est, 0.f, 1.f );
-
-        }
-    }
-
-    free( lpf );
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.5);
-    }
-    // ------------------------------------------------------------------------
-    
-    /**
-    * STEP 4: Populate the red and blue channels
-    */
-
-    // Step 4.1: Calculate P/Q diagonal local discrimination
-    PQ_Dir = ( float ( * ) ) calloc( width * height, sizeof *PQ_Dir ); //merror ( PQ_Dir, "rcd_demosaicing_171125()" );
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for ( int row = 4; row < height - 4; row++ ) {
-        for ( int col = 4 + (FC( row, 0 ) & 1), indx = row * width + col; col < width - 4; col += 2, indx += 2 ) {
-
-            float P_Stat = max( - 18.f * cfa[indx] * cfa[indx - w1 - 1] - 18.f * cfa[indx] * cfa[indx + w1 + 1] - 36.f * cfa[indx] * cfa[indx - w2 - 2] - 36.f * cfa[indx] * cfa[indx + w2 + 2] + 18.f * cfa[indx] * cfa[indx - w3 - 3] + 18.f * cfa[indx] * cfa[indx + w3 + 3] - 2.f * cfa[indx] * cfa[indx - w4 - 4] - 2.f * cfa[indx] * cfa[indx + w4 + 4] + 38.f * cfa[indx] * cfa[indx] - 70.f * cfa[indx - w1 - 1] * cfa[indx + w1 + 1] - 12.f * cfa[indx - w1 - 1] * cfa[indx - w2 - 2] + 24.f * cfa[indx - w1 - 1] * cfa[indx + w2 + 2] - 38.f * cfa[indx - w1 - 1] * cfa[indx - w3 - 3] + 16.f * cfa[indx - w1 - 1] * cfa[indx + w3 + 3] + 12.f * cfa[indx - w1 - 1] * cfa[indx - w4 - 4] - 6.f * cfa[indx - w1 - 1] * cfa[indx + w4 + 4] + 46.f * cfa[indx - w1 - 1] * cfa[indx - w1 - 1] + 24.f * cfa[indx + w1 + 1] * cfa[indx - w2 - 2] - 12.f * cfa[indx + w1 + 1] * cfa[indx + w2 + 2] + 16.f * cfa[indx + w1 + 1] * cfa[indx - w3 - 3] - 38.f * cfa[indx + w1 + 1] * cfa[indx + w3 + 3] - 6.f * cfa[indx + w1 + 1] * cfa[indx - w4 - 4] + 12.f * cfa[indx + w1 + 1] * cfa[indx + w4 + 4] + 46.f * cfa[indx + w1 + 1] * cfa[indx + w1 + 1] + 14.f * cfa[indx - w2 - 2] * cfa[indx + w2 + 2] - 12.f * cfa[indx - w2 - 2] * cfa[indx + w3 + 3] - 2.f * cfa[indx - w2 - 2] * cfa[indx - w4 - 4] + 2.f * cfa[indx - w2 - 2] * cfa[indx + w4 + 4] + 11.f * cfa[indx - w2 - 2] * cfa[indx - w2 - 2] - 12.f * cfa[indx + w2 + 2] * cfa[indx - w3 - 3] + 2 * cfa[indx + w2 + 2] * cfa[indx - w4 - 4] - 2.f * cfa[indx + w2 + 2] * cfa[indx + w4 + 4] + 11.f * cfa[indx + w2 + 2] * cfa[indx + w2 + 2] + 2.f * cfa[indx - w3 - 3] * cfa[indx + w3 + 3] - 6.f * cfa[indx - w3 - 3] * cfa[indx - w4 - 4] + 10.f * cfa[indx - w3 - 3] * cfa[indx - w3 - 3] - 6.f * cfa[indx + w3 + 3] * cfa[indx + w4 + 4] + 10.f * cfa[indx + w3 + 3] * cfa[indx + w3 + 3] + 1.f * cfa[indx - w4 - 4] * cfa[indx - w4 - 4] + 1.f * cfa[indx + w4 + 4] * cfa[indx + w4 + 4], epssq );
-            float Q_Stat = max( - 18.f * cfa[indx] * cfa[indx + w1 - 1] - 18.f * cfa[indx] * cfa[indx - w1 + 1] - 36.f * cfa[indx] * cfa[indx + w2 - 2] - 36.f * cfa[indx] * cfa[indx - w2 + 2] + 18.f * cfa[indx] * cfa[indx + w3 - 3] + 18.f * cfa[indx] * cfa[indx - w3 + 3] - 2.f * cfa[indx] * cfa[indx + w4 - 4] - 2.f * cfa[indx] * cfa[indx - w4 + 4] + 38.f * cfa[indx] * cfa[indx] - 70.f * cfa[indx + w1 - 1] * cfa[indx - w1 + 1] - 12.f * cfa[indx + w1 - 1] * cfa[indx + w2 - 2] + 24.f * cfa[indx + w1 - 1] * cfa[indx - w2 + 2] - 38.f * cfa[indx + w1 - 1] * cfa[indx + w3 - 3] + 16.f * cfa[indx + w1 - 1] * cfa[indx - w3 + 3] + 12.f * cfa[indx + w1 - 1] * cfa[indx + w4 - 4] - 6.f * cfa[indx + w1 - 1] * cfa[indx - w4 + 4] + 46.f * cfa[indx + w1 - 1] * cfa[indx + w1 - 1] + 24.f * cfa[indx - w1 + 1] * cfa[indx + w2 - 2] - 12.f * cfa[indx - w1 + 1] * cfa[indx - w2 + 2] + 16.f * cfa[indx - w1 + 1] * cfa[indx + w3 - 3] - 38.f * cfa[indx - w1 + 1] * cfa[indx - w3 + 3] - 6.f * cfa[indx - w1 + 1] * cfa[indx + w4 - 4] + 12.f * cfa[indx - w1 + 1] * cfa[indx - w4 + 4] + 46.f * cfa[indx - w1 + 1] * cfa[indx - w1 + 1] + 14.f * cfa[indx + w2 - 2] * cfa[indx - w2 + 2] - 12.f * cfa[indx + w2 - 2] * cfa[indx - w3 + 3] - 2.f * cfa[indx + w2 - 2] * cfa[indx + w4 - 4] + 2.f * cfa[indx + w2 - 2] * cfa[indx - w4 + 4] + 11.f * cfa[indx + w2 - 2] * cfa[indx + w2 - 2] - 12.f * cfa[indx - w2 + 2] * cfa[indx + w3 - 3] + 2 * cfa[indx - w2 + 2] * cfa[indx + w4 - 4] - 2.f * cfa[indx - w2 + 2] * cfa[indx - w4 + 4] + 11.f * cfa[indx - w2 + 2] * cfa[indx - w2 + 2] + 2.f * cfa[indx + w3 - 3] * cfa[indx - w3 + 3] - 6.f * cfa[indx + w3 - 3] * cfa[indx + w4 - 4] + 10.f * cfa[indx + w3 - 3] * cfa[indx + w3 - 3] - 6.f * cfa[indx - w3 + 3] * cfa[indx - w4 + 4] + 10.f * cfa[indx - w3 + 3] * cfa[indx - w3 + 3] + 1.f * cfa[indx + w4 - 4] * cfa[indx + w4 - 4] + 1.f * cfa[indx - w4 + 4] * cfa[indx - w4 + 4], epssq );
-
-            PQ_Dir[indx] = P_Stat / ( P_Stat + Q_Stat );
-
-        }
-    }
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.7);
-    }
-    // -------------------------------------------------------------------------
-
-    // Step 4.2: Populate the red and blue channels at blue and red CFA positions
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for ( int row = 4; row < height - 4; row++ ) {
-        for ( int col = 4 + (FC( row, 0 ) & 1), indx = row * width + col, c = 2 - FC( row, col ); col < width - 4; col += 2, indx += 2 ) {
-
-            // Refined P/Q diagonal local discrimination
-            float PQ_Central_Value   = PQ_Dir[indx];
-            float PQ_Neighbourhood_Value = 0.25f * ( PQ_Dir[indx - w1 - 1] + PQ_Dir[indx - w1 + 1] + PQ_Dir[indx + w1 - 1] + PQ_Dir[indx + w1 + 1] );
-
-            float PQ_Disc = ( fabs( 0.5f - PQ_Central_Value ) < fabs( 0.5f - PQ_Neighbourhood_Value ) ) ? PQ_Neighbourhood_Value : PQ_Central_Value;
-
-            // Diagonal gradients
-            float NW_Grad = eps + fabs( rgb[indx - w1 - 1][c] - rgb[indx + w1 + 1][c] ) + fabs( rgb[indx - w1 - 1][c] - rgb[indx - w3 - 3][c] ) + fabs( rgb[indx][1] - rgb[indx - w2 - 2][1] );
-            float NE_Grad = eps + fabs( rgb[indx - w1 + 1][c] - rgb[indx + w1 - 1][c] ) + fabs( rgb[indx - w1 + 1][c] - rgb[indx - w3 + 3][c] ) + fabs( rgb[indx][1] - rgb[indx - w2 + 2][1] );
-            float SW_Grad = eps + fabs( rgb[indx + w1 - 1][c] - rgb[indx - w1 + 1][c] ) + fabs( rgb[indx + w1 - 1][c] - rgb[indx + w3 - 3][c] ) + fabs( rgb[indx][1] - rgb[indx + w2 - 2][1] );
-            float SE_Grad = eps + fabs( rgb[indx + w1 + 1][c] - rgb[indx - w1 - 1][c] ) + fabs( rgb[indx + w1 + 1][c] - rgb[indx + w3 + 3][c] ) + fabs( rgb[indx][1] - rgb[indx + w2 + 2][1] );
-
-            // Diagonal colour differences
-            float NW_Est = rgb[indx - w1 - 1][c] - rgb[indx - w1 - 1][1];
-            float NE_Est = rgb[indx - w1 + 1][c] - rgb[indx - w1 + 1][1];
-            float SW_Est = rgb[indx + w1 - 1][c] - rgb[indx + w1 - 1][1];
-            float SE_Est = rgb[indx + w1 + 1][c] - rgb[indx + w1 + 1][1];
-
-            // P/Q estimations
-            float P_Est = ( NW_Grad * SE_Est + SE_Grad * NW_Est ) / max(eps, NW_Grad + SE_Grad );
-            float Q_Est = ( NE_Grad * SW_Est + SW_Grad * NE_Est ) / max(eps, NE_Grad + SW_Grad );
-
-            // R@B and B@R interpolation
-            rgb[indx][c] = LIM( rgb[indx][1] + ( 1.f - PQ_Disc ) * P_Est + PQ_Disc * Q_Est, 0.f, 1.f );
-
-        }
-    }
-
-    free( PQ_Dir );
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.825);
-    }
-    // -------------------------------------------------------------------------
-        
-    // Step 4.3: Populate the red and blue channels at green CFA positions
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for ( int row = 4; row < height - 4; row++ ) {
-        for ( int col = 4 + (FC( row, 1 ) & 1), indx = row * width + col; col < width - 4; col += 2, indx += 2 ) {
-
-            // Refined vertical and horizontal local discrimination
-            float VH_Central_Value   = VH_Dir[indx];
-            float VH_Neighbourhood_Value = 0.25f * ( VH_Dir[indx - w1 - 1] + VH_Dir[indx - w1 + 1] + VH_Dir[indx + w1 - 1] + VH_Dir[indx + w1 + 1] );
-            
-            float VH_Disc = ( fabs( 0.5f - VH_Central_Value ) < fabs( 0.5f - VH_Neighbourhood_Value ) ) ? VH_Neighbourhood_Value : VH_Central_Value;
-
-            for ( int c = 0; c <= 2; c += 2 ) {
-
-                // Cardinal gradients
-                float N_Grad = eps + fabs( rgb[indx][1] - rgb[indx - w2][1] ) + fabs( rgb[indx - w1][c] - rgb[indx + w1][c] ) + fabs( rgb[indx - w1][c] - rgb[indx - w3][c] );
-                float S_Grad = eps + fabs( rgb[indx][1] - rgb[indx + w2][1] ) + fabs( rgb[indx + w1][c] - rgb[indx - w1][c] ) + fabs( rgb[indx + w1][c] - rgb[indx + w3][c] );
-                float W_Grad = eps + fabs( rgb[indx][1] - rgb[indx -  2][1] ) + fabs( rgb[indx -  1][c] - rgb[indx +  1][c] ) + fabs( rgb[indx -  1][c] - rgb[indx -  3][c] );
-                float E_Grad = eps + fabs( rgb[indx][1] - rgb[indx +  2][1] ) + fabs( rgb[indx +  1][c] - rgb[indx -  1][c] ) + fabs( rgb[indx +  1][c] - rgb[indx +  3][c] );
-
-                // Cardinal colour differences
-                float N_Est = rgb[indx - w1][c] - rgb[indx - w1][1];
-                float S_Est = rgb[indx + w1][c] - rgb[indx + w1][1];
-                float W_Est = rgb[indx -  1][c] - rgb[indx -  1][1];
-                float E_Est = rgb[indx +  1][c] - rgb[indx +  1][1];
-
-                // Vertical and horizontal estimations
-                float V_Est = ( N_Grad * S_Est + S_Grad * N_Est ) / max(eps, N_Grad + S_Grad );
-                float H_Est = ( E_Grad * W_Est + W_Grad * E_Est ) / max(eps, E_Grad + W_Grad );
-
-                // R@G and B@G interpolation
-                rgb[indx][c] = LIM( rgb[indx][1] + ( 1.f - VH_Disc ) * V_Est + VH_Disc * H_Est, 0.f, 1.f );
-
-            }
-        }
-    }
-
-    free(VH_Dir);
-
-    // RT ---------------------------------------------------------------------
-    if (plistener) {
-        plistener->setProgress(0.95);
-    }
-
-#ifdef _OPENMP
-    #pragma omp parallel for
-#endif
-    for (int row = 0; row < height; ++row) {
-        for (int col = 0, idx = row * width + col ; col < width; ++col, ++idx) {
-            red[row][col] = CLIP(rgb[idx][0] * 65535.f);
-            green[row][col] = CLIP(rgb[idx][1] * 65535.f);
-            blue[row][col] = CLIP(rgb[idx][2] * 65535.f);
-        }
-    }
-
-    border_interpolate2(width, height, 8);
-
-    if (plistener) {
-        plistener->setProgress(1);
-    }
-    // -------------------------------------------------------------------------
-}
-
 #define fcol(row,col) xtrans[(row)%6][(col)%6]
 #define isgreen(row,col) (xtrans[(row)%3][(col)%3]&1)
 
diff --git a/rtengine/ffmanager.cc b/rtengine/ffmanager.cc
index 82aa709b7..6b0302d1e 100644
--- a/rtengine/ffmanager.cc
+++ b/rtengine/ffmanager.cc
@@ -138,7 +138,7 @@ void ffInfo::updateRawImage()
             int H = ri->get_height();
             int W = ri->get_width();
             ri->compress_image(0);
-            int rSize = W * ((ri->getSensorType() == ST_BAYER || ri->getSensorType() == ST_FUJI_XTRANS) ? 1 : 3);
+            int rSize = W * ((ri->getSensorType() == ST_BAYER || ri->getSensorType() == ST_FUJI_XTRANS || ri->get_colors() == 1) ? 1 : 3);
             acc_t **acc = new acc_t*[H];
 
             for( int row = 0; row < H; row++) {
@@ -160,7 +160,7 @@ void ffInfo::updateRawImage()
                     temp->compress_image(0);     //\ TODO would be better working on original, because is temporary
                     nFiles++;
 
-                    if( ri->getSensorType() == ST_BAYER || ri->getSensorType() == ST_FUJI_XTRANS ) {
+                    if( ri->getSensorType() == ST_BAYER || ri->getSensorType() == ST_FUJI_XTRANS || ri->get_colors() == 1 ) {
                         for( int row = 0; row < H; row++) {
                             for( int col = 0; col < W; col++) {
                                 acc[row][col] += temp->data[row][col];
diff --git a/rtengine/imagesource.h b/rtengine/imagesource.h
index af9eecc35..c0565a4a3 100644
--- a/rtengine/imagesource.h
+++ b/rtengine/imagesource.h
@@ -87,6 +87,7 @@ public:
     // use right after demosaicing image, add coarse transformation and put the result in the provided Imagefloat*
     virtual void        getImage    (const ColorTemp &ctemp, int tran, Imagefloat* image, const PreviewProps &pp, const ToneCurveParams &hlp, const RAWParams &raw) = 0;
     virtual eSensorType getSensorType () const = 0;
+    virtual bool        isMono () const = 0;
     // true is ready to provide the AutoWB, i.e. when the image has been demosaiced for RawImageSource
     virtual bool        isWBProviderReady () = 0;
 
diff --git a/rtengine/improccoordinator.cc b/rtengine/improccoordinator.cc
index 83853cef5..78f0065ba 100644
--- a/rtengine/improccoordinator.cc
+++ b/rtengine/improccoordinator.cc
@@ -358,7 +358,7 @@ void ImProcCoordinator::updatePreviewImage(int todo, Crop* cropCall)
     // If high detail (=100%) is newly selected, do a demosaic update, since the last was just with FAST
 
     if (imageTypeListener) {
-        imageTypeListener->imageTypeChanged(imgsrc->isRAW(), imgsrc->getSensorType() == ST_BAYER, imgsrc->getSensorType() == ST_FUJI_XTRANS);
+        imageTypeListener->imageTypeChanged (imgsrc->isRAW(), imgsrc->getSensorType() == ST_BAYER, imgsrc->getSensorType() == ST_FUJI_XTRANS, imgsrc->isMono());
     }
 
     if ((todo & M_RAW)
diff --git a/rtengine/improcfun.cc b/rtengine/improcfun.cc
index cbed87984..d47c407d6 100644
--- a/rtengine/improcfun.cc
+++ b/rtengine/improcfun.cc
@@ -1503,29 +1503,16 @@ void ImProcFunctions::ciecam_02(CieImage* ncie, double adap, int pW, int pwb, La
                 }
 
 //if(params->dirpyrequalizer.enabled) if(execsharp) {
-            if (params->dirpyrequalizer.enabled) {
-                if (params->dirpyrequalizer.gamutlab  /*&& execsharp*/) {
-                    float artifact = (float) settings->artifact_cbdl;
-
-                    if (artifact > 6.f) {
-                        artifact = 6.f;
-                    }
-
-                    if (artifact < 0.f) {
-                        artifact = 1.f;
-                    }
-
-                    float chrom = 50.f;
-                    {
-                        int hotbad = 0;
-                        ImProcFunctions::badpixcam(ncie, artifact, 5, 2, params->dirpyrequalizer.skinprotect, chrom, hotbad);       //enabled remove artifacts for cbDL
-                    }
-                }
+            if (params->dirpyrequalizer.enabled && params->dirpyrequalizer.gamutlab && rtt) { //remove artifacts by gaussian blur - skin control, but not for thumbs
+                constexpr float artifact = 4.f;
+                constexpr float chrom = 50.f;
+                const bool hotbad = params->dirpyrequalizer.skinprotect != 0.0;
+                ImProcFunctions::badpixcam (ncie, artifact / scale, 5, 2, chrom, hotbad);      //enabled remove artifacts for cbDL
             }
 
-            if (params->colorappearance.badpixsl > 0) if (execsharp) {
+            if (params->colorappearance.badpixsl > 0 && execsharp) {
                     int mode = params->colorappearance.badpixsl;
-                    ImProcFunctions::badpixcam(ncie, 3.4, 5, mode, 0, 0, 1); //for bad pixels CIECAM
+                    ImProcFunctions::badpixcam (ncie, 3.4, 5, mode, 0, true);//for bad pixels CIECAM
                 }
 
             if (params->sharpenMicro.enabled)if (execsharp) {
@@ -2933,33 +2920,23 @@ void ImProcFunctions::ciecam_02float(CieImage* ncie, float adap, int pW, int pwb
                     }
 
 //if(params->dirpyrequalizer.enabled) if(execsharp) {
-                if (params->dirpyrequalizer.enabled)  {
-                    if (params->dirpyrequalizer.gamutlab  /*&& execsharp*/) { //remove artifacts by gaussian blur - skin control
-                        float artifact = (float) settings->artifact_cbdl;
+                if (params->dirpyrequalizer.enabled && params->dirpyrequalizer.gamutlab && rtt) { //remove artifacts by gaussian blur - skin control, but not for thumbs
+                    constexpr float artifact = 4.f;
+                    constexpr float chrom = 50.f;
+                    const bool hotbad = params->dirpyrequalizer.skinprotect != 0.0;
 
-                        if (artifact > 6.f) {
-                            artifact = 6.f;
-                        }
-
-                        if (artifact < 0.f) {
-                            artifact = 1.f;
-                        }
-
-                        int hotbad = 0;
-                        float chrom = 50.f;
-                        lab->deleteLab();
-                        ImProcFunctions::badpixcam(ncie, artifact, 5, 2, params->dirpyrequalizer.skinprotect, chrom, hotbad);   //enabled remove artifacts for cbDL
-                        lab->reallocLab();
-                    }
+                    lab->deleteLab();
+                    ImProcFunctions::badpixcam (ncie, artifact / scale, 5, 2, chrom, hotbad);  //enabled remove artifacts for cbDL
+                    lab->reallocLab();
                 }
 
 //if(params->colorappearance.badpixsl > 0) { int mode=params->colorappearance.badpixsl;
-                if (params->colorappearance.badpixsl > 0) if (execsharp) {
-                        int mode = params->colorappearance.badpixsl;
-                        lab->deleteLab();
-                        ImProcFunctions::badpixcam(ncie, 3.0, 10, mode, 0, 0, 1); //for bad pixels CIECAM
-                        lab->reallocLab();
-                    }
+                if (params->colorappearance.badpixsl > 0 && execsharp) {
+                    int mode = params->colorappearance.badpixsl;
+                    lab->deleteLab();
+                    ImProcFunctions::badpixcam (ncie, 3.0, 10, mode, 0, true);//for bad pixels CIECAM
+                    lab->reallocLab();
+                }
 
                 if (params->impulseDenoise.enabled) if (execsharp) {
                         float **buffers[3];
@@ -6324,53 +6301,44 @@ void ImProcFunctions::defringe(LabImage* lab)
     if (params->defringe.enabled && lab->W >= 8 && lab->H >= 8)
 
     {
-        PF_correct_RT(lab, lab, params->defringe.radius, params->defringe.threshold);
+        PF_correct_RT (lab, params->defringe.radius, params->defringe.threshold);
     }
 }
 
 void ImProcFunctions::defringecam(CieImage* ncie)
 {
     if (params->defringe.enabled && ncie->W >= 8 && ncie->H >= 8) {
-        PF_correct_RTcam(ncie, ncie, params->defringe.radius, params->defringe.threshold);
+        PF_correct_RTcam (ncie, params->defringe.radius, params->defringe.threshold);
     }
 }
 
-void ImProcFunctions::badpixcam(CieImage* ncie, double rad, int thr, int mode, float skinprot, float chrom, int hotbad)
+void ImProcFunctions::badpixcam (CieImage* ncie, double rad, int thr, int mode, float chrom, bool hotbad)
 {
     if (ncie->W >= 8 && ncie->H >= 8) {
-        Badpixelscam(ncie, ncie, rad, thr, mode, skinprot, chrom, hotbad);
+        Badpixelscam (ncie, rad, thr, mode, chrom, hotbad);
     }
 }
 
-void ImProcFunctions::badpixlab(LabImage* lab, double rad, int thr, int mode, float skinprot, float chrom)
+void ImProcFunctions::badpixlab (LabImage* lab, double rad, int thr, float chrom)
 {
     if (lab->W >= 8 && lab->H >= 8) {
-        BadpixelsLab(lab, lab, rad, thr, mode, skinprot, chrom);
+        BadpixelsLab (lab, rad, thr, chrom);
     }
 }
 
 void ImProcFunctions::dirpyrequalizer(LabImage* lab, int scale)
 {
     if (params->dirpyrequalizer.enabled && lab->W >= 8 && lab->H >= 8) {
-        float b_l = static_cast<float>(params->dirpyrequalizer.hueskin.getBottomLeft()) / 100.0f;
-        float t_l = static_cast<float>(params->dirpyrequalizer.hueskin.getTopLeft()) / 100.0f;
-        float t_r = static_cast<float>(params->dirpyrequalizer.hueskin.getTopRight()) / 100.0f;
+        float b_l = static_cast<float> (params->dirpyrequalizer.hueskin.getBottomLeft()) / 100.f;
+        float t_l = static_cast<float> (params->dirpyrequalizer.hueskin.getTopLeft()) / 100.f;
+        float t_r = static_cast<float> (params->dirpyrequalizer.hueskin.getTopRight()) / 100.f;
         //      if     (params->dirpyrequalizer.algo=="FI") choice=0;
         //      else if(params->dirpyrequalizer.algo=="LA") choice=1;
-        float artifact = (float) settings->artifact_cbdl;
 
-        if (artifact > 6.f) {
-            artifact = 6.f;
-        }
-
-        if (artifact < 0.f) {
-            artifact = 1.f;
-        }
-
-        float chrom = 50.f;
-
-        if (params->dirpyrequalizer.gamutlab) {
-            ImProcFunctions::badpixlab(lab, artifact, 5, 3, params->dirpyrequalizer.skinprotect, chrom);     //for artifacts
+        if (params->dirpyrequalizer.gamutlab && params->dirpyrequalizer.skinprotect != 0) {
+            constexpr float artifact = 4.f;
+            constexpr float chrom = 50.f;
+            ImProcFunctions::badpixlab (lab, artifact / scale, 5, chrom);    //for artifacts
         }
 
         //dirpyrLab_equalizer(lab, lab, params->dirpyrequalizer.mult);
diff --git a/rtengine/improcfun.h b/rtengine/improcfun.h
index 48db4ed43..ba93e3498 100644
--- a/rtengine/improcfun.h
+++ b/rtengine/improcfun.h
@@ -396,13 +396,13 @@ public:
     void idirpyr_eq_channelcam(float ** data_coarse, float ** data_fine, float ** buffer, int width, int height, int level, float multi[6], const double dirpyrThreshold, float ** l_a_h, float ** l_b_c, const double skinprot, float b_l, float t_l, float t_r);
     void defringe(LabImage* lab);
     void defringecam(CieImage* ncie);
-    void badpixcam(CieImage* ncie, double rad, int thr, int mode, float skinprot, float chrom, int hotbad);
-    void badpixlab(LabImage* lab, double rad, int thr, int mode, float skinprot, float chrom);
+    void badpixcam      (CieImage* ncie, double rad, int thr, int mode, float chrom, bool hotbad);
+    void badpixlab      (LabImage* lab, double rad, int thr, float chrom);
 
-    void PF_correct_RT(LabImage * src, LabImage * dst, double radius, int thresh);
-    void PF_correct_RTcam(CieImage * src, CieImage * dst, double radius, int thresh);
-    void Badpixelscam(CieImage * src, CieImage * dst, double radius, int thresh, int mode, float skinprot, float chrom, int hotbad);
-    void BadpixelsLab(LabImage * src, LabImage * dst, double radius, int thresh, int mode, float skinprot, float chrom);
+    void PF_correct_RT    (LabImage * lab, double radius, int thresh);
+    void PF_correct_RTcam (CieImage * ncie, double radius, int thresh);
+    void Badpixelscam (CieImage * ncie, double radius, int thresh, int mode, float chrom, bool hotbad);
+    void BadpixelsLab (LabImage * lab, double radius, int thresh, float chrom);
 
     void ToneMapFattal02(Imagefloat *rgb);
     void localContrast(LabImage *lab);
diff --git a/rtengine/iplocallab.cc b/rtengine/iplocallab.cc
index 3cffbafa4..0207039e5 100644
--- a/rtengine/iplocallab.cc
+++ b/rtengine/iplocallab.cc
@@ -8168,8 +8168,8 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             int xEn = lp.xc + lp.lx;
             bufsob = new LabImage(bfw, bfh);
             bufreserv = new LabImage(bfw, bfh);
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufchro(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufchro(bfw, bfh);
 
             float *orig[bfh] ALIGNED16;
             origBuffer = new float[bfh * bfw];
@@ -8281,10 +8281,10 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
 
                 }
 
-                const JaggedArray<float> Cdeltae(bfw, bfh);
-                const JaggedArray<float> Cdeltaesob(bfw, bfh);
-                const JaggedArray<float> Ldeltae(bfw, bfh);
-                const JaggedArray<float> Ldeltaesob(bfw, bfh);
+                JaggedArray<float> Cdeltae(bfw, bfh);
+                JaggedArray<float> Cdeltaesob(bfw, bfh);
+                JaggedArray<float> Ldeltae(bfw, bfh);
+                JaggedArray<float> Ldeltaesob(bfw, bfh);
 
 
 #ifdef _OPENMP
@@ -8326,19 +8326,19 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
                 //rr maximum radius to stock data
                 int rr = sqrt(SQR(XR) + SQR(Ye)) + ar;    //+ ar to prevent crash due to round float
                 //polar coord
-                const JaggedArray<float> val(rr, xEn - begx + ar);
-                const JaggedArray<float> CdE(rr, xEn - begx + ar);
-                const JaggedArray<float> LdE(rr, xEn - begx + ar);
-                const JaggedArray<float> CdEsob(rr, xEn - begx + ar);
-                const JaggedArray<float> LdEsob(rr, xEn - begx + ar);
-                const JaggedArray<float> Soderiv(rr, xEn - begx + ar);
-                const JaggedArray<float> Chderiv(rr, xEn - begx + ar);
-                const JaggedArray<float> Luderiv(rr, xEn - begx + ar);
-                const JaggedArray<float> goodmax(rr, xEn - begx + ar);
-                const JaggedArray<float> Soderiv2(rr, xEn - begx + ar);
-                const JaggedArray<float> Chderiv2(rr, xEn - begx + ar);
-                const JaggedArray<float> Luderiv2(rr, xEn - begx + ar);
-                const JaggedArray<float> Totalderiv2(rr, xEn - begx + ar);
+                JaggedArray<float> val(rr, xEn - begx + ar);
+                JaggedArray<float> CdE(rr, xEn - begx + ar);
+                JaggedArray<float> LdE(rr, xEn - begx + ar);
+                JaggedArray<float> CdEsob(rr, xEn - begx + ar);
+                JaggedArray<float> LdEsob(rr, xEn - begx + ar);
+                JaggedArray<float> Soderiv(rr, xEn - begx + ar);
+                JaggedArray<float> Chderiv(rr, xEn - begx + ar);
+                JaggedArray<float> Luderiv(rr, xEn - begx + ar);
+                JaggedArray<float> goodmax(rr, xEn - begx + ar);
+                JaggedArray<float> Soderiv2(rr, xEn - begx + ar);
+                JaggedArray<float> Chderiv2(rr, xEn - begx + ar);
+                JaggedArray<float> Luderiv2(rr, xEn - begx + ar);
+                JaggedArray<float> Totalderiv2(rr, xEn - begx + ar);
 
                 //cDe and LdE to stock delta E chroma and luma in area top and polar coord
                 //cDesob and LdEsob Sobel canny of delta E chroma and luma
@@ -8366,14 +8366,14 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
                 int YL = max(-Yo, Ye);
                 int rrL = sqrt(SQR(YL) + SQR(Xo)) + ar;
 
-                const JaggedArray<float> valL(rrL, yEn - begy + ar);
-                const JaggedArray<float> CdEL(rrL, yEn - begy + ar);
-                const JaggedArray<float> LdEL(rrL, yEn - begy + ar);
-                const JaggedArray<float> CdEsobL(rrL, yEn - begy + ar);
-                const JaggedArray<float> LdEsobL(rrL, yEn - begy + ar);
-                const JaggedArray<float> SoderivL(rrL, yEn - begy + ar);
-                const JaggedArray<float> ChderivL(rrL, yEn - begy + ar);
-                const JaggedArray<float> LuderivL(rrL, yEn - begy + ar);
+                JaggedArray<float> valL(rrL, yEn - begy + ar);
+                JaggedArray<float> CdEL(rrL, yEn - begy + ar);
+                JaggedArray<float> LdEL(rrL, yEn - begy + ar);
+                JaggedArray<float> CdEsobL(rrL, yEn - begy + ar);
+                JaggedArray<float> LdEsobL(rrL, yEn - begy + ar);
+                JaggedArray<float> SoderivL(rrL, yEn - begy + ar);
+                JaggedArray<float> ChderivL(rrL, yEn - begy + ar);
+                JaggedArray<float> LuderivL(rrL, yEn - begy + ar);
 
                 float *radL = nullptr;
                 radL = new float[yEn - begy + ar];
@@ -8385,14 +8385,14 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
                 XR = max(-Xo, Xe);
                 int rrB = sqrt(SQR(XR) + SQR(Yo)) + ar;
 
-                const JaggedArray<float> valB(rrB, xEn - begx + ar);
-                const JaggedArray<float> CdEB(rrB, xEn - begx + ar);
-                const JaggedArray<float> LdEB(rrB, xEn - begx + ar);
-                const JaggedArray<float> CdEsobB(rrB, xEn - begx + ar);
-                const JaggedArray<float> LdEsobB(rrB, xEn - begx + ar);
-                const JaggedArray<float> SoderivB(rrB, xEn - begx + ar);
-                const JaggedArray<float> ChderivB(rrB, xEn - begx + ar);
-                const JaggedArray<float> LuderivB(rrB, xEn - begx + ar);
+                JaggedArray<float> valB(rrB, xEn - begx + ar);
+                JaggedArray<float> CdEB(rrB, xEn - begx + ar);
+                JaggedArray<float> LdEB(rrB, xEn - begx + ar);
+                JaggedArray<float> CdEsobB(rrB, xEn - begx + ar);
+                JaggedArray<float> LdEsobB(rrB, xEn - begx + ar);
+                JaggedArray<float> SoderivB(rrB, xEn - begx + ar);
+                JaggedArray<float> ChderivB(rrB, xEn - begx + ar);
+                JaggedArray<float> LuderivB(rrB, xEn - begx + ar);
 
                 float *radB = nullptr;
                 radB = new float[xEn - begx + ar];
@@ -8403,14 +8403,14 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
                 //init fourth quarter right
                 YL = max(-Yo, Ye);
                 int rrR = sqrt(SQR(YL) + SQR(Xe)) + ar;
-                const JaggedArray<float> valR(rrR, yEn - begy + ar);
-                const JaggedArray<float> CdER(rrR, yEn - begy + ar);
-                const JaggedArray<float> LdER(rrR, yEn - begy + ar);
-                const JaggedArray<float> CdEsobR(rrR, yEn - begy + ar);
-                const JaggedArray<float> LdEsobR(rrR, yEn - begy + ar);
-                const JaggedArray<float> SoderivR(rrR, yEn - begy + ar);
-                const JaggedArray<float> ChderivR(rrR, yEn - begy + ar);
-                const JaggedArray<float> LuderivR(rrR, yEn - begy + ar);
+                JaggedArray<float> valR(rrR, yEn - begy + ar);
+                JaggedArray<float> CdER(rrR, yEn - begy + ar);
+                JaggedArray<float> LdER(rrR, yEn - begy + ar);
+                JaggedArray<float> CdEsobR(rrR, yEn - begy + ar);
+                JaggedArray<float> LdEsobR(rrR, yEn - begy + ar);
+                JaggedArray<float> SoderivR(rrR, yEn - begy + ar);
+                JaggedArray<float> ChderivR(rrR, yEn - begy + ar);
+                JaggedArray<float> LuderivR(rrR, yEn - begy + ar);
 
                 float *radR = nullptr;
                 radR = new float[yEn - begy + ar];
@@ -9014,8 +9014,8 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             int GH = transformed->H;
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufchro(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufchro(bfw, bfh);
 
             float *orig[bfh] ALIGNED16;
 
@@ -10349,11 +10349,11 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             int bfh = 0.f, bfw = 0.f;
             bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufchro(bfw, bfh);
-            const JaggedArray<float> buflightslid(bfw, bfh);
-            const JaggedArray<float> bufchroslid(bfw, bfh);
-            const JaggedArray<float> bufhh(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufchro(bfw, bfh);
+            JaggedArray<float> buflightslid(bfw, bfh);
+            JaggedArray<float> bufchroslid(bfw, bfh);
+            JaggedArray<float> bufhh(bfw, bfh);
 
             float adjustr = 1.0f;
 
@@ -10627,7 +10627,7 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             LabImage *bufcontorig = nullptr;
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflightc(bfw, bfh);
+            JaggedArray<float> buflightc(bfw, bfh);
 
             float clighc = 0.f;
             const float localtype = lumaref;
@@ -10803,9 +10803,9 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             LabImage *bufcat02 = nullptr;
             LabImage *bufcat02fin = nullptr;
 
-            const JaggedArray<float> buflightcat(bfw, bfh, true);
-            const JaggedArray<float> buf_a_cat(bfw, bfh, true);
-            const JaggedArray<float> buf_b_cat(bfw, bfh, true);
+            JaggedArray<float> buflightcat(bfw, bfh, true);
+            JaggedArray<float> buf_a_cat(bfw, bfh, true);
+            JaggedArray<float> buf_b_cat(bfw, bfh, true);
 
             bufcat02 = new LabImage(bfw, bfh); //buffer for data in zone limit
             bufcat02fin = new LabImage(bfw, bfh); //buffer for data in zone limit
@@ -10904,9 +10904,9 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             int bfh = 0.f, bfw = 0.f;
             bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufl_ab(bfw, bfh);
-            const JaggedArray<float> buflightcurv(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufl_ab(bfw, bfh);
+            JaggedArray<float> buflightcurv(bfw, bfh);
 
 
             if (call <= 3) { //simpleprocess, dcrop, improccoordinator
@@ -11057,8 +11057,8 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
 
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufl_ab(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufl_ab(bfw, bfh);
 
 
             if (call <= 3) { //simpleprocess, dcrop, improccoordinator
@@ -11164,7 +11164,7 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             LabImage *bufgb = nullptr;
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
 
             if (call <= 3) { //simpleprocess dcrop improcc
 
@@ -11266,12 +11266,12 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
         if ((lp.mulloc[0] != 1.f || lp.mulloc[1] != 1.f || lp.mulloc[2] != 1.f || lp.mulloc[3] != 1.f || lp.mulloc[4] != 1.f) && lp.cbdlena) {
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufchrom(bfw, bfh);
-            const JaggedArray<float> bufchr(bfw, bfh);
-            const JaggedArray<float> bufsh(bfw, bfh);
-            const JaggedArray<float> loctemp(bfw, bfh);
-            const JaggedArray<float> loctempch(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufchrom(bfw, bfh);
+            JaggedArray<float> bufchr(bfw, bfh);
+            JaggedArray<float> bufsh(bfw, bfh);
+            JaggedArray<float> loctemp(bfw, bfh);
+            JaggedArray<float> loctempch(bfw, bfh);
 
             float b_l = -5.f;
             float t_l = 25.f;
@@ -11411,12 +11411,11 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
         if (!lp.invshar && lp.shrad > 0.42 && call < 3  && lp.sharpena) { //interior ellipse for sharpening, call = 1 and 2 only with Dcrop and simpleprocess
             int bfh = call == 2 ? int (lp.ly + lp.lyT) + del : original->H; //bfw bfh real size of square zone
             int bfw = call == 2 ? int (lp.lx + lp.lxL) + del : original->W;
-            const JaggedArray<float> loctemp(bfw, bfh);
+            JaggedArray<float> loctemp(bfw, bfh);
 
             if (call == 2) { //call from simpleprocess
-                const JaggedArray<float> bufsh(bfw, bfh, true);
-                const JaggedArray<float> hbuffer(bfw, bfh);
-
+                JaggedArray<float> bufsh(bfw, bfh, true);
+                JaggedArray<float> hbuffer(bfw, bfh);
                 int begy = lp.yc - lp.lyT;
                 int begx = lp.xc - lp.lxL;
                 int yEn = lp.yc + lp.ly;
@@ -11463,7 +11462,7 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
         } else if (lp.invshar && lp.shrad > 0.42 && call < 3 && lp.sharpena) {
             int GW = original->W;
             int GH = original->H;
-            const JaggedArray<float> loctemp(GW, GH);
+            JaggedArray<float> loctemp(GW, GH);
 
             ImProcFunctions::deconvsharpeningloc(original->L, shbuffer, GW, GH, loctemp, params->locallab.shardamping, (double)params->locallab.sharradius / 100., params->locallab.shariter, params->locallab.sharamount);
 
@@ -11490,8 +11489,8 @@ void ImProcFunctions::Lab_Local(int call, int maxspot, int sp, LUTf & huerefs, L
             LabImage *bufreti = nullptr;
             int bfh = int (lp.ly + lp.lyT) + del; //bfw bfh real size of square zone
             int bfw = int (lp.lx + lp.lxL) + del;
-            const JaggedArray<float> buflight(bfw, bfh);
-            const JaggedArray<float> bufchro(bfw, bfh);
+            JaggedArray<float> buflight(bfw, bfh);
+            JaggedArray<float> bufchro(bfw, bfh);
 
             float hueplus = hueref + dhueret;
             float huemoins = hueref - dhueret;
diff --git a/rtengine/jaggedarray.h b/rtengine/jaggedarray.h
index 153ed2902..01da776a6 100644
--- a/rtengine/jaggedarray.h
+++ b/rtengine/jaggedarray.h
@@ -19,6 +19,8 @@
  */
 #pragma once
 
+#include <cstring>
+
 #include "noncopyable.h"
 
 namespace rtengine
@@ -26,60 +28,46 @@ namespace rtengine
 
 // These emulate a jagged array, but use only 2 allocations instead of 1 + H.
 
-template<class T>
-inline T** const allocJaggedArray (const int W, const int H, const bool initZero = false)
-{
-    T** const a = new T*[H];
-    a[0] = new T[H * W];
-
-    for (int i = 1; i < H; ++i) {
-        a[i] = a[i - 1] + W;
-    }
-
-    if (initZero) {
-        std::memset(a[0], 0, sizeof(T) * W * H);
-    }
-
-    return a;
-}
-
-template<class T>
-inline void freeJaggedArray (T** const a)
-{
-    delete [] a[0];
-    delete [] a;
-}
-
-template<class T>
+template<typename T>
 class JaggedArray :
     public NonCopyable
 {
 public:
-    JaggedArray (const int W, const int H, const bool initZero = false)
+    JaggedArray(std::size_t width, std::size_t height, bool init_zero = false) :
+        array(
+            [width, height, init_zero]() -> T**
+            {
+                T** const res = new T*[height];
+                res[0] = new T[height * width];
+
+                for (std::size_t i = 1; i < height; ++i) {
+                    res[i] = res[i - 1] + width;
+                }
+
+                if (init_zero) {
+                    std::memset(res[0], 0, sizeof(T) * width * height);
+                }
+
+                return res;
+            }()
+        )
     {
-        a = allocJaggedArray<T> (W, H, initZero);
-    }
-    ~JaggedArray ()
-    {
-        if (a) {
-            freeJaggedArray<T> (a);
-            a = nullptr;
-        }
     }
 
-    operator T** const () const
+    ~JaggedArray ()
     {
-        return a;
+        delete[] array[0];
+        delete[] array;
+    }
+
+    operator T** ()
+    {
+        return array;
     }
 
 private:
-    T** a;
+    T** const array;
 
 };
 
-// Declared but not defined to prevent
-// explicitly freeing a JaggedArray<T> implicitly cast to T**.
-template<class T>
-void freeJaggedArray (JaggedArray<T>&);
-
 } // rtengine
diff --git a/rtengine/procparams.h b/rtengine/procparams.h
index 73e7c6e7a..9ee0a49e2 100644
--- a/rtengine/procparams.h
+++ b/rtengine/procparams.h
@@ -654,7 +654,7 @@ struct ColorAppearanceParams {
 struct DefringeParams {
     bool    enabled;
     double  radius;
-    float   threshold;
+    int     threshold;
     std::vector<double> huecurve;
 
     DefringeParams();
diff --git a/rtengine/rawimagesource.cc b/rtengine/rawimagesource.cc
index 0593c5650..1b976d996 100644
--- a/rtengine/rawimagesource.cc
+++ b/rtengine/rawimagesource.cc
@@ -2827,7 +2827,7 @@ void RawImageSource::processFlatField(const RAWParams &raw, RawImage *riFlatFile
         cfaboxblur(riFlatFile, cfablur, BS, BS);
     }
 
-    if (ri->getSensorType() == ST_BAYER) {
+    if(ri->getSensorType() == ST_BAYER || ri->get_colors() == 1) {
         float refcolor[2][2];
 
         //find centre average values by channel
@@ -2835,8 +2835,8 @@ void RawImageSource::processFlatField(const RAWParams &raw, RawImage *riFlatFile
             for (int n = 0; n < 2; n++) {
                 int row = 2 * (H >> 2) + m;
                 int col = 2 * (W >> 2) + n;
-                int c  = FC(row, col);
-                int c4 = (c == 1 && !(row & 1)) ? 3 : c;
+                int c  = ri->get_colors() != 1 ? FC(row, col) : 0;
+                int c4 = ri->get_colors() != 1 ? (( c == 1 && !(row & 1) ) ? 3 : c) : 0;
                 refcolor[m][n] = max(0.0f, cfablur[row * W + col] - black[c4]);
             }
 
@@ -2848,8 +2848,8 @@ void RawImageSource::processFlatField(const RAWParams &raw, RawImage *riFlatFile
             for (int m = 0; m < 2; m++)
                 for (int n = 0; n < 2; n++) {
                     float maxval = 0.f;
-                    int c  = FC(m, n);
-                    int c4 = (c == 1 && !(m & 1)) ? 3 : c;
+                    int c  = ri->get_colors() != 1 ? FC(m, n) : 0;
+                    int c4 = ri->get_colors() != 1 ? (( c == 1 && !(m & 1) ) ? 3 : c) : 0;
 #ifdef _OPENMP
                     #pragma omp parallel
 #endif
@@ -2898,13 +2898,20 @@ void RawImageSource::processFlatField(const RAWParams &raw, RawImage *riFlatFile
                 refcolor[m][n] *= limitFactor;
             }
 
+        unsigned int c[2][2] {};
+        unsigned int c4[2][2] {};
+        if(ri->get_colors() != 1) {
+            for (int i = 0; i < 2; ++i) {
+                for(int j = 0; j < 2; ++j) {
+                    c[i][j] = FC(i, j);
+                }
+            }
+            c4[0][0] = ( c[0][0] == 1) ? 3 : c[0][0];
+            c4[0][1] = ( c[0][1] == 1) ? 3 : c[0][1];
+            c4[1][0] = c[1][0];
+            c4[1][1] = c[1][1];
+        }
 
-        unsigned int c[2][2]  = {{FC(0, 0), FC(0, 1)}, {FC(1, 0), FC(1, 1)}};
-        unsigned int c4[2][2];
-        c4[0][0] = (c[0][0] == 1) ? 3 : c[0][0];
-        c4[0][1] = (c[0][1] == 1) ? 3 : c[0][1];
-        c4[1][0] = c[1][0];
-        c4[1][1] = c[1][1];
 
 #ifdef __SSE2__
         vfloat refcolorv[2] = {_mm_set_ps(refcolor[0][1], refcolor[0][0], refcolor[0][1], refcolor[0][0]),
@@ -3028,13 +3035,20 @@ void RawImageSource::processFlatField(const RAWParams &raw, RawImage *riFlatFile
         cfaboxblur(riFlatFile, cfablur1, 0, 2 * BS);  //now do horizontal blur
         cfaboxblur(riFlatFile, cfablur2, 2 * BS, 0);  //now do vertical blur
 
-        if (ri->getSensorType() == ST_BAYER) {
-            unsigned int c[2][2]  = {{FC(0, 0), FC(0, 1)}, {FC(1, 0), FC(1, 1)}};
-            unsigned int c4[2][2];
-            c4[0][0] = (c[0][0] == 1) ? 3 : c[0][0];
-            c4[0][1] = (c[0][1] == 1) ? 3 : c[0][1];
-            c4[1][0] = c[1][0];
-            c4[1][1] = c[1][1];
+        if(ri->getSensorType() == ST_BAYER || ri->get_colors() == 1) {
+            unsigned int c[2][2] {};
+            unsigned int c4[2][2] {};
+            if(ri->get_colors() != 1) {
+                for (int i = 0; i < 2; ++i) {
+                    for(int j = 0; j < 2; ++j) {
+                        c[i][j] = FC(i, j);
+                    }
+                }
+                c4[0][0] = ( c[0][0] == 1) ? 3 : c[0][0];
+                c4[0][1] = ( c[0][1] == 1) ? 3 : c[0][1];
+                c4[1][0] = c[1][0];
+                c4[1][1] = c[1][1];
+            }
 
 #ifdef __SSE2__
             vfloat blackv[2] = {_mm_set_ps(black[c4[0][1]], black[c4[0][0]], black[c4[0][1]], black[c4[0][0]]),
@@ -3152,6 +3166,9 @@ void RawImageSource::copyOriginalPixels(const RAWParams &raw, RawImage *src, Raw
                 }
             }
         }
+        if (riFlatFile && W == riFlatFile->get_width() && H == riFlatFile->get_height()) {
+            processFlatField(raw, riFlatFile, black);
+        }  // flatfield
     } else {
         // No bayer pattern
         // TODO: Is there a flat field correction possible?
diff --git a/rtengine/rawimagesource.h b/rtengine/rawimagesource.h
index f9bf7ce13..545248d2c 100644
--- a/rtengine/rawimagesource.h
+++ b/rtengine/rawimagesource.h
@@ -145,6 +145,10 @@ public:
     {
         return ri != nullptr ? ri->getSensorType() : ST_NONE;
     }
+    bool        isMono () const
+    {
+        return ri->get_colors() == 1;
+    }
     ColorTemp   getWB() const
     {
         return camera_wb;
diff --git a/rtengine/rcd_demosaic.cc b/rtengine/rcd_demosaic.cc
new file mode 100644
index 000000000..f131bc87e
--- /dev/null
+++ b/rtengine/rcd_demosaic.cc
@@ -0,0 +1,302 @@
+/*
+ *  This file is part of RawTherapee.
+ *
+ *  Copyright (c) 2017-2018 Luis Sanz Rodriguez (luis.sanz.rodriguez(at)gmail(dot)com) and Ingo Weyrich (heckflosse67@gmx.de)
+ *
+ *  RawTherapee is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 3 of the License, or
+ *  (at your option) any later version.
+ *
+ *  RawTherapee is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with RawTherapee.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <cmath>
+
+#include "rawimagesource.h"
+#include "rt_math.h"
+#include "../rtgui/multilangmgr.h"
+#include "opthelper.h"
+#include "StopWatch.h"
+
+using namespace std;
+
+namespace rtengine
+{
+
+/**
+* RATIO CORRECTED DEMOSAICING
+* Luis Sanz Rodriguez (luis.sanz.rodriguez(at)gmail(dot)com)
+*
+* Release 2.3 @ 171125
+*
+* Original code from https://github.com/LuisSR/RCD-Demosaicing
+* Licensed under the GNU GPL version 3
+*/
+// Tiled version by Ingo Weyrich (heckflosse67@gmx.de)
+void RawImageSource::rcd_demosaic()
+{
+    BENCHFUN
+
+    volatile double progress = 0.0;
+
+    if (plistener) {
+        plistener->setProgressStr(Glib::ustring::compose(M("TP_RAW_DMETHOD_PROGRESSBAR"), RAWParams::BayerSensor::getMethodString(RAWParams::BayerSensor::Method::RCD)));
+        plistener->setProgress(0);
+    }
+    
+    constexpr int rcdBorder = 9;
+    constexpr int tileSize = 214;
+    constexpr int tileSizeN = tileSize - 2 * rcdBorder;
+    const int numTh = H / (tileSizeN) + ((H % (tileSizeN)) ? 1 : 0);
+    const int numTw = W / (tileSizeN) + ((W % (tileSizeN)) ? 1 : 0);
+    constexpr int w1 = tileSize, w2 = 2 * tileSize, w3 = 3 * tileSize, w4 = 4 * tileSize;
+    //Tolerance to avoid dividing by zero
+    static constexpr float eps = 1e-5f;
+    static constexpr float epssq = 1e-10f;
+
+#ifdef _OPENMP
+#pragma omp parallel
+#endif
+{
+    int progresscounter = 0;
+    float *cfa = (float*) calloc(tileSize * tileSize, sizeof *cfa);
+    float (*rgb)[tileSize * tileSize] = (float (*)[tileSize * tileSize])malloc(3 * sizeof *rgb);
+    float *VH_Dir = (float*) calloc(tileSize * tileSize, sizeof *VH_Dir);
+    float *PQ_Dir = (float*) calloc(tileSize * tileSize, sizeof *PQ_Dir);
+    float *lpf = PQ_Dir; // reuse buffer, they don't overlap in usage
+
+#ifdef _OPENMP
+    #pragma omp for schedule(dynamic) collapse(2) nowait
+#endif
+    for(int tr = 0; tr < numTh; ++tr) {
+        for(int tc = 0; tc < numTw; ++tc) {
+            const int rowStart = tr * tileSizeN;
+            const int rowEnd = std::min(rowStart + tileSize, H);
+            if(rowStart + rcdBorder == rowEnd - rcdBorder) {
+                continue;
+            }
+            const int colStart = tc * tileSizeN;
+            const int colEnd = std::min(colStart + tileSize, W);
+            if(colStart + rcdBorder == colEnd - rcdBorder) {
+                continue;
+            }
+
+            const int tileRows = std::min(rowEnd - rowStart, tileSize);
+            const int tilecols = std::min(colEnd - colStart, tileSize);
+
+            for (int row = rowStart; row < rowEnd; row++) {
+                int indx = (row - rowStart) * tileSize;
+                int c0 = FC(row, colStart);
+                int c1 = FC(row, colStart + 1);
+                int col = colStart;
+
+                for (; col < colEnd - 1; col+=2, indx+=2) {
+                    cfa[indx] = rgb[c0][indx] = LIM01(rawData[row][col] / 65535.f);
+                    cfa[indx + 1] = rgb[c1][indx + 1] = LIM01(rawData[row][col + 1] / 65535.f);
+                }
+                if(col < colEnd) {
+                    cfa[indx] = rgb[c0][indx] = LIM01(rawData[row][col] / 65535.f);
+                }
+            }
+
+            /**
+            * STEP 1: Find cardinal and diagonal interpolation directions
+            */
+
+            for (int row = 4; row < tileRows - 4; row++) {
+                for (int col = 4, indx = row * tileSize + col; col < tilecols - 4; col++, indx++) {
+                    const float cfai = cfa[indx];
+                    //Calculate h/v local discrimination
+                    float V_Stat = max(epssq, -18.f * cfai * (cfa[indx - w1] + cfa[indx + w1] + 2.f * (cfa[indx - w2] + cfa[indx + w2]) - cfa[indx - w3] - cfa[indx + w3]) - 2.f * cfai * (cfa[indx - w4] + cfa[indx + w4] - 19.f * cfai) - cfa[indx - w1] * (70.f * cfa[indx + w1] + 12.f * cfa[indx - w2] - 24.f * cfa[indx + w2] + 38.f * cfa[indx - w3] - 16.f * cfa[indx + w3] - 12.f * cfa[indx - w4] + 6.f * cfa[indx + w4] - 46.f * cfa[indx - w1]) + cfa[indx + w1] * (24.f * cfa[indx - w2] - 12.f * cfa[indx + w2] + 16.f * cfa[indx - w3] - 38.f * cfa[indx + w3] - 6.f * cfa[indx - w4] + 12.f * cfa[indx + w4] + 46.f * cfa[indx + w1]) + cfa[indx - w2] * (14.f * cfa[indx + w2] - 12.f * cfa[indx + w3] - 2.f * cfa[indx - w4] + 2.f * cfa[indx + w4] + 11.f * cfa[indx - w2]) + cfa[indx + w2] * (-12.f * cfa[indx - w3] + 2.f * (cfa[indx - w4] - cfa[indx + w4]) + 11.f * cfa[indx + w2]) + cfa[indx - w3] * (2.f * cfa[indx + w3] - 6.f * cfa[indx - w4] + 10.f * cfa[indx - w3]) + cfa[indx + w3] * (-6.f * cfa[indx + w4] + 10.f * cfa[indx + w3]) + cfa[indx - w4] * cfa[indx - w4] + cfa[indx + w4] * cfa[indx + w4]);
+                    float H_Stat = max(epssq, -18.f * cfai * (cfa[indx -  1] + cfa[indx +  1] + 2.f * (cfa[indx -  2] + cfa[indx +  2]) - cfa[indx -  3] - cfa[indx +  3]) - 2.f * cfai * (cfa[indx -  4] + cfa[indx +  4] - 19.f * cfai) - cfa[indx -  1] * (70.f * cfa[indx +  1] + 12.f * cfa[indx -  2] - 24.f * cfa[indx +  2] + 38.f * cfa[indx -  3] - 16.f * cfa[indx +  3] - 12.f * cfa[indx -  4] + 6.f * cfa[indx +  4] - 46.f * cfa[indx -  1]) + cfa[indx +  1] * (24.f * cfa[indx -  2] - 12.f * cfa[indx +  2] + 16.f * cfa[indx -  3] - 38.f * cfa[indx +  3] - 6.f * cfa[indx -  4] + 12.f * cfa[indx +  4] + 46.f * cfa[indx +  1]) + cfa[indx -  2] * (14.f * cfa[indx +  2] - 12.f * cfa[indx +  3] - 2.f * cfa[indx -  4] + 2.f * cfa[indx +  4] + 11.f * cfa[indx -  2]) + cfa[indx +  2] * (-12.f * cfa[indx -  3] + 2.f * (cfa[indx -  4] - cfa[indx +  4]) + 11.f * cfa[indx +  2]) + cfa[indx -  3] * (2.f * cfa[indx +  3] - 6.f * cfa[indx -  4] + 10.f * cfa[indx -  3]) + cfa[indx +  3] * (-6.f * cfa[indx +  4] + 10.f * cfa[indx +  3]) + cfa[indx -  4] * cfa[indx -  4] + cfa[indx +  4] * cfa[indx +  4]);
+
+                    VH_Dir[indx] = V_Stat / (V_Stat + H_Stat);
+                }
+            }
+
+            /**
+            * STEP 2: Calculate the low pass filter
+            */
+            // Step 2.1: Low pass filter incorporating green, red and blue local samples from the raw data
+
+            for (int row = 2; row < tileRows - 2; row++) {
+                for (int col = 2 + (FC(row, 0) & 1), indx = row * tileSize + col; col < tilecols - 2; col += 2, indx += 2) {
+                    lpf[indx>>1] = 0.25f * cfa[indx] + 0.125f * (cfa[indx - w1] + cfa[indx + w1] + cfa[indx - 1] + cfa[indx + 1]) + 0.0625f * (cfa[indx - w1 - 1] + cfa[indx - w1 + 1] + cfa[indx + w1 - 1] + cfa[indx + w1 + 1]);
+                }
+            }
+
+            /**
+            * STEP 3: Populate the green channel
+            */
+            // Step 3.1: Populate the green channel at blue and red CFA positions
+           for (int row = 4; row < tileRows - 4; row++) {
+               for (int col = 4 + (FC(row, 0) & 1), indx = row * tileSize + col; col < tilecols - 4; col += 2, indx += 2) {
+
+                    // Refined vertical and horizontal local discrimination
+                    float VH_Central_Value = VH_Dir[indx];
+                    float VH_Neighbourhood_Value = 0.25f * ((VH_Dir[indx - w1 - 1] + VH_Dir[indx - w1 + 1]) + (VH_Dir[indx + w1 - 1] + VH_Dir[indx + w1 + 1]));
+
+                    float VH_Disc = std::fabs(0.5f - VH_Central_Value) < std::fabs(0.5f - VH_Neighbourhood_Value) ? VH_Neighbourhood_Value : VH_Central_Value;
+
+                    // Cardinal gradients
+                    float N_Grad = eps + std::fabs(cfa[indx - w1] - cfa[indx + w1]) + std::fabs(cfa[indx] - cfa[indx - w2]) + std::fabs(cfa[indx - w1] - cfa[indx - w3]) + std::fabs(cfa[indx - w2] - cfa[indx - w4]);
+                    float S_Grad = eps + std::fabs(cfa[indx - w1] - cfa[indx + w1]) + std::fabs(cfa[indx] - cfa[indx + w2]) + std::fabs(cfa[indx + w1] - cfa[indx + w3]) + std::fabs(cfa[indx + w2] - cfa[indx + w4]);
+                    float W_Grad = eps + std::fabs(cfa[indx -  1] - cfa[indx +  1]) + std::fabs(cfa[indx] - cfa[indx -  2]) + std::fabs(cfa[indx -  1] - cfa[indx -  3]) + std::fabs(cfa[indx -  2] - cfa[indx -  4]);
+                    float E_Grad = eps + std::fabs(cfa[indx -  1] - cfa[indx +  1]) + std::fabs(cfa[indx] - cfa[indx +  2]) + std::fabs(cfa[indx +  1] - cfa[indx +  3]) + std::fabs(cfa[indx +  2] - cfa[indx +  4]);
+
+                    // Cardinal pixel estimations
+                    float N_Est = cfa[indx - w1] * (1.f + (lpf[indx>>1] - lpf[(indx - w2)>>1]) / (eps + lpf[indx>>1] + lpf[(indx - w2)>>1]));
+                    float S_Est = cfa[indx + w1] * (1.f + (lpf[indx>>1] - lpf[(indx + w2)>>1]) / (eps + lpf[indx>>1] + lpf[(indx + w2)>>1]));
+                    float W_Est = cfa[indx -  1] * (1.f + (lpf[indx>>1] - lpf[(indx -  2)>>1]) / (eps + lpf[indx>>1] + lpf[(indx -  2)>>1]));
+                    float E_Est = cfa[indx +  1] * (1.f + (lpf[indx>>1] - lpf[(indx +  2)>>1]) / (eps + lpf[indx>>1] + lpf[(indx +  2)>>1]));
+
+                    // Vertical and horizontal estimations
+                    float V_Est = (S_Grad * N_Est + N_Grad * S_Est) / (N_Grad + S_Grad);
+                    float H_Est = (W_Grad * E_Est + E_Grad * W_Est) / (E_Grad + W_Grad);
+
+                    // G@B and G@R interpolation
+                    rgb[1][indx] = VH_Disc * H_Est + (1.f - VH_Disc) * V_Est;
+
+                }
+            }
+            /**
+            * STEP 4: Populate the red and blue channels
+            */
+
+            // Step 4.1: Calculate P/Q diagonal local discrimination
+            for (int row = rcdBorder - 4; row < tileRows - rcdBorder + 4; row++) {
+                for (int col = rcdBorder - 4 + (FC(row, rcdBorder) & 1), indx = row * tileSize + col; col < tilecols - rcdBorder + 4; col += 2, indx += 2) {
+                    const float cfai = cfa[indx];
+
+                    float P_Stat = max(epssq, - 18.f * cfai * (cfa[indx - w1 - 1] + cfa[indx + w1 + 1] + 2.f * (cfa[indx - w2 - 2] + cfa[indx + w2 + 2]) - cfa[indx - w3 - 3] - cfa[indx + w3 + 3]) - 2.f * cfai * (cfa[indx - w4 - 4] + cfa[indx + w4 + 4] - 19.f * cfai) - cfa[indx - w1 - 1] * (70.f * cfa[indx + w1 + 1] - 12.f * cfa[indx - w2 - 2] + 24.f * cfa[indx + w2 + 2] - 38.f * cfa[indx - w3 - 3] + 16.f * cfa[indx + w3 + 3] + 12.f * cfa[indx - w4 - 4] - 6.f * cfa[indx + w4 + 4] + 46.f * cfa[indx - w1 - 1]) + cfa[indx + w1 + 1] * (24.f * cfa[indx - w2 - 2] - 12.f * cfa[indx + w2 + 2] + 16.f * cfa[indx - w3 - 3] - 38.f * cfa[indx + w3 + 3] - 6.f * cfa[indx - w4 - 4] + 12.f * cfa[indx + w4 + 4] + 46.f * cfa[indx + w1 + 1]) + cfa[indx - w2 - 2] * (14.f * cfa[indx + w2 + 2] - 12.f * cfa[indx + w3 + 3] - 2.f * (cfa[indx - w4 - 4] - cfa[indx + w4 + 4]) + 11.f * cfa[indx - w2 - 2]) - cfa[indx + w2 + 2] * (12.f * cfa[indx - w3 - 3] + 2.f * (cfa[indx - w4 - 4] - cfa[indx + w4 + 4]) + 11.f * cfa[indx + w2 + 2]) + cfa[indx - w3 - 3] * (2.f * cfa[indx + w3 + 3] - 6.f * cfa[indx - w4 - 4] + 10.f * cfa[indx - w3 - 3]) - cfa[indx + w3 + 3] * (6.f * cfa[indx + w4 + 4] + 10.f * cfa[indx + w3 + 3]) + cfa[indx - w4 - 4] * cfa[indx - w4 - 4] + cfa[indx + w4 + 4] * cfa[indx + w4 + 4]);
+                    float Q_Stat = max(epssq, - 18.f * cfai * (cfa[indx + w1 - 1] + cfa[indx - w1 + 1] + 2.f * (cfa[indx + w2 - 2] + cfa[indx - w2 + 2]) - cfa[indx + w3 - 3] - cfa[indx - w3 + 3]) - 2.f * cfai * (cfa[indx + w4 - 4] + cfa[indx - w4 + 4] - 19.f * cfai) - cfa[indx + w1 - 1] * (70.f * cfa[indx - w1 + 1] - 12.f * cfa[indx + w2 - 2] + 24.f * cfa[indx - w2 + 2] - 38.f * cfa[indx + w3 - 3] + 16.f * cfa[indx - w3 + 3] + 12.f * cfa[indx + w4 - 4] - 6.f * cfa[indx - w4 + 4] + 46.f * cfa[indx + w1 - 1]) + cfa[indx - w1 + 1] * (24.f * cfa[indx + w2 - 2] - 12.f * cfa[indx - w2 + 2] + 16.f * cfa[indx + w3 - 3] - 38.f * cfa[indx - w3 + 3] - 6.f * cfa[indx + w4 - 4] + 12.f * cfa[indx - w4 + 4] + 46.f * cfa[indx - w1 + 1]) + cfa[indx + w2 - 2] * (14.f * cfa[indx - w2 + 2] - 12.f * cfa[indx - w3 + 3] - 2.f * (cfa[indx + w4 - 4] - cfa[indx - w4 + 4]) + 11.f * cfa[indx + w2 - 2]) - cfa[indx - w2 + 2] * (12.f * cfa[indx + w3 - 3] + 2.f * (cfa[indx + w4 - 4] - cfa[indx - w4 + 4]) + 11.f * cfa[indx - w2 + 2]) + cfa[indx + w3 - 3] * (2.f * cfa[indx - w3 + 3] - 6.f * cfa[indx + w4 - 4] + 10.f * cfa[indx + w3 - 3]) - cfa[indx - w3 + 3] * (6.f * cfa[indx - w4 + 4] + 10.f * cfa[indx - w3 + 3]) + cfa[indx + w4 - 4] * cfa[indx + w4 - 4] + cfa[indx - w4 + 4] * cfa[indx - w4 + 4]);
+
+                    PQ_Dir[indx] = P_Stat / (P_Stat + Q_Stat);
+
+                }
+            }
+
+            // Step 4.2: Populate the red and blue channels at blue and red CFA positions
+            for (int row = rcdBorder - 3; row < tileRows - rcdBorder + 3; row++) {
+                for (int col = rcdBorder - 3 + (FC(row, rcdBorder - 1) & 1), indx = row * tileSize + col, c = 2 - FC(row, col); col < tilecols - rcdBorder + 3; col += 2, indx += 2) {
+
+                    // Refined P/Q diagonal local discrimination
+                    float PQ_Central_Value   = PQ_Dir[indx];
+                    float PQ_Neighbourhood_Value = 0.25f * (PQ_Dir[indx - w1 - 1] + PQ_Dir[indx - w1 + 1] + PQ_Dir[indx + w1 - 1] + PQ_Dir[indx + w1 + 1]);
+
+                    float PQ_Disc = (std::fabs(0.5f - PQ_Central_Value) < std::fabs(0.5f - PQ_Neighbourhood_Value)) ? PQ_Neighbourhood_Value : PQ_Central_Value;
+
+                    // Diagonal gradients
+                    float NW_Grad = eps + std::fabs(rgb[c][indx - w1 - 1] - rgb[c][indx + w1 + 1]) + std::fabs(rgb[c][indx - w1 - 1] - rgb[c][indx - w3 - 3]) + std::fabs(rgb[1][indx] - rgb[1][indx - w2 - 2]);
+                    float NE_Grad = eps + std::fabs(rgb[c][indx - w1 + 1] - rgb[c][indx + w1 - 1]) + std::fabs(rgb[c][indx - w1 + 1] - rgb[c][indx - w3 + 3]) + std::fabs(rgb[1][indx] - rgb[1][indx - w2 + 2]);
+                    float SW_Grad = eps + std::fabs(rgb[c][indx - w1 + 1] - rgb[c][indx + w1 - 1]) + std::fabs(rgb[c][indx + w1 - 1] - rgb[c][indx + w3 - 3]) + std::fabs(rgb[1][indx] - rgb[1][indx + w2 - 2]);
+                    float SE_Grad = eps + std::fabs(rgb[c][indx - w1 - 1] - rgb[c][indx + w1 + 1]) + std::fabs(rgb[c][indx + w1 + 1] - rgb[c][indx + w3 + 3]) + std::fabs(rgb[1][indx] - rgb[1][indx + w2 + 2]);
+
+                    // Diagonal colour differences
+                    float NW_Est = rgb[c][indx - w1 - 1] - rgb[1][indx - w1 - 1];
+                    float NE_Est = rgb[c][indx - w1 + 1] - rgb[1][indx - w1 + 1];
+                    float SW_Est = rgb[c][indx + w1 - 1] - rgb[1][indx + w1 - 1];
+                    float SE_Est = rgb[c][indx + w1 + 1] - rgb[1][indx + w1 + 1];
+
+                    // P/Q estimations
+                    float P_Est = (NW_Grad * SE_Est + SE_Grad * NW_Est) / (NW_Grad + SE_Grad);
+                    float Q_Est = (NE_Grad * SW_Est + SW_Grad * NE_Est) / (NE_Grad + SW_Grad);
+
+                    // R@B and B@R interpolation
+                    rgb[c][indx] = rgb[1][indx] + (1.f - PQ_Disc) * P_Est + PQ_Disc * Q_Est;
+
+                }
+            }
+
+            // Step 4.3: Populate the red and blue channels at green CFA positions
+            for (int row = rcdBorder; row < tileRows - rcdBorder; row++) {
+                for (int col = rcdBorder + (FC(row, rcdBorder - 1) & 1), indx = row * tileSize + col; col < tilecols - rcdBorder; col += 2, indx += 2) {
+
+                    // Refined vertical and horizontal local discrimination
+                    float VH_Central_Value   = VH_Dir[indx];
+                    float VH_Neighbourhood_Value = 0.25f * ((VH_Dir[indx - w1 - 1] + VH_Dir[indx - w1 + 1]) + (VH_Dir[indx + w1 - 1] + VH_Dir[indx + w1 + 1]));
+
+                    float VH_Disc = (std::fabs(0.5f - VH_Central_Value) < std::fabs(0.5f - VH_Neighbourhood_Value)) ? VH_Neighbourhood_Value : VH_Central_Value;
+                    float rgb1 = rgb[1][indx];
+                    float N1 = eps + std::fabs(rgb1 - rgb[1][indx - w2]);
+                    float S1 = eps + std::fabs(rgb1 - rgb[1][indx + w2]);
+                    float W1 = eps + std::fabs(rgb1 - rgb[1][indx -  2]);
+                    float E1 = eps + std::fabs(rgb1 - rgb[1][indx +  2]);
+
+                    float rgb1mw1 = rgb[1][indx - w1];
+                    float rgb1pw1 = rgb[1][indx + w1];
+                    float rgb1m1 = rgb[1][indx - 1];
+                    float rgb1p1 = rgb[1][indx + 1];
+                    for (int c = 0; c <= 2; c += 2) {
+                        // Cardinal gradients
+                        float SNabs = std::fabs(rgb[c][indx - w1] - rgb[c][indx + w1]);
+                        float EWabs = std::fabs(rgb[c][indx -  1] - rgb[c][indx +  1]);
+                        float N_Grad = N1 + SNabs + std::fabs(rgb[c][indx - w1] - rgb[c][indx - w3]);
+                        float S_Grad = S1 + SNabs + std::fabs(rgb[c][indx + w1] - rgb[c][indx + w3]);
+                        float W_Grad = W1 + EWabs + std::fabs(rgb[c][indx -  1] - rgb[c][indx -  3]);
+                        float E_Grad = E1 + EWabs + std::fabs(rgb[c][indx +  1] - rgb[c][indx +  3]);
+
+                        // Cardinal colour differences
+                        float N_Est = rgb[c][indx - w1] - rgb1mw1;
+                        float S_Est = rgb[c][indx + w1] - rgb1pw1;
+                        float W_Est = rgb[c][indx -  1] - rgb1m1;
+                        float E_Est = rgb[c][indx +  1] - rgb1p1;
+
+                        // Vertical and horizontal estimations
+                        float V_Est = (N_Grad * S_Est + S_Grad * N_Est) / (N_Grad + S_Grad);
+                        float H_Est = (E_Grad * W_Est + W_Grad * E_Est) / (E_Grad + W_Grad);
+
+                        // R@G and B@G interpolation
+                        rgb[c][indx] = rgb1 + (1.f - VH_Disc) * V_Est + VH_Disc * H_Est;
+
+                    }
+                }
+            }
+
+            for (int row = rowStart + rcdBorder; row < rowEnd - rcdBorder; ++row) {
+                for (int col = colStart + rcdBorder; col < colEnd - rcdBorder; ++col) {
+                    int idx = (row - rowStart) * tileSize + col - colStart ;
+                    red[row][col] = CLIP(rgb[0][idx] * 65535.f);
+                    green[row][col] = CLIP(rgb[1][idx] * 65535.f);
+                    blue[row][col] = CLIP(rgb[2][idx] * 65535.f);
+                }
+            }
+
+            if(plistener) {
+                progresscounter++;
+                if(progresscounter % 32 == 0) {
+#ifdef _OPENMP
+                    #pragma omp critical (rcdprogress)
+#endif
+                    {
+                        progress += (double)32 * ((tileSizeN) * (tileSizeN)) / (H * W);
+                        progress = progress > 1.0 ? 1.0 : progress;
+                        plistener->setProgress(progress);
+                    }
+                }
+            }
+
+        }
+    }
+
+    free(cfa);
+    free(rgb);
+    free(VH_Dir);
+    free(PQ_Dir);
+}
+
+    border_interpolate2(W, H, rcdBorder);
+
+    if (plistener) {
+        plistener->setProgress(1);
+    }
+    // -------------------------------------------------------------------------
+}
+
+} /* namespace */
diff --git a/rtengine/rtengine.h b/rtengine/rtengine.h
index 15dc69413..c257735e3 100644
--- a/rtengine/rtengine.h
+++ b/rtengine/rtengine.h
@@ -370,7 +370,7 @@ class ImageTypeListener
 {
 public :
     virtual ~ImageTypeListener() = default;
-    virtual void imageTypeChanged (bool isRaw, bool isBayer, bool isXtrans) = 0;
+    virtual void imageTypeChanged (bool isRaw, bool isBayer, bool isXtrans, bool is_Mono = false) = 0;
 };
 
 class WaveletListener
diff --git a/rtengine/settings.h b/rtengine/settings.h
index e1f11a183..ae46ea5a7 100644
--- a/rtengine/settings.h
+++ b/rtengine/settings.h
@@ -80,7 +80,6 @@ public:
     //  double          colortoningab; //
     //  double          decaction;
     //  bool            bw_complementary;
-    double          artifact_cbdl;
     double          level0_cbdl;
     double          level123_cbdl;
     int             nspot;
diff --git a/rtengine/shmap.cc b/rtengine/shmap.cc
index b34aac7af..f819bf1bd 100644
--- a/rtengine/shmap.cc
+++ b/rtengine/shmap.cc
@@ -124,7 +124,7 @@ void SHMap::update (Imagefloat* img, double radius, double lumi[3], bool hq, int
         rangefn[lutSize - 1] = 1e-15f;
 
         // We need one temporary buffer
-        const JaggedArray<float> buffer (W, H);
+        JaggedArray<float> buffer (W, H);
 
         // the final result has to be in map
         // for an even number of levels that means: map => buffer, buffer => map
@@ -255,7 +255,7 @@ void SHMap::updateL (float** L, double radius, bool hq, int skip)
         //printf("lut=%d rf5=%f rfm=%f\n thre=%f",lutSize, rangefn[5],rangefn[lutSize-10],thresh );
 
         // We need one temporary buffer
-        const JaggedArray<float> buffer (W, H);
+        JaggedArray<float> buffer (W, H);
 
         // the final result has to be in map
         // for an even number of levels that means: map => buffer, buffer => map
diff --git a/rtengine/simpleprocess.cc b/rtengine/simpleprocess.cc
index aecafa4c9..790892b72 100644
--- a/rtengine/simpleprocess.cc
+++ b/rtengine/simpleprocess.cc
@@ -2718,6 +2718,11 @@ private:
         if (params.raw.bayersensor.method == procparams::RAWParams::BayerSensor::getMethodString(procparams::RAWParams::BayerSensor::Method::PIXELSHIFT)) {
             params.raw.bayersensor.method = procparams::RAWParams::BayerSensor::getMethodString(params.raw.bayersensor.pixelShiftLmmse ? procparams::RAWParams::BayerSensor::Method::LMMSE : procparams::RAWParams::BayerSensor::Method::AMAZE);
         }
+
+        // Use Rcd instead of Amaze for fast export
+        if (params.raw.bayersensor.method == procparams::RAWParams::BayerSensor::getMethodString(procparams::RAWParams::BayerSensor::Method::AMAZE)) {
+            params.raw.bayersensor.method = procparams::RAWParams::BayerSensor::getMethodString(procparams::RAWParams::BayerSensor::Method::RCD);
+        }
     }
 
 private:
diff --git a/rtengine/sleefsseavx.c b/rtengine/sleefsseavx.c
index 574499571..c68f11ae0 100644
--- a/rtengine/sleefsseavx.c
+++ b/rtengine/sleefsseavx.c
@@ -970,6 +970,8 @@ static INLINE vmask vmaskf_isinf(vfloat d) { return vmaskf_eq(vabsf(d), vcast_vf
 static INLINE vmask vmaskf_ispinf(vfloat d) { return vmaskf_eq(d, vcast_vf_f(INFINITYf)); }
 static INLINE vmask vmaskf_isminf(vfloat d) { return vmaskf_eq(d, vcast_vf_f(-INFINITYf)); }
 static INLINE vmask vmaskf_isnan(vfloat d) { return vmaskf_neq(d, d); }
+// the following is equivalent to vorm(vmaskf_isnan(a), vmaskf_isnan(b)), but faster
+static INLINE vmask vmaskf_isnan(vfloat a, vfloat b) { return (vmask)_mm_cmpunord_ps(a, b); }
 static INLINE vfloat visinf2f(vfloat d, vfloat m) { return (vfloat)vandm(vmaskf_isinf(d), vorm(vsignbitf(d), (vmask)m)); }
 static INLINE vfloat visinff(vfloat d) { return visinf2f(d, vcast_vf_f(1.0f)); }
 
@@ -1201,7 +1203,7 @@ static INLINE vfloat xatan2f(vfloat y, vfloat x) {
   r = vself(vmaskf_isinf(y), vsubf(vcast_vf_f((float)(rtengine::RT_PI/2)), visinf2f(x, vmulsignf(vcast_vf_f((float)(rtengine::RT_PI/4)), x))), r);
   r = vself(vmaskf_eq(y, vcast_vf_f(0.0f)), vselfzero(vmaskf_eq(vsignf(x), vcast_vf_f(-1.0f)), vcast_vf_f((float)rtengine::RT_PI)), r);
 
-  return vself(vorm(vmaskf_isnan(x), vmaskf_isnan(y)), vcast_vf_f(NANf), vmulsignf(r, y));
+  return vself(vmaskf_isnan(x, y), vcast_vf_f(NANf), vmulsignf(r, y));
 }
 
 static INLINE vfloat xasinf(vfloat d) {
diff --git a/rtengine/stdimagesource.h b/rtengine/stdimagesource.h
index 304bc3d8d..62cca7d4f 100644
--- a/rtengine/stdimagesource.h
+++ b/rtengine/stdimagesource.h
@@ -52,6 +52,7 @@ public:
     ColorTemp   getSpotWB   (std::vector<Coord2D> &red, std::vector<Coord2D> &green, std::vector<Coord2D> &blue, int tran, double equal);
 
     eSensorType getSensorType() const {return ST_NONE;}
+    bool isMono() const {return false;}
 
     bool        isWBProviderReady ()
     {
diff --git a/rtgui/dirpyrequalizer.cc b/rtgui/dirpyrequalizer.cc
index 69d3af198..8ceeed96a 100644
--- a/rtgui/dirpyrequalizer.cc
+++ b/rtgui/dirpyrequalizer.cc
@@ -196,6 +196,7 @@ void DirPyrEqualizer::read (const ProcParams* pp, const ParamsEdited* pedited)
     */
     gamutlabConn.block (true);
     gamutlab->set_active (pp->dirpyrequalizer.gamutlab);
+    gamutlab->set_sensitive (pp->dirpyrequalizer.skinprotect != 0);
     gamutlabConn.block (false);
     lastgamutlab = pp->dirpyrequalizer.gamutlab;
 
@@ -339,6 +340,7 @@ void DirPyrEqualizer::adjusterChanged (Adjuster* a, double newval)
                                             Glib::ustring::format(std::fixed, std::setprecision(2), threshold->getValue()))
                                    );
         } else if (a == skinprotect) {
+            gamutlab->set_sensitive (skinprotect->getValue() != 0);
             listener->panelChanged (EvDirPyrEqualizerSkin,
                                     Glib::ustring::compose("%1",
                                             Glib::ustring::format(std::fixed, std::setprecision(2), skinprotect->getValue()))
diff --git a/rtgui/options.cc b/rtgui/options.cc
index 44f6942a7..0b9ef150b 100644
--- a/rtgui/options.cc
+++ b/rtgui/options.cc
@@ -552,7 +552,6 @@ void Options::setDefaults ()
     rtSettings.gamutICC = true;
     rtSettings.gamutLch = true;
     rtSettings.amchroma = 40;//between 20 and 140   low values increase effect..and also artifacts, high values reduces
-    rtSettings.artifact_cbdl = 4.;
     rtSettings.level0_cbdl = 0;
     rtSettings.level123_cbdl = 30;
 //locallab
@@ -1448,9 +1447,6 @@ void Options::readFromFile (Glib::ustring fname)
                                     rtSettings.viewinggreySc = keyFile.get_integer ("Color Management", "greySc");
                                 }
                 */
-                if (keyFile.has_key ("Color Management", "CBDLArtif")) {
-                    rtSettings.artifact_cbdl = keyFile.get_double ("Color Management", "CBDLArtif");
-                }
 
                 if (keyFile.has_key ("Color Management", "CBDLlevel0")) {
                     rtSettings.level0_cbdl = keyFile.get_double ("Color Management", "CBDLlevel0");
@@ -2055,7 +2051,6 @@ void Options::saveToFile (Glib::ustring fname)
         keyFile.set_integer ("Color Management", "CRI", rtSettings.CRI_color);
         keyFile.set_integer ("Color Management", "DenoiseLabgamma", rtSettings.denoiselabgamma);
         //keyFile.set_boolean ("Color Management", "Ciebadpixgauss", rtSettings.ciebadpixgauss);
-        keyFile.set_double  ("Color Management", "CBDLArtif", rtSettings.artifact_cbdl);
         keyFile.set_double  ("Color Management", "CBDLlevel0", rtSettings.level0_cbdl);
         keyFile.set_double  ("Color Management", "CBDLlevel123", rtSettings.level123_cbdl);
         //keyFile.set_double  ("Color Management", "Colortoningab", rtSettings.colortoningab);
diff --git a/rtgui/toolpanelcoord.cc b/rtgui/toolpanelcoord.cc
index 66c4ff5f2..e267b7bcc 100644
--- a/rtgui/toolpanelcoord.cc
+++ b/rtgui/toolpanelcoord.cc
@@ -273,7 +273,7 @@ ToolPanelCoordinator::~ToolPanelCoordinator()
     delete toolBar;
 }
 
-void ToolPanelCoordinator::imageTypeChanged(bool isRaw, bool isBayer, bool isXtrans)
+void ToolPanelCoordinator::imageTypeChanged (bool isRaw, bool isBayer, bool isXtrans, bool isMono)
 {
     if (isRaw) {
         if (isBayer) {
@@ -304,7 +304,20 @@ void ToolPanelCoordinator::imageTypeChanged(bool isRaw, bool isBayer, bool isXtr
             };
             idle_register.add(func, this);
         }
-        else {
+        else if (isMono) {
+            const auto func = [](gpointer data) -> gboolean {
+                ToolPanelCoordinator* const self = static_cast<ToolPanelCoordinator*>(data);
+
+                self->rawPanelSW->set_sensitive (true);
+                self->sensorbayer->FoldableToolPanel::hide();
+                self->sensorxtrans->FoldableToolPanel::hide();
+                self->preprocess->FoldableToolPanel::hide();
+                self->flatfield->FoldableToolPanel::show();
+
+                return FALSE;
+            };
+            idle_register.add(func, this);
+        } else {
             const auto func = [](gpointer data) -> gboolean {
                 ToolPanelCoordinator* const self = static_cast<ToolPanelCoordinator*>(data);
 
diff --git a/rtgui/toolpanelcoord.h b/rtgui/toolpanelcoord.h
index 265fca687..d53973231 100644
--- a/rtgui/toolpanelcoord.h
+++ b/rtgui/toolpanelcoord.h
@@ -223,7 +223,7 @@ public:
     // toolpanellistener interface
     void panelChanged(rtengine::ProcEvent event, const Glib::ustring& descr);
 
-    void imageTypeChanged(bool isRaw, bool isBayer, bool isXtrans);
+    void imageTypeChanged (bool isRaw, bool isBayer, bool isXtrans, bool isMono = false);
     // profilechangelistener interface
     void profileChange(const rtengine::procparams::PartialProfile* nparams, rtengine::ProcEvent event, const Glib::ustring& descr, const ParamsEdited* paramsEdited = nullptr);
     void setDefaults(rtengine::procparams::ProcParams* defparams);