-
Auto Encoder Dims ( 32-2048 ?:help ) : Auto encoder dims setting, affects overall ability of the model to learn faces.
-
Inter Dims ( 32-2048 ?:help ) : Inter dims setting, affects overall ability of the model to learn faces, should be equal or higher than Auto Encoder dims, AMP ONLY.
-
Encoder Dims ( 16-256 ?:help ) : Encoder dims setting, affects ability of the encoder to learn faces.
-
Decoder Dims ( 16-256 ?:help ) : Decoder dims setting, affects ability of the decoder to recreate faces.
-
Decoder Mask Dims ( 16-256 ?:help ) : Mask decoder dims setting, affects quality of the learned masks. May or may not affect some other aspects of training.
-
Morph factor ( 0.1 .. 0.5 ?:help ) : Affects how much the model will morph your predicted faces to look and express more like your SRC, typical and recommended value is 0.5. (I need to test this personally, didn't use AMP yet so don't know if higher or lower value is better).
-
Masked training ( y/n ?:help ) : Prioritizes training of what's masked (default mask or applied xseg mask), available only for WF and HEAD face types, disabling it trains the whole sample area (including background) at the same priority as the face itself. Default value is y (enabled).
-
Eyes and mouth priority ( y/n ?:help ) : Attempts to fix problems with eyes and mouth (including teeth) by training them at higher priority, can improve their sharpness/level of detail too.
-
Uniform_yaw ( y/n ?:help ) : Helps with training of profile faces, forces model to train evenly on all faces depending on their yaw and prioritizes profile faces, may cause frontal faces to train slower, enabled by default during pretraining, can be used while RW is enabled to improve generalization of profile/side faces or when RW is disabled to improve quality and sharpness/detail of those faces. Useful when your source dataset doesn't have many profile shots. Can help lower loss values. Default value is n (disabled).
-
Blur our mask ( y/n ?:help ) : Blurs area outside of the masked area to make it more smoother. With masked training enabled, background is trained with lower priority than face area so it's more prone to artifacts and noise, you can combine blur out mask with background style power to get background that is both closer to background of DST faces and also smoother due to the additional blurring this option provides. The same XSeg model must be used to apply masks to both SRC and DST dataset.
-
Place models and optimizer on GPU ( y/n ?:help ) : Enabling GPU optimizer puts all the load on your GPU which greatly improves performance (iteration time) but will lead to higher VRAM usage, disabling this feature will off load some work of the optimizer to CPU which decreases load on GPU and VRAM usage thus letting you achieve higher batch size or run more demanding models at the cost of longer iteration times. If you get OOM (out of memory) error and you don't want to lower your batch size or disable some feature you should disable this feature and thus some work will be offloaded to your CPU and you will be able to run your model without OOM errors at the cost of lower speed. Default value is y (enabled).
-
Use AdaBelief optimizer? ( y/n ?:help ) : AdaBelief (AB) is a new model optimizer which increases model accuracy and quality of trained faces, when this option is enabled it replaces the default RMSProp optimizer. However those improvements come at a cost of higher VRAM usage. When using AdaBelief LRD is optional but still recommended and should be enabled (LRD) before running GAN training. Default value is Y.
-
Use learning rate dropout ( y/n/cpu ?:help ) : LRD is used to accelerate training of faces and reduces sub-pixel shake (reduces face shaking and to some degree can reduce lighting flicker as well). It's primarly used in 3 cases:
-
before disabling RW, when loss values aren't improving by a lot anymore, this can help model to generalize faces a bit more
-
after RW has been disabled and you've trained the model well enough enabling it near the end of training will result in more detailed, stable faces that are less prone to flicker This option affects VRAM usage so if you run into OOM errors you can run it on CPU at the cost of 20% slower iteration times or just lower your batch size. For more detailed explanation of LRD and order of enabling main features during training please refer to FAQ Question 8
-
Enable random warp of samples ( y/n ?:help ) : Random warp is used to generalize a model so that it correctly learns face features and expressions in the initial training stage but as long as it's enabled the model may have trouble learning the fine detail - because of it it's recommended to keep this feature enabled as long as your faces are still improving (by looking at decreasing loss values and faces in the preview window improving) and once all look correct (and loss isn't decreasing anymore) you should disable it to start learning details, from then you don't re-enable it unless you ruin the results by applying to high values for certain settings (style power, true face, etc) or when you want to reuse that model for training of new target video with the same source or when reusing with combination of both new SRC and DST, you always start training with RW enabled. Default value is y (enabled).
-
Enable HSV power ( 0.0 .. 0.3 ?:help ) : Applies random hue, saturation and brightness changes to only your SRC dataset during training to improve color stability (reduce flicker) and may also affect color matching of the final result, this option has an effect of slightly averaging out colors of your SRC set as the HSV shift of SRC samples is based only on color information from SRC samples and it can be combined with color transfer (CT), power (quality) of which this option reduces or used without it if you happen to get better results without CT but need to just make the colors of the resulting face sligtly more stable and consistent, requires your SRC dataset to have lots of variety in terms of lighting conditions (direction, strenght and color tone), recommended value is 0.05.
-
GAN power ( 0.0 .. 10.0 ?:help ) : GAN stands for Generative Adversarial Network and in case of DFL 2.0 it is implemented as an additional way of training to get more detailed/sharp faces. This option is adjustable on a scale from 0.0 to 10.0 and it should only be enabled once the model is more or less fully trained (after you've disabled random warp of samples and enabled LRD). It's recommended to use low values like 0.01. Make sure to backup your model before you start training (in case you don't like results, get artifcats or your model collapses). Once enabled two more settings will be presented to adjust internal parameters of GAN:
-
[1/8th of RES] GAN patch size ( 3-640 ?:help ) : Improves quality of GAN training at the cost of higher VRAM usage, default value is 1/8th of your resolution.
-
GAN dimensions ( 4-64 ?:help ) : The dimensions of the GAN network. The higher the dimensions, the more VRAM is required but it can also improve quality, you can get sharp edges even at the lowest setting and because of thise default value of 16 is recommended but you can reduce it to 12-14 to save some performance if you need to.
-
'True face' power. ( 0.0000 .. 1.0 ?:help ) : True face training with a variable power settings let's you set the model discriminator to higher or lower value, what this does is it tries to make the final face look more like src, as a side effect it can make faces appear sharper but can also alter lighting and color matching and in extreme cases even make faces appear to change angle as the model will try to generate face that looks closer to the training sample, as with GAN this feature should only be enabled once random warp is disabled and model is fairly well trained. Consider making a backup before enabling this feature. Never use high values, typical value is 0.01 but you can use even lower ones like 0.001. It has a small performance impact. Default value is 0.0 (disabled).
-
Face style power ( 0.0..100.0 ?:help ) and Background style power ( 0.0..100.0 ?:help ) : This setting controls style transfer of either face (FSP) or background (BSP) part of the image, it is used to transfer the color information from your target/destination faces (data_dst) over to the final predicted faces, thus improving the lighting and color match but high values can cause the predicted face to look less like your source face and more like your target face. Start with small values like 0.001-0.1 and increase or decrease them depending on your needs. This feature has impact on memory usage and can cause OOM error, forcing you to lower your batch size in order to use it. For Background Style Power (BSP) higher values can be used as we don't care much about preserving SRC backgrounds, recommended value by DFL for BSP is 2.0 but you can also experiment with different values for the background. Consider making a backup before enabling this feature as it can also lead to artifacts and model collapse. Default value is 0.0 (disabled).
-
Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : this features is used to match the colors of your data_src to the data_dst so that the final result has similar skin color/tone to the data_dst and the final result after training doesn't change colors when face moves around, commonly reffered to as flickering/flicker/color shift/color change (which may happen if various face angles were taken from various sources that contained different light conditions or were color graded differently). There are several options to choose from:
-
none: because sometimes less is better and in some cases you might get better results without any color transfer during training.
-
rct (reinhard color transfer): based on: https://www.cs.tau.ac.il/~turkel/imagepapers/ColorTransfer.pdf
-
lct (linear color transfer): Matches the color distribution of the target image to that of the source image using a linear transform.
-
mkl (Monge-Kantorovitch linear): based on: http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Main.Publications/fpitie07b.pdf
-
idt (Iterative Distribution Transfer): based on: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.158.1052&rep=rep1&type=pdf
-
sot (sliced optimal transfer): based on: https://dcoeurjo.github.io/OTColorTransfer/
Most color transfers have little to no affect on performance or VRAM usage with exception of SOT which has performance effect during training and can severly slow down the merging process if used during merging, other color transfers like IDT may also have a performance impact during merging.
Using color transfers is not always required but quite often helps and in some cases is absolutely mandatory, you should remember also that enabling them acts as an augmentation of the set, effectively creating new conditions for all of the SRC samples, thus increasing the complexity of the training data which can result in higher loss when enabled and naturally will mean the model will have to be trained longer to achieve the same state compared to training without color transfer where faces never change colors that much. This option can be combined with Random HSV Power which provides additional augmentation of the SRC set based of colors of just SRC set alone (unlike CT which augments SRC based of DST), effectively slightly averaging it's colors, providing additional color conditions CT methods may not achieve and it also reduces the effect of CT slightly (referred to as CT quality reduction by iperov in official notes).
-
Enable gradient clipping ( y/n ?:help ) : This feature is implemented to prevent so called model collapse/corruption which may occur when using various features of DFL 2.0. It has small performance impact so if you really don't want to use it you must enable auto backups as a collapsed model cannot recover and must be scraped and training must be started all over. Default value is n (disabled) but since the performance impact is so low and it can save you a lot of time by preventing model collapse if you leave it enabled. Model collapse is most likely to happen when using Style Powers so if you're using them it's highly advised to enable gradient clipping or backups (you can also do them manually).
-
Enable pretraining mode ( y/n ?:help ) : Enables pretraining process that uses a dataset of random people to initially pretrain your model, after training it to around 400k-500k iterations such model can be then used when starting training with actual data_src and data_dst you want to train, it saves time because the model will already know how faces should look like and thus make it so it takes less time for faces to appear clearly when training (make sure to disable pretrain when you train on your actual data_src and data_dst). Models using -D architecture variants must be pretrained and it's also highly recommended to pretrain all models.