KerasでのDrop-Activationの実装と検証

KerasでDrop-Activationを実装し、性能を検証したのでその記録です。

以下の検証に関するコードはgithubにあげてあります。
github.com

1. 本記事の概要

Deep learningの正則化手法であるDrop-Activationの概要説明
KerasでのDrop-Activationの実装
Cifar10でのDrop-Activationの検証

2. Drop-Activationの概要

Drop-ActivationはDeep learningの正則化手法です。原論文はこちらです。
[1811.05850] Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Deep learningの正則化手法として有名な手法としてDropoutがあります。Dropoutは過学習対策に有効ですが、Batch Normalizationと一緒に使用するとDLの性能が悪化することが知られています。
Drop-ActivationはDropoutと似たような概念でありながら、Batch Normalizationと共存できる手法です。

Drop-Activationの理論を以下に示します。わりと簡単です。

前提
- 活性化関数f、活性化関数への入力x、活性化関数の出力f(x)
- 活性化関数を非活性にする確率p (0<=p<=1)
学習時
- 確率pに従い、活性化関数の活性or非活性を決め、活性化関数の出力を計算する。
  - 活性：活性化関数の出力をf(x)とする。
  - 非活性：活性化関数の出力をxとする。(活性化関数をただの線形関数にする)
テスト時
- 活性化関数の出力を px+(1-p)f(x) で計算する。

f:id:st1990:20191024134451p:plain — Drop-Activationのイメージ図(原論文より引用)

原論文での精度検証結果を以下に引用します。Baselineの活性化関数はReluです。Randomized Reluとの比較や、CutoutやAutoAugを使ったときの検証がされています。なかなかいいですね。

f:id:st1990:20191024135638p:plain — fig2-3 (原論文より引用)

f:id:st1990:20191024135718p:plain — table1 (原論文より引用)

f:id:st1990:20191024135739p:plain — table4 (原論文より引用)

3. 実装

Kerasの自作レイヤーを作る機能を使って、Drop-Activationレイヤーを作成しました。
オリジナルのKerasレイヤーを作成する - Keras Documentation
学習時に一定の確率で非活性にする部分には既存のdropoutを活用しました。

こちらの実装も参考にしました。
GitHub - JGuillaumin/drop_activation_tf: DropActivation implementation for TensorFlow and Keras

"""
Reference:
    drop-activation original paper : https://arxiv.org/abs/1811.05850
    implementation in keras by JGuillaumin : https://github.com/JGuillaumin/drop_activation_tf
"""

from keras import backend as K
from keras.engine.topology import Layer
import tensorflow as tf

class DropActivation(Layer):
    def __init__(self, rate=0.05, seed=None, activation_func=None, **kwargs):
        """
        Args:
            rate : drop rate,
            seed : random seed,
            activation_func : function of activation function. use K.relu if is None.
        """
        super(DropActivation, self).__init__(**kwargs)
        self.supports_masking = True

        self.RATE = rate
        self.SEED = seed
        self.ACTIVATION_FUNC = activation_func if activation_func is not None else K.relu

        return

    def call(self, inputs, training=None):
        if training is None:
            training = K.learning_phase()
        
        def oup_tensor_in_training():
            # input tensor
            inp_tensor = tf.convert_to_tensor(inputs)
            # 0:drop, 1:retain
            mask = K.ones_like(inp_tensor, dtype=K.floatx())
            mask = K.dropout(mask, level=self.RATE, seed=self.SEED)
            # output tensor
            masked = (1.0 - mask) * inp_tensor
            not_masked = mask * self.ACTIVATION_FUNC(inp_tensor)
            oup_tensor = masked + not_masked
            return oup_tensor

        def oup_tensor_in_test():
            # input tensor
            inp_tensor = tf.convert_to_tensor(inputs)
            # mask = expectation value of retain rate
            mask = 1.0 - self.RATE
            # output tensor
            masked = (1.0 - mask) * inp_tensor
            not_masked = mask * self.ACTIVATION_FUNC(inp_tensor)
            oup_tensor = masked + not_masked
            return oup_tensor

        outputs = K.in_train_phase(oup_tensor_in_training, oup_tensor_in_test, training)

        return outputs

    def compute_output_shape(self, input_shape):
        return input_shape

    def get_config(self):
        config = {
            'rate': self.RATE,
            'seed': self.SEED,
            'activation_func': self.ACTIVATION_FUNC,
        }

        base_config = super(DropActivation, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

4. 検証

Drop-Activationの検証をしました。検証内容は以下のとおりです。

目的
- Drop-Activationを適用した場合の精度を検証する。
対象問題
- cifar10で10クラスの画像を分類する。
画像分類モデル
- ResNet20v1 (Kerasのドキュメントで使われている構造, CIFAR-10 ResNet - Keras Documentation)
- 基本のデータ拡張(水平フリップ、画像シフト)
Drop-Activationの適用条件
- 基本の活性化関数: Relu
- ResNet20v1のReluすべてにDrop-Activationを適用
比較条件
- Drop-Activationのdrop rate (=0, 0.05, 0.1)
  - drop rate=0のとき、Drop-Activationの作用なし
- Cutout, Random erasing, Mixupの有無

結果のまとめを以下に示します。各1回しか検証していないので、ばらつきを考慮できていないことに注意してください。

これらの結果から以下のことがわかりました。

Drop-Activationを適用することで精度(val acc)は必ずしも良くなるわけではない。
val lossはいずれも良くなっている。

原論文や実装の参考にしたgithubではいい結果がでているので、もっと大きなモデルだと効果が大きいのかもしれません。
GitHub - JGuillaumin/drop_activation_tf: DropActivation implementation for TensorFlow and Keras
実装も簡単なので、構造の選択肢の一つとして組み込むことは有りだと思いました。

droprate	cutout	random erasing	mixup	train acc	train loss	val acc	val loss
0	-	-	-	0.991	0.151	0.9	0.518
0.05	-	-	-	0.977	0.179	0.902	0.476
0.1	-	-	-	0.971	0.196	0.899	0.497
0	use	-	-	0.963	0.229	0.904	0.453
0.05	use	-	-	0.951	0.257	0.906	0.442
0	-	use	-	0.954	0.253	0.902	0.45
0.05	-	use	-	0.937	0.292	0.906	0.428
0	-	-	use	0.827	1.093	0.909	0.418
0.05	-	-	use	0.817	1.117	0.903	0.39
0	use	-	use	0.812	1.135	0.904	0.422
0.05	use	-	use	0.799	1.159	0.903	0.402
0	-	use	use	0.797	1.155	0.905	0.425
0.05	-	use	use	0.786	1.187	0.901	0.403