Leaky relu in python

Содержание

LeakyReLU layer
Функция активации ReLu в Python
Реализация функции ReLu
Полный код
Производная ReLu
Функция Leaky ReLu
Градиент Leaky ReLu
Реализация Leaky ReLu
Полный код
Заключение
LeakyReLU¶
Docs
Tutorials
Resources
Activation Functions
Linear Activation
ReLU
Leaky-ReLU
Softplus
Sigmoid
Hyperbolic Tangent
Arctan

LeakyReLU layer

It allows a small gradient when the unit is not active:

>>> layer = tf.keras.layers.LeakyReLU() >>> output = layer([-3.0, -1.0, 0.0, 2.0]) >>> list(output.numpy()) [-0.9, -0.3, 0.0, 2.0] >>> layer = tf.keras.layers.LeakyReLU(alpha=0.1) >>> output = layer([-3.0, -1.0, 0.0, 2.0]) >>> list(output.numpy()) [-0.3, -0.1, 0.0, 2.0]

Input shape

Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the batch axis) when using this layer as the first layer in a model.

Output shape

Источник

Функция активации ReLu в Python

Функция активации Relu в Python или Rectified Linear Activation является наиболее распространенным выбором функции активации в мире глубокого обучения. Relu обеспечивает самые современные результаты и в то же время очень эффективен с точки зрения вычислений.

Основная концепция функции активации Relu заключается в следующем:

Return 0 if the input is negative otherwise return the input as it is.

Мы можем представить это математически следующим образом:

Псевдокод для Relu следующий:

if input > 0: return input else: return 0

В этом руководстве мы узнаем, как реализовать нашу собственную функцию ReLu, узнаем о некоторых ее недостатках и узнаем о лучшей версии.

Реализация функции ReLu

Напишем собственную реализацию Relu на Python. Мы будем использовать встроенную функцию max для ее реализации.

Чтобы протестировать функцию, запустим ее на нескольких входах.

x = 1.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = -10.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = 0.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = 15.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = -20.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))

Полный код

def relu(x): return max(0.0, x) x = 1.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = -10.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = 0.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = 15.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x))) x = -20.0 print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))

Applying Relu on (1.0) gives 1.0 Applying Relu on (-10.0) gives 0.0 Applying Relu on (0.0) gives 0.0 Applying Relu on (15.0) gives 15.0 Applying Relu on (-20.0) gives 0.0

Производная ReLu

Посмотрим, каким будет градиент (производная) функции ReLu. При дифференцировании получим следующую функцию:

Мы видим, что для значений x меньше нуля градиент равен 0. Это означает, что веса и смещения для некоторых нейронов не обновляются. Это может быть проблемой в тренировочном процессе.

Чтобы решить эту проблему, у нас есть функция Leaky ReLu.

Функция Leaky ReLu

Функция Leaky ReLu – это импровизация обычной функции ReLu. Чтобы решить проблему нулевого градиента для отрицательного значения, Leaky ReLu дает чрезвычайно малую линейную составляющую x отрицательным входам.

Математически мы можем выразить Leaky ReLu как:

Здесь a – небольшая константа, подобная 0,01, которую мы взяли выше.

Графически это можно представить как:

Градиент Leaky ReLu

Рассчитаем градиент для функции Leaky ReLu. Градиент может быть таким:

В этом случае градиент для отрицательных входов не равен нулю. Это значит, что обновится весь нейрон.

Реализация Leaky ReLu

Реализация Leaky ReLu приведена ниже:

def relu(x): if x>0 : return x else : return 0.01*x

Давайте попробуем это на месте.

x = 1.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = -10.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = 0.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = 15.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = -20.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))

Полный код

Полный код Leaky ReLu приведен ниже:

def leaky_relu(x): if x>0 : return x else : return 0.01*x x = 1.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = -10.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = 0.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = 15.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x))) x = -20.0 print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))

Applying Leaky Relu on (1.0) gives 1.0 Applying Leaky Relu on (-10.0) gives -0.1 Applying Leaky Relu on (0.0) gives 0.0 Applying Leaky Relu on (15.0) gives 15.0 Applying Leaky Relu on (-20.0) gives -0.2

Заключение

Это руководство было посвящено функции ReLu в Python. Мы также увидели улучшенную версию функции Leaky ReLu, которая решает проблему нулевых градиентов для отрицательных значений.

Источник

LeakyReLU¶

Parameters :

negative_slope (float) – Controls the angle of the negative slope (which is used for negative input values). Default: 1e-2
inplace (bool) – can optionally do the operation in-place. Default: False

Shape:

Input: ( ∗ ) (*) ( ∗ ) where * means, any number of additional dimensions
Output: ( ∗ ) (*) ( ∗ ) , same shape as the input

>>> m = nn.LeakyReLU(0.1) >>> input = torch.randn(2) >>> output = m(input)

Docs

Access comprehensive developer documentation for PyTorch

Tutorials

Get in-depth tutorials for beginners and advanced developers

Resources

Find development resources and get your questions answered

© Copyright The Linux Foundation. The PyTorch Foundation is a project of The Linux Foundation. For web site terms of use, trademark policy and other policies applicable to The PyTorch Foundation please see www.linuxfoundation.org/policies/. The PyTorch Foundation supports the PyTorch open source project, which has been established as PyTorch Project a Series of LF Projects, LLC. For policies applicable to the PyTorch Project a Series of LF Projects, LLC, please see www.lfprojects.org/policies/.

To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: Cookies Policy.

Источник

Activation Functions

When building your Deep Learning model, activation functions are an important choice to make. In this article, we’ll review the main activation functions, their implementations in Python, and advantages/disadvantages of each.

Linear Activation

Linear activation is the simplest form of activation. In that case, \(f(x)\) is just the identity. If you use a linear activation function the wrong way, your whole Neural Network ends up being a regression :

\[\hat = \sigma(h) = h = W^ h_1 = W^ W^ X = W’ X\]

Linear activations are only needed when you’re considering a regression problem, as a last layer. The whole idea behind the other activation functions is to create non-linearity, to be able to model highly non-linear data that cannot be solved by a simple regression !

ReLU

ReLU stands for Rectified Linear Unit. It is a widely used activation function. The formula is simply the maximum between \(x\) and 0 :

To implement this in Python, you might simply use :

The derivative of the ReLU is :

def der_relu(x): if x  0 : return 0 if x > 0 : return 1

Let’s simulate some data and plot them to illustrate this activation function :

import numpy as np import matplotlib.pyplot as plt # Data which will go through activations x = np.linspace(-10,10,100) plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: relu(x),x)), label="relu") plt.plot(x, list(map(lambda x: der_relu(x),x)), label="derivative") plt.title("ReLU") plt.legend() plt.show()

Advantage	Easy to implement and quick to compute
Disadvantage	Problematic when we have lots of negative values, since the outcome is always 0 and leads to the death of the neuron

Leaky-ReLU

Leaky-ReLU is an improvement of the main default of the ReLU, in the sense that it can handle the negative values pretty well, but still brings non-linearity.

The derivative is also simple to compute :

def leaky_relu(x): return max(0.01*x,x) def der_leaky_relu(x): if x  0 : return 0.01 if x >= 0 : return 1

And we can plot the result of this activation function :

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: leaky_relu(x),x)), label="leaky-relu") plt.plot(x, list(map(lambda x: der_leaky_relu(x),x)), label="derivative") plt.title("Leaky-ReLU") plt.legend() plt.show()

Advantage	Leaky-ReLU overcomes the problem of the death of the neuron linked to a zero-slope.
Disadvantage	The factor 0.01 is arbitraty, and can be tuned (PReLU for parametric ReLU)

Exponential Linear Units (ELU) try to make the mean activations closer to zero, which speeds up learning.

\(f(x) = x\) if \(x>0\), and \(a(e^x -1)\) otherwise, where \(a\) is a positive constant.

def elu(x): if x > 0 : return x else : return (np.exp(x)-1) def der_elu(x): if x > 0 : return 1 else : return np.exp(x)

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: elu(x),x)), label="elu") plt.plot(x, list(map(lambda x: der_elu(x),x)), label="derivative") plt.title("ELU") plt.legend() plt.show()

Advantage	Can achieve higher accuracy than ReLU
Disadvantage	Same as ReLU, and a needs to be tuned

Softplus

The Softplus function is a continuous approximation of ReLU. It is given by :

The derivative of the softplus function is :

You can implement them in Python :

def softplus(x): return np.log(1+np.exp(x)) def der_softplus(x): return 1/(1+np.exp(x))*np.exp(x)

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: softplus(x),x)), label="softplus") plt.plot(x, list(map(lambda x: der_softplus(x),x)), label="derivative") plt.title("Softplus") plt.legend() plt.show()

Advantage	Softplus is continuous and might have good properties in terms of derivability. It is interesting to use it when the values are between 0 and 1.
Disadvantage	As ReLU, problematic when we have lots of negative values, since the outcome gets really close to 0 and might lead to the death of the neuron

Sigmoid

Sigmoid is one of the most common activation functions in litterature these days. The sigmoid function has the following form :

The derivative of the sigmoid is :

def softmax(x): return 1/(1+np.exp(-x)) def der_softmax(x): return 1/(1+np.exp(-x)) * (1-1/(1+np.exp(-x)))

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: softmax(x),x)), label="softmax") plt.plot(x, list(map(lambda x: der_softmax(x),x)), label="derivative") plt.title("Softmax") plt.legend() plt.show()

It might not be obvious when considering data between \(-10\) and \(10\) only, but the sigmoid is subject to the vanishing gradient problem. This means that the gradient will tend to vanish as \(x\) takes large values.

Since the gradient is the sigmoid times 1 minus the sigmoid, the gradient can be efficiently computed. If we keep track of the sigmoids, we can compute the gradient really quickly.

Advantage	Interpretability of the output mapped between 0 and 1, compute gradient quickly
Disadvantage	Vanishing Gradient

Hyperbolic Tangent

Hyperbolic tangent is quite similar to the sigmoid function, expect is maps the input between -1 and 1.

The derivative is computed in the following way :

def hyperb(x): return (np.exp(x)-np.exp(-x)) / (np.exp(x) + np.exp(-x)) def der_hyperb(x): return 1 - ((np.exp(x)-np.exp(-x)) / (np.exp(x) + np.exp(-x)))**2

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: hyperb(x),x)), label="hyperbolic") plt.plot(x, list(map(lambda x: der_hyperb(x),x)), label="derivative") plt.title("Hyperbolic") plt.legend() plt.show()

Advantage	Efficient since it has mean 0 in the middle layers between -1 and 1
Disadvantage	Vanishing gradient too

Arctan

This activation function maps the input values in the range \((− \pi / 2, \pi / 2)\). It is the inverse of a hyperbolic tangent function.

def arctan(x): return np.arctan(x) def der_arctan(x): return 1 / (1+x**2)

plt.figure(figsize=(12,8)) plt.plot(x, list(map(lambda x: arctan(x),x)), label="arctan") plt.plot(x, list(map(lambda x: der_arctan(x),x)), label="derivative") plt.title("Arctan") plt.legend() plt.show()

Advantage	Simple derivative
Disadvantage	Vanishing gradient

Источник