Skip to content
GitHubDiscordThreads

Batch Normalization

Batch Normalization is one of important parts in our NN.

This paper title tells me the reason Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

  • accelerating traning
  • reduce internal covariate shift

Independent and identically distributed (IID)

Section titled “Independent and identically distributed (IID)”

If our data is independent and identically distributed, training model can be simplified and its predictive ability is improved. One important step of data preparation is whitening which is used to

  • reduce features’ coralation => Independent
  • all features have zero mean and unit variances => Identically distributed

What is problem of ICS? Generally data is not IID

  • Previous layer should update hyper-parameters to adjust new data so that reduce learning speed
  • Get stuck in the saturation region as the network grows deeper and network stop learning earlier

What is covariate shift? While in the process

  • weight scale invariance
  • data scale invariance