I want to create a Siamese neural network in python with a scheme similar to this:
It would be to pass two different images to two networks which would share weights, and then these would be "concatenated" to pass several more layers to later produce an output.
In principle, I have this code that I have been able to get more or less from the Keras documentation:
def create_base_network(input_shape):
input = Input(shape=input_shape)
x = Flatten()(input)
x = Conv2D(96, (11, 11), activation='relu', padding='same', name='conv1')(x)
x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x)
#LRN1
x = Conv2D(384, (3, 3), activation='relu', padding='same', name='conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool2')(x)
#LRN2
x = Conv2D(384, (3, 3), activation='relu', padding='same', name='conv3')(x)
x = Conv2D(384, (3, 3), activation='relu', padding='same', name='conv4')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='conv5')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='pool3')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
return Model(input, x)
(I don't know how to put the LRN layers in Keras, this would be another question) I also want to know how to interpret the schema nomenclature for the code. That is, what does it mean, for example, that a convolutional layer is (3x3, 256.2)
This would be the base network, this must be called twice, each with an image and share the weights between them.
Next I would like to know how to concatenate two models of this network, to continue adding layers. At the moment I only have this doubt, as more arise I will tell you.
Let's see, I'm not an expert in splicing networks, but I would say that you should define the architecture before compiling the model.
As a first tip don't start with Flatten. The idea of convolutional networks is to start with learned filters that obtain the elemental characteristics of the image before embedding them in a vector (what the Flatten layer does) where we no longer have pixels, but rather characteristics obtained from the filters.
The thing would be to start with Convs, polling, padding, etc. and then add a Flatten and some Dense layers that allow discriminating the final result.
Greetings and I hope it helps