1.
Spectrogram 분류
a.
CNN - LSTM 모델 사용
b.
Pretrained CNN - BI-LSTM 사용
2.
텍스트 분류
•
시퀀스 형태의 모델 사용
•
KoBert 사용
3.
멀티 모달
a.
기본 모델
from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout, concatenate
from keras.models import Model
# Define text input
text_input = Input(shape=(100,), name='text_input')
# Define image input
image_input = Input(shape=(128, 128, 3), name='image_input')
# Define text model
text_model = Dense(64, activation='relu')(text_input)
text_model = Dropout(0.5)(text_model)
# Define image model
image_model = Conv2D(32, (3,3), activation='relu')(image_input)
image_model = MaxPooling2D((2,2))(image_model)
image_model = Dropout(0.5)(image_model)
image_model = Flatten()(image_model)
# Concatenate the text and image models
merged_model = concatenate([text_model, image_model])
# Add a fully connected layer
merged_model = Dense(64, activation='relu')(merged_model)
merged_model = Dropout(0.5)(merged_model)
# Add the output layer
output = Dense(1, activation='sigmoid')(merged_model)
# Define the model with both inputs and output
model = Model(inputs=[text_input, image_input], outputs=output)
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Python
복사
b.
텍스트 모델 - Bert, Image 모델은 위에서 정한 모델
from keras.models import Model
from keras.layers import Input, Dense, LSTM, Dropout, Reshape, Conv2D, MaxPooling2D, Flatten, concatenate
from transformers import TFBertModel
# Define input shapes
text_input_shape = (None,)
image_input_shape = (128, 128, 3)
# Define the text model using Kobert
text_input = Input(shape=text_input_shape, dtype='int32')
bert_model = TFBertModel.from_pretrained('monologg/kobert')
text_output = bert_model(text_input)[1]
# Define the image model using convolutional layers and LSTM
image_input = Input(shape=image_input_shape)
conv1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(image_input)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, kernel_size=(3, 3), activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(128, kernel_size=(3, 3), activation='relu')(pool2)
reshape = Reshape((-1, 128))(conv3)
lstm = LSTM(64, return_sequences=True)(reshape)
dropout = Dropout(0.5)(lstm)
flatten = Flatten()(dropout)
# Combine the text and image models
combined = concatenate([text_output, flatten])
# Add a dense layer with 128 units and ReLU activation
dense1 = Dense(128, activation='relu')(combined)
# Add a dropout layer to prevent overfitting
dropout2 = Dropout(0.5)(dense1)
# Add a dense layer with 7 units and softmax activation for classification
output = Dense(7, activation='softmax')(dropout2)
# Define the model
model = Model(inputs=[text_input, image_input], outputs=output)
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Python
복사
4.
멀티 모달도 좋겠지만 멀티 모달보다는 2개의 모델를 통해 필터링 되는 것이 더 좋다고 판단
•
이유 : 음성으로는 정상, 도움 요청 이렇게 밖에 판단이 되지 않는다
•
먼저 이미지를 통해 위급 여부를 판단
•
그 후 정확한 상황 분류는 텍스트를 통해 판단한다
•
돌려보니 결과는 그렇지 않았다.