
응급 상황 분류

Spectrogram 분류
CNN - LSTM 모델 사용
Pretrained CNN - BI-LSTM 사용
텍스트 분류
시퀀스 형태의 모델 사용
KoBert 사용
멀티 모달
기본 모델
from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout, concatenate from keras.models import Model # Define text input text_input = Input(shape=(100,), name='text_input') # Define image input image_input = Input(shape=(128, 128, 3), name='image_input') # Define text model text_model = Dense(64, activation='relu')(text_input) text_model = Dropout(0.5)(text_model) # Define image model image_model = Conv2D(32, (3,3), activation='relu')(image_input) image_model = MaxPooling2D((2,2))(image_model) image_model = Dropout(0.5)(image_model) image_model = Flatten()(image_model) # Concatenate the text and image models merged_model = concatenate([text_model, image_model]) # Add a fully connected layer merged_model = Dense(64, activation='relu')(merged_model) merged_model = Dropout(0.5)(merged_model) # Add the output layer output = Dense(1, activation='sigmoid')(merged_model) # Define the model with both inputs and output model = Model(inputs=[text_input, image_input], outputs=output) # Compile the model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
텍스트 모델 - Bert, Image 모델은 위에서 정한 모델
from keras.models import Model from keras.layers import Input, Dense, LSTM, Dropout, Reshape, Conv2D, MaxPooling2D, Flatten, concatenate from transformers import TFBertModel # Define input shapes text_input_shape = (None,) image_input_shape = (128, 128, 3) # Define the text model using Kobert text_input = Input(shape=text_input_shape, dtype='int32') bert_model = TFBertModel.from_pretrained('monologg/kobert') text_output = bert_model(text_input)[1] # Define the image model using convolutional layers and LSTM image_input = Input(shape=image_input_shape) conv1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(image_input) pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) conv2 = Conv2D(64, kernel_size=(3, 3), activation='relu')(pool1) pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) conv3 = Conv2D(128, kernel_size=(3, 3), activation='relu')(pool2) reshape = Reshape((-1, 128))(conv3) lstm = LSTM(64, return_sequences=True)(reshape) dropout = Dropout(0.5)(lstm) flatten = Flatten()(dropout) # Combine the text and image models combined = concatenate([text_output, flatten]) # Add a dense layer with 128 units and ReLU activation dense1 = Dense(128, activation='relu')(combined) # Add a dropout layer to prevent overfitting dropout2 = Dropout(0.5)(dense1) # Add a dense layer with 7 units and softmax activation for classification output = Dense(7, activation='softmax')(dropout2) # Define the model model = Model(inputs=[text_input, image_input], outputs=output) # Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
멀티 모달도 좋겠지만 멀티 모달보다는 2개의 모델를 통해 필터링 되는 것이 더 좋다고 판단
이유 : 음성으로는 정상, 도움 요청 이렇게 밖에 판단이 되지 않는다
먼저 이미지를 통해 위급 여부를 판단
그 후 정확한 상황 분류는 텍스트를 통해 판단한다
돌려보니 결과는 그렇지 않았다.