///
Search

응급 상황 분류

1.
Spectrogram 분류
a.
CNN - LSTM 모델 사용
b.
Pretrained CNN - BI-LSTM 사용
2.
텍스트 분류
시퀀스 형태의 모델 사용
KoBert 사용
3.
멀티 모달
a.
기본 모델
from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout, concatenate from keras.models import Model # Define text input text_input = Input(shape=(100,), name='text_input') # Define image input image_input = Input(shape=(128, 128, 3), name='image_input') # Define text model text_model = Dense(64, activation='relu')(text_input) text_model = Dropout(0.5)(text_model) # Define image model image_model = Conv2D(32, (3,3), activation='relu')(image_input) image_model = MaxPooling2D((2,2))(image_model) image_model = Dropout(0.5)(image_model) image_model = Flatten()(image_model) # Concatenate the text and image models merged_model = concatenate([text_model, image_model]) # Add a fully connected layer merged_model = Dense(64, activation='relu')(merged_model) merged_model = Dropout(0.5)(merged_model) # Add the output layer output = Dense(1, activation='sigmoid')(merged_model) # Define the model with both inputs and output model = Model(inputs=[text_input, image_input], outputs=output) # Compile the model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Python
복사
b.
텍스트 모델 - Bert, Image 모델은 위에서 정한 모델
from keras.models import Model from keras.layers import Input, Dense, LSTM, Dropout, Reshape, Conv2D, MaxPooling2D, Flatten, concatenate from transformers import TFBertModel # Define input shapes text_input_shape = (None,) image_input_shape = (128, 128, 3) # Define the text model using Kobert text_input = Input(shape=text_input_shape, dtype='int32') bert_model = TFBertModel.from_pretrained('monologg/kobert') text_output = bert_model(text_input)[1] # Define the image model using convolutional layers and LSTM image_input = Input(shape=image_input_shape) conv1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(image_input) pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) conv2 = Conv2D(64, kernel_size=(3, 3), activation='relu')(pool1) pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) conv3 = Conv2D(128, kernel_size=(3, 3), activation='relu')(pool2) reshape = Reshape((-1, 128))(conv3) lstm = LSTM(64, return_sequences=True)(reshape) dropout = Dropout(0.5)(lstm) flatten = Flatten()(dropout) # Combine the text and image models combined = concatenate([text_output, flatten]) # Add a dense layer with 128 units and ReLU activation dense1 = Dense(128, activation='relu')(combined) # Add a dropout layer to prevent overfitting dropout2 = Dropout(0.5)(dense1) # Add a dense layer with 7 units and softmax activation for classification output = Dense(7, activation='softmax')(dropout2) # Define the model model = Model(inputs=[text_input, image_input], outputs=output) # Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Python
복사
4.
멀티 모달도 좋겠지만 멀티 모달보다는 2개의 모델를 통해 필터링 되는 것이 더 좋다고 판단
이유 : 음성으로는 정상, 도움 요청 이렇게 밖에 판단이 되지 않는다
먼저 이미지를 통해 위급 여부를 판단
그 후 정확한 상황 분류는 텍스트를 통해 판단한다
돌려보니 결과는 그렇지 않았다.