응급 상황 분류

Spectrogram 분류

CNN - LSTM 모델 사용

Pretrained CNN - BI-LSTM 사용

텍스트 분류

•

시퀀스 형태의 모델 사용

•

KoBert 사용

멀티 모달

기본 모델

from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout, concatenate
from keras.models import Model

# Define text input
text_input = Input(shape=(100,), name='text_input')

# Define image input
image_input = Input(shape=(128, 128, 3), name='image_input')

# Define text model
text_model = Dense(64, activation='relu')(text_input)
text_model = Dropout(0.5)(text_model)

# Define image model
image_model = Conv2D(32, (3,3), activation='relu')(image_input)
image_model = MaxPooling2D((2,2))(image_model)
image_model = Dropout(0.5)(image_model)
image_model = Flatten()(image_model)

# Concatenate the text and image models
merged_model = concatenate([text_model, image_model])

# Add a fully connected layer
merged_model = Dense(64, activation='relu')(merged_model)
merged_model = Dropout(0.5)(merged_model)

# Add the output layer
output = Dense(1, activation='sigmoid')(merged_model)

# Define the model with both inputs and output
model = Model(inputs=[text_input, image_input], outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Python
복사

텍스트 모델 - Bert, Image 모델은 위에서 정한 모델

from keras.models import Model
from keras.layers import Input, Dense, LSTM, Dropout, Reshape, Conv2D, MaxPooling2D, Flatten, concatenate
from transformers import TFBertModel

# Define input shapes
text_input_shape = (None,)
image_input_shape = (128, 128, 3)

# Define the text model using Kobert
text_input = Input(shape=text_input_shape, dtype='int32')
bert_model = TFBertModel.from_pretrained('monologg/kobert')
text_output = bert_model(text_input)[1]

# Define the image model using convolutional layers and LSTM
image_input = Input(shape=image_input_shape)
conv1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(image_input)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, kernel_size=(3, 3), activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(128, kernel_size=(3, 3), activation='relu')(pool2)
reshape = Reshape((-1, 128))(conv3)
lstm = LSTM(64, return_sequences=True)(reshape)
dropout = Dropout(0.5)(lstm)
flatten = Flatten()(dropout)

# Combine the text and image models
combined = concatenate([text_output, flatten])

# Add a dense layer with 128 units and ReLU activation
dense1 = Dense(128, activation='relu')(combined)

# Add a dropout layer to prevent overfitting
dropout2 = Dropout(0.5)(dense1)

# Add a dense layer with 7 units and softmax activation for classification
output = Dense(7, activation='softmax')(dropout2)

# Define the model
model = Model(inputs=[text_input, image_input], outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Python
복사

멀티 모달도 좋겠지만 멀티 모달보다는 2개의 모델를 통해 필터링 되는 것이 더 좋다고 판단

•

이유 : 음성으로는 정상, 도움 요청 이렇게 밖에 판단이 되지 않는다 

•

먼저 이미지를 통해 위급 여부를 판단

•

그 후 정확한 상황 분류는 텍스트를 통해 판단한다 

•

돌려보니 결과는 그렇지 않았다.