Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition

Chowdhury, Prothoma Khan; Oyshe, Kabiratun Ummi; Rahaman, Muhammad Aminur; Debnath, Tanoy; Rahman, Anichur; KUMAR, NEERAJ

doi:10.1007/s00521-024-10258-3

Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition.

AUT.CHOWDHURY PROTHOMA KHAN, OYSHE KABIRATUN UMMI, RAHAMAN MUHAMMAD AMINUR, DEBNATH TANOY, RAHMAN ANICHUR, KUMAR NEERAJ.

Opis bibliograficzny

Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition. [AUT.] CHOWDHURY PROTHOMA KHAN, OYSHE KABIRATUN UMMI, RAHAMAN MUHAMMAD AMINUR, DEBNATH TANOY, RAHMAN ANICHUR, KUMAR NEERAJ. Neural Computing and Applications. DOI: 10.1007/s00521-024-10258-3

Skopiowane!

Kliknij opis aby skopiować do schowka

Szczegóły publikacji

Źródło:

Neural Computing and Applications

Rok:2024

Język:angielski

Charakter formalny:Artykuł w czasopismie

Typ MNiSW/MEiN:inne

Streszczenia

Isolated dynamic sign language recognition (IDSLR) has the potential to change accessibility and inclusion by enabling speech and/or hearing-impaired people to engage more completely in a variety of spheres of life, including social interactions, work, and more. IDSLR is a challenging task due to considering a sequence of image frame analysis with multiple linguistic features for a single gesture in cluttered backgrounds and an illumination variation environment. We have proposed a Hybrid Efficient Convolution (HEC) model that ensembles EfficientNet-B3 and a few modified layers as an alternative to traditional machine learning techniques with improved performances in cluttered backgrounds with illumination variation environments. The architecture of the HCE integrates pre-trained layers of EfficientNet-B3 loaded with customized weights and a new custom dense layer featuring 256 units, followed by batch normalization, dropout, and the final output layer. To enhance the robustness of the system, we employed the augmentation technique during pre-processing. Then, the system executes channel-wise feature transformation through point-wise convolution that reduces the computational complexity and increases the accuracy. The updated dense layer with 256 units processes the output from the standard EfficientNet-B3, shaping the model into a hybrid form to achieve better performance. We have created our own gesture dataset, called “BdSL_OPA_23_GESTURES,” which consists of 6000 video clips of 100 isolated dynamic Bangla Sign Language words, with 60 videos for each word from 20 different people in the cluttered background with illumination variation environments to train and evaluate the performances of the proposed model. We have considered 80% of the total dataset for training purpose, while the remaining 20% is dedicated to testing and validation. In a small number of epochs, our proposed HEC model achieves a superior accuracy of 93.17% on our created “BdSL_OPA_23_GESTURES” dataset. All the information of the proposed model with the dataset has been shared along with the scientific community to provide access publicly at: https://github.com/Prothoma2001/Bangla-Continuous-Sign-Language-Recognition.git.

Linki zewnętrzne

PBN

677f9cf8fdbe831a91d0db25

DOI

10.1007/s00521-024-10258-3

Strona WWW

https://link.springer.com/content/pdf/1…

Identyfikatory

ISSN: 0941-0643

e-ISSN: 1433-3058

BPP ID: (6, 7708) wydawnictwo ciągłe #7708

Metryki

100,00

Punkty MNiSW/MEiN

0

Impact Factor

0

Index Copernicus

0

Punktacja wewnętrzna

Eksport cytowania

Wsparcie dla menedżerów bibliografii:
Ta strona wspiera automatyczny import do Zotero, Mendeley i EndNote. Użytkownicy z zainstalowanym rozszerzeniem przeglądarki mogą zapisać tę publikację jednym kliknięciem - ikona pojawi się automatycznie w pasku narzędzi przeglądarki.