A Madurese–Indonesian machine translation system using an encoder-decoder model with an attention-based LSTM architecture

Mulaab Mulaab; Firdaus Solihin; Bihubbil Choir Aidifta

doi:10.1051/epjconf/202534401028

Open Access

Issue		EPJ Web Conf. Volume 344, 2025 AI-Integrated Physics, Technology, and Engineering Conference (AIPTEC 2025)


Article Number		01028
Number of page(s)		8
Section		AI-Integrated Physics, Technology, and Engineering
DOI		https://doi.org/10.1051/epjconf/202534401028
Published online		22 December 2025

EPJ Web of Conferences 344, 01028 (2025)
https://doi.org/10.1051/epjconf/202534401028

A Madurese–Indonesian machine translation system using an encoder-decoder model with an attention-based LSTM architecture

Mulaab^*, Firdaus Solihin and Bihubbil Choir Aidifta

Department of Informatics, University of Trunodjoyo Madura, Bangkalan, East Java, Indonesia

^* Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Published online: 22 December 2025

Abstract

Indonesia’s linguistic diversity includes Madurese, widely used as a first language by the Madurese community. However, limited proficiency in Indonesian—the national language—hinders their communication with speakers from other regions. In this study, a Neural Machine Translation (NMT) system based on the encoder-decoder architecture is implemented, designed to preserve full sentence structure. As the foundational framework, Long Short-Term Memory (LSTM) networks are employed. To address the limitation of the conventional encoder-decoder model—namely, its tendency to lose information due to the decoder’s reliance solely on the encoder’s final hidden state—an Attention mechanism is integrated. This enables the decoder to dynamically attend to all encoder outputs during the decoding process. As a result, this approach improves alignment between source and target words and significantly enhances translation quality. Empirical results show that this Attention-enhanced model works best with longer sentences (3–37 words), achieving a BLEU score of 0.0767—even with a modest dataset of 3,734 sentence pairs. Conversely, when translating very short sentences (≤2 words), performance declines significantly: despite a much larger corpus of 28,247 pairs, the BLEU score drops to 0.0344. This suggests that sentence length critically impacts model effectiveness, and that Attention-based NMT is more suited to moderately complex input. These findings highlight the importance of corpus characteristics in low-resource language translation and offer insights for developing better Indonesian–Madurese language tools.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.