| Issue |
EPJ Web Conf.
Volume 344, 2025
AI-Integrated Physics, Technology, and Engineering Conference (AIPTEC 2025)
|
|
|---|---|---|
| Article Number | 01028 | |
| Number of page(s) | 8 | |
| Section | AI-Integrated Physics, Technology, and Engineering | |
| DOI | https://doi.org/10.1051/epjconf/202534401028 | |
| Published online | 22 December 2025 | |
https://doi.org/10.1051/epjconf/202534401028
A Madurese–Indonesian machine translation system using an encoder-decoder model with an attention-based LSTM architecture
Department of Informatics, University of Trunodjoyo Madura, Bangkalan, East Java, Indonesia
* Corresponding author: mulaab@trunojoyo.ac.id
Published online: 22 December 2025
Indonesia’s linguistic diversity includes Madurese, widely used as a first language by the Madurese community. However, limited proficiency in Indonesian—the national language—hinders their communication with speakers from other regions. In this study, a Neural Machine Translation (NMT) system based on the encoder-decoder architecture is implemented, designed to preserve full sentence structure. As the foundational framework, Long Short-Term Memory (LSTM) networks are employed. To address the limitation of the conventional encoder-decoder model—namely, its tendency to lose information due to the decoder’s reliance solely on the encoder’s final hidden state—an Attention mechanism is integrated. This enables the decoder to dynamically attend to all encoder outputs during the decoding process. As a result, this approach improves alignment between source and target words and significantly enhances translation quality. Empirical results show that this Attention-enhanced model works best with longer sentences (3–37 words), achieving a BLEU score of 0.0767—even with a modest dataset of 3,734 sentence pairs. Conversely, when translating very short sentences (≤2 words), performance declines significantly: despite a much larger corpus of 28,247 pairs, the BLEU score drops to 0.0344. This suggests that sentence length critically impacts model effectiveness, and that Attention-based NMT is more suited to moderately complex input. These findings highlight the importance of corpus characteristics in low-resource language translation and offer insights for developing better Indonesian–Madurese language tools.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

