Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions ext/iconv/tests/bug48147.phpt
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,53 @@
Bug #48147 (iconv with //IGNORE cuts the string)
--EXTENSIONS--
iconv
--SKIPIF--
<?php
/*
* POSIX 2024 standardizes "//IGNORE" and says that it should ignore
* both invalid and untranslatable sequences. However... being in the
* standard does not make it standard. Musl's iconv() does not support
* "//IGNORE" at all, and will fail if you try. The documentations for
* the BSD and IBM implementations make no mention of it either.
*
* Since the expected output appears to agree with POSIX, we use a
* blacklist of implementations that do not (yet?) conform.
*/
if (ICONV_IMPL == "unknown") {
/* musl */
die("skip iconv implementation does not support //IGNORE");
}
?>
--FILE--
<?php
/*
* POSIX says that when //IGNORE is specified, invalid bytes followed
* by valid bytes "shall not be treated as an error." GNU iconv does
* not follow this convention, but PHP does the right thing. In the
* examples below, invalid bytes in the middle of the string get
* dropped, and a string is returned. The two examples where the
* problem is at the end technically do not qualify for the "shall
* not" exception because there are no VALID bytes after the error. So
* PHP is morally correct in those cases to return an error (false).
*/
$text = "aa\xC3\xC3\xC3\xB8aa";
var_dump(iconv("UTF-8", "UTF-8", $text));
var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", $text)));
// only invalid
var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "\xC3")));
var_dump(iconv("UTF-8", "UTF-8//IGNORE", "\xC3"));
// start invalid
var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "\xC3\xC3\xC3\xB8aa")));
// finish invalid
var_dump(urlencode(iconv("UTF-8", "UTF-8//IGNORE", "aa\xC3\xC3\xC3")));
var_dump(iconv("UTF-8", "UTF-8//IGNORE", "aa\xC3\xC3\xC3"));
?>
--EXPECTF--
Notice: iconv(): Detected an illegal character in input string in %s on line %d
bool(false)
string(10) "aa%C3%B8aa"

Notice: iconv(): Detected an incomplete multibyte character in input string in %s on line %d
string(0) ""
bool(false)
string(8) "%C3%B8aa"

Notice: iconv(): Detected an incomplete multibyte character in input string in %s on line %d
string(0) ""
bool(false)