MantisBT - v1.2 Release (Closed)
View Issue Details
0000269v1.2 Release (Closed)[All Projects] Generalpublic2009-09-19 07:142009-12-08 19:10
Reportereureka 
Assigned Topedroa 
PrioritynormalSeveritymajorReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version1.2 
Summary0000269: potential corruption of UTF-8 character strings
DescriptionWhen mono-bytes functions for character strings manipulation are applied to character strings encoded in utf-8, they can sometimes corrupt these strings, see here :
http://forums.web2project.net/viewtopic.php?p=2799#2799
TagsNo tags attached.
Attached Files

Notes
(0000511)
pedroa   
2009-09-19 11:14   
Hi eureka and thank you very much for all your efforts regarding this whole UTF-8 subject and help on adapting web2Project to it.

I am surrendered to it now :)

Unicode is now mandatory, as Database structure, on all existing locale inc files, and to any new locale to be added to web2Project.

Considering this, and if I may, in a positive critic spirit, I find the w2PUTF8strlen and the w2PUTF8substr functions totally unnecessary.

Here is my idea:
1) On root base.php we should add:
[code]mb_internal_encoding('UTF-8');[/code]
2) We should delete those w2PUTF8strlen, w2PUTF8substr functions from includes/main_functions.php, because now we can use mb_strlen and mb_substr (and maybe other mb_ functions) instead.
3) Replace w2PUTF8strlen, w2PUTF8substr, strlen and substr, for mb_strlen and mb_substr, throughout the application.
On the places you have identified as well, and others that may be identified to be problematic.

Thoughts?

Thanks again,

Pedro A.
(0000512)
eureka   
2009-09-19 13:00   
You can also use phputf8 (http://sourceforge.net/projects/phputf8/) because all platforms do not support mb_string. if mb_string is enabled, this library is used preferably
(0000513)
caseydk   
2009-09-19 20:15   
I think we can guard against mb_string not existing by adding a function_exists('mb_strlen') into the w2PUTF8strlen. If it's available, use it... otherwise, fall back to the existing methods.

Thoughts?
(0000514)
pedroa   
2009-09-20 08:46   
eureka:
I understand you point. But phputf8 is way too much, I prefer your functions :)

caseydk:
Yes, was thinking about that too.

So instead of:
function w2PUTF8strlen($str) {

We'd have:
if (!function_exists('mb_strlen')) {
     function mb_strlen() {
.....and the rest of "our" function w2PUTF8strlen.....
     }
}

Same for mb_substr.
This way mb_string is used preferably (also because being native makes it faster), or fall back to ours if mb_string is not available.
Either way we need to rename strlen, substr, w2PUTF8strlen and w2PUTF8substr to mb_strlen and mb_substr whenever UTF-8 may affect the application, like in the places eureka has being reporting.

If we knew everybody has mb_string we could simply tell people to edit php.ini to set:
mbstring.func_overload = 6
And it would be naturally fixed, but since we can't expect that, we'll have to sort an internal solution. And the best one is to inject mb_strlen and mb_substr, because it will be a longer lasting solution and better performing if the mb_string is effectively available.

Deal?

Cheers,

Pedro A.
(0000515)
caseydk   
2009-09-20 09:00   
Good call. I like that implementation. So even if we don't *really* have the underlying function available we can act like it and make sure everyone behaves accordingly.
(0000518)
eureka   
2009-09-21 12:26   
(Last edited: 2009-09-21 12:27)
Very good pedroa! But you need to replace all occurences of string functions by their mbstring equivalent, overload them and everything will work perfectly :)

(0000522)
pedroa   
2009-09-22 08:48   
This set of potential issues, and this restructure is now on SVN revision 667.

Added alternative mb_strlen, mb_substr, mb_strpos, mb_str_replace and mb_trim functions... if they are not already available through PHP mb_string
(recycled old w2PUTF8strlen and w2PUFT8substr)
If you have an non mb_string environment and find youself with issues with the alternative functions please provide us with new bug reports.
If you find other situations, or mb_ functions that might need web2Project UTF8 coverage please report that too.

Also took the time to fix an issue with duplicated task logs on Project view whenever there was more than one department associated with the project.

Thank you eureka for all the support.

Pedro A.

Issue History
2009-09-19 07:14eurekaNew Issue
2009-09-19 11:14pedroaNote Added: 0000511
2009-09-19 11:14pedroaAssigned To => pedroa
2009-09-19 11:14pedroaStatusnew => feedback
2009-09-19 13:00eurekaNote Added: 0000512
2009-09-19 20:15caseydkNote Added: 0000513
2009-09-20 08:46pedroaNote Added: 0000514
2009-09-20 09:00caseydkNote Added: 0000515
2009-09-21 12:26eurekaNote Added: 0000518
2009-09-21 12:27eurekaNote Edited: 0000518
2009-09-22 08:48pedroaStatusfeedback => resolved
2009-09-22 08:48pedroaResolutionopen => fixed
2009-09-22 08:48pedroaNote Added: 0000522
2009-12-08 19:10caseydkStatusresolved => closed
2009-12-08 19:10caseydkFixed in Version => 1.2