Anonymous Login
2018-10-16 17:44 PDT

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0000269v1.2 Release (Closed)[All Projects] Generalpublic2009-12-08 19:10
Reportereureka 
Assigned Topedroa 
PrioritynormalSeveritymajorReproducibilityalways
StatusclosedResolutionfixed 
Product Version 
Target VersionFixed in Version1.2 
Summary0000269: potential corruption of UTF-8 character strings
DescriptionWhen mono-bytes functions for character strings manipulation are applied to character strings encoded in utf-8, they can sometimes corrupt these strings, see here :
http://forums.web2project.net/viewtopic.php?p=2799#2799
TagsNo tags attached.
Attached Files

-Relationships
+Relationships

-Notes

~0000511

pedroa (administrator)

Hi eureka and thank you very much for all your efforts regarding this whole UTF-8 subject and help on adapting web2Project to it.

I am surrendered to it now :)

Unicode is now mandatory, as Database structure, on all existing locale inc files, and to any new locale to be added to web2Project.

Considering this, and if I may, in a positive critic spirit, I find the w2PUTF8strlen and the w2PUTF8substr functions totally unnecessary.

Here is my idea:
1) On root base.php we should add:
[code]mb_internal_encoding('UTF-8');[/code]
2) We should delete those w2PUTF8strlen, w2PUTF8substr functions from includes/main_functions.php, because now we can use mb_strlen and mb_substr (and maybe other mb_ functions) instead.
3) Replace w2PUTF8strlen, w2PUTF8substr, strlen and substr, for mb_strlen and mb_substr, throughout the application.
On the places you have identified as well, and others that may be identified to be problematic.

Thoughts?

Thanks again,

Pedro A.

~0000512

eureka (reporter)

You can also use phputf8 (http://sourceforge.net/projects/phputf8/) because all platforms do not support mb_string. if mb_string is enabled, this library is used preferably

~0000513

caseydk (administrator)

I think we can guard against mb_string not existing by adding a function_exists('mb_strlen') into the w2PUTF8strlen. If it's available, use it... otherwise, fall back to the existing methods.

Thoughts?

~0000514

pedroa (administrator)

eureka:
I understand you point. But phputf8 is way too much, I prefer your functions :)

caseydk:
Yes, was thinking about that too.

So instead of:
function w2PUTF8strlen($str) {

We'd have:
if (!function_exists('mb_strlen')) {
     function mb_strlen() {
.....and the rest of "our" function w2PUTF8strlen.....
     }
}

Same for mb_substr.
This way mb_string is used preferably (also because being native makes it faster), or fall back to ours if mb_string is not available.
Either way we need to rename strlen, substr, w2PUTF8strlen and w2PUTF8substr to mb_strlen and mb_substr whenever UTF-8 may affect the application, like in the places eureka has being reporting.

If we knew everybody has mb_string we could simply tell people to edit php.ini to set:
mbstring.func_overload = 6
And it would be naturally fixed, but since we can't expect that, we'll have to sort an internal solution. And the best one is to inject mb_strlen and mb_substr, because it will be a longer lasting solution and better performing if the mb_string is effectively available.

Deal?

Cheers,

Pedro A.

~0000515

caseydk (administrator)

Good call. I like that implementation. So even if we don't *really* have the underlying function available we can act like it and make sure everyone behaves accordingly.

~0000518

eureka (reporter)

Last edited: 2009-09-21 12:27

Very good pedroa! But you need to replace all occurences of string functions by their mbstring equivalent, overload them and everything will work perfectly :)

~0000522

pedroa (administrator)

This set of potential issues, and this restructure is now on SVN revision 667.

Added alternative mb_strlen, mb_substr, mb_strpos, mb_str_replace and mb_trim functions... if they are not already available through PHP mb_string
(recycled old w2PUTF8strlen and w2PUFT8substr)
If you have an non mb_string environment and find youself with issues with the alternative functions please provide us with new bug reports.
If you find other situations, or mb_ functions that might need web2Project UTF8 coverage please report that too.

Also took the time to fix an issue with duplicated task logs on Project view whenever there was more than one department associated with the project.

Thank you eureka for all the support.

Pedro A.
+Notes

-Issue History
Date Modified Username Field Change
2009-09-19 07:14 eureka New Issue
2009-09-19 11:14 pedroa Note Added: 0000511
2009-09-19 11:14 pedroa Assigned To => pedroa
2009-09-19 11:14 pedroa Status new => feedback
2009-09-19 13:00 eureka Note Added: 0000512
2009-09-19 20:15 caseydk Note Added: 0000513
2009-09-20 08:46 pedroa Note Added: 0000514
2009-09-20 09:00 caseydk Note Added: 0000515
2009-09-21 12:26 eureka Note Added: 0000518
2009-09-21 12:27 eureka Note Edited: 0000518
2009-09-22 08:48 pedroa Status feedback => resolved
2009-09-22 08:48 pedroa Resolution open => fixed
2009-09-22 08:48 pedroa Note Added: 0000522
2009-12-08 19:10 caseydk Status resolved => closed
2009-12-08 19:10 caseydk Fixed in Version => 1.2
+Issue History