Subject: Re: UNC M set unification issues From: "john knightley" Date: Fri, 28 Nov 2008 17:03:25 +0800 To: satoshi.yamamoto.yd@hitachi.com CC: chenzh-zhuang@163.com, csluqin@comp.polyu.edu.hk, kawabata.taichi@gmail.com, xiaomwang2006@163.com Dear Yamamoto Satoshi, my apologies for the late nature of these observations, this is a result of my misunderstanding the schedule the IRG was following regarding the UNC, it was only at the last day of the IRG meeting that I realised that the UNC was to be submitted to the Dublin WG2. I had thought that the UNC would be reviewed again at IRG #32. With the benefit of hindsight it would have better to raise these issues earlier rather than assume that they would be otherwise touched upon during IRG meetings. Whilst #008 was raised in IRGN1483 and IRGN1518 it was ommitted by error during the discussions at IRG #31. Rather than "M set of UNC has not been discussed in detail" it would have been clearer to say "M set of UNC has not been discussed in sufficient detail" . My apologies again for not raising these matters sooner. Yours sincerely John Knightley 2008/11/28 : > > I can accept to move questionable characters to D set, but I cannot understand > > what you are saying at all. > > > > At the first submission of UNC, Japan showed similar characters in its document > > (IRG N1367) and noted its opinion (they are not unified). After merging member's > > proposal into a set, Taichi showed list of possible unifications using IDS checking. > > Some of your mentioned characters appeared in those lists and IRG had discussions > > on them. Furthermore, IRG editors have worked on this small urgent set (both M and D) > > for more than a year. Why you can say "M set of UNC has not been discussed > > in detail" ? > > > > I believe that you had chance to review and report those concerns before > > the last IRG meeting. If so, Japan would be able to provide explanations on them > > and IRG could discuss how to treat them. > > > > I understand in general that variants might cause security problems as you mentioned. > > But it's quit different story from what we are talking about. > > > > -- > > YAMAMOTO Satoshi mailto:satoshi.yamamoto.yd@hitachi.com > > Product Planning Dept., Hitachi, Ltd., Software Division > > > > > > >> >>Dear Yamamoto Satoshi, >> >> >> >>thankful for a quick and gracious reply. >> >> >> >>In this context by variant, I ment was according to the evidence >> >>provided appears to be cognate with, and that an application of the >> >>unification rules might well lead to the conlusion that these are >> >>unifiable. As far as I am aware the M set of UNC has not been >> >>discussed in detail. The unification candidates of each of these a >> >>frequently used characters, separately encoding the glyphs proposed in >> >>the UNC raises securitty questions. Might doubts regarding these >> >>characters are strong enough to say that more discussion at IRG level >> >>is required and thereforemoving them to the D set at this time would >> >>as you say is possible be the best. >> >> >> >>Moving the five characters for which the 大新字典 is given as source would >> >>be a good a appropriate solution. That is moving the following to D >> >>set. >> >> >> >>#006 is maybe unifiable with 久 >> >>#008 is variant of 今 >> >>#130 is maybe unifiable with 甚 >> >>#145 is maybe unifiable with 空 >> >>#162 is maybe unifiable with 美 >> >> >> >>to this I would add >> >> >> >>#119 is maybe unifiable with 為 >> >> >> >>Regarding #99 and #100 the removal of one or both to D set would be appropriate. >> >> >> >>The above changes would improve the quality of the M set. >> >> >> >>Thank you again for being very understanding. >> >> >> >>Yours sincerly >> >>John Knightley >> >> >> >> >> >>2008/11/28 : >>> >>> Dear John, >>> >>> >>> >>> Sorry but I cannot understand what you mean. >>> >>> >>> >>> Repeatedly IRG re-confirmed that variants are not always unified. To unify >>> >>> two glyphs or not should be determined by the unification rule. "Variants" >>> >>> can not be a reason to unify glyphs. >>> >>> >>> >>> My understanding is the same as Mr. Chen's. Japan has ever tried to explain >>> >>> any questionable J-source characters and in case couldn't, withdraws. >>> >>> However, if you strongly doubts on those characters now, I think that >>> >>> IRG Chief editor can move them to another working set (CJK D or future >>> >>> extension) for further discussion. >>> >>> >>> >>> IRG N1499 was reviewed at the last IRG meeting and concluded it needs >>> >>> to be revised. Please don' t use that document as an evidence. >>> >>> >>> >>> >>> >>> -- >>> >>> YAMAMOTO Satoshi mailto:satoshi.yamamoto.yd@hitachi.com >>> >>> Product Planning Dept., Hitachi, Ltd., Software Division >>> >>> >>> >>> >>> >>> >>> >>> >>>> >>>>Dear Mr Chen Zhuang, >>>> >>>> >>>> >>>>thank you for a very quick reply, and my apologies to Mr Yamaoto San >>>> >>>>for not including him in the initial email. >>>> >>>> >>>> >>>>For 5 of the the cases, shown below, the evidence supplied is from >>>> >>>>《新大字典》where the evidence provided shows them as variants of the >>>> >>>>encoded characters:- >>>> >>>> >>>> >>>>#006 is a variant of 久 >>>> >>>> >>>> >>>>#008 is variant of 今 >>>> >>>> >>>> >>>>#130 is a variant of 甚 >>>> >>>> >>>> >>>>#145 is a variant of 空 >>>> >>>> >>>> >>>>Is #162 a variant of 美 ? >>>> >>>> >>>> >>>>Further more #008 is in the Adobe IVS sequences of the IVD, this was >>>> >>>>also in IRGN1483 part B as a unification pattern for discussion and no >>>> >>>>comments were made againast it. >>>> >>>> >>>> >>>>I agree there are more than enough UCS codepoints, my concern is that >>>> >>>>for some of these encoding separately my cause problems as the >>>> >>>>varaiants are already found in in some japanese fonts at the BMP >>>> >>>>locations. >>>> >>>> >>>> >>>>Yours sincerely >>>> >>>>John Knightley >>>> >>>> >>>> >>>>2008/11/28 chen-zhuang : >>>>> >>>>> >>>>> >>>>> Sorry, the previous email did not conclude email adress of Yamaoto san. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Chen Zhuang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From: "chen-zhuang" >>>>> >>>>> To: "john knightley" >>>>> >>>>> Date: Fri, 28 Nov 2008 10:56:45 +0800 (CST) >>>>> >>>>> Subject: Re: UNC M set unification issues >>>>> >>>>> >>>>> >>>>> John, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks for giving us your concern. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Generally, m set was aready confirmed by IRG editors and should not be >>>>> >>>>> changed if not strongly doubted by most IRG editors. I believe that the >>>>> >>>>> cases you mentioned were aready discussed in previous meetings. For >>>>> >>>>> instance, some J source characters were questioned by Chinese editors before >>>>> >>>>> but at last accepted becuase Japan said they were used for place names and >>>>> >>>>> urgently needed. Therefore, I will not move them to d set without >>>>> >>>>> confirmation from Japan of similar doubt from more editors. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Personally, I think no need to unify (or over unify) these kinds of >>>>> >>>>> characters since we have enough coding positions out of BMP of UCS. Besiding >>>>> >>>>> place names or people names, encoding more varints thoes appeared in ancient >>>>> >>>>> books will make it much more easier for publishing industry. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I checked one case #145 which may be variant of U+07A7A (空). The glyph is >>>>> >>>>> not exactly the same as that in IRGN1499 which was proposed by Japan for >>>>> >>>>> Compatibility Character. I also am interested that if Japan could move #145 >>>>> >>>>> of UNC to the Compatibility Zone. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Yamamoto san, how do you think about the questions of John? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Chen Zhuang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From: "john knightley" >>>>> >>>>> To: "chenzh-zhuang" >>>>> >>>>> Date: Fri, 28 Nov 2008 09:56:55 +0800 (CST) >>>>> >>>>> Subject: UNC M set unification issues >>>>> >>>>> >>>>>> >>>>>> Dear Mr Chen Zhuang, > > in reviewing IRG#31 documents I have discovered >>>>>> >>>>>> that in the M set of > the UNC there are several cases of unresolved >>>>>> >>>>>> unification issues. > Hopefully these issues can be addressed before the UNC >>>>>> >>>>>> is submitted to > WG2. What would be the best way to try and resolve these >>>>>> >>>>>> issues? > > Five UNC M set characters in particular seem to be unifiable:- > >>>>>>> >>>>>> > #006 is a variant of $B5W (B > > #008 is variant of $B:# (B > > #130 is a >>>>>> >>>>>> variant of $B?S (B > > #145 is a variant of $B6u (B > > Most of these are >>>>>> >>>>>> patterns that are unified in IRGN1499 Proposal to > Add a Set of >>>>>> >>>>>> Compatibility Ideographs for Government Use (Japan), and > several were >>>>>> >>>>>> listed in Annex S discussion documents, but not discussed > at IRG #31. > > >>>>>> >>>>>> And there are some cases that should be considered more > > Is #99 a variant >>>>>> >>>>>> of #100 ? > > Is #151 a variant of $BdL (B ? > > Is #162 a variant of $BH~ >>>>>> >>>>>> (B ? > > A closer look at the evidences may well reveal other cases. > > >>>>>> >>>>>> Yours sincerely > John Knightley > >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> [广告] 重奖 悬赏kfc3v3 球衣 >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> [广告] 重奖 悬赏kfc3v3 球衣 >>>> >>>> >>> >>> >> >> > >