]> granicus.if.org Git - postgresql/commit
Use radix tree for character encoding conversions.
authorHeikki Linnakangas <heikki.linnakangas@iki.fi>
Mon, 13 Mar 2017 18:46:39 +0000 (20:46 +0200)
committerHeikki Linnakangas <heikki.linnakangas@iki.fi>
Mon, 13 Mar 2017 18:46:39 +0000 (20:46 +0200)
commitaeed17d00037950a16cc5ebad5b5592e5fa1ad0f
tree070aac060c2f923b5c636afbab51272bd2d04056
parent84892692fdedb753cfdd9a63b318b47ec640915f
Use radix tree for character encoding conversions.

Replace the mapping tables used to convert between UTF-8 and other
character encodings with new radix tree-based maps. Looking up an entry in
a radix tree is much faster than a binary search in the old maps. As a
bonus, the radix tree representation is also more compact, making the
binaries slightly smaller.

The "combined" maps work the same as before, with binary search. They are
much smaller than the main tables, so it doesn't matter so much. However,
the "combined" maps are now stored in the same .map files as the main
tables. This seems more clear, since they're always used together, and
generated from the same source files.

Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages.
Reviewed by Michael Paquier and Daniel Gustafsson.

Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
111 files changed:
src/backend/utils/mb/Unicode/Makefile
src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
src/backend/utils/mb/Unicode/UCS_to_UHC.pl
src/backend/utils/mb/Unicode/UCS_to_most.pl
src/backend/utils/mb/Unicode/big5_to_utf8.map
src/backend/utils/mb/Unicode/convutils.pm
src/backend/utils/mb/Unicode/euc_cn_to_utf8.map
src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map
src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8_combined.map [deleted file]
src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
src/backend/utils/mb/Unicode/euc_kr_to_utf8.map
src/backend/utils/mb/Unicode/euc_tw_to_utf8.map
src/backend/utils/mb/Unicode/gb18030_to_utf8.map
src/backend/utils/mb/Unicode/gbk_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map
src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map
src/backend/utils/mb/Unicode/johab_to_utf8.map
src/backend/utils/mb/Unicode/koi8r_to_utf8.map
src/backend/utils/mb/Unicode/koi8u_to_utf8.map
src/backend/utils/mb/Unicode/shift_jis_2004_to_utf8.map
src/backend/utils/mb/Unicode/shift_jis_2004_to_utf8_combined.map [deleted file]
src/backend/utils/mb/Unicode/sjis_to_utf8.map
src/backend/utils/mb/Unicode/uhc_to_utf8.map
src/backend/utils/mb/Unicode/utf8_to_big5.map
src/backend/utils/mb/Unicode/utf8_to_euc_cn.map
src/backend/utils/mb/Unicode/utf8_to_euc_jis_2004.map
src/backend/utils/mb/Unicode/utf8_to_euc_jis_2004_combined.map [deleted file]
src/backend/utils/mb/Unicode/utf8_to_euc_jp.map
src/backend/utils/mb/Unicode/utf8_to_euc_kr.map
src/backend/utils/mb/Unicode/utf8_to_euc_tw.map
src/backend/utils/mb/Unicode/utf8_to_gb18030.map
src/backend/utils/mb/Unicode/utf8_to_gbk.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_10.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_13.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_14.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_15.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_16.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_2.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_3.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_4.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_5.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_6.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_7.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_8.map
src/backend/utils/mb/Unicode/utf8_to_iso8859_9.map
src/backend/utils/mb/Unicode/utf8_to_johab.map
src/backend/utils/mb/Unicode/utf8_to_koi8r.map
src/backend/utils/mb/Unicode/utf8_to_koi8u.map
src/backend/utils/mb/Unicode/utf8_to_shift_jis_2004.map
src/backend/utils/mb/Unicode/utf8_to_shift_jis_2004_combined.map [deleted file]
src/backend/utils/mb/Unicode/utf8_to_sjis.map
src/backend/utils/mb/Unicode/utf8_to_uhc.map
src/backend/utils/mb/Unicode/utf8_to_win1250.map
src/backend/utils/mb/Unicode/utf8_to_win1251.map
src/backend/utils/mb/Unicode/utf8_to_win1252.map
src/backend/utils/mb/Unicode/utf8_to_win1253.map
src/backend/utils/mb/Unicode/utf8_to_win1254.map
src/backend/utils/mb/Unicode/utf8_to_win1255.map
src/backend/utils/mb/Unicode/utf8_to_win1256.map
src/backend/utils/mb/Unicode/utf8_to_win1257.map
src/backend/utils/mb/Unicode/utf8_to_win1258.map
src/backend/utils/mb/Unicode/utf8_to_win866.map
src/backend/utils/mb/Unicode/utf8_to_win874.map
src/backend/utils/mb/Unicode/win1250_to_utf8.map
src/backend/utils/mb/Unicode/win1251_to_utf8.map
src/backend/utils/mb/Unicode/win1252_to_utf8.map
src/backend/utils/mb/Unicode/win1253_to_utf8.map
src/backend/utils/mb/Unicode/win1254_to_utf8.map
src/backend/utils/mb/Unicode/win1255_to_utf8.map
src/backend/utils/mb/Unicode/win1256_to_utf8.map
src/backend/utils/mb/Unicode/win1257_to_utf8.map
src/backend/utils/mb/Unicode/win1258_to_utf8.map
src/backend/utils/mb/Unicode/win866_to_utf8.map
src/backend/utils/mb/Unicode/win874_to_utf8.map
src/backend/utils/mb/conv.c
src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
src/include/mb/pg_wchar.h