iOS - NSString의 언어 인코딩 메소드 정리

by digipine posted Nov 01, 2017
?

Shortcut

PrevPrev Article

NextNext Article

ESCClose

Larger Font Smaller Font Up Down Go comment Print

NSString의 인코딩

NSString은 내부적으로 모두 유니코드로 저장된다. 하지만, NSString에 문자열을 넣거나 뺄때 유니코드가 아닌 다른 인코딩을 사용할 수 있다.
NSString이 기본적으로 지원하는 인코딩은 NSString.h에 다음과 같이 선언되어 있다.

typedef unsigned NSStringEncoding;

enum {
    NSASCIIStringEncoding = 1,		/* 0..127 only */
    NSNEXTSTEPStringEncoding = 2,
    NSJapaneseEUCStringEncoding = 3,
    NSUTF8StringEncoding = 4,
    NSISOLatin1StringEncoding = 5,
    NSSymbolStringEncoding = 6,
    NSNonLossyASCIIStringEncoding = 7,
    NSShiftJISStringEncoding = 8,
    NSISOLatin2StringEncoding = 9,
    NSUnicodeStringEncoding = 10,
    NSWindowsCP1251StringEncoding = 11,    /* Cyrillic; same as AdobeStandardCyrillic */
    NSWindowsCP1252StringEncoding = 12,    /* WinLatin1 */
    NSWindowsCP1253StringEncoding = 13,    /* Greek */
    NSWindowsCP1254StringEncoding = 14,    /* Turkish */
    NSWindowsCP1250StringEncoding = 15,    /* WinLatin2 */
    NSISO2022JPStringEncoding = 21,         /* ISO 2022 Japanese encoding for e-mail */
    NSMacOSRomanStringEncoding = 30,

    NSProprietaryStringEncoding = 65536    /* Installation-specific encoding */
};

하지만, NSString은 실제로 Mac OS에서 사용할 수 있는 모든 인코딩을 지원한다.

다음은 NSString클래스의 인코딩관련 메쏘드들이다.

/* 사용자 기본 인코딩을 구한다 */
+ (NSStringEncoding)defaultCStringEncoding;

/* NSString에서 사용가능한 모든 인코딩을 구한다 */
+ (const NSStringEncoding *)availableStringEncodings;

/* 인코딩의 이름을 구한다 */
+ (NSString *)localizedNameOfStringEncoding:(NSStringEncoding)encoding;

/* 현재 스트링을 변환시 가장 빠르거나 메모리가 적게 드는 인코딩을 구한다 */
- (NSStringEncoding)fastestEncoding;
- (NSStringEncoding)smallestEncoding;

/* 데이터의 손실없이 해당 인코딩으로 변환 가능한지 여부를 알아본다 */
- (BOOL)canBeConvertedToEncoding:(NSStringEncoding)encoding;

/* 해당 인코딩의 데이터로부터 NSString을 생성한다 */
- (id)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding;

/* 현재 스트링을 해당 인코딩의 데이터로 변환하여 얻는다 */
- (NSData *)dataUsingEncoding:(NSStringEncoding)encoding allowLossyConversion:(BOOL)lossy;
- (NSData *)dataUsingEncoding:(NSStringEncoding)encoding; /* lossy=NO */

다음의 코드를 사용해서 NSString에서 지원하는 모든 인코딩의 이름과 그 값을 알 수 있다.

NSStringEncoding *encoding = [NSString availableStringEncodings];
while (*encoding) {
    NSLog(@"%@ %x", [NSString localizedNameOfStringEncoding:*encoding], *encoding);
    encoding++;
}

다음은 NSString에서 지원하는 인코딩들이다.

Western (Mac OS Roman) 0x1e 30 30
Japanese (Mac OS) 0x80000001 2147483649 -2147483647
Traditional Chinese (Mac OS) 0x80000002 2147483650 -2147483646
Korean (Mac OS) 0x80000003 2147483651 -2147483645
Arabic (Mac OS) 0x80000004 2147483652 -2147483644
Hebrew (Mac OS) 0x80000005 2147483653 -2147483643
Greek (Mac OS) 0x80000006 2147483654 -2147483642
Cyrillic (Mac OS) 0x80000007 2147483655 -2147483641
Devanagari (Mac OS) 0x80000009 2147483657 -2147483639
Gurmukhi (Mac OS) 0x8000000a 2147483658 -2147483638
Gujarati (Mac OS) 0x8000000b 2147483659 -2147483637
Thai (Mac OS) 0x80000015 2147483669 -2147483627
Simplified Chinese (Mac OS) 0x80000019 2147483673 -2147483623
Tibetan (Mac OS) 0x8000001a 2147483674 -2147483622
Central European (Mac OS) 0x8000001d 2147483677 -2147483619
Symbol (Mac OS) 0x6 6 6
Dingbats (Mac OS) 0x80000022 2147483682 -2147483614
Turkish (Mac OS) 0x80000023 2147483683 -2147483613
Croatian (Mac OS) 0x80000024 2147483684 -2147483612
Icelandic (Mac OS) 0x80000025 2147483685 -2147483611
Romanian (Mac OS) 0x80000026 2147483686 -2147483610
Keyboard Symbols (Mac OS) 0x80000029 2147483689 -2147483607
Farsi (Mac OS) 0x8000008c 2147483788 -2147483508
Cyrillic (Mac OS Ukrainian) 0x80000098 2147483800 -2147483496
Western (Mac VT100) 0x800000fc 2147483900 -2147483396
Unicode™ (UTF-16) 0xa 10 10
Unicode™ (UTF-8) 0x4 4 4
Western (ISO Latin 1) 0x5 5 5
Central European (ISO Latin 2) 0x9 9 9
Western (ISO Latin 3) 0x80000203 2147484163 -2147483133
Central European (ISO Latin 4) 0x80000204 2147484164 -2147483132
Cyrillic (ISO 8859-5) 0x80000205 2147484165 -2147483131
Arabic (ISO 8859-6) 0x80000206 2147484166 -2147483130
Greek (ISO 8859-7) 0x80000207 2147484167 -2147483129
Hebrew (ISO 8859-8) 0x80000208 2147484168 -2147483128
Turkish (ISO Latin 5) 0x80000209 2147484169 -2147483127
Nordic (ISO Latin 6) 0x8000020a 2147484170 -2147483126
Thai (ISO 8859-11) 0x8000020b 2147484171 -2147483125
Baltic Rim (ISO Latin 7) 0x8000020d 2147484173 -2147483123
Celtic (ISO Latin 8) 0x8000020e 2147484174 -2147483122
Western (ISO Latin 9) 0x8000020f 2147484175 -2147483121
Latin-US (DOS) 0x80000400 2147484672 -2147482624
Greek (DOS) 0x80000405 2147484677 -2147482619
Baltic Rim (DOS) 0x80000406 2147484678 -2147482618
Western (DOS Latin 1) 0x80000410 2147484688 -2147482608
Central European (DOS Latin 2) 0x80000412 2147484690 -2147482606
Turkish (DOS) 0x80000414 2147484692 -2147482604
Icelandic (DOS) 0x80000416 2147484694 -2147482602
Arabic (DOS) 0x80000419 2147484697 -2147482599
Cyrillic (DOS) 0x8000041b 2147484699 -2147482597
Thai (Windows, DOS) 0x8000041d 2147484701 -2147482595
Japanese (Windows, DOS) 0x8 8 8
Simplified Chinese (Windows, DOS) 0x80000421 2147484705 -2147482591
Korean (Windows, DOS) 0x80000422 2147484706 -2147482590
Traditional Chinese (Windows, DOS) 0x80000423 2147484707 -2147482589
Western (Windows Latin 1) 0xc 12 12
Central European (Windows Latin 2) 0xf 15 15
Cyrillic (Windows) 0xb 11 11
Greek (Windows) 0xd 13 13
Turkish (Windows Latin 5) 0xe 14 14
Hebrew (Windows) 0x80000505 2147484933 -2147482363
Arabic (Windows) 0x80000506 2147484934 -2147482362
Baltic Rim (Windows) 0x80000507 2147484935 -2147482361
Vietnamese (Windows) 0x80000508 2147484936 -2147482360
Western (ASCII) 0x1 1 1
Japanese (Shift JIS X0213) 0x80000628 2147485224 -2147482072
Chinese (GBK) 0x80000631 2147485233 -2147482063
Chinese (GB 18030) 0x80000632 2147485234 -2147482062
Japanese (ISO 2022-JP) 0x15 21 21
Korean (ISO 2022-KR) 0x80000840 2147485760 -2147481536
Japanese (EUC) 0x3 3 3
Simplified Chinese (EUC) 0x80000930 2147486000 -2147481296
Traditional Chinese (EUC) 0x80000931 2147486001 -2147481295
Korean (EUC) 0x80000940 2147486016 -2147481280
Japanese (Shift JIS) 0x80000a01 2147486209 -2147481087
Cyrillic (KOI8-R) 0x80000a02 2147486210 -2147481086
Traditional Chinese (Big 5) 0x80000a03 2147486211 -2147481085
Western (Mac Mail) 0x80000a04 2147486212 -2147481084
Traditional Chinese (Big 5 HKSCS) 0x80000a06 2147486214 -2147481082
Western (NextStep) 0x2 2 2
Non-lossy ASCII 0x7 7 7
Western (EBCDIC US) 0x80000c02 2147486722 -2147480574
 
 
TAG •