Hans_Huginn: [PIG] Keyword Convert ( Replace Function )

2015년 9월 3일 목요일

[PIG] Keyword Convert ( Replace Function )

REPLACE Function

Syntax

REPLACE(string, 'oldChar', 'newChar');

Terms

string	The string to be updated.
'oldChar'	The existing characters being replaced, in quotes.
'newChar'	The new characters replacing the existing characters, in quotes.

Example

REPLACE(string,'software','wiki');

정규표현식으로 keyword 치환

REPLACE 함수의 oldChar에 정규표현식을 사용해서 원하는문자열을 제거할 수 있다.

POSIX	ASCII	UNICODE	설명
\p{Alnum}	A-Za-z0-9		영숫자
\p{Alpha}	A-Za-z		알파벳 문자
\p{ASCII}		\x00-\x7F	ASCII문자
\p{Digit}	0-9	\uFF10-\uFF19	숫자
\p{Lower}	a-z	\uFF41-\uFF5A	소문자
\p{Upper}	A-Z	\uFF21-\uFF3A	대문자
		\uAC00-\uD7A3	가-힣
		\u1100-\u1112	ㄱ-ㅎ
		\u3130-\u3163	ㄱ-ㅎ(Compatible)ㅏ-ㅣ
		\u4E00-\u9FFF	Unified Hanja (Traditional/Simplify Chinese, Japanese, Korean)
		\u3040-\u30FC	Japanese

Example

#ASCII가 아닌 문자 제거
REPLACE(keyword, '[^\\p{ASCII}]','');

#ASCII or 한글이 아닌 문자 제거
REPLACE(keyword,'[^\\p{ASCII}^\\uac00-\\ud7a3i]','');

#ASCII or CJK 가 아닌 문자 제거
REPLACE(keyword,'[^\\p{ASCII}^\\uac00-\\ud7a3i^\\u4E00-\\u9FFF^\\u3040-\\u30FC]','');

관련 link

정규표현식 wiki: https://ko.wikipedia.org/wiki/%EC%A0%95%EA%B7%9C_%ED%91%9C%ED%98%84%EC%8B%9D
정규표현식 java: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
Unicode wiki: https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF

Hans_Huginn

2015년 9월 3일 목요일

[PIG] Keyword Convert ( Replace Function )

댓글 없음:

댓글 쓰기

추천 게시물

python: SVD(Singular Value Decomposition)로 간단한 추천시스템 만들기( feat. surprise )

이 블로그 검색