awesome-korean-nlp 
A curated list of resources dedicated to Natural Language Processing for Korean
한글, 한국어에 대한 자연어처리를 하는데에 유용한 자료 목록을 모아두었습니다.
Maintainers - Insik Kim
Thanks to Keon Kim, Martin Park for making awesome nlp, which is original awesome-nlp
Please feel free to pull requests, or email Insik Kim (insik92@gmail.com) to add links.
Table of Contents
Tutorials and Courses
- Tensor Flow Tutorial on Seq2Seq Models
- Natural Language Understanding with Distributed Representation Lecture Note by Cho
videos
Dataset
Corpus, 말뭉치
- 세종말뭉치
- Wikipedia Korean
- Wiki Source Korean
출판물이기 때문에 문법 정보 추출에 도움이 되는 데이터
- 나무위키:데이터베이스 덤프, 나무 위키 DB Dump Download Mirror Site
미러에서 제공하는 7z파일을 이용하면 약 1.2GB 크기의 압축파일을 받을 수 있다.
위키 rawdata를 평문으로 바꿔주는 스크립트는 namu_wiki_db_preprocess 참고
Deep Learning for NLP
Packages
Implementations
- Pre-trained word embeddings for WSJ corpus by Koc AI-Lab
- Word2vec by Mikolov
- HLBL language model by Turian
- Real-valued vector “embeddings” by Dhillon
- Improving Word Representations Via Global Context And Multiple Word Prototypes by Huang
- Dependency based word embeddings
- Global Vectors for Word Representations
Libraries
- Python - Python NLP Libraries
- KoNLPy - A Python package for Korean natural language processing.
- C++ - C++ Libraries
- Scalar - Scalar Libraries
https://github.com/twitter/twitter-korean-text
Services
Articles
Review Articles
Word Vectors
Resources about word vectors, aka word embeddings, and distributed representations for words.
Word vectors are numeric representations of words that are often used as input to deep learning systems. This process is sometimes called pretraining.
Thought Vectors
Thought vectors are numeric representations for sentences, paragraphs, and documents. The following papers are listed in order of date published, each one replaces the last as the state of the art in sentiment analysis.
Single Exchange Dialogs
Memory and Attention Models
General Natural Language Processing
Named Entity Recognition
Neural Network
Supplementary Materials
Projects
- 시인 뉴럴. Multi-layer LSTM for character-level language models in Torch. implemented by Kim Tae Hoon.
- 한글 word2vec Demo. implemented by Daegeun Lee.
Blogs
Credits
part of the lists are from