BERT(언어 모델)

Bidirectional Encoder Representations from Transformers(BERT)는 Google이 개발한 자연어 처리(NLP) 사전 훈련을 위한 트랜스포머 기반의 머신 러닝 기술입니다.BERT는 Jacob Devlin과 Google의 ^[1]^[2]그의 동료들에 의해 2018년에 만들어지고 발행되었습니다.2019년 구글은 검색 엔진에서 BERT를 활용하기 시작했다고 발표했으며 2020년 후반에는 거의 모든 영어 쿼리에서 BERT를 사용하게 되었다.2020년 문헌 조사에서 "1년 조금 넘는 기간 동안 BERT는 NLP 실험에서 유비쿼터스 베이스라인이 되었다"고 결론지었고,^[3] 모델을 분석하고 개선한 150개가 넘는 연구 출판물을 집계했다.

원래 영문 BERT에는 두 가지 ^[1]모델이 있습니다. (1) BERT_BASE: 12개의 쌍방향 자기 어텐션헤드를 가진 12개의 인코더와 (2) 16개의 쌍방향 자기 어텐션헤드를 가진 24개의_LARGE 인코더입니다.두 모델 모두 8억 단어 BooksCorpus와^[4] 25억 단어 영어 위키피디아에서 추출한 레이블이 없는 데이터에서 사전 교육되었습니다.

아키텍처

BERT는 다양한 수의 인코더 레이어와 셀프 어텐션 헤드를 가진 트랜스 언어 모델의 핵심입니다.이 아키텍처는 Vaswani 등(2017)^[5]의 최초 트랜스포머 구현과 "거의 동일"합니다.

BERT는 언어 모델링(토큰의 15%가 마스크되었고 BERT는 컨텍스트에서 예측하도록 훈련됨)과 다음 문장 예측(선택한 다음 문장이 개연성이 있는지 또는 첫 문장이 주어지지 않았는지 예측하도록 훈련됨)의 두 가지 작업에 대해 사전 훈련을 받았다.트레이닝 프로세스의 결과, BERT는 단어의 문맥적 임베딩을 학습합니다.계산 비용이 많이 드는 사전 교육 후, BERT를 소규모 데이터셋에서 더 적은 리소스로 미세 조정하여 특정 ^[1]^[6]작업에 대한 성능을 최적화할 수 있습니다.

성능

BERT가 출판되었을 때, 다음과 같은 다양한 자연 언어 이해 ^[1]작업에서 최첨단 성능을 달성했습니다.

GLUE(General Language Understanded Evaluation) 태스크 세트(9개 태스크로 구성)
SQuAD(Stanford Question Answering Dataset) v1.1 및 v2.0
SWAG(적대세대의 상황)
감성 분석: BERT에 기반한 감성 분류기는 여러 언어로 놀라운 성능을 달성했습니다.

분석.

이러한 자연어 이해 태스크에서 BERT가 최첨단 성능을 발휘하는 이유는 아직 ^[8]^[9]잘 알려져 있지 않습니다.현재의 연구는 신중하게 선택된 입력 ^[10]^[11]시퀀스의 결과로서 BERT의 출력 뒤에 있는 관계, 프로빙 분류기를 ^[12]^[13]통한 내부 벡터 표현 분석, 주의 ^[8]^[9]가중치로 표현되는 관계를 조사하는 데 초점을 맞추고 있다.

이력

BERT는 반감독 시퀀스 학습,^[14] 생성 사전 훈련, ELMo ^[15]및 ULMFit을 ^[16]포함한 사전 훈련 컨텍스트 표현에서 유래한다.이전 모델과 달리, BERT는 단순한 텍스트 말뭉치만을 사용하여 사전 훈련을 받은 깊이 있는 양방향, 감독되지 않은 언어 표현이다.word2vec 또는 GloVe와 같은 문맥이 없는 모델은 어휘의 각 단어에 대해 단일 단어 삽입 표현을 생성하며, 여기서 BERT는 주어진 단어의 각 발생에 대한 문맥을 고려한다.예를 들어, "Running"의 벡터는 "He is running a company"와 "He is running a mathon"의 양쪽 문장에서 동일한 단어 2vec 벡터를 나타내지만, BERT는 문장에 따라 다른 컨텍스트화된 임베딩을 제공합니다.

2019년 10월 25일 구글 검색은 미국 ^[17]내 영어 검색 질의에 BERT 모델을 적용하기 시작했다고 발표했다.2019년 12월 9일, BERT는 구글 검색에서 70개 이상의 ^[18]언어로 채택되었다고 보고되었다.2020년 10월에는 거의 모든 영어 기반 쿼리가 ^[19]BERT에 의해 처리되었다.

인식

BERT에 대한 연구 논문은 NAACL(^[20]Association for Computational Languageology)의 2019 북미 지부 연차 회의에서 Best Long Paper Award를 수상했다.

적응

BERT는 다음과 같은 도메인 고유의 ^[21]^[22]말뭉치를 사용하여 언어 모델을 훈련함으로써 많은 도메인에 적응되었습니다.

바이오메디컬 코퍼스로 훈련받은 BioBERT
FinBERT, 금융 분야에 초점을 맞추고 있습니다.
partent BERT, 특허 트레이닝 완료
랍비어 히브리어 말뭉치에 대한 훈련을 받은 BEREL

「」를 참조해 주세요.

레퍼런스

^ ^a ^b ^c ^d Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-11-27.
^ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].
^ Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish (2017-06-12). "Attention Is All You Need". arXiv:1706.03762 [cs.CL].
^ Horev, Rani (2018). "BERT Explained: State of the art language model for NLP". Towards Data Science. Retrieved 27 September 2021.
^ Chiorrini, Andrea; Diamantini, Claudia; Mircoli, Alex; Potena, Domenico. "Emotion and sentiment analysis of tweets using BERT" (PDF). Proceedings of Data Analytics solutions for Real-LIfe APplications (DARLI-AP) 2021.
^ ^a ^b Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). "Revealing the Dark Secrets of BERT". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 4364–4373. doi:10.18653/v1/D19-1445. S2CID 201645145.
^ ^a ^b Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (2019). "What Does BERT Look at? An Analysis of BERT's Attention". Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 276–286. doi:10.18653/v1/w19-4828.
^ Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 284–294. arXiv:1805.04623. Bibcode:2018arXiv180504623K. doi:10.18653/v1/p18-1027. S2CID 21700944.
^ Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). "Colorless Green Recurrent Networks Dream Hierarchically". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 1195–1205. arXiv:1803.11138. Bibcode:2018arXiv180311138G. doi:10.18653/v1/n18-1108. S2CID 4460159.
^ Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). "Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 240–248. arXiv:1808.08079. Bibcode:2018arXiv180808079G. doi:10.18653/v1/w18-5426. S2CID 52090220.
^ Zhang, Kelly; Bowman, Samuel (2018). "Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 359–361. doi:10.18653/v1/w18-5448.
^ Dai, Andrew; Le, Quoc (4 November 2015). "Semi-supervised Sequence Learning". arXiv:1511.01432 [cs.LG].
^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer (15 February 2018). "Deep contextualized word representations". arXiv:1802.05365v2 [cs.CL].
^ Howard, Jeremy; Ruder, Sebastian (18 January 2018). "Universal Language Model Fine-tuning for Text Classification". arXiv:1801.06146v5 [cs.CL].
^ Nayak, Pandu (25 October 2019). "Understanding searches better than ever before". Google Blog. Retrieved 10 December 2019.
^ Montti, Roger (10 December 2019). "Google's BERT Rolls Out Worldwide". Search Engine Journal. Search Engine Journal. Retrieved 10 December 2019.
^ "Google: BERT now used on almost every English query". Search Engine Land. 2020-10-15. Retrieved 2020-11-24.
^ "Best Paper Awards". NAACL. 2019. Retrieved Mar 28, 2020.
^ "Domain-Specific BERT Models · Chris McCormick". mccormickml.com. Retrieved 2022-08-02.
^ Tai, Wen; Kung, H. T.; Dong, Xin; Comiter, Marcus; Kuo, Chang-Fu (November 2020). "exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources". Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics: 1433–1439. doi:10.18653/v1/2020.findings-emnlp.129.
^ Lee, Jinhyuk; Yoon, Wonjin; Kim, Sungdong; Kim, Donghyeon; Kim, Sunkyu; So, Chan Ho; Kang, Jaewoo (2020-02-01). "BioBERT: a pre-trained biomedical language representation model for biomedical text mining". Bioinformatics. 36 (4): 1234–1240. doi:10.1093/BIOINFORMATICS/BTZ682. PMC 7703786. PMID 31501885.
^ Araci, Dogu (2019-08-27). "FinBERT: Financial Sentiment Analysis with Pre-trained Language Models". arXiv:1908.10063 [cs].
^ Lee, Jieh-Sheng; Hsiang, Jieh (2019-06-30). "PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model". arXiv:1906.02124 [cs, stat].
^ Shmidman, Avi; Guedalia, Joshua; Shmidman, Shaltiel; Shmidman, Cheyn Shmuel; Handel, Eli; Koppel, Moshe (2022-08-04). "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language". {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

추가 정보

Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].

외부 링크

[:0-1] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[2] "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-11-27.

[3] Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].

[vaswani-5] Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish (2017-06-12). "Attention Is All You Need". arXiv:1706.03762 [cs.CL].

[6] Horev, Rani (2018). "BERT Explained: State of the art language model for NLP". Towards Data Science. Retrieved 27 September 2021.

[chiorrini-7] Chiorrini, Andrea; Diamantini, Claudia; Mircoli, Alex; Potena, Domenico. "Emotion and sentiment analysis of tweets using BERT" (PDF). Proceedings of Data Analytics solutions for Real-LIfe APplications (DARLI-AP) 2021.

[:1-8] Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). "Revealing the Dark Secrets of BERT". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 4364–4373. doi:10.18653/v1/D19-1445. S2CID 201645145.

[:2-9] Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (2019). "What Does BERT Look at? An Analysis of BERT's Attention". Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 276–286. doi:10.18653/v1/w19-4828.

[10] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 284–294. arXiv:1805.04623. Bibcode:2018arXiv180504623K. doi:10.18653/v1/p18-1027. S2CID 21700944.

[11] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). "Colorless Green Recurrent Networks Dream Hierarchically". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 1195–1205. arXiv:1803.11138. Bibcode:2018arXiv180311138G. doi:10.18653/v1/n18-1108. S2CID 4460159.

[12] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). "Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 240–248. arXiv:1808.08079. Bibcode:2018arXiv180808079G. doi:10.18653/v1/w18-5426. S2CID 52090220.

[13] Zhang, Kelly; Bowman, Samuel (2018). "Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 359–361. doi:10.18653/v1/w18-5448.

[14] Dai, Andrew; Le, Quoc (4 November 2015). "Semi-supervised Sequence Learning". arXiv:1511.01432 [cs.LG].

[15] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer (15 February 2018). "Deep contextualized word representations". arXiv:1802.05365v2 [cs.CL].

[16] Howard, Jeremy; Ruder, Sebastian (18 January 2018). "Universal Language Model Fine-tuning for Text Classification". arXiv:1801.06146v5 [cs.CL].

[17] Nayak, Pandu (25 October 2019). "Understanding searches better than ever before". Google Blog. Retrieved 10 December 2019.

[18] Montti, Roger (10 December 2019). "Google's BERT Rolls Out Worldwide". Search Engine Journal. Search Engine Journal. Retrieved 10 December 2019.

[19] "Google: BERT now used on almost every English query". Search Engine Land. 2020-10-15. Retrieved 2020-11-24.

[20] "Best Paper Awards". NAACL. 2019. Retrieved Mar 28, 2020.

[21] "Domain-Specific BERT Models · Chris McCormick". mccormickml.com. Retrieved 2022-08-02.

[22] Tai, Wen; Kung, H. T.; Dong, Xin; Comiter, Marcus; Kuo, Chang-Fu (November 2020). "exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources". Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics: 1433–1439. doi:10.18653/v1/2020.findings-emnlp.129.

[23] Lee, Jinhyuk; Yoon, Wonjin; Kim, Sungdong; Kim, Donghyeon; Kim, Sunkyu; So, Chan Ho; Kang, Jaewoo (2020-02-01). "BioBERT: a pre-trained biomedical language representation model for biomedical text mining". Bioinformatics. 36 (4): 1234–1240. doi:10.1093/BIOINFORMATICS/BTZ682. PMC 7703786. PMID 31501885.

[24] Araci, Dogu (2019-08-27). "FinBERT: Financial Sentiment Analysis with Pre-trained Language Models". arXiv:1908.10063 [cs].

[25] Lee, Jieh-Sheng; Hsiang, Jieh (2019-06-30). "PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model". arXiv:1906.02124 [cs, stat].

[26] Shmidman, Avi; Guedalia, Joshua; Shmidman, Shaltiel; Shmidman, Cheyn Shmuel; Handel, Eli; Koppel, Moshe (2022-08-04). "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language". {{cite journal}}:Cite 저널 요구 사항 journal=(도움말)

[1]

[2]

[3]

[4]

[5]

[6]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

Search

BERT(언어 모델)

네임스페이스

더

목차

아키텍처

성능

분석.

이력

인식

적응

「」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

Search

BERT(언어 모델)

아키텍처

성능

분석.

이력

인식

적응

「 」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

「」를 참조해 주세요.