Data2Vec

Friday, February 18, 2022

Yuho Jeong

yuho8437@unist.ac.kr

2022년 1월에 Meta AI에서 발표한 Data2vec 블로그 포스팅을 읽고 내용을 공유합니다.

Summary

현재의 self-supervised learning 연구들은 image, speech, text 등 각기 다른 modality마다 학습 방법에 차이가 크다.
따라서 Meta AI는 multiple modality(e.b. image, speech, text)에서 작동하는 data2vec을 개발하였으며, data2vec은 기존 computer vision, speech 분야의 알고리즘 성능을 넘었고, NLP 분야에서는 견줄만큼의 성능을 기록하였다.
Data2vec은 기존에 self-supervised learning에서 자주 사용되던 contrastive learning이나, reconstructing the input example 방식을 사용하지는 않는다.
Data2vec은 input data에 대한 own representation을 맞추도록 학습시킨다.

Teacher network에 데이터를 넣어 출력된 representation을 target으로 설정한다.
Student network에는 데이터에 일부 masking을 가해서 넣어 representation을 얻어내고, 이 representation이 teacher의 target representation과 동일해지도록 모델을 학습시킨다.
Teacher와 student는 동일한 네트워크이지만 weight 값이 살짝 다르다. (BYOL 처럼 exponentialmoving average 사용)

Comments

Self-supervised learning 중에서도 BYOL의 아이디어를 차용하여 multi-modal learning 알고리즘을 고안한 것이 재미있었음
하나의 backbone으로 image, speech, text 관련 task에 모두 적용 가능하다는 것이, 현재까지의 multi-modal 연구 중에 가장 자연스러운 방법이라는 생각이 들며 future work들이 기대됨
Label이 필요없고, multi-modality에서 작동하고, single-purpose algorithms보다 잘하는 AI 모델. 이것이 앞으로의 AI의 주요 방향성이 되지 않을까 싶음

References

Data2vec블로그 포스팅: The first high-performance self-supervised algorithm that works for speech, vision, and text
Data2vec paper: Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Data2vec code: https://github.com/pytorch/fairseq/tree/main/examples/data2vec