Systems and methods for using contrastive pre-training to generate text and code embeddings
Topics: AI (Deep Learning), LLMO, OpenAI, Semantic Search
The patent by OpenAI describes systems and methods for generating text and code embeddings using contrastive pre-training. It outlines a process of training a machine learning model on positive and negative example pairs to create vector representations (embeddings) of text and code. The system can then use these embeddings to determine semantic similarity between inputs. Key aspects include using delimiters to structure inputs, encoding paired data samples independently, and training the model using contrastive learning on the encoded vectors. The resulting embeddings can be used for tasks like semantic similarity search, code search, and generating responses to natural language queries.