ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining

ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining. Yu, X. *, Xu, R. *, Xue, C. *, Zhang, J., Ma, X., & Yu, Z. ACL 2025. Findings of the Association for Computational Linguistics (ACL), 2025. (* Equal contribution)

Paper abstract bibtex 1 download

A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.

@article{yu2025confitv2improvingresumejob,
  title={ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining},
  author={Yu, Xiao * and Xu, Ruize * and Xue, Chengyuan * and Zhang, Jinzhong and Ma, Xu and Yu, Zhou},
  journal={<span style="color: #0088cc; font-style: normal">ACL 2025.</span> Findings of the Association for Computational Linguistics (ACL)},
  year={2025},
  bibbase_note={(* Equal contribution)},
  url_Paper = {https://arxiv.org/abs/2502.12361},
  abstract ={A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.}
}

Downloads: 1

{"_id":"ZeokQubAMPAHfEbaj","bibbaseid":"yu-xu-xue-zhang-ma-yu-confitv2improvingresumejobmatchingusinghypotheticalresumeembeddingandrunneruphardnegativemining-2025","author_short":["Yu, X. *","Xu, R. *","Xue, C. *","Zhang, J.","Ma, X.","Yu, Z."],"bibdata":{"bibtype":"article","type":"article","title":"ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining","author":[{"propositions":[],"lastnames":["Yu"],"firstnames":["Xiao","*"],"suffixes":[]},{"propositions":[],"lastnames":["Xu"],"firstnames":["Ruize","*"],"suffixes":[]},{"propositions":[],"lastnames":["Xue"],"firstnames":["Chengyuan","*"],"suffixes":[]},{"propositions":[],"lastnames":["Zhang"],"firstnames":["Jinzhong"],"suffixes":[]},{"propositions":[],"lastnames":["Ma"],"firstnames":["Xu"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Zhou"],"suffixes":[]}],"journal":"ACL 2025. Findings of the Association for Computational Linguistics (ACL)","year":"2025","bibbase_note":"(* Equal contribution)","url_paper":"https://arxiv.org/abs/2502.12361","abstract":"A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.","bibtex":"@article{yu2025confitv2improvingresumejob,\n title={ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining},\n author={Yu, Xiao * and Xu, Ruize * and Xue, Chengyuan * and Zhang, Jinzhong and Ma, Xu and Yu, Zhou},\n journal={ACL 2025. Findings of the Association for Computational Linguistics (ACL)},\n year={2025},\n bibbase_note={(* Equal contribution)},\n url_Paper = {https://arxiv.org/abs/2502.12361},\n abstract ={A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.}\n}","author_short":["Yu, X. *","Xu, R. *","Xue, C. *","Zhang, J.","Ma, X.","Yu, Z."],"key":"yu2025confitv2improvingresumejob","id":"yu2025confitv2improvingresumejob","bibbaseid":"yu-xu-xue-zhang-ma-yu-confitv2improvingresumejobmatchingusinghypotheticalresumeembeddingandrunneruphardnegativemining-2025","role":"author","urls":{" paper":"https://arxiv.org/abs/2502.12361"},"metadata":{"authorlinks":{}},"downloads":1,"html":""},"bibtype":"article","biburl":"https://rick-xu315.github.io/assets/pub.bib","dataSources":["EqeiRs79xcAjEfgyn"],"keywords":[],"search_terms":["confit","improving","resume","job","matching","using","hypothetical","resume","embedding","runner","hard","negative","mining","yu","xu","xue","zhang","ma","yu"],"title":"ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining","year":2025,"downloads":1}