Minimum qualifications:
- PhD degree in Computer Science, or a related field, or equivalent practical experience.
- 2 years of experience programming with Python.
- Experience with large language models.
- Having published papers (being listed as author) at conferences (e.g., NIPS, ICML, ACL, CVPR, etc).
Preferred qualifications:
- First author publications at target conferences (NeuRIPS, ICML, ICLR, ACL, NAACL, EMNLP, USNIX, IEESNP, CCS) in the field of synthetic data generation.
- Experience in designing, implementing, and scaling AI systems, with a focus on beam-like pipelines.
- Experience in designing and developing high-quality user/developer-facing Application programming interfaces (APIs).
- Significant research experience with LLMs and understanding of their capabilities and limitations.
- Proven track record of delivering AI models, tools, or frameworks.
- Excellent communication and teamwork skills for effective collaboration with partners.
About the job
As an organization, Google maintains a portfolio of research projects driven by fundamental research, new product innovation, product contribution and infrastructure goals, while providing individuals and teams the freedom to emphasize specific types of work. As a Research Scientist, you'll setup large-scale tests and deploy promising ideas quickly and broadly, managing deadlines and deliverables while applying the latest theories to develop new and improved products, processes, or technologies. From creating experiments and prototyping implementations to designing new architectures, our research scientists work on real-world problems that span the breadth of computer science, such as machine (and deep) learning, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more.
As a Research Scientist, you'll also actively contribute to the wider research community by sharing and publishing your findings, with ideas inspired by internal projects as well as from collaborations with research programs at partner universities and technical institutes all over the world.
Google Research is building the next generation of intelligent systems for all Google products. To achieve this, we’re working on projects that utilize the latest computer science techniques developed by skilled software developers and research scientists. Google Research teams collaborate closely with other teams across Google, maintaining the flexibility and versatility required to adapt new projects and foci that meet the demands of the world's fast-paced business needs.
Responsibilities
- Develop and evaluate novel algorithms for various synthetic data generation, across modalities (text, images, audio, etc.)
- Design and conduct experiments to assess the quality and effectiveness of synthetic data in training and evaluating AI models.
- Use beam-like pipelines to handle massive datasets and enable efficient data generation at scale.
- Stay up-to-date with the latest research in Large Language Models (LLMs) and synthetic data.