In the past few days, everything shared has been from Google. Today, I'll share another Google product — Smart Paste, which is a tool for context-aware adjustment of pasted code, and it's an internal Google tool.
Background
Most developers frequently copy and paste code during their daily work to avoid unnecessary repetitive typing. Although this accelerates the code development process, it’s more than just simple copying and pasting. An analysis of Google monorepo (Google's code repository, which stores the complete working history rather than just submitted changes) revealed interesting patterns in user behavior that are potential targets for improving efficiency. For example, according to editing history, about 25% of paste operations are immediately modified afterward, ranging from minor syntax fixes (e.g., adding missing semicolons) to more complex adjustments related to the surrounding code environment (e.g., renaming variables and changing their types). Making these changes often disrupts the flow of code writing and slows down the development process.
Smart Paste is now widely used internally at Google. It can predict the next state of the code environment and make context-aware adjustments to pasted code using generative AI. Advances in large sequence models (such as DIDACT) have made tools like code reviews and build fixes possible, and Smart Paste simplifies the copy-paste process during code revisions. In a usage study covering approximately 40,000 engineers, we found that 6.9% of all paste operations in the IDE used Smart Paste, with an acceptance rate of 42.5%, significantly streamlining users' workflows.
logic and suggests modifications to the function name and operator.
Data preparation, model training, and calibration
For model training, post-paste editing data was obtained from Google monorepo and its associated comprehensive logs. Since data quality is critical, a set of simple heuristic methods was designed to limit the concept of post-paste adjustments to areas close to the original paste location. Using these heuristics, a training dataset was extracted from the data of the previous months, and the generated examples were manually evaluated, iterating until the extraction heuristics reached their optimal state. During this process, the team noticed that 28% of paste operations had no subsequent edits (differences between programming languages ranged from 23% to 41%). These examples were retained in the training dataset so that the model could also learn to output "no edits needed."
The team used the DIDACT model pre-trained on programming-related tasks and fine-tuned it using the labeled dataset.
Interaction
When Auto-apply displays inserted/modified flagfiles, they are bolded and underlined.
Inline difference highlights deleted tryfromenv (strikethrough) and inserted flagfile (italicized and with lower opacity).
Users cut and paste the modified string to trigger paste suggestions on the return string.
Results
Upon examining the user behavior of approximately 40,000 engineers, we found that 6.9% of all paste operations in the IDE used smart paste, with an acceptance rate of 42.5%, significantly reducing the workload of developers.