Understanding Grok: Does X.ai’s Twitter Chatbot Surpass ChatGPT in Performance?

Grok is an artificial intelligence conceived in the spirit of “The Hitchhiker’s Guide to the Galaxy,” built to not only respond to a vast array of inquiries but also to proactively suggest queries to users.

Incorporating a touch of humor and a dash of rebellion in its responses, Grok isn’t your run-of-the-mill AI. It’s not for those who dislike a bit of cheekiness in their interactions!

A standout feature of Grok is its ability to tap into the 𝕏 platform for real-time worldly insights. Unlike many AI systems that shy away from more contentious questions, Grok does not.

Currently, Grok is in its early beta phase, crafted from two months of development. We anticipate that it will evolve swiftly, improving with user engagement and feedback.

With appreciation, the xAI Team

Our vision behind developing Grok encompasses two main objectives:

First, we’re focused on collecting feedback to shape AI tools that universally benefit humanity. We strive to craft AI that serves individuals across all walks of life and across the spectrum of political beliefs, always within legal boundaries. Grok is a step toward this endeavor, showcasing our approach publicly.

Second, we aim to catalyze research and innovation. Grok is envisioned as a robust research aide, facilitating users in swiftly finding pertinent information, analyzing data, and fostering novel insights.

Elon Musk, an original founder of OpenAI, parted ways with the organization when it transitioned from a non-profit into a profit-capped entity. This year, Musk co-launched X.ai with the ambitious goal to deepen our understanding of the universe. The emergence and growth of OpenAI as a for-profit entity pursuing AGI, coupled with the success and advanced capabilities of GPT-4, have been cited by Musk as catalysts for establishing X.ai as a competitive alternative.

Grok is an AI Assistant

Last year, Musk made headlines by taking over a major social networking service, and just this past weekend, his venture X.ai rolled out an inaugural offering for a chosen few on X, the social platform formerly recognized as Twitter. This new offering is named Grok and presents itself as a chatbot bearing a resemblance to ChatGPT. What sets Grok apart is its ability to draw upon live data from X, a feature that is anticipated to provide users with information that is not just relevant and current but also exclusive.

Elon Musk about new AI GROK

In another tweet, Musk described Grok as an AI aide. Details about this new tool are still trickling in, yet some beta testers with early access have already started sharing their experiences with Grok. Instances of Grok providing real-time updates, resolving technical queries, and composing Python scripts have come to light.

Interested individuals can join a waitlist through a sign-up page, and Musk has mentioned that Grok will be an exclusive feature for subscribers of Twitter Premium+, a new membership level. This service will be priced at $16 monthly for users signing up via the web in the U.S., and $22 for those subscribing through the iOS App Store or Google Play. Initially, Grok’s services will be accessible only to users in the United States.

Grok is Based on a New LLM

Grok-0, the forerunner to Grok-1, is a large language model (LLM) with 33 billion parameters. It has not been explicitly stated whether Grok-1 maintains the same number of parameters. However, given the noticeable enhancements in performance, there is speculation that Grok-1 might have expanded beyond the 33 billion parameter mark. Grok-1 boasts a data token context window of 8k, roughly equivalent to 6,000 words, which is twice what was originally available in ChatGPT’s GPT-4 version, and possibly on par with the latest version of OpenAI’s chatbot. The model’s training includes data up to the third quarter of 2023, suggesting it encompasses information until the end of June 2023 or thereabouts.

With the inception of xAI, our team developed an initial prototype LLM named Grok-0, which, despite having half the number of parameters compared to LLaMA 2 (70B), showed comparable performance on standard language model benchmarks. Over the past two months, there has been considerable progress, particularly in the realms of reasoning and coding, culminating in the advent of Grok-1. This cutting-edge language model has demonstrated remarkable prowess, achieving a 63.2% success rate on the HumanEval coding task and 73% on the MMLU assessment.

Grok is Better GPT-3.5 and Llama

Grok underwent comparative testing against several top-tier LLMs such as LLaMA 2, Inflection-1, PaLM 2, Claude 2, and OpenAI’s GPT-3.5 and GPT-4. Considering there are more than 200 tests available for LLM benchmarking, presenting just a handful might suggest a selective presentation of results. Nonetheless, tests like GSM8k, MMLU, HumanEval, and MATH are among the most recognized in the industry. In these comparisons, xAI reports that Grok-0 matches the performance of LLaMA 2 while requiring only half as much computational power.

According to the company, Grok-1 outperforms GPT-3.5 and is closing in on Google’s PaLM 2, yet it still lags noticeably behind Anthropic’s Claude 2 and GPT-4. The company’s statement reveals:

Grok-1 has exhibited impressive capabilities in these benchmarks, outdoing all competitors in the same computational category, including ChatGPT-3.5 and Inflection-1. It is only outstripped by models that were developed with considerably more extensive training data and computational resources, such as GPT-4, which underlines xAI’s rapid advancements in efficient LLM training.

Since it’s possible that our models might have been inadvertently trained on these benchmarks found online, we also conducted a unique evaluation using the 2023 Hungarian national high school finals in mathematics, released in late May after our data collection concluded. Grok achieved a C grade (59%) on this exam, on par with Claude-2 at 55%, while GPT-4 earned a B grade (68%). These assessments were performed using a low-temperature setting of 0.1 and identical prompts for all models, without any specific optimization for this test. This served as a more authentic assessment on an unfamiliar dataset.

Currently, Grok-1 is only designated to support the Grok assistant on X, rendering these benchmarks somewhat irrelevant for immediate practical application. Businesses are not presently considering whether to integrate its API as they would with offerings from OpenAI or Anthropic. However, the intention behind these demonstrations is to affirm the model’s capabilities and foster trust in both the model and the assistant.

Grok Supports the Super App Strategy

Musk’s move to integrate an AI assistant within a social networking application mirrors an initiative previously announced by Mark Zuckerberg for Meta’s AI assistants. This tactic is advantageous for several key reasons:

Incorporating a chatbot capable of contextual searches and responses into an information-centric platform like X could be a significant asset, particularly for its most active users.

At present, X functions mainly as a medium for content discovery. The addition of an informational assistant transforms it into a resource for search as well.

When an app includes a helpful service or assistant, it tends to encourage users to interact with the app more often and for longer periods.

Should the goal be to elevate X to the status of a Super App akin to the Western counterpart of WeChat, an integrated assistant could become an essential feature for navigating and exploring the app’s services.

Furthermore, Grok places X.ai prominently on the generative AI landscape, casting a favorable and innovative light on Musk’s X. The launch of this assistant carries numerous potential advantages, especially if it cultivates a dedicated user base.

Grok Has a Personality

Injecting Character into Grok X.ai is looking to set Grok apart by infusing it with a distinctive personality. While digital assistants like Siri, Alexa, and Google Assistant have pursued affinity and trust through friendly demeanors, Grok is carving out a niche with a more daring edge.

This decision to embrace a specific personality contrasts with the neutral, impersonal stance taken by other AI assistants such as ChatGPT and Bard, which aim for a more straightforward, professional interaction. Meta, on the other hand, while developing its neutral AI assistant, is simultaneously introducing AI Characters modeled after celebrities, signaling a belief in the significance of personality for consumer-facing applications, as opposed to the enterprise focus of ChatGPT and Bard.

Grok’s Present Limitation:

Lack of Multimodality One significant area where Grok currently lags behind competitors like ChatGPT and Google Bard is in multimodal capabilities. X.ai has stated its intentions to integrate these features down the line.

As of now, Grok operates without the capability to process visual and audio inputs. Plans are in place to endow Grok with these additional ‘senses’ to facilitate more diverse applications, including real-time interaction and assistance.

While Grok‘s text-only format may not be a significant barrier initially, multimodal functionality will become a necessity for personal AI assistants to gain widespread popularity.

Will Grok Be the ChatGPT Alternative?

Grok’s Market Positioning and Future Grok steps into a bustling field dominated by ChatGPT, with formidable counterparts like Claude 2, Google Bard, and others in the race, including platforms like Bing Chat and Perplexity.ai which offer comparable services with a search-first approach. Grok stands out by only offering a paid service, contrasting with ChatGPT’s free and premium options and other competitors’ freemium models.

For Grok to carve out a significant user base at its $16 monthly fee, it must outshine its competitors considerably. Much of its success may hinge on the integrated services within X (formerly known as Twitter) rather than on standalone assistant capabilities.

Given these factors, Grok’s near-term impact on ChatGPT’s dominance may be minimal. The lack of a freemium version and X’s established user behavior pattern present considerable challenges to adoption.

Yet, Grok could play a key role in X’s strategy. The appeal of alternatives like Bard or Claude hinges on specific tasks and user preferences. ChatGPT continues to expand its features, setting a high bar for competitors. However, X’s devoted user base could provide a fertile ground for Grok to grow. By enhancing its services to resemble a super app, X could increase Grok’s appeal and potentially introduce a freemium model to broaden its reach, though it remains to be seen whether this is part of Musk’s vision.

The future may see Grok become a significant player in the personal AI assistant space, second to ChatGPT, or it could become an essential generative AI feature within X. As applications increasingly incorporate embedded assistants, Grok could exemplify this trend.

We’re only at the dawn of the assistant era, and the possibilities are vast.

Leave a Reply

Your email address will not be published. Required fields are marked *