Random rants
Feel free to jump to the ‘introduction’ section for the ‘meat’ of this post!
Recently I have been using a lot of Anthropic Claude over OpenAI’s ChatGPT, mainly because the Corporate IT team has now put ChatGPT behind the Zscaler browser isolation and it became a bit more cumbersome to use.
So I went to look for alternatives, with Gemini also blocked, I went to try Claude. Thankfully Claude is not blocked (yet), and in the recent weeks, I have been using Claude more and more. In terms of my use cases, anecdotally it is as good as ChatGPT or arguably better.
The number of AI platforms continues to increase as AI development continues to pick up pace, just look at the number of bookmarks in my browser favorites:
Okay enough of my random rants. Let’s get to the main topic of the day!
Introduction
While being a consumer of AI applications is useful for day to day, how about doing more for the world and create our own AI application building on the backend models currently available on the market?
So I got my hands dirty and used Gemini API, Google Gen AI SDK, and integrated them together to build a simple web application:
Architecture
The architecture is not too complex, as the bulk of the work is done by the Gemini AI model. In our trial, we are using the Google’s suite of products including the Gemini API, but of course feel free to use OpenAI, Grok or any other API endpoints if you have the appropriate access and license.
Development steps (high level)
This post is not meant to be a step by step tutorial, but if you feel like trying out, have a look at this. A bird’s eye overview of what I did is as follows:
Create a Python Flask App using Cloud Shell and the Cloud Run Application template
Build the front-end: Create an HTML form with input fields for YouTube links, model selection, and additional prompts
Build the back-end: Implement Flask routes and integrate Gemini API to process the YouTube videos
Deploy the application to Cloud Run to make it publicly accessible
Challenges I faced
Firefox may not be the best browser to access GCP, as the in-browser IDE doesn’t seem to work well. Some buttons were not clickable even though they were supposed to.
I switched to Google Chrome, and everything became fine.
While not making any (false) accusations, it is a challenge to support different browsers, and I suppose Google went with a Chrome-first approach.
Outputs
My web application was a success, and it was accessible here. I don’t have a million dollars worth of credits to spare, hence by the time you read this, it should have been decommissioned. However I did take some a screenshot of a sample output:
I also switched up the temperature, and the output was funkier. Feel free to have a go at it yourself!
Final thoughts
While what I have here is quite simple and probably something that the Gemini site does better itself than through my web application, it demonstrates the possibilities and potential use cases out there that each of us may have. Feel free to brainstorm and solve your work related challenges and it may just become the next big hit at your workplace!
(GCP $5 credit vouchers, while stock last!)