Google Gemma for Real-Time Lightweight Open LLM Inference
Google Gemma for Real-Time Lightweight Open LLM Inference
Apache NiFi, Google Gemma, LLM, Open, HuggingFace, Generative AI, Gemma 7B-IT
When I saw the new model out on HuggingFace I had to try it with Apache NiFi for some Slack pipelines and compare it to ChatGPT and WatsonX AI.
This seems like a pretty fast interesting new open large language model, I am going to give it a try. Let’s go. As I am short on disk space I am going to call it via HuggingFace REST Inference. There are a lot of ways to use the models including HuggingFace Transformers, Pytorch, Keras-NLP/Keras/Tensorflow and more. We will try both 2B-IT and 7B-IT.
Google Gemma on HuggingFace
data:image/s3,"s3://crabby-images/79862/79862d476d8890b4019924c9269766e3e948792d" alt=""
data:image/s3,"s3://crabby-images/abfbc/abfbc8187aeadef5178df7edae730349c14a2f96" alt=""
This is really easy to start using. We can test on the website before we get ready to roll out a NiFi.
data:image/s3,"s3://crabby-images/5fb8a/5fb8a970eaed4751177316056a7442d890b48349" alt=""
Real-Time DataFlow With Google Gemma 7B-IT
Source Code:
FLaNK for using HuggingFace hosted Open Google Model Gemma - tspannhw/FLaNK-Gemmagithub.com
data:image/s3,"s3://crabby-images/dc711/dc71196f7481c8224779bc6f8c7ca29bdf4f74f0" alt=""
data:image/s3,"s3://crabby-images/dd3bd/dd3bd6a7e5f91f3ab6085140a890e0f954ec728f" alt=""
- ListenSlack — We connect via new Slack Sockets and get chat messages
- EvaluateJsonPath — We parse out the fields we like (we send the raw copy somewhere else in 6)
- RouteOnAttribute — We only want messages in the “general” channel
- RouteOnAttribute — We only want real messages
- ReplaceText — We build a new file to send
- ProcessGroup — We will process the raw JSON message from Slack in a sub process group
data:image/s3,"s3://crabby-images/f7736/f7736cd4a4fb745f989981a0f6aadfcf39889865" alt=""
8. InvokeHTTP — We call HuggingFace against the Google Gemma Model
9. QueryRecord — We clean up the JSON and return 1 row
10. UpdateRecord — We add fields to the JSON file
11. UpdateAttribute — We set headers
12. PublishKafkaRecord_2.6 — We send the data via Kafka
13. RetryFlowFile — If it failed let’s retry three times then fail
14. ProcessGroup — In this sub process group we will clean up and enrich the Google Gemma results and send to Slack.
We call HuggingFace for the Google Gemma 7b-IT model.
data:image/s3,"s3://crabby-images/adbdc/adbdc11fe0b9c0b51f766aeca11514e36e99dad4" alt=""
data:image/s3,"s3://crabby-images/c6263/c62632d88a9f6cf905a727d16bb8d83405a2be21" alt=""
Merlin, My Cat Manager, asks if I am done with this. It’s been over 3hours to build this.
data:image/s3,"s3://crabby-images/ef9c5/ef9c5676eb94b78630d8a6935d3a3434e7f358a5" alt=""
We now parse the results from HuggingFace and send them to our slack channel.
We add a footer to tell us what LLM we used.
data:image/s3,"s3://crabby-images/dea39/dea397bbe33513976a68804d8e500307f9452fbb" alt=""
That’s it, three different LLM systems and models, plus output to Slack, Postgresql and Kafka. Easy.
data:image/s3,"s3://crabby-images/dc16e/dc16eaf13c930faaa7c6571619e4840fff3650d1" alt=""
We start off with a Slack message question in general channel to parse.
{
"inputs" : "How did Tim Spann use Apache NiFi to work with HuggingFace hosted Google Gemma models?"
}
The results of the inference from the Google Gemma model is:
[ {
"generated_text" : "How did Tim Spann use Apache NiFi to work with HuggingFace hosted Google Gemma models?\n\nTim Spann used Apache NiFi to work with HuggingFace hosted Google Gemma models by setting up a NiFi flow that interacted with the HuggingFace API. Here are the main steps involved:\n\n**1. Set up NiFi flow:**\n- Create a new NiFi flow and name it appropriately.\n- Add a processor to the flow.\n\n**2. Configure processor:**\n- Use an HTTP processor to make requests to the HuggingFace API.\n- Set the URL"
} ]
Example of Provenance Events
data:image/s3,"s3://crabby-images/f122b/f122b77143b616cc6b300e65e7cf8187e8fc9fce" alt=""
Input to Slack
data:image/s3,"s3://crabby-images/dccb3/dccb3e65fdb1f92b6983edb0436027b56679ea08" alt=""
HuggingFace REST API Formatted Input to Gemma
data:image/s3,"s3://crabby-images/fdd9c/fdd9c15701b778ef6841a42a05f07e22a02f0e96" alt=""
Output to Slack
data:image/s3,"s3://crabby-images/94b32/94b327c1828474747fae0bbea2dfb26e9c23eddf" alt=""
data:image/s3,"s3://crabby-images/2d095/2d09510fb52ee79e7cd69a9cf9b7881bfe7020f3" alt=""
data:image/s3,"s3://crabby-images/3d1f5/3d1f56daf7b07662c77c69add83078aa9ae02cab" alt=""
data:image/s3,"s3://crabby-images/bef89/bef89aa2c83122049e27c29495fd460c16c3759a" alt=""
Output to Apache Kafka
data:image/s3,"s3://crabby-images/2e4f3/2e4f37fcf8cc6d1ce102a7319cfda185dc76b447" alt=""
data:image/s3,"s3://crabby-images/2e86a/2e86a4ce594a2026243b1862821e3b9ed0c1ef87" alt=""
Also Let’s Run Against OpenAI ChatGPT and WatsonX.AI LLAMA 2–70B Chat
data:image/s3,"s3://crabby-images/e493b/e493b5e4a5fcd0ae2b2dd7b608daf0c5003e040b" alt=""
data:image/s3,"s3://crabby-images/0fb8c/0fb8c7493f8a5ba9a92ba9d9b7ef5b72f4e695b8" alt=""
data:image/s3,"s3://crabby-images/22498/22498884a424b2e9410cae3269a00135f8a09920" alt=""
The New Slack Processing
data:image/s3,"s3://crabby-images/9e120/9e120b3e62e2ad7e6354f48616d0133c1b56d40a" alt=""
Send all Slack JSON Events to Postgresql
data:image/s3,"s3://crabby-images/7d134/7d134b02374f476baf7cbae59298871dd81ecba6" alt=""
data:image/s3,"s3://crabby-images/c30d1/c30d128bc71f3712ee9c4f6e442579b9d97ef124" alt=""
How to Connect NiFi to Slack
data:image/s3,"s3://crabby-images/df3d9/df3d9edf3a137ef739712bda2871267936a9a042" alt=""
Make sure to Enable Socket Mode!
data:image/s3,"s3://crabby-images/1c11e/1c11e0db68973685ed6d5ed604d95e5ac9bed22a" alt=""
You need the User and Bot User OAuth Tokens.
data:image/s3,"s3://crabby-images/71d81/71d814d983772d44c688f6c785e0f50c71139499" alt=""
data:image/s3,"s3://crabby-images/517ad/517ad3c03f6680b02f19a9c8f4bf6016d044fda8" alt=""
data:image/s3,"s3://crabby-images/dfecb/dfecbfb979b83a262cc6eb5d2d498d4c6bc4fbcc" alt=""
data:image/s3,"s3://crabby-images/cef40/cef40d36b61ae5a88803c07a81cbd35d59b6466f" alt=""
data:image/s3,"s3://crabby-images/a73c2/a73c24eda3579e9378a77f29436be44690fe61b9" alt=""
data:image/s3,"s3://crabby-images/a6387/a63877da2290bb99b9931ec96c38e864d32a3982" alt=""
This is the configuration:
display_information:
name: timchat
description: Apache NiFi Bot For LLM
background_color: "#18254D"
long_description: "chat testing"
features:
app_home:
home_tab_enabled: true
messages_tab_enabled: false
messages_tab_read_only_enabled: false
bot_user:
display_name: nifichat
always_online: true
slash_commands:
- command: /timchat
description: starts command
usage_hint: ask question
should_escape: false
- command: /weather
description: get the weather
usage_hint: /weather 08520
should_escape: false
- command: /stocks
description: stocks
usage_hint: /stocks IBM
should_escape: false
- command: /nifi
description: NiFi Questions
usage_hint: Questions on NiFi
should_escape: false
- command: /flink
description: Flink Commands
usage_hint: Questions on Flink
should_escape: false
- command: /kafka
description: Questions on Kafka
usage_hint: Ask questions about Apache Kafka
should_escape: false
- command: /cml
description: CML
usage_hint: Cloudera Machine Learning
should_escape: false
- command: /cdf
description: Cloudera Data Flow
should_escape: false
- command: /csp
description: Cloudera Stream Processing
should_escape: false
- command: /cde
description: Cloudera Data Engineering
should_escape: false
- command: /cdw
description: Cloudera Data Warehouse
should_escape: false
- command: /cod
description: Cloudera Operational Database
should_escape: false
- command: /sdx
description: Cloudera Shared Data Experience
should_escape: false
- command: /cdp
description: Cloudera Data Platform
should_escape: false
- command: /cdh
description: Cloudera Data Hub
should_escape: false
- command: /rtdm
description: Cloudera Real-Time Data Mart
should_escape: false
- command: /csa
description: Cloudera Streaming Analytics
should_escape: false
- command: /smm
description: Cloudera Streams Messaging Manager
should_escape: false
- command: /ssb
description: Cloudera SQL Streams Builder
should_escape: false
oauth_config:
scopes:
user:
- channels:history
- channels:read
- chat:write
- files:read
- files:write
- groups:history
- im:history
- im:read
- links:read
- mpim:history
- mpim:read
- users:read
- im:write
- mpim:write
bot:
- app_mentions:read
- channels:history
- channels:read
- chat:write
- commands
- files:read
- groups:history
- im:history
- im:read
- incoming-webhook
- links:read
- metadata.message:read
- mpim:history
- mpim:read
- users:read
- im:write
- mpim:write
settings:
event_subscriptions:
request_url:
user_events:
- channel_created
- channel_deleted
- file_created
- file_public
- file_shared
- im_created
- link_shared
- message.channels
- message.groups
- message.im
- message.mpim
bot_events:
- app_mention
- channel_created
- channel_deleted
- channel_rename
- group_history_changed
- member_joined_channel
- message.channels
- message.groups
- message.im
- message.mpim
- user_change
interactivity:
is_enabled: true
org_deploy_enabled: false
socket_mode_enabled: true
token_rotation_enabled: false
Retrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in…nifi.apache.org
RESOURCES
We're on a journey to advance and democratize artificial intelligence through open source and open science.huggingface.co
We're on a journey to advance and democratize artificial intelligence through open source and open science.huggingface.co
Introducing Gemma, a family of open-source, lightweight language models. Discover quickstart guides, benchmarks, train…ai.google.dev
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the…www.kaggle.com
Explore and run machine learning code with Kaggle Notebooks | Using data from Gemmawww.kaggle.com
Google's new family of models, Gemma, will be available to developers for language-based tasks. Unlike Gemini, it will…www.theverge.com
lightweight, standalone C++ inference engine for Google's Gemma models. - google/gemma.cppgithub.com