AI+Data Weekly ( AI, Data, Iceberg, Polaris, Streamlit, Flink, Kafka, Python, Java, NiFi )

 

AI+Data Weekly ( AI, Data, Iceberg, Polaris, Streamlit, Flink, Kafka, Python, Java, NiFi )

#166 - 02-December-2024

image_fx_ (43)

https://bsky.app/profile/paasdev.bsky.social

The Coolness this week

🌐 Andrew Ng's AI Suite OSS
🌐 Open Jupyter Notebooks from github
🌐 Snowpipe Streaming with Kafka Auto Schema
🌐 SenseCAP Watcher AI Physical Device
🌐 cool projects like sdkman and debezium
🌐 Using AgentKit for orchestration
🌐 Super charged voice
🌐 Agents with Memory
🌐 OpenInterpreter lets you run code locally
🌐 Extract Structured Data from Documents
🌐 Unifiy Streaming, Batch, AI
🌐 Automated AI Web Researcher in Ollama
🌐 Cross Platform Screen Sharing
🌐 Collaboration for AI Engineers
🌐 AutoRestTest is a complete testing software for automated API testing that combines the utility of graph theory, Large Language Models (LLMs), and multi-agent reinforcement learning (MARL) to parse the OpenAPI Specification and create enhanced comprehensive test cases.
🚀 Graphs not Silos
🐿️ FLUSS: Streaming Storage
🐿️Fluss -> Flow for Flink Real Time Analytics
🌐 TableFlow - iceberg / kafka
❄️ Snowflake Cortex AI + Slack
🌐 Big Pile of Snowflake Queries Dataset
⌨️ 5 Days to GenAI with Kaggle
🙋🏻‍♂️ Data Engineering Trends
🙋🏻‍♂️ Flink SQL with AI
🙋🏻‍♂️ Segmentation Masks detect - SAM2
🌐 Ray Data Scalable Datasets for ML
🕶️ Ollama Functions as Tools
🕶️ Ollama - cool functions with Ollama python
🕶️ Postbot3000 - give it a try
🕶️ Full way to grab your website with LLM
🕶️ Very Interesting Web3 Stuff
🚘 Cool Limo Startup in Jersey with AI
🌐 Anthropic Open Source Model Context Protocol
🖥️ MCP: First Server
🙋🏻‍♂️ Open Interpreter
🕶️ Cool Google Tricks
🕶️ SQL Talk
🕶️ Google AI Studio
🕶️ GO and Java - Type Safety
🕶️ AG2 Agents
🕶️ Microsoft's updated AUTOGEN Agents
✏️ LLAMAINDEX resume cookbook
🔍 Airflow with Snowflake
🕶️ Clean Your Mac with a Shell Script
🐍 Big Friendly Bluesky Extract
📑 LLM Observability OSS
📝 NV Ingest from NVIDIA for PDF
🕶️ NodeJS Editor
📑 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
📑 Cool new markdown like language
🕶️ Open Source Browser API for AI

New Models

🕶️ Sparse-Llama-3.1-8B-2of4
🕶️ NVIDIA Hymba
❄️ Snowflake Arctic Instruct
🤯 QWQ
🤯 Natural Language to SQL
🤯 olmo2 models

Interesting Datasets

🌐 [tulu3 datasets https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372

Upcoming

🐍 Dec 5: Global PyData: Virtual: https://global2024.pydata.org/cfp/talk/L9JXKS/
💻 Dec 19: Conf42 IoT 2024: Virtual: https://www.conf42.com/Internet_of_Things_IoT_2024_Tim_Spann_opensource_build

Recent Tim Stuff

💻 XTremePython 2024 - LLM
💻 PyData NYC
💻 Advanced RAG Techniques @ All Things Open Raleigh 2024
💻 Building Real Time LLM Models
💻 Big Data Conference EU Talk on Open Source Real-Time AI
💻 CloudX AI Real-Time
💻 BuildStuff - Adding Generative AI
🐈‍⬛ Conf42 Prompt Engineering
🥑 06 Nov 2024 AI Alliance Talk in Manhattan
💻 08 Nov 2024 PyData NYC slides

Apps, Demos, Examples, Models, Notebooks and Projects

🐍 RAG 101
🐦 Milvus Knowledgebase
👻 AIM Ghosts
🚕 Unstructured Data - Ghosts - Part 1
✍🏼 Multimodal RAG is not Scary Ghosts
✍🏼 Advanced RAG Techniques

Technologies

Python Java Snowflake Streamlit AWS Google Cloud Azure

CODE + COMMUNITY

© 2020-2024 Tim Spann https://www.youtube.com/@FLaNK-Stack (AI + Vectors + LLM + Streaming + IoT)