FLaNK Stack Weekly 29 Jan 2024
29-January-2024
FLaNK Stack Weekly
Tim Spann @PaaSDev
https://www.youtube.com/@FLaNK-Stack
https://www.threads.net/@tspannhw
https://medium.com/@tspann/subscribe
Get your new Apache NiFi for Dummies!
https://www.cloudera.com/campaign/apache-nifi-for-dummies.html
https://ossinsight.io/analyze/tspannhw
Trial: https://console.us-west-1.cdp.cloudera.com/trial/register.html#/
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
**This is Issue #122 **
https://github.com/tspannhw/FLiPStackWeekly
https://www.cloudera.com/solutions/dim-developer.html
Articles
Apache NiFi and Amazon Textract for Machine Learning https://medium.com/@tspann/apache-nifi-and-amazon-textract-for-machine-learning-e45f4af12e68
Apache NiFi and Amazon Transcribe for Machine Learning https://medium.com/@tspann/apache-nifi-and-amazon-transcribe-for-machine-learning-00db5ed0996a
Building a Library of Python Processors https://medium.com/@tspann/building-a-library-of-python-processors-6b5517404a58
Harnessing the Power of Apache NiFi and Amazon Polly for Machine Learning https://medium.com/@tspann/harnessing-the-power-of-apache-nifi-and-amazon-polly-for-machine-learning-4ea3139fbe77
Building LLM Pipelines with Pinecone, HuggingFace, Python and Apache NiFi https://medium.com/@tspann/llm-pipelines-with-pinecone-and-huggingface-with-python-and-apache-nifi-a96c20be93b7
Writing A Gen AI Processor with Python https://medium.com/@tspann/writing-a-generative-ai-python-processor-ed0655cf4e3f
Raspberry Pi 5 Setup https://medium.com/@tspann/i-setup-too-many-sbcs-d6417081a200
Codeless Generative AI Pipelines with Chroma Vector DB & Apache NiFi https://medium.com/@tspann/codeless-generative-ai-pipelines-with-chroma-vector-db-apache-nifi-43e77d75952f
Using NiFi to Augment and Enrich LLM Results with Real-Time Contextual Data https://medium.com/@tspann/augmenting-and-enriching-llm-with-real-time-context-b6da7ba4960a
ReadyFlow with WatsonX https://community.cloudera.com/t5/Community-Articles/Processing-real-time-unstructured-data-with-GenAI-using/ta-p/378191
AWS Open Source https://community.aws/content/2bJFKCPKPttVH0yPHPPt3XHoZJR/aws-open-source-newsletter-184?lang=en
Checkpoint Chronicle December 2023 https://decodable.co/blog/checkpoint-chronicle-december-2023
ADSB with NiFi https://www.researchgate.net/publication/352469660_Near-Real-Time_IDS_for_the_US_FAA's_NextGen_ADS-B
NiFi Security https://exceptionfactory.com/posts/2021/07/21/single-user-access-and-https-in-apache-nifi/
4 Wars orf AI https://www.latent.space/p/dec-2023
DocLLM for PDF https://medium.com/@basics.machinelearning/discover-docllm-the-new-llm-from-jpmorgan-for-working-with-complex-documents-5f54ea287d52
GRPC and Protobuf are growing https://www.infoq.com/news/2023/12/linkedin-grpc-protobuf-rest-json/
Multi-Layered Cache https://www.infoq.com/news/2023/10/doordash-multilayered-cache/
LLM https://www.infoq.com/articles/large-language-models-llms-prompting/
Top 10 Challenges to GenAI https://www.datanami.com/2024/01/22/top-10-challenges-to-genai-success/
Data Engineering in 2024 https://www.datanami.com/2024/01/23/data-engineering-in-2024-predictions-for-data-lakes-and-the-serving-layer/
EdgeAI https://docs.omniverse.nvidia.com/dev-guide/latest/index.html https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg#new_tab https://developer.nvidia.com/blog/generate-synthetic-data-for-deep-object-pose-estimation-training-with-nvidia-isaac-ros/? https://developer.nvidia.com/blog/how-to-build-vision-ai-applications-at-the-edge-with-nvidia-metropolis-microservices-and-apis/ https://developer.nvidia.com/blog/bringing-generative-ai-to-the-edge-with-nvidia-metropolis-microservices-for-jetson/ https://developer.nvidia.com/blog/getting-started-on-jetson-top-resources-from-gtc-21/ https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/ https://www.jetson-ai-lab.com/
Fine Tuning LLM https://www.philschmid.de/fine-tune-llms-in-2024-with-trl
Use Markdown in Google https://support.google.com/docs/answer/12014036
Videos
Seven Videos on Real-Time Streaming https://medium.com/@tspann/seven-videos-on-real-time-streaming-02711320afa8
Unlocking Financial Data with Real-Time Pipelines (OSACon 2023) https://www.youtube.com/watch?v=Q7gF7m4yFi4&ab_channel=OSACon
Looking at the New Features of Apache NiFi (Halifax Community over Code) https://www.youtube.com/watch?v=_orD9aAXk48&ab_channel=TheASF
Utilizing Real-Time Transit Data for Travel Optimization (Halifax Community over Code) Sunday Oct 8 2023, Canada https://www.youtube.com/watch?v=OWQmeF-UeEc&ab_channel=TheASF
Continuous SQL with Kafka and Flink | Timothy Spann (EN) https://www.youtube.com/watch?v=IGs0k240zhU&ab_channel=JAVAPRO
Events
Feb 8, 2024: NYC. https://www.meetup.com/new-york-open-source-data-infrastructure-meetup/events/297484047/
18:00 - 18:30 Welcome: Networking & snacks 18:30 - 18:35 Kickoff: Welcome Aiven 18:35 - 19:00 A Guide to Product Experimentation (Erin Mikail Staples, LaunchDarkly) 19:00 - 19:30 Building Real-time Pipelines: A Case Study with Transit Data (Tim Spann, Cloudera) 19:30 ~ 21:00 Food & networking
Feb 20, 2024: 12-1PM EST. Virtual. Azure Data Tech Groups: DBA Fundamentals Group https://www.meetup.com/dba-fundamentals-group/events/296855261/
Feb 28, 2024: NYC. Cloudera Meetup. Flink https://www.meetup.com/futureofdata-princeton/events/298661947/
March 5, 2024: Princeton. Meetup. GenAI. https://www.meetup.com/applied-generative-artificial-intelligence-applications/
March 15, 2024: Princeton. IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024 https://princetonacm.acm.org/tcfpro/
April 2024: XtremeJ 2024. Virtual. https://xtremej.dev/2023/schedule/
Cloudera Events https://www.cloudera.com/about/events.html
More Events: https://www.linkedin.com/pulse/schedule-2024-tim-spann--y4coe
Code
- https://github.com/tspannhw/FLaNK-python-watsonx-processor
- https://github.com/tspannhw/FLaNK-CDW
- https://github.com/tspannhw/FLaNK-VectorDB
- https://github.com/tspannhw/FLaNK-RPI5
- https://github.com/tspannhw/FLaNK-EdgeAI
- https://github.com/kevinbtalbert/NiFi-Flows-Demos
- https://github.com/DataSQRL/apirag
- https://github.com/tspannhw/FLaNK-python-ExtractCompanyName-processor
Models
- https://github.com/apple/ml-ferret
- https://github.com/modelscope/scepter
- https://motherduck.com/blog/duckdb-text2sql-llm/
Tools
- https://github.com/timfraedrich/OutRun
- https://github.com/build-on-aws/get-the-news-rss-atom-feed-summary/blob/main/README.md
- https://github.com/langroid/langroid
- https://github.com/aws-samples/apache-flink-near-online-data-enrichment-patterns
- https://github.com/IncomeStreamSurfer/chatgptassistantautoblogger
- https://konpyutaika.github.io/nifikop/
- https://github.com/stas00/ml-engineering
- https://github.com/LiheYoung/Depth-Anything
- https://github.com/marklogic/nifi
- https://github.com/viraniaman94/sendenv
- https://learn.microsoft.com/en-us/azure/cosmos-db/free-tier
- https://developers.google.com/edu/python
- https://github.com/InstantID/InstantID
- https://github.com/Corgea/retriever
- https://github.com/weaviate/weaviate
- https://github.com/LiheYoung/Depth-Anything
- https://github.com/qdrant/qdrant
- https://github.com/rajnandan1/kener
- https://towardsdatascience.com/running-local-llms-and-vlms-on-the-raspberry-pi-57bd0059c41a
- https://github.com/huggingface/trl
- https://harlequin.sh/
- https://jupysql.ploomber.io/en/latest/quick-start.html
- https://www.querybook.org/
- https://wix-incubator.github.io/quix/docs/about
- https://fugue-tutorials.readthedocs.io/
- https://github.com/async-profiler/async-profiler
- https://heynote.com/
- https://github.com/theOGognf/finagg
- https://github.com/InstantID/InstantID
- https://github.com/huggingface/datatrove
- https://bernsteinbear.com/blog/scrapscript/
- https://github.com/reorproject/reor
- https://memgpt.readme.io/docs/index
- https://github.com/origin-energy/java-snapshot-testing
- https://github.com/BishopFox/cloudfoxable
- https://pypi.org/project/Wikipedia-API/
- https://github.com/nutlope/pdftochat
- https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/nvisii_data_gen/generate_dataset.py
- https://github.com/danvega/todos-http-client
- https://thenewstack.io/what-you-can-do-with-vector-search/
- https://milvus.io/docs/example_code.md
- https://www.newark.com/sbc-powered-drones-for-aerial-inspection-trc-ar
- https://github.com/linkedin/rest.li/
- https://farfetch.github.io/kafkaflow/
- https://thenewstack.io/opentofu-1-6-general-availability-open-source-infrastructure-as-code/
- https://huggingface.co/blog/gcp-partnership
- https://github.com/kanton-bern/hellodata-be
- https://github.com/assafelovic/gpt-newspaper
- https://softwaredoug.com/blog/2024/01/24/are-we-at-peak-vector-db
- https://zed.dev/download
- https://github.com/vnglst/pong-wars
- https://github.com/rasbt/LLMs-from-scratch
- https://github.com/Mihaiii/llm_steer
- https://cloudevents.github.io/sdk-java/kafka.html
- https://github.com/lamini-ai/prompt-engineering-open-llms
- https://github.com/lamini-ai/llm-classifier
© 2020-2024 Tim Spann