Tuesday, April 16, 2024

Me and ChromaDB a Series! - How Do I Backup and Restore ChromaDB?

Hi Friends,

Thanks for continuing to go with me on my ChromaDB journey.  

If you haven't had a chance to read the first two parts, here they are:

Me and ChromaDB - A Series Maybe... Part One - A.I. is Your Friend!

Today I've got a great topic that started me on this ChromaDB, AI, LLM, Vector database journey.  How do I backup and restore my AI database?

Humans are funny, we leap before we look and it tends to get us into trouble.  Case in point, new application technology.  I love it, you love it, we all love new toys!  Think of backup and recovery as the batteries for your new toy.  A lot of times we're so excited about the new toy we forget the boring essentials like batteries.














Unfortunately backup and recovery is often seen as a boring essential, until the data is needed and then you're the most important person in the world.  

Inevitably a test application becomes production over night and now it's your responsibility to protect that data.

So let's get ahead of the curve with this new application data and figure out how to backup and recover your new application before it becomes production!

With ChromaDB you have a few options, you can:

1. Run memory resident.

2. Create a persistent data file.

3. Run in client/server mode.

I'm currently running my ChromaDB with persistence and writing to a data directory so if something happens all my data will be saved.  If you don't specify what type of relational database you want to use for persistence, ChromaDB will use SQLite >3.35 as the default database.

For my first test I wanted to try ChromaDB out of the box, so I used SQLite.  Make sure you create the filesystem first and when you let ChromaDB what client you're going to use, type:

persistentClient = chromadb.PersistentClient(path="/where_your_data_filesystem_is")

That's awesome, I've got my ChromaDB setup, it's persistent, I can query my data, but now what?  There are a bunch of different ways to backup your SQLite database, but if you have Veritas NetBackup it's SUPER simple to integrate this new technology into your enterprise backup and restore technology.

The cool thing about this is the SQLite agent is already built right into the NetBackup client software and has been since around 10.2.

Let me guide you through the process.

If you haven't downloaded and installed the Veritas NetBackup Client software on your ChromaDB box, you'll need to do that now.  You can download the Veritas software from HERE.

1. Log into the NetBackup WebUI and navigate to Protection > Policies and click on the +Add button.



2. There's going to be a lot of choices in the next section, but here's the ones I want you to focus on:
    Attributes:
    a. The name of the Policy.
    b. The Policy Type = DataStore
    c. Policy Storage = An active storage unit that you have available.



    Schedules:
    This schedule will look a little different from your Standard policy since we're going to initiate the 
    backup and restore from the CLI. 



    Clients:
    Let's add our ChromaDB database box as the client to our backup.  Click on the +Add button.

















Enter the name of the ChromaDB machine's name.  I like to click on the "Detect client operating system".


















   Backup Selections:
   Now we'll choose what we want backed up.  Click on the +Add button.












Select the persistent data path you chose earlier when you told ChromaDB what type of client you were using.


Alright we now have our policy to backup our ChromaDB SQLite database!

Let's kick off a backup from the CLI on the ChromaDB box.  

1. Log into your ChromaDB box where your database resides and navigate to:

/usr/openv/netbackup/bin

2.  Run the following command:
./nbsqlite -o backup -S nameofyourprimaryserver.com -P nameofyourpolicy -s Default-Application-Backup -z 10M -d /data/chroma.sqlite3

Let's break this down:
   -o backup tells NetBackup we're ready to do a backup
   -S put in the FQDN of your NetBackup Primary server.
   -P the name of the policy you just created. For me it is "chromadb2".
   -s this is the name of the schedule you're using.  I'm using the default one.
   -z here's some Linux magic here.  You don't need this setting for Windows, but for Linux you need to tell NetBackup how big you want your LVM snapshot to be.  You can set it in KB, MB or GB.
   -d point NetBackup to where your sqlite3 file is located.

Hit Enter and you should see this:
Backup initiated from XBSA ...
The SQLite database backup is in progress...
File backed up:  /SQLite/chroma.sqlite3
SQLite database backup is successful!
Completed the  backup  operation

Now back to the NetBackup WebUI!

Under the Activity Monitor check the status of the backup:


WOO HOO You've successfully backed up your ChromaDB SQLite database!



So let's say we want to do a restore, so we'll query to see what backup images are available to us:
1. Go back to the CLI of your ChromaDB box and navigate to:
/usr/openv/netbackup/bin

2. Type the following:
./nbsqlite -o query -S nameofyourprimaryserver.com -C thenameofyourchromadbbox -P chroma2

chromadb2
nameofyourchromabox
==================================================================================

==================================================================================
1713299663      Linux           SQLite            Tue Apr 16 15:34:23 2024
Completed the  query  operation


Check it out, we've got a backup that we can restore!

1. Go back to the CLI of your ChromaDB box and navigate to:
/usr/openv/netbackup/bin

2. Enter the following to restore:

 ./nbsqlite -o restore -S nameofyourprimaryserver.com -t /data/restore
  -o restore  tells NetBackup you'd like to do a restore.
  -S then give the name of your Primary NetBackup server.
  -t is the directory you want to restore the file to.

Restore initiated from XBSA
The SQLite database restore is in progress
File restored: /data/restore/chroma.sqlite3
SQLite database restore is successful!
Completed the  restore  operation

Let's go check out the NetBackup Activity Monitor to see how the job did there.

Looking good!  Let's go check out our /data/restore folder:



And there's our ChromaDB SQLite backup restored and ready to be used!

Hey that was fun wasn't it?!

Stay tuned for the next episode where I'm not sure what I'm going to do yet....

Friday, April 12, 2024

Brain's Favorite Gadgets of 2024 - So Far!

Hi Friends,

Even though I complained there were no good gadgets, I had to remind myself of some of the cool stuff that IS new and awesome.

1. Nintendo Switch

Where have you been all my life?!  I love the Nintendo Switch, it is an awesome game system.  I've always enjoyed portable handheld gadgets, from the Walkman to the Gameboy.  Something about having all your cool stuff with you at any time has always been appealing and the Nintendo Switch is the marriage of all the benefits of a big console in your hands!!  Oh yeah, did I mention it also has a touch screen too?  I know it's not new, in fact it's been out since 2017, but it's just so cool I had to be on the list.








2. Living AI EMO

This little guy is AWESOME!  If you haven't seen him yet you have to go check him out.  I've been a sucker for robots since I was a kid.  Something about a super smart buddy that wouldn't judge you, made you laugh, would protect you and would always be by your side sounded pretty cool as a kid.  EMO is a a desktop robot pet/friend.  He enjoys talking with you, wandering your desk, dancing, singing, playing music and pats to the head.  He's a pretty cool little dude!  Living AI even has a new robot coming out soon called Aibi.  I can't wait!!!












3.  Panic Playdate

This is a neat little game system, it's very clever and I've never seen anything like it.  It has the D-Pad and A & B, but also something very different, a crank.  The crank is another controller on the device and it makes games very unique.  Panic gives you a bunch of games when you get your system and even the way you get them is very clever.  You can even develop games for it and purchase games created by a pretty large online community.





Another robot friend!  This one is different from EMO.  Moxie is more of a confidant, a friend to chat with and go on story adventures with.  The way he communicates is simply amazing, you chat with Moxie very similarly to the way you'd chat with a friend.  Moxie spends a lot of time getting to know you and will ask you a lot of questions.  The really neat thing is you can ask him questions too about how he is doing.  He remembers too and will continue to grow the more you spend time with him.



Scratch is not new, it was released in 2007, but it is VERY cool!  It's a high level, block based programing language for younger kids developed by the MIT Media Lab.  I've always had trouble with programing and Scratch really makes it easy to just start making cool stuff like games!  I know I'm not the target audience for Scratch, but for someone that has always wanted to make games, but just didn't have the programing skills to do it, Scratch makes the impossible, possible!  Plus there are TONS and TONS of examples people have shared on the Scratch site.



Say what you will about them, but they're a really cool first step to augmented reality.  Trust me, I want the Oasis too, but unfortunately there's no James Halliday so we'll just have to wait a little longer for it.  I haven't tried these yet, but they look pretty cool and with everything Apple makes, it's just going to get better!



I love the story about this company.  It began as a student project back in 2021 and they're now on Gen4 of their robot!!  I have a Gen3 and a Gen4 and they are both really cute.  The company has done some great things and they've released new features and even accessories for their bots!


For those that know me, know I love watches too!  I've always been obsessed with mechanical watches and clocks and how some springs and cogs can tell time.  Swatch is a REALLY cool company because they create some very innovative affordable Swiss mechanical watches.  The Sistem 51 is super cool in that it is:
  1.  A Swiss mechanical/automatic watch at a good price.
  2.  The movement is made of only 51 parts.
  3.  Built 100% by machines.
  4.  Hermetically sealed so it never needs servicing.
  5.  A single screw holds the whole movement together.
  6.  90 hour power reserve!



That's it for now.  More to come!

*Update*
Wow, how could I forget ChatGPT?!  I've just hit the tip of the iceberg on understanding and working with Large Language Model (LLM) databases in my two part (so far) series on ChromaDB:
This is some really cool technology and as chips and networks get faster and storage becomes cheaper, Artificial Intelligence (AI) will have so much knowledge it will be mind blowing!  I spent some time talking with ChatGPT and it is truly amazing.  Are computers better at some tasks then humans?
Without a doubt!  Are there still things that humans do better than computers?  Absolutely!
My hope is that AI and humans will work better with each other, not compete against each other.


Thursday, April 11, 2024

Me and ChromaDB - A Series! Let's Create Our First Vector Database with Cosine Similarity

Hi Friends,

I got PIP to install!

I've been doing a bunch of research and wanted to give credit to some great blogs!!


Hasini Madushika's blog:

https://medium.com/@hasinivijayarathna/creating-a-vector-database-using-chroma-956b1d84aca3

Fantastic overview on how to setup your first ChromaDB and create a searcheable index of books and authors.


Michael Wornow's blog:

https://michaelwornow.net/2023/12/31/chromadb-demo

Great ChromadB overview with pros and cons as well as a great section on cosine similarity vs. distance.


Milana Shkhanukova's blog:

https://medium.com/@milana.shxanukova15/cosine-distance-and-cosine-similarity-a5da0e4d9ded

Fantastic job explaining in more detail what is cosine distance and how it's different from similarity.


Harrison Hoffman's blog:

https://realpython.com/chromadb-vector-database/

Another great blog on ChromaDB foundations as well as lots of information on vector similarity.


Who knew physics would actually be useful?!  Yeah yeah, Isaac Newton did in 1687.

Now let's get rolling!

1. Let's make sure Python3 is working:

#python3 -V

Python 3.10.12

WOOT!  And just to let you know, I'm using Ubuntu 22.04.2.

2. And to install the ever illusive PIP.

#sudo apt install python3-pip -y

#pip -V

pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

ALRIGHT!

Now let's create a vector database!











1.  The first thing I'm going to do is create a place to store my data.

mkdir /data

I did all this in a bash shell, but you can create a Python script and run it in there too if you'd like.

2. Run Python so you can run the code.

#python3

3. Import ChromaDB into Python.

import chromadb

Now here comes the fun part.  Do you want your database to only run memory resident or do you want it to save some place?  Kinda depends on your needs, the space you have, etc.  But I don't want all my work to go poof, so I'm going to save it to disk.

4. Use this if you just want memory resident.

chroma_client = chromadb.Client()


OR


4. Use this if you want to save your database somewhere.

persistentClient = chromadb.PersistentClient(path="/data")

I'm going to save my data to filesystem called "data", but you can save your data anywhere you'd like.

What's really cool about this is ChromaDB saves this data into a SQLite3 database for you.  If you're having trouble with it, take a look at the troubleshooting section on the Chroma website.(https://docs.trychroma.com/troubleshooting).

5. This is the really cool part!  Next we're creating our collection.  Here we're going to give it a name and you'll notice a geometry and physics term you probably hoped would never haunt you again, COSINE!  What's going on here is we're telling the database to find words that have different vectors.  If the vectors are different, they probably mean the opposite.  If the vectors are facing the same directions, it's more likely they mean similar things.  In Michael Warnow's blog he shows you how to find the similarity instead of the difference if you want to do that instead.

books_collection = persistentClient.create_collection( name="books", 

       metadata={"hnsw:space": "cosine"}

)

6.  Next thing we do is add data to our books collection.  I'm going to use Hasini Madushika book collection because her book titles do a great job showing how the cosine feature works.  There's a lot going on here, but I think I'll break this down further in another blog about embeddings and such.


books_collection.add(

    documents=[

  "The Enigma Code", "Decoding Secrets", "Whispers of Intrigue", "Conundrums and Clues",

  "The Puzzle Master", "Mysterious Ciphers", "The Labyrinth of Enigmas", "Cryptic Chronicles",

  "Puzzled Minds", "Secrets Unveiled", "Echoes of Eternity", "Time's Embrace",

  "Chronicles of Eternity", "Eternal Moments", "Timeless Whispers", "Infinity's Tapestry",

  "Temporal Odyssey", "Endless Hours", "The Time Weaver", "Eternal Sands",

  "The Silent Symphony", "Whispers of Silence", "Silent Serenade", "The Sound of Quiet",

  "Quiet Harmony", "Harmony in Silence", "Muted Melodies", "Serenity's Echo",

  "The Tranquil Note", "Echoes of Quietude"

  ],

    metadatas=[{"author":"Alan Cipher", "price":"$19.99"},{"author":"Olivia Mystery", "price":"$18.95"},

{"author":"James Riddle", "price":"$21.50"},{"author":"Emma Puzzler", "price":"$22.99"},

{"author":"Alex Brainteaser", "price":"$20.75"},{"author":"Victoria Enigma", "price":"$23.45"},

{"author":"Samuel Conundrum", "price":"$24.80"},{"author":"Grace Enigma", "price":"$19.25"},

{"author":"Daniel Riddle", "price":"$17.99"},{"author":"Amanda Mystery", "price":"$21.00"},

{"author":"Robert Timeless", "price":"$25.50"},{"author":"Sarah Infinity", "price":"$26.75"},

{"author":"Michael Eternal", "price":"$24.99"},{"author":"Emily Timekeeper", "price":"$23.20"},

{"author":"Christopher Infinity", "price":"$22.45"},{"author":"Jessica Forever", "price":"$27.30"},

{"author":"Nicholas Timeless", "price":"$28.50"},{"author":"Laura Infinity", "price":"$26.00"},

{"author":"Benjamin Chronos", "price":"$24.95"},{"author":"Rachel Timebound", "price":"$25.75"},

{"author":"William Hush", "price":"$18.50"},{"author":"Sophia Mute", "price":"$17.75"},

{"author":"Oliver Quietude", "price":"$19.20"},{"author":"Isabella Hushington", "price":"$20.15"},

{"author":"Matthew Serene", "price":"$18.99"},{"author":"Emily Tranquil", "price":"$21.50"},

{"author":"Christopher Hushwell", "price":"$22.75"},{"author":"Grace Silentheart", "price":"$19.95"},

{"author":"Daniel Peaceful", "price":"$23.00"},{"author":"Victoria Hushed", "price":"$20.80"}

  ],

    ids=["id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8", "id9", "id10", "id11", "id12", "id13", "id14", "id15", 

"id16", "id17", "id18", "id19", "id20", "id21", "id22", "id23", "id24", "id25", "id26", "id27", "id28", "id29", "id30"

  ]

)

7.  Now that we've got data in our database we can query it!

results = books_collection.query(

  query_texts=["Eternity", "Puzzle"],

  n_results=5

)

print(results)


{'ids': [['id11', 'id13', 'id14', 'id18', 'id20'], ['id5', 'id9', 'id1', 'id4', 'id19']], 'distances': [[0.33300162331174143, 0.378593976226286, 0.42385445272206546, 0.45082819934444696, 0.5292349798068949], [0.44286587397375554, 0.5532287339143328, 0.5986420831809907, 0.6688832557674483, 0.6887007582505835]], 'metadatas': [[{'author': 'Robert Timeless', 'price': '$25.50'}, {'author': 'Michael Eternal', 'price': '$24.99'}, {'author': 'Emily Timekeeper', 'price': '$23.20'}, {'author': 'Laura Infinity', 'price': '$26.00'}, {'author': 'Rachel Timebound', 'price': '$25.75'}], [{'author': 'Alex Brainteaser', 'price': '$20.75'}, {'author': 'Daniel Riddle', 'price': '$17.99'}, {'author': 'Alan Cipher', 'price': '$19.99'}, {'author': 'Emma Puzzler', 'price': '$22.99'}, {'author': 'Benjamin Chronos', 'price': '$24.95'}]], 'embeddings': None, 'documents': [['Echoes of Eternity', 'Chronicles of Eternity', 'Eternal Moments', 'Endless Hours', 'Eternal Sands'], ['The Puzzle Master', 'Puzzled Minds', 'The Enigma Code', 'Conundrums and Clues', 'The Time Weaver']], 'uris': None, 'data': None}


8.  Let's do a couple of other queries using our cosine vector.

print("results for 'Eternity':", results["documents"][0])

print("results for 'Puzzle':", results["documents"][1])


results for 'Eternity': ['Echoes of Eternity', 'Chronicles of Eternity', 'Eternal Moments', 'Endless Hours', 'Eternal Sands']

results for 'Puzzle': ['The Puzzle Master', 'Puzzled Minds', 'The Enigma Code', 'Conundrums and Clues', 'The Time Weaver']

This is really cool, notice the key words are Eternity and Puzzle.  The query find the exact word, but also words with a similar meaning.  They may not have the same magnitude, but isn't that cool?!?!

Until Next Time!

Neil

Wednesday, April 10, 2024

What's After the Plastic Age?

Hi Friends,

I was going through some of my old blogs and I came across 

Dr. Brain's - Gadgets of The Year for 2015


and


I was noticing that the gadgets really haven't changed much since then and it's been eight years!  Yeah, yeah I know there was a pandemic in there but I wonder if there's more to it then just time.

Now don't run away yet, hear me out.  Humans are always looking and exploring for MORE.  We're never content with what we have or what we know, we always want more.  And that has lead to some pretty amazing stuff; Computers, airplanes, cars, telephones, nuclear fission power plants, space ships and more!

We tend to break out our "jumps" in technology with Ages.

Stone Age:
2.6 Million Years Ago - 3,300 BC
Basically you've got humans running around and someone thought it would be a good idea to use a rock as a hammer!

 










That's some pretty cool stuff if you're used to using your hands or teeth as tools.

Bronze Age:
3500 BC - 1200 BC
Someone found some shiny "metal" stuff?  Decided to heat it up over  a fire and mix it with some other shiny "metal" stuff and look you can make cool pointy things out of it!!



















Notice most of this stuff is weapons of war?  But hey, beats trying to keep your flint axe from shattering!


Iron Age:
1200 BC - 500 BC
Hey, we found some other shiny "metal" stuff that we can heat and put other stuff in and mold into harder pointy things!!



















Gotta love those Hittites!  Yes I know it's not ALL weapons of war, but funny what is usually represented.

Industrial Revolution
1760 AD  - 1840 AD
Yeah so there's a big gap between 500 BC and 1760 AD, but you know some stuff happened, gun powder, yada yada yada.  But look at these cool train things we made!!















Plastic Age
1907 - Today
Apparently in 1907 Leo Baekeland invented Bakelite!  It was the first fully synthetic plastic.



















Now I know there's a big difference between a space ship and plastic, but it's the root of the age.  Plastic has helped combat diseases, created new products for us like computers, helped grow food and the list goes on and on and on!

So what was the point of this history lesson other than enabling you to spend your precious time doing something other than what you should be doing.  What's next?!  For years I've been hearing about carbon nanotubes and how that's going to be the next "it" thing, but I'm still waiting....

I'm a technology maniac and I LOVE tech.  This drought of tech is driving me crazy and I hope I'm around for whatever the next IT Age is!  :-)

Until Next Time,
Neil

Tuesday, April 9, 2024

Me and ChromaDB - A Series Maybe... Part One - A.I. is Your Friend!

Hi Friends,

What's the first thing that comes to your mind when I say Artificial Intelligence(AI)?

HAL 9000




The WOPR


Skynet


Personally I prefer:

Wall-E


V.I.N.CENT


TARS


As with everything humans create, AI is a REALLY cool tool and it can be helpful or it can be harmful.  I love technology and I'm hoping that AI will work with humans as a partner and hopefully AI will have fewer bugs in it's code than we have in ours.

But before I bow down to my new robot rulers, let's talk about where things are right now in this fast moving technology that has a lot of us scratching our heads as to what it even is.  

In the past I've done research on machine learning and finding patterns in data, but Large Language Models(LLM) databases are REALLY cool! 

If you're a total newb like me, take a look at this AWESOME video:
https://youtu.be/2IK3DFHRFfw?si=Ar1ttKfs7kxOAV_7

The author of the video is outstanding explaining some very difficult concepts into terms that made a lot of sense.

Computers are basically really powerful calculators and they do a fantastic job crunching numbers. The folks that make these databases feed the AI tons of text, images and writing examples that are translated into numbers.  The LLM starts to see patterns in the text and images and begins playing, What's The Next Word game.

For example we can start something like, "A cat is a <blank>." and we go from there.  A cat is an animal.  A cat is an animal with four legs.  A cat is an animal with four legs and whiskers.  And on and on.  The more information you feed your LLM the better it will be at answering your questions, but you have to remember that your LLM will only be as smart as it's data AND the questions it's being asked.

Soooooo I decided it was time for me to install one of these REALLY cool databases and find out how they work and how I'm going to back it up.  I decided to go with ChromaDB, it's open-source and look how easy the install instructions are:

pip install chromadb

That's it?!  Well sheesh, I can do that!  Wait, what's PIP?  Isn't it that thing you get put on when you're doing a bad job at work?  

Will Neil be able to figure out what the mysterious PIP tool is?  Can he get ChromaDB installed on his Ubuntu box??  All these questions and more will be answered in the next chapter of Me and ChromaDB!!