ai code snippets
satya - 1/12/2024, 4:38:09 PM
Similar word set
import gensim.downloader as api
# Load a pre-trained Word2Vec model (or any other KeyedVectors model)
word_vectors = api.load("word2vec-google-news-300")
# Given set of words
given_words = ["king", "queen", "man"]
# Calculate vectors for each word in the given set
word_vectors_set = [word_vectors[word] for word in given_words]
# Calculate the mean vector of the set
mean_vector = sum(word_vectors_set) / len(word_vectors_set)
# Find similar words to the mean vector
similar_words = word_vectors.similar_by_vector(mean_vector, topn=10)
# Print the similar words and their similarity scores
for word, score in similar_words:
print(f"{word}: {score:.4f}")
satya - 1/12/2024, 7:23:44 PM
is there a word2vec model online that I can run queries against in a browser?
satya - 1/13/2024, 1:18:26 PM
satya - 1/13/2024, 1:18:50 PM
Getting stem words using NLTK
from nltk.stem.snowball import EnglishStemmer
from nltk.tokenize import word_tokenize
text = "Long sentences"
# list of strings
words = word_tokenize(text)
stemmer = EnglishStemmer()
#list of strings
stemmed_words = [stemmer.stem(word) for word in words]
satya - 1/13/2024, 1:20:56 PM
Lemmatizing with NLTK
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
lemmatizer = WordNetLemmatizer()
string_for_lemmatizing = "some sentence"
words = word_tokenize(string_for_lemmatizing)
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
satya - 1/13/2024, 1:22:11 PM
Key classes of NLTK
satya - 1/13/2024, 1:29:14 PM
NLTK uses
satya - 1/13/2024, 1:45:37 PM
Useful python segment in libraries
def localTest():
print ("Starting local test")
print ("End local test")
if __name__ == '__main__':
satya - 1/13/2024, 1:52:46 PM
For some of nltk to work
import nltk'punkt')'wordnet')
satya - 1/13/2024, 4:09:59 PM
Additional nltk initializations
from nltk import ne_chunk
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
import nltk'punkt')'wordnet')'averaged_perceptron_tagger')'maxent_ne_chunker')'words')
satya - 1/13/2024, 4:20:50 PM
Here is how you navigate a ChunkedTree
import nltk
text = "John and Mary are living in New York City since 2020."
tokens = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokens)
chunked = nltk.ne_chunk(tagged)
for subtree in chunked:
if isinstance(subtree, nltk.Tree):
label = subtree.label()
entity = " ".join([word for word, pos in subtree.leaves()])
print(f"Named Entity: {entity}, Label: {label}")
satya - 1/13/2024, 4:21:24 PM
Here is an example of a chunck tree
(GPE living/VBG)
(GPE New/NNP York/NNP)
(DATE 2020/CD)
satya - 1/13/2024, 4:23:21 PM
More on chunck tree
satya - 1/13/2024, 4:29:06 PM
Sample code NLTK name recognition and chunking
# *********************
# Import and download some stuf!!
# You have to do this only once per session I believe
# *********************
import nltk'punkt')'wordnet')'averaged_perceptron_tagger')'maxent_ne_chunker')'words')
# Key functions
# *********************
from nltk import ne_chunk
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
def example():
ner_text = "Some sentence with peoples and corp names"
#word list: strings
tokens = word_tokenize(ner_text)
#A list of key/value tuples
pos_tagged = pos_tag(tokens)
# Chunked tree object
result = ne_chunk(pos_tagged)
#this will open a new window with the tree rendering
satya - 1/13/2024, 4:49:36 PM
Parts of speech
def posTaggingExercise():
text = """
We hold these truths to be self-evident, that all men are created equal,
that they are endowed by their Creator with certain unalienable Rights,
that among these are Life, Liberty and the pursuit of Happiness.
words = word_tokenize(text)
taggedWords = pos_tag(words)
return taggedWords
# Example
[('We', 'PRP'), ('hold', 'VBP'), ('these', 'DT'), ('truths', 'NNS'),..]
satya - 1/13/2024, 4:50:54 PM
NLTK parts of speech attributes
satya - 1/13/2024, 7:48:23 PM
What we have done so far with word2vec
satya - 1/13/2024, 7:49:38 PM
What we have done so far with NLTK
satya - 1/13/2024, 8:04:55 PM
Idea of a list comprehension in python
A language critique:
# Conceptual record processing
# in any language, with python as an example
# In python these are called list comprehensions.
# Take this procedural idea for an example
for every-record in a list
do-something with that record
store that record in a list
#This is expressed in python as
[do-something for every-record] #put each processed record in a list
# Now your target container can be set or a dictionary as well
# in addition to a list
{do-something for every-record} #put each record in a set
{do-something-for-akey: do-something-for-value for every-record} #Put it in a dictionary
satya - 1/19/2024, 5:46:16 PM
Hugging face home Page:
satya - 1/19/2024, 6:23:18 PM
Where to get the API keys
satya - 1/19/2024, 6:23:31 PM
You have to verify your email first for this to work
satya - 1/21/2024, 7:28:32 PM
Where is hugging face text inference api request and response documented?
satya - 1/21/2024, 7:34:26 PM
Here is a list of inputs and outputs to the api
satya - 1/21/2024, 7:42:04 PM
Each type of task has different inputs and outputs to the API
satya - 1/21/2024, 7:43:14 PM
Some task names
satya - 1/21/2024, 7:43:51 PM
Inputs and outputs to the text generation task are documented here
satya - 1/21/2024, 7:44:57 PM
Example input
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = ""
def query(payload):
response =, headers=headers, json=payload)
return response.json()
data = query({"inputs": "The answer to the universe is"})
# There are other parameters other than input
# See the api docs
satya - 1/21/2024, 7:45:49 PM
Return value is either a dict or a list of dicts if you sent a list of inputs
satya - 1/21/2024, 7:46:25 PM
data == [
{"generated_text": 'hello'}