ai code snippets
satya - 1/12/2024, 4:38:09 PM
Similar word set
import gensim.downloader as api
# Load a pre-trained Word2Vec model (or any other KeyedVectors model)
word_vectors = api.load("word2vec-google-news-300")
# Given set of words
given_words = ["king", "queen", "man"]
# Calculate vectors for each word in the given set
word_vectors_set = [word_vectors[word] for word in given_words]
# Calculate the mean vector of the set
mean_vector = sum(word_vectors_set) / len(word_vectors_set)
# Find similar words to the mean vector
similar_words = word_vectors.similar_by_vector(mean_vector, topn=10)
# Print the similar words and their similarity scores
for word, score in similar_words:
print(f"{word}: {score:.4f}")
satya - 1/12/2024, 7:23:44 PM
is there a word2vec model online that I can run queries against in a browser?
is there a word2vec model online that I can run queries against in a browser?
Search for: is there a word2vec model online that I can run queries against in a browser?
satya - 1/13/2024, 1:18:26 PM
NLTK
NLTK
satya - 1/13/2024, 1:18:50 PM
Getting stem words using NLTK
from nltk.stem.snowball import EnglishStemmer
from nltk.tokenize import word_tokenize
text = "Long sentences"
# list of strings
words = word_tokenize(text)
print(words)
stemmer = EnglishStemmer()
#list of strings
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
satya - 1/13/2024, 1:20:56 PM
Lemmatizing with NLTK
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
lemmatizer = WordNetLemmatizer()
string_for_lemmatizing = "some sentence"
words = word_tokenize(string_for_lemmatizing)
print(words)
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print(lemmatized_words)
satya - 1/13/2024, 1:22:11 PM
Key classes of NLTK
satya - 1/13/2024, 1:29:14 PM
NLTK uses
satya - 1/13/2024, 1:45:37 PM
Useful python segment in libraries
def localTest():
print ("Starting local test")
print ("End local test")
if __name__ == '__main__':
localTest()
satya - 1/13/2024, 1:52:46 PM
For some of nltk to work
import nltk
nltk.download('punkt')
nltk.download('wordnet')
satya - 1/13/2024, 4:09:59 PM
Additional nltk initializations
from nltk import ne_chunk
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
satya - 1/13/2024, 4:20:50 PM
Here is how you navigate a ChunkedTree
import nltk
text = "John and Mary are living in New York City since 2020."
tokens = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokens)
chunked = nltk.ne_chunk(tagged)
for subtree in chunked:
if isinstance(subtree, nltk.Tree):
label = subtree.label()
entity = " ".join([word for word, pos in subtree.leaves()])
print(f"Named Entity: {entity}, Label: {label}")
satya - 1/13/2024, 4:21:24 PM
Here is an example of a chunck tree
(S
(PERSON John/NNP)
and/CC
(PERSON Mary/NNP)
are/VBP
(GPE living/VBG)
in/IN
(GPE New/NNP York/NNP)
City/NNP
since/IN
(DATE 2020/CD)
./.)
satya - 1/13/2024, 4:23:21 PM
More on chunck tree
satya - 1/13/2024, 4:29:06 PM
Sample code NLTK name recognition and chunking
# *********************
# Import and download some stuf!!
# You have to do this only once per session I believe
# *********************
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
# Key functions
# *********************
from nltk import ne_chunk
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
def example():
ner_text = "Some sentence with peoples and corp names"
#word list: strings
tokens = word_tokenize(ner_text)
print(tokens)
#A list of key/value tuples
pos_tagged = pos_tag(tokens)
print(pos_tagged)
# Chunked tree object
result = ne_chunk(pos_tagged)
print(result)
#result.draw()
#this will open a new window with the tree rendering
satya - 1/13/2024, 4:49:36 PM
Parts of speech
def posTaggingExercise():
text = """
We hold these truths to be self-evident, that all men are created equal,
that they are endowed by their Creator with certain unalienable Rights,
that among these are Life, Liberty and the pursuit of Happiness.
"""
words = word_tokenize(text)
taggedWords = pos_tag(words)
print(taggedWords)
return taggedWords
# Example
[('We', 'PRP'), ('hold', 'VBP'), ('these', 'DT'), ('truths', 'NNS'),..]
satya - 1/13/2024, 4:50:54 PM
NLTK parts of speech attributes
NLTK parts of speech attributes
satya - 1/13/2024, 7:48:23 PM
What we have done so far with word2vec
satya - 1/13/2024, 7:49:38 PM
What we have done so far with NLTK
satya - 1/13/2024, 8:04:55 PM
Idea of a list comprehension in python
A language critique:
#
# Conceptual record processing
# in any language, with python as an example
# In python these are called list comprehensions.
#
# Take this procedural idea for an example
for every-record in a list
do-something with that record
store that record in a list
#
#This is expressed in python as
#
[do-something for every-record] #put each processed record in a list
#
# Now your target container can be set or a dictionary as well
# in addition to a list
#
{do-something for every-record} #put each record in a set
{do-something-for-akey: do-something-for-value for every-record} #Put it in a dictionary
satya - 1/19/2024, 5:46:16 PM
Hugging face home Page: https://huggingface.co/
satya - 1/19/2024, 6:23:18 PM
Where to get the API keys
satya - 1/19/2024, 6:23:31 PM
You have to verify your email first for this to work
You have to verify your email first for this to work
satya - 1/21/2024, 7:28:32 PM
Where is hugging face text inference api request and response documented?
Where is hugging face text inference api request and response documented?
Search for: Where is hugging face text inference api request and response documented?
satya - 1/21/2024, 7:34:26 PM
Here is a list of inputs and outputs to the api
satya - 1/21/2024, 7:42:04 PM
Each type of task has different inputs and outputs to the API
Each type of task has different inputs and outputs to the API
satya - 1/21/2024, 7:43:14 PM
Some task names
satya - 1/21/2024, 7:43:51 PM
Inputs and outputs to the text generation task are documented here
Inputs and outputs to the text generation task are documented here
satya - 1/21/2024, 7:44:57 PM
Example input
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/gpt2"
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
data = query({"inputs": "The answer to the universe is"})
# There are other parameters other than input
# See the api docs
satya - 1/21/2024, 7:45:49 PM
Return value is either a dict or a list of dicts if you sent a list of inputs
Return value is either a dict or a list of dicts if you sent a list of inputs
satya - 1/21/2024, 7:46:25 PM
Example
data == [
{"generated_text": 'hello'}
]