Resolving Imports and modules in Python
satya - 9/16/2019, 1:04:42 PM
There are two ways to import names from other modules
#Run and import all global variables
#from a file called module1
import module1
#Import a specific variable
#So as not to pollute the receiving namespace
from module2 import variable1
#variables can be functions, objects, etc
#Everything in python is an object anyway
satya - 9/16/2019, 1:07:41 PM
Where will it look module1?
module1 can be seen as another python file with extension module1.py
However you don't say the .py in the end
You also don't say where that file is
Few places it will look for are:
a) Current path
b) standard library (at python\Lib)
c) Externally installed library through pip (python\Lib\site-packages)
d) In places identified by environment variable "pythonpath"
satya - 9/16/2019, 1:08:17 PM
There are over 200 standard libraries in python
There are over 200 standard libraries in python
satya - 9/16/2019, 1:08:32 PM
Important python standard modules
Important python standard modules
satya - 9/16/2019, 1:11:07 PM
This is a nice introduction to useful libraries
satya - 9/16/2019, 1:14:00 PM
Some listed libraries are
Datetime
math
random
re
csv
#command line scripting
sys
argparse
satya - 9/16/2019, 1:20:26 PM
.pyc and __pycache__ :compiling/caching
It creates .pyc file for .py files based on time stamps when .py changes
It may create a sub directory called __pycache__ and put the .pyc files underneath
This file contains bytecode
satya - 9/16/2019, 1:28:24 PM
Module search path, one more time
1. programs current working directory/home directory
2. PYTHONPATH directories
3. standard library directories
4. contents of any .pth files (if present)
5. The site-packages home of third party extensions
6. All these directories end up as sys.path variable in the interpreter. you can print that to see the paths
satya - 9/16/2019, 1:29:19 PM
Nature of site module
this is the standard module that describes how the module path is decided. So look for the package documentation for this library.
satya - 9/16/2019, 1:30:00 PM
Here is the docs for "site" modules
satya - 9/16/2019, 1:31:18 PM
Lib/site-packages
Python automatically adds the site-packages sub directory of its standard library to the module search path. this is where pip installs python packages.
satya - 9/16/2019, 1:38:15 PM
What does sys.path has?
#Do this
import sys
print (sys.path)
#You will get this
C:\\satya\\data\\code\\pyspark',
'c:\\satya\\i\\python374\\python37.zip',
'c:\\satya\\i\\python374\\DLLs',
'c:\\satya\\i\\python374\\lib',
'c:\\satya\\i\\python374',
'C:\\Users\\satya\\AppData\\Roaming\\Python\\Python37\\site-packages',
'c:\\satya\\i\\python374\\lib\\site-packages'
Not sure why there are two packages.
satya - 9/16/2019, 1:39:46 PM
Interesting
#*******************
#First version
#*******************
import sys
print (sys.path)
#*******************
#Second version: wrong
#Because sys.path is not a filename
#Only sys is the file
#*******************
import sys.path
print (path)
#*******************
#Third version
# from file or module 'sys' import a variable "path"
#*******************
from sys import path
print (path)
satya - 9/16/2019, 1:44:46 PM
For example when pip installs pyspark, it is located in
\Lib\site-packages\pyspark\*
satya - 9/16/2019, 1:45:03 PM
And it appears awfully close to the python under spark installation
And it appears awfully close to the python under spark installation
satya - 9/16/2019, 1:45:31 PM
That means one may be able to write code without spark at all
.. and then just run it on azure. a thought.
satya - 9/16/2019, 1:47:50 PM
This also answers another question
VSCode is using the pyspark that is installed pip install and not the one that came with spark install
So in other words, you HAVE TO install pip pyspark to get this intellisense
There may be other benefits as well with pip install pyspark, but I haven't explored enough to know
satya - 9/16/2019, 2:06:38 PM
Something more about modules and their names
An imported module in python code is actually a variable
It is also the filename
So filenames must follow variable naming conventions
Modules written in C and other languages are called extension modules
satya - 9/16/2019, 2:09:37 PM
Oh a key difference
#sys is a module
#sys is a file
import sys
#notice the clarification of path
#sys is module
#sys is a file
#sys.path is a variable of module sys
print (sys.path)
#When you do from
from sys import path
#the module variable "sys" is not
#available. Only "path"is
#because "from" keeps a local copy called "path"
#pointing to sys.path
satya - 9/16/2019, 2:09:57 PM
That outlines the difference between import and from
That outlines the difference between import and from
satya - 9/16/2019, 2:11:11 PM
You can also do this from module import *
from module1 import *
Now you don't have to say:
module1.v1
moudle1.v2
but instead just use
v1
v2
satya - 9/16/2019, 2:12:16 PM
You can actually do both and use both ways to refer to the same variables
You can actually do both and use both ways to refer to the same variables
satya - 9/16/2019, 2:13:28 PM
The exported names from a module are available in __dict__ list
The exported names from a module are available in __dict__ list
satya - 9/16/2019, 2:18:42 PM
Now, if you want to know with out intellisense what pyspark packages are available
Look at this path in your python installation for pyspark module
#Here you will sub modules of sql
C:\satya\i\python374\Lib\site-packages\pyspark\sql
#here your will sub module like RDD
C:\satya\i\python374\Lib\site-packages\pyspark
satya - 9/18/2019, 11:16:27 AM
Here is how pyright uses typestubs
satya - 9/18/2019, 11:19:46 AM
Pyright and type annotations are here
satya - 9/18/2019, 11:20:46 AM
How does pyright execute an import module
How does pyright execute an import module
satya - 9/18/2019, 1:55:20 PM
The import system is documented here for python in its language ref
The import system is documented here for python in its language ref
satya - 9/18/2019, 1:57:25 PM
A regular package: __init__.py
A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package?s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.
satya - 9/18/2019, 2:25:36 PM
what is pythonpath used for
what is pythonpath used for
satya - 9/18/2019, 2:26:37 PM
It is correct, that it is used to local modules
It is correct, that it is used to local modules
satya - 9/20/2019, 9:35:19 AM
What is a package
A set of modules together
Usually tucked in a directory hierarchy
satya - 9/20/2019, 9:37:40 AM
Examples
#module1 is a .py file
#import all public symbols from there
import d1.d2.d3.module1
#or
from d1.d2.d3.module1 import x
#Just get sympbol x from that file
satya - 9/20/2019, 9:40:01 AM
What happens
1. Add d1, d2, d3, and module1 as variables
2. add x as a variable for "from"
3. Run any init.py files in d1, d2, d3 and register any symbols that init.py imports
satya - 9/20/2019, 9:46:32 AM
Consider a variation now
#*****************
from d1.d2.d3 import *
#*****************
1. Look for init.py file in d3, and see if there is a variable called __all__
2. it points to a list of modules that get imported because of import * above
satya - 9/20/2019, 9:47:40 AM
Can you do this then?
from d1 import d1SubDirectory1
satya - 9/20/2019, 10:01:20 AM
You also can do this
#**********************
import d1.d2.module
Or
from d1.d2 import module
#**********************
One difference is in the namespace. In case 1 you have to refer to variables in module as
d1.d2.module.varx
In the second case you can do
module.varx
satya - 9/20/2019, 2:37:17 PM
what is from __future__ import in python?
what is from __future__ import in python?
satya - 9/20/2019, 2:55:55 PM
Future statement is explained a bit here
satya - 9/20/2019, 2:59:13 PM
Targeting python 2 is explained here
satya - 9/20/2019, 3:01:25 PM
Book: http://book.pythontips.com/en/latest/index.html
satya - 9/20/2019, 3:16:28 PM
How to know here modules are installed
satya - 9/20/2019, 3:25:36 PM
Another difference between builtin modules and python modules
Another difference between builtin modules and python modules
satya - 9/20/2019, 5:11:50 PM
the __file__ property of a module
1. This property exists only for modules that are based on .py files
2. this property does not exist for builtin modules like "sys" for example
satya - 9/20/2019, 5:25:26 PM
Python 3 installation directory structure
Python 3 installation directory structure
satya - 9/20/2019, 5:29:05 PM
How does python find packages: another article
satya - 9/20/2019, 5:30:13 PM
the built in modules
The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
1. Notice the word "statically" linked
2. Notice the word "dynamically" linked
satya - 9/20/2019, 6:11:06 PM
This is where in python docs this is explained under "Module" object
This is where in python docs this is explained under "Module" object
satya - 9/20/2019, 6:12:33 PM
Here is what it says
Predefined (writable) attributes:
1) __name__ is the module?s name;
2) __doc__ is the module?s documentation string, or None if unavailable;
3) __file__ is the pathname of the file from which the module was loaded, if it was loaded from a file.
4) The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
satya - 9/20/2019, 6:45:50 PM
Now I think I am ready to explore this
1) from __future__ import print_function
2) import sys
3) from operator import add
4) from pyspark.sql import SparkSession
satya - 9/20/2019, 6:51:18 PM
The first one: __future__
1. from is always from a module or a directory (package)
2. in this case __future__ is a module because you will find a __future__.py file in the /lib path
3. From that module it is importing a variable called "print_function"
4. This name "print_function" is the name of one of a dozen objects of type "Feature"
6. The presence of this feature object seem to indicate to the previous compilers to interpret the "print" function differently
7. You have to read the backward compatibility of python to know which features you want to turn on for backward or future compatibility
8. How python will interpret a 3.x print function as a print statement in earlier version is NOT clear to me but as far as the module workings go, this line is merely importing that feature variable.
9. For understanding module imports you can ignore this behavior
satya - 9/20/2019, 6:56:47 PM
Second one, Understanding import sys
1. By following the syntax of "import" the "sys" must be a module.
2. Looking in the /lib directory or /lib/site-packages directory or any directory that is known to sys.path you will see that there is no sys.py file anywhere.
3. So "sys" is a standard library that is statically linked into the python DLLs in windows. Which sub directory or which DLL file this is available is not important
4. How do you know then this exists? 2 ways
5. First, You need to follow the standard library URL in python docs to know what system modules are available. It is an important link to have. You will not find these standard libraries as files in your local file system
6. the second way to know this is to use intellisense like "pyright" in vscode. However you will see hundreds of packages at the root level. so in a way you have to know this exists.
7. Once you know it exists the URL will list the variables supported by this module or you can use now the intellisense better with a "."
8. Being a module, the statement will import all the variables declared by "sys" module
satya - 9/20/2019, 7:03:30 PM
third one "from operator import add"
1. So Operator must be a package or a module
2. Look at the standard library to see if this is part of the standard library
3. It is. Look at the URL below
4. https://docs.python.org/3.7/library/operator.html
5. The "add" is a function object defined and named in the operator module
6. So this import with from adds the "add" variable to the namespace pointing to the "add" function object
satya - 9/20/2019, 7:27:27 PM
This is a full explanation of __all__ in python docs
satya - 9/20/2019, 8:04:15 PM
The fourth one now: from pyspark.sql import SparkSession
1. simple enough pyspark is a package
2. sql could be package or a module
3. looking at \lib\site_packages one sees pyspark is a directory. And sql a sub directory with many modules inside sql
4. Because "sql" is a package it has an init.py file. It imports already all the necessary .py files under \sql and declares them the public names. These public names are further gathered under "__all__" variable in the init.py file.
5. Unlike what is said in the docs for the "init.py" file, the __all__ seem to contain not the names of the modules but the names of the exposed public variables. So I suppose it may contain either modules or the names of exposed variables. However I have not seen this explicitly stated.
6. The implication is that a sub package like "sql" does not want you to import its modules but wants to control them through its init.py
7. Similarly the pyspark package wants to control only its direct sub modules and not its sub packages like "sql"
8. That means if you look at the directory structure of \pyspark, it and its sub directories (not the sub modules or files) are what are typically imported.
satya - 9/20/2019, 8:05:25 PM
Here is the init.py of the sql sub package
from __future__ import absolute_import
from pyspark.sql.types import Row
from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration
from pyspark.sql.session import SparkSession
from pyspark.sql.column import Column
from pyspark.sql.catalog import Catalog
from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions
from pyspark.sql.group import GroupedData
from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
from pyspark.sql.window import Window, WindowSpec
__all__ = [
'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration',
'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row',
'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec',
'DataFrameReader', 'DataFrameWriter'
]
Again see the backward compatibility declaration for "absolute_import".
satya - 10/4/2019, 3:53:24 PM
A detailed article: The Definitive Guide to Python import Statement
A detailed article: The Definitive Guide to Python import Statement
satya - 10/4/2019, 3:54:03 PM
From there
#explicit relative imports:
import other
from . import a2
from .subA import sa1
satya - 10/4/2019, 4:53:58 PM
Still having trouble with pyright unresolved imports for local import files
Still having trouble with pyright unresolved imports for local import files
satya - 10/4/2019, 4:54:03 PM
pyright unresolved import
pyright unresolved import
satya - 10/5/2019, 2:27:01 PM
I have reported this problem at pyright github
satya - 10/5/2019, 2:27:39 PM
Suggestion was to look into this explanation on pyright github
Suggestion was to look into this explanation on pyright github
satya - 10/5/2019, 2:28:37 PM
Here is what happens
On windows 10
\somedir
\somedir\myscript.py
\somedir\mod1.py
\somedir\mod2.py
in myscript.py I have
import mod1
This line is highlighted as "unresolved import". The "output" console for pyright is searching everywhere but not in the local directory of the script "\somedir"
At run time python is ok because the 'sys.path' will have the myscript.py local directory.
Does "pyright" not search the path of the script file based on its location? is there a setting in pyright to locate that module?
satya - 10/5/2019, 2:40:42 PM
Execution environments in pyright
Execution environments in pyright
satya - 10/5/2019, 4:12:23 PM
pyright configuraiton in vscode settings
pyright configuraiton in vscode settings
satya - 10/5/2019, 5:02:52 PM
There are only a limited number of settings that come from vscode
The rest are in this pyrightconfig.json
satya - 10/5/2019, 5:03:04 PM
See what comes from vscode settings
satya - 10/5/2019, 5:04:40 PM
Getting started with pyright is here
satya - 10/14/2019, 1:44:44 PM
pyright configuraiton pyrightconfig.json is documented here
satya - 10/14/2019, 1:47:41 PM
What does the root path look like pyrightconfig.json
{
"executionEnvironments": [
{"root": "people/satya/python/break-file"}
]
}
Notice the path separator even on windows
satya - 10/14/2019, 1:52:09 PM
Few more notes on this file
1. Once you have this file, you have to restart vscode for pyright to use this file correctly
2. The path starts with the root at the "folder". In the example above the sub directory "people" is immediately under the root folder such as "c:\abc\xyz\some-root" which is the folder that is added to the workspace. Then "people" is a sub directory under "\some-root"
3. Notice the path separator.
4. The modules I want to include are directly under ".....\break-file"
5. You can look at the "output" tab of the console section of vscode where the output is chosen from "pyright"
6. There can be many root folders that are independent of each other added to the workspace. Each root folder can contain its own pyrightconfig.json file.
satya - 10/14/2019, 1:59:43 PM
Are comments allowed in json config files?
Are comments allowed in json config files?
satya - 10/14/2019, 2:00:02 PM
Apparently they are not allowed :(