Resolving Imports and modules in Python

satya - 9/16/2019, 1:04:42 PM

There are two ways to import names from other modules


#Run and import all global variables
#from a file called module1

import module1


#Import a specific variable
#So as not to pollute the receiving namespace

from module2 import variable1


#variables can be functions, objects, etc
#Everything in python is an object anyway

satya - 9/16/2019, 1:07:41 PM

Where will it look module1?

module1 can be seen as another python file with extension module1.py

However you don't say the .py in the end

You also don't say where that file is

Few places it will look for are:

a) Current path

b) standard library (at python\Lib)

c) Externally installed library through pip (python\Lib\site-packages)

d) In places identified by environment variable "pythonpath"

satya - 9/16/2019, 1:08:17 PM

There are over 200 standard libraries in python

There are over 200 standard libraries in python

satya - 9/16/2019, 1:08:32 PM

Important python standard modules

Important python standard modules

Search for: Important python standard modules

satya - 9/16/2019, 1:11:07 PM

This is a nice introduction to useful libraries

This is a nice introduction to useful libraries

satya - 9/16/2019, 1:14:00 PM

Some listed libraries are


Datetime
math
random
re
csv

#command line scripting
sys
argparse

satya - 9/16/2019, 1:20:26 PM

.pyc and __pycache__ :compiling/caching

It creates .pyc file for .py files based on time stamps when .py changes

It may create a sub directory called __pycache__ and put the .pyc files underneath

This file contains bytecode

satya - 9/16/2019, 1:28:24 PM

Module search path, one more time

1. programs current working directory/home directory

2. PYTHONPATH directories

3. standard library directories

4. contents of any .pth files (if present)

5. The site-packages home of third party extensions

6. All these directories end up as sys.path variable in the interpreter. you can print that to see the paths

satya - 9/16/2019, 1:29:19 PM

Nature of site module

this is the standard module that describes how the module path is decided. So look for the package documentation for this library.

satya - 9/16/2019, 1:30:00 PM

Here is the docs for "site" modules

Here is the docs for "site" modules

satya - 9/16/2019, 1:31:18 PM

Lib/site-packages

Python automatically adds the site-packages sub directory of its standard library to the module search path. this is where pip installs python packages.

satya - 9/16/2019, 1:38:15 PM

What does sys.path has?


#Do this
import sys
print (sys.path)

#You will get this

C:\\satya\\data\\code\\pyspark', 
'c:\\satya\\i\\python374\\python37.zip', 
'c:\\satya\\i\\python374\\DLLs', 
'c:\\satya\\i\\python374\\lib', 
'c:\\satya\\i\\python374', 
'C:\\Users\\satya\\AppData\\Roaming\\Python\\Python37\\site-packages', 
'c:\\satya\\i\\python374\\lib\\site-packages'

Not sure why there are two packages.

satya - 9/16/2019, 1:39:46 PM

Interesting


#*******************
#First version
#*******************
import sys
print (sys.path)

#*******************
#Second version: wrong
#Because sys.path is not a filename
#Only sys is the file
#*******************
import sys.path
print (path)

#*******************
#Third version
# from file or module 'sys' import a variable "path"
#*******************
from sys import path
print (path)

satya - 9/16/2019, 1:44:46 PM

For example when pip installs pyspark, it is located in


\Lib\site-packages\pyspark\*

satya - 9/16/2019, 1:45:03 PM

And it appears awfully close to the python under spark installation

And it appears awfully close to the python under spark installation

satya - 9/16/2019, 1:45:31 PM

That means one may be able to write code without spark at all

.. and then just run it on azure. a thought.

satya - 9/16/2019, 1:47:50 PM

This also answers another question

VSCode is using the pyspark that is installed pip install and not the one that came with spark install

So in other words, you HAVE TO install pip pyspark to get this intellisense

There may be other benefits as well with pip install pyspark, but I haven't explored enough to know

satya - 9/16/2019, 2:06:38 PM

Something more about modules and their names

An imported module in python code is actually a variable

It is also the filename

So filenames must follow variable naming conventions

Modules written in C and other languages are called extension modules

satya - 9/16/2019, 2:09:37 PM

Oh a key difference


#sys is a module
#sys is a file
import sys

#notice the clarification of path

#sys is module
#sys is a file
#sys.path is a variable of module sys
print (sys.path)

#When you do from
from sys import path

#the module variable "sys" is not
#available. Only "path"is
#because "from" keeps a local copy called "path"
#pointing to sys.path

satya - 9/16/2019, 2:09:57 PM

That outlines the difference between import and from

That outlines the difference between import and from

satya - 9/16/2019, 2:11:11 PM

You can also do this from module import *

from module1 import *

Now you don't have to say:

module1.v1

moudle1.v2

but instead just use

v1

v2

satya - 9/16/2019, 2:12:16 PM

You can actually do both and use both ways to refer to the same variables

You can actually do both and use both ways to refer to the same variables

satya - 9/16/2019, 2:13:28 PM

The exported names from a module are available in __dict__ list

The exported names from a module are available in __dict__ list

satya - 9/16/2019, 2:18:42 PM

Now, if you want to know with out intellisense what pyspark packages are available

Look at this path in your python installation for pyspark module

#Here you will sub modules of sql

C:\satya\i\python374\Lib\site-packages\pyspark\sql

#here your will sub module like RDD

C:\satya\i\python374\Lib\site-packages\pyspark

satya - 9/18/2019, 11:16:27 AM

Here is how pyright uses typestubs

Here is how pyright uses typestubs

satya - 9/18/2019, 11:16:56 AM

Pyright docs

Pyright docs

satya - 9/18/2019, 11:19:46 AM

Pyright and type annotations are here

Pyright and type annotations are here

satya - 9/18/2019, 11:20:46 AM

How does pyright execute an import module

How does pyright execute an import module

Search for: How does pyright execute an import module

satya - 9/18/2019, 1:55:20 PM

The import system is documented here for python in its language ref

The import system is documented here for python in its language ref

satya - 9/18/2019, 1:57:25 PM

A regular package: __init__.py

A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package?s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

satya - 9/18/2019, 2:25:36 PM

what is pythonpath used for

what is pythonpath used for

Search for: what is pythonpath used for

satya - 9/18/2019, 2:26:27 PM

Here is an SOF summary

Here is an SOF summary

satya - 9/18/2019, 2:26:37 PM

It is correct, that it is used to local modules

It is correct, that it is used to local modules

satya - 9/20/2019, 9:35:19 AM

What is a package

A set of modules together

Usually tucked in a directory hierarchy

satya - 9/20/2019, 9:37:40 AM

Examples


#module1 is a .py file
#import all public symbols from there

import d1.d2.d3.module1

#or
from d1.d2.d3.module1 import x

#Just get sympbol x from that file

satya - 9/20/2019, 9:40:01 AM

What happens

1. Add d1, d2, d3, and module1 as variables

2. add x as a variable for "from"

3. Run any init.py files in d1, d2, d3 and register any symbols that init.py imports

satya - 9/20/2019, 9:46:32 AM

Consider a variation now

#*****************

from d1.d2.d3 import *

#*****************

1. Look for init.py file in d3, and see if there is a variable called __all__

2. it points to a list of modules that get imported because of import * above

satya - 9/20/2019, 9:47:40 AM

Can you do this then?


from d1 import d1SubDirectory1

satya - 9/20/2019, 10:01:20 AM

You also can do this

#**********************

import d1.d2.module

Or

from d1.d2 import module

#**********************

One difference is in the namespace. In case 1 you have to refer to variables in module as

d1.d2.module.varx

In the second case you can do

module.varx

satya - 9/20/2019, 2:37:17 PM

what is from __future__ import in python?

what is from __future__ import in python?

Search for: what is from __future__ import in python?

satya - 9/20/2019, 2:37:27 PM

Future module is documented here

Future module is documented here

satya - 9/20/2019, 2:55:55 PM

Future statement is explained a bit here

Future statement is explained a bit here

satya - 9/20/2019, 2:59:13 PM

Targeting python 2 is explained here

Targeting python 2 is explained here

satya - 9/20/2019, 3:01:25 PM

Book: http://book.pythontips.com/en/latest/index.html

Book: http://book.pythontips.com/en/latest/index.html

satya - 9/20/2019, 3:16:28 PM

How to know here modules are installed

How to know here modules are installed

satya - 9/20/2019, 3:25:36 PM

Another difference between builtin modules and python modules

Another difference between builtin modules and python modules

satya - 9/20/2019, 5:11:50 PM

the __file__ property of a module

1. This property exists only for modules that are based on .py files

2. this property does not exist for builtin modules like "sys" for example

satya - 9/20/2019, 5:25:26 PM

Python 3 installation directory structure

Python 3 installation directory structure

Search for: Python 3 installation directory structure

satya - 9/20/2019, 5:29:05 PM

How does python find packages: another article

How does python find packages: another article

satya - 9/20/2019, 5:30:13 PM

the built in modules

The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

1. Notice the word "statically" linked

2. Notice the word "dynamically" linked

satya - 9/20/2019, 6:11:06 PM

This is where in python docs this is explained under "Module" object

This is where in python docs this is explained under "Module" object

satya - 9/20/2019, 6:12:33 PM

Here is what it says

Predefined (writable) attributes:

1) __name__ is the module?s name;

2) __doc__ is the module?s documentation string, or None if unavailable;

3) __file__ is the pathname of the file from which the module was loaded, if it was loaded from a file.

4) The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

satya - 9/20/2019, 6:45:50 PM

Now I think I am ready to explore this


1) from __future__ import print_function

2) import sys

3) from operator import add

4) from pyspark.sql import SparkSession

satya - 9/20/2019, 6:51:18 PM

The first one: __future__

1. from is always from a module or a directory (package)

2. in this case __future__ is a module because you will find a __future__.py file in the /lib path

3. From that module it is importing a variable called "print_function"

4. This name "print_function" is the name of one of a dozen objects of type "Feature"

6. The presence of this feature object seem to indicate to the previous compilers to interpret the "print" function differently

7. You have to read the backward compatibility of python to know which features you want to turn on for backward or future compatibility

8. How python will interpret a 3.x print function as a print statement in earlier version is NOT clear to me but as far as the module workings go, this line is merely importing that feature variable.

9. For understanding module imports you can ignore this behavior

satya - 9/20/2019, 6:56:47 PM

Second one, Understanding import sys

1. By following the syntax of "import" the "sys" must be a module.

2. Looking in the /lib directory or /lib/site-packages directory or any directory that is known to sys.path you will see that there is no sys.py file anywhere.

3. So "sys" is a standard library that is statically linked into the python DLLs in windows. Which sub directory or which DLL file this is available is not important

4. How do you know then this exists? 2 ways

5. First, You need to follow the standard library URL in python docs to know what system modules are available. It is an important link to have. You will not find these standard libraries as files in your local file system

6. the second way to know this is to use intellisense like "pyright" in vscode. However you will see hundreds of packages at the root level. so in a way you have to know this exists.

7. Once you know it exists the URL will list the variables supported by this module or you can use now the intellisense better with a "."

8. Being a module, the statement will import all the variables declared by "sys" module

satya - 9/20/2019, 7:03:30 PM

third one "from operator import add"

1. So Operator must be a package or a module

2. Look at the standard library to see if this is part of the standard library

3. It is. Look at the URL below

4. https://docs.python.org/3.7/library/operator.html

5. The "add" is a function object defined and named in the operator module

6. So this import with from adds the "add" variable to the namespace pointing to the "add" function object

satya - 9/20/2019, 7:23:04 PM

Here is the full story of __all__

Here is the full story of __all__

satya - 9/20/2019, 7:27:27 PM

This is a full explanation of __all__ in python docs

This is a full explanation of __all__ in python docs

satya - 9/20/2019, 8:04:15 PM

The fourth one now: from pyspark.sql import SparkSession

1. simple enough pyspark is a package

2. sql could be package or a module

3. looking at \lib\site_packages one sees pyspark is a directory. And sql a sub directory with many modules inside sql

4. Because "sql" is a package it has an init.py file. It imports already all the necessary .py files under \sql and declares them the public names. These public names are further gathered under "__all__" variable in the init.py file.

5. Unlike what is said in the docs for the "init.py" file, the __all__ seem to contain not the names of the modules but the names of the exposed public variables. So I suppose it may contain either modules or the names of exposed variables. However I have not seen this explicitly stated.

6. The implication is that a sub package like "sql" does not want you to import its modules but wants to control them through its init.py

7. Similarly the pyspark package wants to control only its direct sub modules and not its sub packages like "sql"

8. That means if you look at the directory structure of \pyspark, it and its sub directories (not the sub modules or files) are what are typically imported.

satya - 9/20/2019, 8:05:25 PM

Here is the init.py of the sql sub package


from __future__ import absolute_import


from pyspark.sql.types import Row
from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration
from pyspark.sql.session import SparkSession
from pyspark.sql.column import Column
from pyspark.sql.catalog import Catalog
from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions
from pyspark.sql.group import GroupedData
from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
from pyspark.sql.window import Window, WindowSpec


__all__ = [
    'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration',
    'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row',
    'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec',
    'DataFrameReader', 'DataFrameWriter'
]

Again see the backward compatibility declaration for "absolute_import".

satya - 10/4/2019, 3:53:24 PM

A detailed article: The Definitive Guide to Python import Statement

A detailed article: The Definitive Guide to Python import Statement

satya - 10/4/2019, 3:54:03 PM

From there


#explicit relative imports:

import other
from . import a2
from .subA import sa1

satya - 10/4/2019, 4:53:58 PM

Still having trouble with pyright unresolved imports for local import files

Still having trouble with pyright unresolved imports for local import files

satya - 10/4/2019, 4:54:03 PM

pyright unresolved import

pyright unresolved import

Search for: pyright unresolved import

satya - 10/5/2019, 2:27:01 PM

I have reported this problem at pyright github

I have reported this problem at pyright github

satya - 10/5/2019, 2:27:39 PM

Suggestion was to look into this explanation on pyright github

Suggestion was to look into this explanation on pyright github

satya - 10/5/2019, 2:28:37 PM

Here is what happens


On windows 10
\somedir

\somedir\myscript.py
\somedir\mod1.py
\somedir\mod2.py

in myscript.py I have

import mod1

This line is highlighted as "unresolved import". The "output" console for pyright is searching everywhere but not in the local directory of the script "\somedir"

At run time python is ok because the 'sys.path' will have the myscript.py local directory.

Does "pyright" not search the path of the script file based on its location? is there a setting in pyright to locate that module?

satya - 10/5/2019, 2:40:42 PM

Execution environments in pyright

Execution environments in pyright

Search for: Execution environments in pyright

satya - 10/5/2019, 2:41:09 PM

pyrightconfig.json

pyrightconfig.json

Search for: pyrightconfig.json

satya - 10/5/2019, 4:12:23 PM

pyright configuraiton in vscode settings

pyright configuraiton in vscode settings

Search for: pyright configuraiton in vscode settings

satya - 10/5/2019, 5:02:52 PM

There are only a limited number of settings that come from vscode

The rest are in this pyrightconfig.json

satya - 10/5/2019, 5:03:04 PM

See what comes from vscode settings

See what comes from vscode settings

satya - 10/5/2019, 5:04:40 PM

Getting started with pyright is here

Getting started with pyright is here

satya - 10/14/2019, 1:44:44 PM

pyright configuraiton pyrightconfig.json is documented here

pyright configuraiton pyrightconfig.json is documented here

satya - 10/14/2019, 1:47:41 PM

What does the root path look like pyrightconfig.json


{
"executionEnvironments": [
        {"root": "people/satya/python/break-file"}
    ]
}

Notice the path separator even on windows

satya - 10/14/2019, 1:52:09 PM

Few more notes on this file

1. Once you have this file, you have to restart vscode for pyright to use this file correctly

2. The path starts with the root at the "folder". In the example above the sub directory "people" is immediately under the root folder such as "c:\abc\xyz\some-root" which is the folder that is added to the workspace. Then "people" is a sub directory under "\some-root"

3. Notice the path separator.

4. The modules I want to include are directly under ".....\break-file"

5. You can look at the "output" tab of the console section of vscode where the output is chosen from "pyright"

6. There can be many root folders that are independent of each other added to the workspace. Each root folder can contain its own pyrightconfig.json file.

satya - 10/14/2019, 1:59:43 PM

Are comments allowed in json config files?

Are comments allowed in json config files?

Search for: Are comments allowed in json config files?

satya - 10/14/2019, 2:00:02 PM

Apparently they are not allowed :(

Apparently they are not allowed :(