Resolving Imports and modules in Python


#Run and import all global variables
#from a file called module1

import module1


#Import a specific variable
#So as not to pollute the receiving namespace

from module2 import variable1


#variables can be functions, objects, etc
#Everything in python is an object anyway

module1 can be seen as another python file with extension module1.py

However you don't say the .py in the end

You also don't say where that file is

Few places it will look for are:

a) Current path

b) standard library (at python\Lib)

c) Externally installed library through pip (python\Lib\site-packages)

d) In places identified by environment variable "pythonpath"

There are over 200 standard libraries in python

Important python standard modules

Search for: Important python standard modules

This is a nice introduction to useful libraries


Datetime
math
random
re
csv

#command line scripting
sys
argparse

It creates .pyc file for .py files based on time stamps when .py changes

It may create a sub directory called __pycache__ and put the .pyc files underneath

This file contains bytecode

1. programs current working directory/home directory

2. PYTHONPATH directories

3. standard library directories

4. contents of any .pth files (if present)

5. The site-packages home of third party extensions

6. All these directories end up as sys.path variable in the interpreter. you can print that to see the paths

this is the standard module that describes how the module path is decided. So look for the package documentation for this library.

Here is the docs for "site" modules

Python automatically adds the site-packages sub directory of its standard library to the module search path. this is where pip installs python packages.


#Do this
import sys
print (sys.path)

#You will get this

C:\\satya\\data\\code\\pyspark', 
'c:\\satya\\i\\python374\\python37.zip', 
'c:\\satya\\i\\python374\\DLLs', 
'c:\\satya\\i\\python374\\lib', 
'c:\\satya\\i\\python374', 
'C:\\Users\\satya\\AppData\\Roaming\\Python\\Python37\\site-packages', 
'c:\\satya\\i\\python374\\lib\\site-packages'

Not sure why there are two packages.


#*******************
#First version
#*******************
import sys
print (sys.path)

#*******************
#Second version: wrong
#Because sys.path is not a filename
#Only sys is the file
#*******************
import sys.path
print (path)

#*******************
#Third version
# from file or module 'sys' import a variable "path"
#*******************
from sys import path
print (path)

\Lib\site-packages\pyspark\*

And it appears awfully close to the python under spark installation

.. and then just run it on azure. a thought.

VSCode is using the pyspark that is installed pip install and not the one that came with spark install

So in other words, you HAVE TO install pip pyspark to get this intellisense

There may be other benefits as well with pip install pyspark, but I haven't explored enough to know

An imported module in python code is actually a variable

It is also the filename

So filenames must follow variable naming conventions

Modules written in C and other languages are called extension modules


#sys is a module
#sys is a file
import sys

#notice the clarification of path

#sys is module
#sys is a file
#sys.path is a variable of module sys
print (sys.path)

#When you do from
from sys import path

#the module variable "sys" is not
#available. Only "path"is
#because "from" keeps a local copy called "path"
#pointing to sys.path

That outlines the difference between import and from

from module1 import *

Now you don't have to say:

module1.v1

moudle1.v2

but instead just use

v1

v2

You can actually do both and use both ways to refer to the same variables

The exported names from a module are available in __dict__ list

Look at this path in your python installation for pyspark module

#Here you will sub modules of sql

C:\satya\i\python374\Lib\site-packages\pyspark\sql

#here your will sub module like RDD

C:\satya\i\python374\Lib\site-packages\pyspark

Here is how pyright uses typestubs

Pyright docs

Pyright and type annotations are here

How does pyright execute an import module

Search for: How does pyright execute an import module

The import system is documented here for python in its language ref

A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package?s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

what is pythonpath used for

Search for: what is pythonpath used for

Here is an SOF summary

It is correct, that it is used to local modules

A set of modules together

Usually tucked in a directory hierarchy


#module1 is a .py file
#import all public symbols from there

import d1.d2.d3.module1

#or
from d1.d2.d3.module1 import x

#Just get sympbol x from that file

1. Add d1, d2, d3, and module1 as variables

2. add x as a variable for "from"

3. Run any init.py files in d1, d2, d3 and register any symbols that init.py imports

#*****************

from d1.d2.d3 import *

#*****************

1. Look for init.py file in d3, and see if there is a variable called __all__

2. it points to a list of modules that get imported because of import * above


from d1 import d1SubDirectory1

#**********************

import d1.d2.module

Or

from d1.d2 import module

#**********************

One difference is in the namespace. In case 1 you have to refer to variables in module as

d1.d2.module.varx

In the second case you can do

module.varx

what is from __future__ import in python?

Search for: what is from __future__ import in python?

Future module is documented here

Future statement is explained a bit here

Targeting python 2 is explained here

Book: http://book.pythontips.com/en/latest/index.html

How to know here modules are installed

Another difference between builtin modules and python modules

1. This property exists only for modules that are based on .py files

2. this property does not exist for builtin modules like "sys" for example

Python 3 installation directory structure

Search for: Python 3 installation directory structure

How does python find packages: another article

The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

1. Notice the word "statically" linked

2. Notice the word "dynamically" linked

This is where in python docs this is explained under "Module" object

Predefined (writable) attributes:

1) __name__ is the module?s name;

2) __doc__ is the module?s documentation string, or None if unavailable;

3) __file__ is the pathname of the file from which the module was loaded, if it was loaded from a file.

4) The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.


1) from __future__ import print_function

2) import sys

3) from operator import add

4) from pyspark.sql import SparkSession

1. from is always from a module or a directory (package)

2. in this case __future__ is a module because you will find a __future__.py file in the /lib path

3. From that module it is importing a variable called "print_function"

4. This name "print_function" is the name of one of a dozen objects of type "Feature"

6. The presence of this feature object seem to indicate to the previous compilers to interpret the "print" function differently

7. You have to read the backward compatibility of python to know which features you want to turn on for backward or future compatibility

8. How python will interpret a 3.x print function as a print statement in earlier version is NOT clear to me but as far as the module workings go, this line is merely importing that feature variable.

9. For understanding module imports you can ignore this behavior

1. By following the syntax of "import" the "sys" must be a module.

2. Looking in the /lib directory or /lib/site-packages directory or any directory that is known to sys.path you will see that there is no sys.py file anywhere.

3. So "sys" is a standard library that is statically linked into the python DLLs in windows. Which sub directory or which DLL file this is available is not important

4. How do you know then this exists? 2 ways

5. First, You need to follow the standard library URL in python docs to know what system modules are available. It is an important link to have. You will not find these standard libraries as files in your local file system

6. the second way to know this is to use intellisense like "pyright" in vscode. However you will see hundreds of packages at the root level. so in a way you have to know this exists.

7. Once you know it exists the URL will list the variables supported by this module or you can use now the intellisense better with a "."

8. Being a module, the statement will import all the variables declared by "sys" module

1. So Operator must be a package or a module

2. Look at the standard library to see if this is part of the standard library

3. It is. Look at the URL below

4. https://docs.python.org/3.7/library/operator.html

5. The "add" is a function object defined and named in the operator module

6. So this import with from adds the "add" variable to the namespace pointing to the "add" function object

Here is the full story of __all__

This is a full explanation of __all__ in python docs

1. simple enough pyspark is a package

2. sql could be package or a module

3. looking at \lib\site_packages one sees pyspark is a directory. And sql a sub directory with many modules inside sql

4. Because "sql" is a package it has an init.py file. It imports already all the necessary .py files under \sql and declares them the public names. These public names are further gathered under "__all__" variable in the init.py file.

5. Unlike what is said in the docs for the "init.py" file, the __all__ seem to contain not the names of the modules but the names of the exposed public variables. So I suppose it may contain either modules or the names of exposed variables. However I have not seen this explicitly stated.

6. The implication is that a sub package like "sql" does not want you to import its modules but wants to control them through its init.py

7. Similarly the pyspark package wants to control only its direct sub modules and not its sub packages like "sql"

8. That means if you look at the directory structure of \pyspark, it and its sub directories (not the sub modules or files) are what are typically imported.


from __future__ import absolute_import


from pyspark.sql.types import Row
from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration
from pyspark.sql.session import SparkSession
from pyspark.sql.column import Column
from pyspark.sql.catalog import Catalog
from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions
from pyspark.sql.group import GroupedData
from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
from pyspark.sql.window import Window, WindowSpec


__all__ = [
    'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration',
    'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row',
    'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec',
    'DataFrameReader', 'DataFrameWriter'
]

Again see the backward compatibility declaration for "absolute_import".

A detailed article: The Definitive Guide to Python import Statement


#explicit relative imports:

import other
from . import a2
from .subA import sa1

Still having trouble with pyright unresolved imports for local import files

pyright unresolved import

Search for: pyright unresolved import

I have reported this problem at pyright github

Suggestion was to look into this explanation on pyright github


On windows 10
\somedir

\somedir\myscript.py
\somedir\mod1.py
\somedir\mod2.py

in myscript.py I have

import mod1

This line is highlighted as "unresolved import". The "output" console for pyright is searching everywhere but not in the local directory of the script "\somedir"

At run time python is ok because the 'sys.path' will have the myscript.py local directory.

Does "pyright" not search the path of the script file based on its location? is there a setting in pyright to locate that module?

Execution environments in pyright

Search for: Execution environments in pyright

pyrightconfig.json

Search for: pyrightconfig.json

pyright configuraiton in vscode settings

Search for: pyright configuraiton in vscode settings

The rest are in this pyrightconfig.json

See what comes from vscode settings

Getting started with pyright is here

pyright configuraiton pyrightconfig.json is documented here


{
"executionEnvironments": [
        {"root": "people/satya/python/break-file"}
    ]
}

Notice the path separator even on windows

1. Once you have this file, you have to restart vscode for pyright to use this file correctly

2. The path starts with the root at the "folder". In the example above the sub directory "people" is immediately under the root folder such as "c:\abc\xyz\some-root" which is the folder that is added to the workspace. Then "people" is a sub directory under "\some-root"

3. Notice the path separator.

4. The modules I want to include are directly under ".....\break-file"

5. You can look at the "output" tab of the console section of vscode where the output is chosen from "pyright"

6. There can be many root folders that are independent of each other added to the workspace. Each root folder can contain its own pyrightconfig.json file.

Are comments allowed in json config files?

Search for: Are comments allowed in json config files?

Apparently they are not allowed :(