Basic Data science with Powershell

satya - 2/21/2020, 11:27:06 AM

My Powershell coding journal is here

My Powershell coding journal is here

satya - 2/21/2020, 11:28:21 AM

Consider a JSON like this


{
    "success": true,
    "data": [
        {
            "turbine": "TO02",
            "tag_name": "ActivePower",
            "tag_value": "1709.2",
            "timestamp": "2020-01-30 00:00:04"
        },
        {
            "turbine": "TO02",
            "tag_name": "WindSpeed",
            "tag_value": "8.82",
            "timestamp": "2020-01-30 00:00:04"
        },
    ]
...
}

satya - 2/21/2020, 11:29:58 AM

An object structure


class datum
{
   String turbine;
   String tagname;
   String tagvalue;
   String timstamp;
}

class jsonData {
    bool success;
    //An array of datum objects
    datum[] data;
}

satya - 2/21/2020, 11:34:15 AM

I can do this in Powershell


#Read from a JSON file and convert it to JSON
$s = Get-Content $jsonfile | ConvertFrom-Json

#access the success field of the object
$successCode = $s.success

#Get the data array where objects
$data = $s.data

#Report how many objects are there in the data array
p -message "There are $($data.Count) number of records"

satya - 2/21/2020, 11:35:05 AM

Lets read the first 5 rows of that array


#Get first 5
$data | Select-Object -First 5

turbine tag_name    tag_value timestamp
------- --------    --------- ---------
TO02    ActivePower 1709.2    2020-01-30 00:00:04
TO02    WindSpeed   8.82      2020-01-30 00:00:04
TO02    ActivePower 1710.2    2020-01-30 00:00:09
TO02    WindSpeed   8.16      2020-01-30 00:00:09
TO02    WindSpeed   8.54      2020-01-30 00:00:18

satya - 2/21/2020, 11:36:14 AM

GetUnique time instances from the 20,000 arrray


function getUniqueTimeInstances($data)
{
    #Object (turbine, tag_name, tag_value, timestamp)
    $timeArray = $data | Select-Object -Property timestamp -Unique
    return $timeArray

}

satya - 2/21/2020, 11:38:39 AM

Where clause: Get all rows where the turbine is T002 and tag name is Activepower


function getNumberOfActivePowerInstancesForTurbine($turbine)
{
    $turbineArray  = $data | Where-Object {$_.turbine -eq $turbine -and $_.tag_name -eq "ActivePower"}
    return $turbineArray

}

satya - 2/21/2020, 11:39:22 AM

Sort the data arrray by timestamp


$sortedData = $data | Sort-Object -Property "timestamp"

satya - 2/21/2020, 11:41:02 AM

Here is how to walk through the dataset


function walkthrough($data)
{
    $sortedData = $data | Sort-Object -Property "timestamp"
    $c = 0
    foreach ($row in $sortedData)  {
        if ($c -eq 466)
        {
            break
        }
        $c = $c + 1
        $time = $row.timestamp
        $tid = getRecordForTime($time)
        if ($tid -eq $null)
        {
            #record doesn't exist
            #p -message 'no record'
            addTimeInstance -time $time -row $row
        }
        else {
            #record exists
            #p -message "record exists"
            updateTimeInstance -time $time -tid $tid -row $row
        }
    }
}

NOte: All functions are not here

satya - 2/21/2020, 11:43:54 AM

Working with a few arrays


function printTIDArray()
{
    #Declare an array of objects
    [TimeInstanceDetailData[]]$tidArray = @()

    #Read the values from a hashtable and put them in the array
    foreach ($item in $script:timeInstanceDataTable.Values)
    {
       # p -message "item: $($item.time)"
        $tidArray += $item
    }

    #pring a header line
    hp -message "The TID array"

    #Sort the arrray
    $sortedTidArray = $tidArray | Sort-Object -Property "time"

    #Print the array in a tabluar format
    $sortedTidArray | Format-Table
}

satya - 2/21/2020, 11:44:44 AM

Here is an example of that arrray


time                apCount wsCount wdCount tCount
----                ------- ------- ------- ------
2020-01-30 00:00:04      19      19       2      0
2020-01-30 00:00:07       0       0       1      0
2020-01-30 00:00:08       0       1       0      0
2020-01-30 00:00:09       6       5       1      0
2020-01-30 00:00:12       0       0       1      0
2020-01-30 00:00:14      13      13       0      0
2020-01-30 00:00:18       0       1       0      0
2020-01-30 00:00:19      19      18       1      0
2020-01-30 00:00:24      19      19       2      0
2020-01-30 00:00:28       0       1       1      0
2020-01-30 00:00:29      19      18       1      0
2020-01-30 00:00:30       0       0       1      0
2020-01-30 00:00:33       0       0       1      0
2020-01-30 00:00:39      19      19       1      0
2020-01-30 00:00:42       0       0       1      0
2020-01-30 00:00:44      19      19       0      0
2020-01-30 00:00:49      19      19       2      0
2020-01-30 00:00:53       0       1       1      0
2020-01-30 00:00:54      18      17       1      1
2020-01-30 00:00:55       1       1       0      0
2020-01-30 00:00:56       0       0       1      1
2020-01-30 00:01:03       0       2       0      0
2020-01-30 00:01:04      19      17       1      0
2020-01-30 00:01:09      18      18       1      1
2020-01-30 00:01:10       1       1       0      1
2020-01-30 00:01:14      13      10       0      0

satya - 2/21/2020, 11:45:06 AM

This is one way to analyze whats going on with a large sets of input data

This is one way to analyze whats going on with a large sets of input data