How to wrap long lines, *without* splitting words. The input is a very long string, with no newlines:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed eu laoreet turpis. Curabitur lacinia, risus ut rhoncus mattis, turpis lorem iaculis justo, nec ultrices arcu erat vitae felis. Pellentesque vulputate efficitur scelerisque. Etiam bibendum dignissim mauris
Query
<a href="https://docs.microsoft.com/en-us/powerquery-m/list-accumulate">List.Accumulate</a> is an aggregate function. I’m using it to “sum” — to add strings together. If the current line plus the next word is longer than 80 characters, then insert a newline first.
To find the length of the current line, I only want the length after the very last newline. Occurrence.Last returns the last match, else -1 if nothing is found.
let
LoremIpsum = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed eu laoreet turpis. Curabitur lacinia, risus ut rhoncus mattis, turpis lorem iaculis justo, nec ultrices arcu erat vitae felis. Pellentesque vulputate efficitur scelerisque. Etiam bibendum dignissim mauris",
// calculate length of string *after* the rightmost newline
Text_LengthAfterNewline = (string as text) as number =>
let
posLastNewline = Text.PositionOf(string, "#(lf)", Occurrence.Last),
posOffset = if posLastNewline <> -1 then posLastNewline else 0,
deltaLen = Text.Length(string) - posOffset
in
deltaLen,
// word wraps text
Text_WordWrap = (string as text, max_width as number) as text =>
let
words = Text.Split(string, " "),
accum_result = List.Accumulate(
words, "",
(state as text, current as text) as text =>
let
len = Text_LengthAfterNewline(state) + Text.Length(current) + 1,
maybeNewline =
if len > max_width then "#(lf)" else "",
accum_string = Text.Combine({state & maybeNewline, current}, " ")
in
accum_string
)
in
accum_result,
wrapped_text = Text_WordWrap(LoremIpsum, 80)
in
wrapped_text
The final result is 80 characters wide or less
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed eu laoreet turpis.
Curabitur lacinia, risus ut rhoncus mattis, turpis lorem iaculis justo, nec
ultrices arcu erat vitae felis. Pellentesque vulputate efficitur scelerisque.
Etiam bibendum dignissim mauris
Validating lengths of each line
let
Source = #"wrap lines",
validate_lengths =
let
lines = Text.Split(Source, "#(lf)"),
lengths = List.Transform(
lines,
each [ String = _, Length = Text.Length(_) ])
in
Table.FromRecords(
lengths,
type table[String = text, Length = number],
MissingField.Error )
in
validate_lengths
Check out the new tutorial at https://blog.powp.co/my-power-query-web-contents-cheat-sheet-6a5bbfdce5eb
Web.Contents without Refresh Errors
The main cause of Web.Contents not refreshing can be fixed by adding the options[Query] and options[RelativeaPath] parameters. (The UI doesn’t create them for you)
/* web request, act based on the HTTP Status Code returned
see more:
override default error handling: https://docs.microsoft.com/en-us/power-query/handlingstatuscodes
example wait-for loop: https://docs.microsoft.com/en-us/power-query/waitretry#manualstatushandling
*/
let
WikiRequest = (pageName as text) as any =>
let
BaseUrl = "https://en.wikipedia.org/wiki",
Options = [
RelativePath = pageName,
ManualStatusHandling = {400, 404}
],
// wrap 'Response' in 'Binary.Buffer' if you are using it multiple times
response = Web.Contents(BaseUrl, Options),
buffered = Binary.Buffer(response),
response_metadata = Value.Metadata(response),
status_code = response_metadata[Response.Status],
final_result = [
buffered = buffered,
response_metadata = response_metadata
]
in
final_result,
Queries = {"Cat", "DoesNot Exist fake page"},
Items = List.Transform(
Queries,
each WikiRequest( _ )
),
ResponseTable = Table.FromRecords(
Items,
type table[buffered = binary, response_metadata = record], MissingField.Error
),
#"Expanded HTTP Status Codes" = Table.ExpandRecordColumn(ResponseTable, "response_metadata", {"Response.Status"}, {"Response.Status"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded HTTP Status Codes",{{"Response.Status", Int64.Type}})
in
#"Changed Type"
WebRequest: Wrapper with Better Defaults
You can get the full file with extra comments: WebRequest.pq
let
/*
Example using this url:
(https://www.metaweather.com/api/location/search?lattlong=36.96,-122.02)
WebRequest(
"https://www.metaweather.com",
"api/location/search",
[ lattlong = "36.96,-122.02" ]
)
Details on preventing "Refresh Errors", using 'Query' and 'RelativePath':
- Not using Query and Relative path cause refresh errors:
(https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power-bi/)
- You can opt-in to Skip-Test:
(https://blog.crossjoin.co.uk/2019/04/25/skip-test-connection-power-bi-refresh-failures/)
- Debugging and tracing the HTTP requests
(https://blog.crossjoin.co.uk/2019/11/17/troubleshooting-web-service-refresh-problems-in-power-bi-with-the-power-query-diagnostics-feature/)
*/
WebRequest = (
staticPath as text, // domain
relativePath as text, // basically use everything after ".com" to "?"
optional query as nullable record, // url query string
optional asRaw as nullable logical, // use true if content is not Json
optional headers as nullable record // optional HTTP headers as a record
) as any =>
let
query = query ?? [],
asRaw = asRaw ?? false, // toggles calling Json.Document() or not
headers = headers ?? [
Accept="application/json"
],
baseUrl = staticPath,
options = [
RelativePath = relativePath,
Headers = headers,
Query = query
// optionally toggle handling errors for specific HTTP Status codes
// ManualStatusHandling = {400, 404}
],
// wrap 'Response' in 'Binary.Buffer' if you are using it multiple times
response = Web.Contents(staticPath, options),
metadata = Value.Metadata(response),
buffered = Binary.Buffer(response),
result = Json.Document(buffered)
in
[
response = if asRaw then buffered else result,
status_code = metadata[Response.Status],
metadata = metadata
]
in
WebRequest
Chaining Web.Contents to Merge Many Queries
let
response_locations = WebRequest(
"https://www.metaweather.com",
"api/location/search",
[ lattlong = "36.96,-122.02" ]
),
location_schema = type table[
distance = number, title = text,
location_type = text, woeid = number, latt_long = text
],
cityListing = Table.FromRecords(response_locations[response], location_schema, MissingField.Error),
city_mergedRequest = Table.AddColumn(
cityListing,
"LocationWeather",
(row as record) as any =>
let
woeid = Text.From(row[woeid]),
response = WebRequest(
"https://www.metaweather.com",
"api/location/" & woeid,
[]
)
in
response,
type any
)
in
city_mergedRequest
Html.Table – Parsing with CSS Selectors
Select a Single Element: an Image, url, text, etc…
This fetches the current heading text on the blog
let
Url = "https://powerbi.microsoft.com/en-us/blog/",
Response = Web.Contents( Url ),
/*
note: normally do not pass dynamic urls like this, see cheatsheet on preventing refresh errors
Non-tabular scraping like Images or any single elements, does not use a "RowSelector"
This CSS Selector finds exactly one element, the Page's Header Text
.section-featured-post .text-heading1 a
*/
HeaderText = Html.Table(
Response,
{
{ "Page Header", ".section-featured-post .text-heading1 a" }
}
)
in
HeaderText
Parsing Element’s Attributes
let
Url = "https://powerbi.microsoft.com/en-us/blog/",
Response = Web.Contents( Url ),
/*
The 3rd argument in "columnNameSelectorPairs" is the transformation function.
by default it uses:
each _[TextContent]
*/
HeaderAsElement = Html.Table(
Response,
{ { "Link", ".section-featured-post .text-heading1 a", each _ } }
),
ElementRecord = Table.ExpandRecordColumn(
HeaderAsElement, "Link",
{"TagName", "TextContent", "Attributes"}, {"TagName", "TextContent", "Attributes"}
),
ExpandedAttributes = Table.ExpandRecordColumn(
ElementRecord, "Attributes",
{"href", "rel", "title"}, {"attr.href", "attr.rel", "attr.title"}
)
in
ExpandedAttributes
Select Tables using your own CSS Selectors
Results of the RowSelectorResults of columnNameSelectorPairs
// Docs on Enum
let
Source = "https://docs.microsoft.com/en-us/previous-versions/dynamics/ax-2012/reference/gg841505(v=ax.60)",
// note: normally do not pass dynamic urls like this, see cheatsheet on preventing refresh errors
Response = Web.BrowserContents( Source ),
/*
Think of "RowSelector" as selecting a table row
Then for every row, you select "columns" using the "columnNameSelectorPairs" selector
The combination gives you a table cell.
For more on CSS Selectors, see: <https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors>
*/
columnNameSelectorPairs = {
// column names don't matter here, since I'm using .PromoteHeaders
{ "Column1", "TABLE.table > * > TR > :nth-child(1)" },
{ "Column2", "TABLE.table > * > TR > :nth-child(2)" },
{ "Column3", "TABLE.table > * > TR > :nth-child(3)" }
},
t1 = Html.Table(
Response, columnNameSelectorPairs,
[RowSelector = "TABLE.table > * > TR"]
),
t2 = Table.PromoteHeaders( t1, [PromoteAllScalars = true] ),
FinalTable = Table.TransformColumnTypes(
t2,
{ { "Name", type text }, { "Value", Int64.Type }, { "Description", type text} }
)
in
FinalTable
SQL Native Query
Parameterized SQL queries with Value.NativeQuery()
let
Source = Sql.Database("localhost", "Adventure Works DW"),
Test = Value.NativeQuery(
Source,
"SELECT * FROM DimDate
WHERE EnglishMonthName=@MonthName",
[
MonthName = "March",
DayName = "Tuesday"
]
)
in
Test
Other
Cartesian Product
let
#"Add Column Pair2" = Table.AddColumn(
Table_Pairs1, "Pairs2",
each Table_Pairs2,
Table.Type
),
#"Expanded Pairs" = Table.ExpandTableColumn(
#"Add Column Pair2",
"Pairs2",
{"Color", "Property"},
{"Color", "Property"}
)
in
#"Expanded Pairs"
You can make start with ‘enter data’, then duplicate that line. You end up with two steps, each are their own enter data
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcitKzEvOLFbSUfLJrEosSgEyTJRidaKVHHNSKxJTijITgSLuqcnZ+UDaEiwTkgpS5ZtfkgGkjJRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Species = _t, Age = _t]),
Source2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8krNU9JRciotKUktSsupBLINlWJ1opWCE4sSM4A858QSIGlkDBYMSU0Bclzy04GkiYFSbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Species = _t, Age = _t]),
Final = Table.Combine( { Source, Source2 } )
in
Final
Convert Table to JSON
TableToJson = (source as table, optional encoding as nullable number) as text =>
let
encoding = encoding ?? TextEncoding.Utf8,
bin = Json.FromValue(source, encoding),
jsonAsText = Text.FromBinary(bin, encoding)
in
jsonAsText
let
// Converts a list of any type to text. Works well on most types
// although to support all cases, it requires more logic
mixedList = {4, "cat", #date(1999,5,9), 0.4},
ListAsText = List.Transform(mixedList, each Text.From(_)),
CsvText = Text.Combine( ListAsText, ", ")
in
//output: "4, cat, 5/9/1999, 0.4"
CsvText
Functions
Using Optional Parameters with Default Values
The Null_coalescing_operator ?? simplifies default values. encoding will be set to what the user passed, unless it’s null. In that case, it is set to TextEncoding.Utf8
let
Read_Json = (json as any, encoding as nullable number) as any =>
// calls Json.Document, using UTF8 by default
let
encoding = encoding ?? TextEncoding.Utf8,
result = Json.Document(json, encoding)
in
result
in
Read_Json
Mapping Function Calls based on Type
Caller chooses which type of conversioin to use, based on type names { date, datetime, datetimezone }
let
// 1] get a `type` from caller
// 2] return a difference function based on the type
GetTransformByType = (_type as type) as function =>
let
// originally from: <https://docs.microsoft.com/en-us/power-query/helperfunctions#tablechangetype>
func_transform =
if (Type.Is(_type, type date)) then Date.From
else if (Type.Is(_type, type datetime)) then DateTime.From
else if (Type.Is(_type, type datetimezone)) then DateTimeZone.From
else (t) => t // else return self
in
func_transform,
nowDtz = DateTimeZone.LocalNow(),
// invoke in 2 steps
toDate = GetTransformByType(type date),
callDate = toDate( DateTimeZone.FixedLocalNow() ),
// create, and invoke functions
Results = [
asDate = (GetTransformByType(type date))( nowDtz ),
asDateTime = (GetTransformByType(type datetime))( nowDtz ),
asDateTimeZone = (GetTransformByType(type datetimezone))( nowDtz )
]
in
Results
The output of diff -q path1 path2 is pretty verbose. This function
Converts full paths to relative
Differences are red.
Missing ‘diff’ ?
If git is installed, you may need to update your %PATH% environment variable.
# in your profile
$Env:Path = "$Env:ProgramFiles\Git\usr\bin", $Env:Path -join ';'
Stand-Alone function Compare-Directory
This is an isolated version of <a href="https://github.com/ninmonkey/Ninmonkey.Console">Ninmonkey.Console: Compare-Directory</a> . I removed all dependencies except coloring is provided by the module PoshCode/Pansies.
function Invoke-NativeCommand {
<#
.synopsis
wrapper to both call 'Get-NativeCommand' and invoke an argument list
.example
PS> # Use the first 'python' in path:
Invoke-NativeCommand 'python' -Args '--version'
#>
param(
# command name: 'python' 'ping.exe', extension is optional
[Parameter(Mandatory, Position = 0)]
[string]$CommandName,
# Force error if multiple binaries are found
[Parameter()][switch]$OneOrNone,
# native command argument list
[Alias('Args')]
[Parameter(Position = 1)]
[string[]]$ArgumentList
)
$binCommand = Get-NativeCommand $CommandName -OneOrNone:$OneOrNone -ea Stop
& $binCommand @ArgumentList
}
function Compare-Directory {
<#
.SYNOPSIS
Compare Two directories using 'diff'
.EXAMPLE
Compare-Directory 'c:\foo' 'c:\bar\bat'
#>
[Alias('DiffDir')]
param(
# Path1
[Parameter(Mandatory, Position = 0)]
[string]$Path1,
# Path2
[Parameter(Mandatory, Position = 1)]
[string]$Path2,
# Output original raw text?
[Parameter()][switch]$OutputRaw
)
$Base1 = $Path1 | Get-Item -ea Stop
$Base2 = $Path2 | Get-Item -ea Stop
$Label1 = $Base1 | Split-Path -Leaf | New-Text -fg 'green'
$Label2 = $Base2 | Split-Path -Leaf | New-Text -fg 'yellow'
"Comparing:
Path: $Path1
Path: $Path2
" | Write-Information
$stdout = Invoke-NativeCommand 'diff' -args @(
'-q'
$Base1
$Base2
)
$outColor = $stdout
$outColor = $outColor -replace [regex]::Escape($path1), $Label1
$outColor = $outColor -replace [regex]::Escape($path2), $Label2
$outColor = $outColor -replace 'Only in', (New-Text 'Only In' -fg 'red')
$outColor = $outColor -replace 'Differ', (New-Text 'Differ' -fg 'red')
if ($OutputRaw) {
h1 'Raw' | Write-Information
$stdout
return
}
$outColor
}
function Get-NativeCommand {
<#
.synopsis
wrapper that returns Get-Item on a native command
.example
# if you want an error when multiple options are found
PS> Get-NativeCommand python -OneOrNone
.example
# note: this is important, $cmdArgs to be an array not scalar for '@' usage
$binPy = Get-NativeCommand python
$cmdArgs = @('--version')
& $binPy @cmdArgs
.example
#>
[cmdletbinding()]
param(
# Name of Native .exe Application
[Parameter(Mandatory, Position = 0, ValueFromPipeline)]
[object]$CommandName,
# One or None: Raise errors when there are more than one match
[Parameter()][switch]$OneOrNone
)
process {
try {
$query = Get-Command -Name $CommandName -All -CommandType Application -ea Stop
| Sort-Object Name
} catch [CommandNotFoundException] {
Write-Error "ZeroResults: '$CommandName'"
return
}
if ($OneOrNone -and $query.Count -gt 1) {
$query | Format-Table -Wrap -AutoSize -Property Name, Version, Source
Write-Error "OneOrNone: Multiple results for '$CommandName'"
return
}
if ($query.Count -gt 1) {
$query = $query | Select-Object -First 1
}
Write-Debug "Using Item: $($query.Source)"
$query
}
}
Sometimes you’ll need to run a command with the same input with different logic. This can be a hassle using a slow command like Get-ADUser or Get-ChildItem on a lot of files like ~ (Home) with -Depth / -Recurse
Power BI Discord asked the question: > How do you use duplicate keys, and dynamic urls with <a href="https://docs.microsoft.com/en-us/powerquery-m/web-contents">Web.Contents</a> ? (They were using a web API that required duplicate keys) After I wrote this, Chris Webb found an easier solution.
Requirements
You can’t use a regular record because keys must be distinct. Query = [ Key1 = 1, Key1 = 10] will throw an error.
You can’t put the **dynamic url** in the first argument Web.Contents or else refreshes can break
let
QueryStr_UsingDuplicateKeys = (key as text, values as list) as text =>
// values are the 'value' of 'key'-> 'value' pairs
let
escapedList = List.Transform(
values,
each
key & "=" & Uri.EscapeDataString( Text.From(_) )
),
joinedArgs = Text.Combine(escapedList, "&")
in
joinedArgs
in
QueryStr_UsingDuplicateKeys
Note: BaseUrl is for the static part of the url. Everything else should be in options[RelativePath] or options[Query] See docs: Web.Contents for details.
Which results in the url: <a href="http://jsonplaceholder.typicode.com/comments">http://jsonplaceholder.typicode.com/comments</a> This is a good place to use Power Query‘s null coalesce operator (Which isn’t in the official docs)
The default UI sets your column to type any.You can use the optional argument of Table.AddColumn to set it to numberOr you can declare your function’s return type
Why doesn’t the original [Num] * 2 work?
Powerquery does not know what type will be returned by your function. That’s because each is by definition a function that returns type any