summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* meosn.build: prepare for gentoo-functions-1.7.2HEADgentoo-functions-1.7.2masterSam James7 days1-1/+1
| | | | Signed-off-by: Sam James <sam@gentoo.org>
* Render the non-bash srandom() implementation fasterKerin Millar7 days2-4/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, there are three implementations of srandom(), one of which is the preferred implementation for shells other than bash. It is a little on the slow side as it has to fork and execute both od(1) and tr(1) every time, just to read 4 bytes. Accelerate it by having the shell maintain its own entropy pool of up to 512 hex digits in size. Consider the following benchmark. i=0; while [ $((i += 1)) -le 30000 ]; do srandom; done >/dev/null As conducted with dash on a system with a 2nd generation Intel Xeon, I obtained the following figures. BEFORE real 0m49.878s use 1m1.985s sys 0m17.035s AFTER real 0m12.866s user 0m12.559s sys 0m0.962s It should be noted that the optimised routine will only be utilised in cases where the kernel is Linux and the shell has not forked itself. $ uname Linux $ srandom # uses the fast path $ number=$(srandom) # subshell; probably uses the slow path $ srandom | { read -r number; } # ditto Still, there are conceivable use cases for which this optimisation may prove useful. Below is an example in which it is known in advance that up to 100 random numbers are required, and where writing them to temporary storage is not considered to be a risk. i=0 tmpfile=${TMPDIR:-/tmp}/random-numbers.$$.$(srandom) while [ $((i += 1)) -le 100 ]; do srandom done > "$tmpfile" while read -r number; do do_something_with "$number" done < "$tmpfile" Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Remedy false positives in categories SC2034 and SC2154Kerin Millar7 days3-5/+9
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Exempt _should_throttle() from shellcheck SC2317Kerin Millar7 days1-0/+1
| | | | | | | | The _should_throttle() function gets the best of shellcheck, which incorrectly reports that there is unreachable code. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: consistently align the test parameter declarationsKerin Millar7 days1-132/+132
| | | | | | | This is merely a whitespace cleanup. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: don't increment testnum by 2 for test_ebegin()Kerin Millar7 days1-8/+10
| | | | | | | | Also, restore the correct test_description string, which was being lost in a subshell. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: comment as to why test_quote_args() fails for yashKerin Millar7 days1-0/+6
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Use the -nt and -ot test primaries again rather than depend on GNU findKerin Millar7 days2-69/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As regards the test(1) utility, the POSIX.1-2024 specification defines the -nt and -ot primaries as standard features. Given that the specification in question was only recently published, this would not normally be an adequate reason for using them in gentoo-functions, in and as of itself. However, I was already aware that the these primaries are commonly implemented and have been so for years. So, I decided to evaluate a number of shells and see how things stand now. Here is a list of the ones that I tested: - ash (busybox 1.36.1) - dash 0.5.12 - bash 5.2.26 - ksh 93u+ - loksh 7.5 - mksh 59c - oksh 7.5 - sh (FreeBSD 14.1) - sh (NetBSD 10.0) - sh (OpenBSD 7.5) - yash 2.56.1 Of these, bash, ksh93, loksh, mksh, oksh, OpenBSD sh and yash appear to conform with the POSIX-1.2024 specification. The remaining four fail to conform in one particular respect, which is as follows. $ touch existent $ set -- existent nonexistent $ [ "$1" -nt "$2" ]; echo "$?" # should be 0 1 $ [ "$2" -ot "$1" ]; echo "$?" # should be 0 1 To address this, I discerned a reasonably straightforward workaround that involves testing both whether the file under consideration exists and whether the variable keeping track of the newest/oldest file has yet been assigned to. As far as I am concerned, the coverage is more than adequate for both primaries to be used by gentoo-functions. As such, this commit adjusts the following three functions so as to do exactly that. - is_older_than() - newest() - oldest() It also removes the following functions, since they are no longer used. - _find0() - _select_by_mtime() With this, GNU findutils is no longer a required runtime dependency. Of course, should a newly introduced feature of gentoo-functions benefit from the presence of findutils in the future, there is no reason that it cannot be brought back in that capacity. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: comment as to the implications of test_local() failingKerin Millar7 days1-0/+8
| | | | | | | | In particular, comment as to why the test can be expected to fail for ksh93 and - in some cases - yash. Signed-off-by: Kerin Millar <kfm@plushava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: account for the potential absence of test(1) as a builtinKerin Millar7 days1-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, the test_whenceforth() function potects itself from being adversely affected by printf(1) not being a builtin utility. Consider the following test. PATH=. whenceforth -x newer/file Owing to the modification of PATH, it becomes impossible to execute any of the standard utilities unless they happen to be builtins. The workaround is to temporarily define printf as a function which duly executes the external utility. Having run the test suite with the yash shell, it has served as a sharp reminder that one cannot assume that test(1) is always available as a builtin either. In fact, yash implements test(1) as a "substitutative built-in command". Below is the relevant material from its manual. - https://magicant.github.io/yash/doc/builtin.html#types - https://magicant.github.io/yash/doc/exec.html#search - https://magicant.github.io/yash/doc/index.html#builtins It is a curious thing, to say the least. Essentially, substitutative builtins can only be used for as long as an executable of the same name can be found in PATH. Since the purpose of test_whenceforth() is not to directly evaluate the behaviour of the test(1) utility, this commit implements the same safeguard for test(1) as is present for printf(1). Signed-off-by: Kerin Millar <kfm@plushava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Render _update_time() a no-op for the yash shellKerin Millar7 days2-1/+14
| | | | | | | | | | | | | | | | | | When integer overflow occurs in a non-interactive yash shell, it prints "yash: arithmetic: overflow" as a diagnostic message before proceeding to exit. That makes it extremely difficult for the arithmetic in the _should_throttle() function to be implemented safely for it. For now, ensure that _update_time() does nothing for yash but return a non-zero status code. In turn, this disables the rate limiting feature for yash. Additionally, refrain from running test_update_time() and test_should_throttle() for yash in test-functions. The former would only amount to a waste of time and the latter would be guaranteed to fail. For the record, my testing was performed with yash 2.56.1. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Handle integer overflow as a special case in _should_throttle()Kerin Millar7 days2-7/+62
| | | | | | | | | | | | | | | | At the point that the genfun_time variable overflows, guarantee that the should_throttle() function behaves as if no throttling should occur rather than proceed to perform arithmetic based on the result of deducting genfun_last_time from genfun_time. Further, guarantee that the should_throttle() function behaves as if no throttling should occur upon the very first occasion that it is called, provided that the call to update_time() succeeds. Finally, add a test case. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Rename quote_args_bash() to _quote_args_bash()Kerin Millar7 days1-21/+24
| | | | | | | For it need not be in the public name space. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: choose a better variable name for storing the temp dirKerin Millar7 days1-5/+5
| | | | | | | | | | | The name, dir, is rather generic. Rename it to global_tmpdir to diminish the likelihood of an accidental name space conflict. Also, don't pass the -f option to rm(1) at the point that the directory is to be removed. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Implement a variant of quote_args() optimised for bashKerin Millar7 days1-0/+26
| | | | | | | | | | | | | Add the quote_args_bash() function, which will be called from quote_args() under the appropriate circumstances. It is faster than the sh implementation, not merely because it takes advantage of the ${parameter@Q} form of parameter expansion, but also because executing external utilities exacts a greater performance toll for bash than it does for, say, dash. The difference is appreciable if running the test suite. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: silence several shellcheck false-positivesKerin Millar7 days1-1/+3
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: declare local variables where possibleKerin Millar7 days1-2/+31
| | | | | | | | | Given that test-functions bails out immediately in the absence of a conventional local builtin, one might as well. Besides, it would be trivial to eliminate local in the future, if so desired. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: have three tests employ callback functionsKerin Millar7 days1-51/+48
| | | | | | | | | Convert test_local(), test_ebegin() and test_quote_args() so as to declare and use callbacks, just like the other tests. An appreciable code cleanup is the result. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Avoid unspecified behaviour around simple commands in generalKerin Millar7 days2-24/+35
| | | | | | | | | | | | | | | | | | | | | | As mentioned by the previous commit, the Shell Command Language leaves it unspecified as to whether variable assignments affecting the execution environment of a simple command charged with executing a function (that is not the implementation of a standard utility) shall persist after the completion of the function. It transpires that modifying gentoo-functions so as to steer clear of this pitfall isn't particularly difficult so this commit does exactly that. Most of the changes are in test-functions but functions/rc.sh also required some minor changes regarding the use of the GENFUN_CALLER variable. With this, loksh very nearly passes the test suite. There is one individual test that continues to fail, although it looks as though that may be caused by a genuine bug on the part of the shell. That will require investigating in its own right. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: test for simple commands persisting environmental changesKerin Millar7 days1-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some implementations allow for alterations made to the execution environment to persist beyond the scope of a simple command. Consider loksh as a case in point. $ f() { :; } $ unset LEAKED $ LEAKED=1 /bin/true; echo "LEAKED = $LEAKED" LEAKED = $ LEAKED=1 cmd2; echo "LEAKED = $LEAKED" LEAKED = 1 Strictly speaking, such behaviour is permitted. The Shell Command Language specification states: """ If the command name is a function that is not a standard utility implemented as a function, variable assignments shall affect the current execution environment during the execution of the function. It is unspecified: - Whether or not the variable assignments persist after the completion of the function - Whether or not the variables gain the export attribute during the execution of the function - Whether or not export attributes gained as a result of the variable assignments persist after the completion of the function (if variable assignments persist after the completion of the function) """ Unfortunately, loksh elects not to be aligned with the practices of the overwhelming majority of implementations in this regard. For now, have test-functions detect and abort for shells that go against the grain. I shall consider reviewing and adapting gentoo-functions to account for such unspecified behaviour but it is not an immediate priority. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: avoid unspecified behaviour in test_quote_args()Kerin Millar7 days1-1/+1
| | | | | | | | | | | | | | | | | In test_quote_args(), there is the following code. fmt=$(printf '\%o' "$i") However, the behaviour of the <backslash> character followed by the <number-sign> character is unspecified. Since it is intended to be taken as a literal backslash, fix it by writing it as thus. fmt=$(printf '\\%o' "$i") Doing so addresses a spurious test failure where using the loksh shell. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Have srandom() employ an upper bound of 2^31-1Kerin Millar7 days2-9/+45
| | | | | | | | | | | | | | | | | | In the case of some shells - mksh, at least - the maximum value of an integer is 2147483647. Such is a consequence of implementing integers as signed int rather than signed long, even though doing so contravenes the specification. Reduce the output range of srandom() so as to be between 0 and 2147483647, rather than 0 and 4294967295. A change of this scope would normally justify incrementing GENFUN_API_LEVEL but I shall not do so on this occasion. My rationale is that >=gentoo-functions-1.7 has not yet had enough exposure for srandom() to be in use by other projects. Additionally, have test-functions test srandom() 10 times instead of 5. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: check numerical bounds with awk in test_srandom()Kerin Millar7 days1-2/+1
| | | | | | | | | | | | | Use awk(1) to test whether the numbers produced by the srandom() function are within bounds. One cannot necesarily rely upon the shell to perform this task. Consider mksh(1) as a case in point. Contrary to the specification, it implements integers as signed int rather than signed long. Consequently, it can only handle numbers between -2147483648 and 2147483647, resulting in easily reproducible test failures caused by overflow. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Avoid a subshell for is_identifier()Kerin Millar7 days2-5/+13
| | | | | | | Also, extend the coverage of the test suite a little further. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Re-wrap a comment in get_nprocs()Kerin Millar7 days1-1/+2
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Document POSIXLY_CORRECT as an influential variableKerin Millar7 days1-0/+1
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: refactor the newest() test caseKerin Millar7 days1-17/+29
| | | | | | | | | | | | Rework the test case for the newest() function in accordance with the recently added test case for the oldest() function. The resulting code is more pleasant to read and maintain. In doing so, an obscure bug has been addressed. Hitherto, an empty NUL-terminated record had erroneously being conveyed to newest() for just one of the 28 individual sub-tests being conducted. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: test the oldest() functionKerin Millar7 days1-0/+67
| | | | | | | Test the oldest() function in addition to the newest() function. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* Make _select_by_mtime() work correctly for paths read from STDINKerin Millar7 days2-3/+19
| | | | | | | | | | | | | | | The _select_by_mtime() function is called by both newest() and oldest(). Pathnames may be specified as positional parameters or as NUL-separated records to be read from the standard input. Unfortunately, the latter interface does not work at all. Rectify this by checking whether the number of parameters is greater then 0, rather than greater than or equal to 0. Also, extend the existing test case in such a way that the interface in question is tested. Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test_functions: check that genfun_time is greater than -1Kerin Millar7 days1-1/+1
| | | | | | | | After all, it is never expected to be negative. Signed-off_by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* test-functions: try to test a locale whose radix character isn't U+2EKerin Millar7 days1-17/+40
| | | | | Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* meson.build: avoid passing an absolute path to install_subdir()Kerin Millar7 days1-1/+1
| | | | | | | | | | Otherwise, some of the files end up outside of EPREFIX. Fixes: 2a58c0e462538b7fb2d12cd95157a9aaf2b7f7ff Bug: https://bugs.gentoo.org/937463 Reported-by: Fabian Groffen <grobian@gentoo.org> Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* meson.build: prepare for gentoo-functions-1.7.1gentoo-functions-1.7.1Sam James13 days1-1/+1
| | | | Signed-off-by: Sam James <sam@gentoo.org>
* Ensure a radix character of U+2E in _update_time()Kerin Millar13 days2-2/+26
| | | | | | | | | | | | | | | | | | | | I overlooked that bash respects the radix character defined by the locale in the course of synthesizing the value of the EPOCHREALTIME value. Set LC_NUMERIC as C to guarantee that the radix character is considered as U+2E (FULL STOP) within the scope of the bash-specific function. Doing so also addresses a distinct issue whereby the invocation of printf was sensitive to the implied value of LC_NUMERIC. Another way to address this would have been to set LC_ALL as C. I decided not to because it would decrease the likelihood of the relevant diagnostic messages being rendered in the user's native language. Additionally, add a test case. Closes: https://bugs.gentoo.org/937376 Reported-by: Christian Bricart <christian@bricart.de> Signed-off-by: Kerin Millar <kfm@plushkava.net> Signed-off-by: Sam James <sam@gentoo.org>
* meson.build: prepare for gentoo-functions-1.7gentoo-functions-1.7Sam James14 days1-1/+1
| | | | Signed-off-by: Sam James <sam@gentoo.org>
* Do not yet deprecate RC_NOCOLORKerin Millar14 days1-3/+1
| | | | | | | It would be sensible to conduct a survey to determine whether - and where - it is being used beforehand. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Add the assign() and deref() functionsKerin Millar14 days2-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two functions are primarily intended to mitigate the appalling use of eval in projects such as netifrc and openrc. Consider the following code. net/iproute2.sh:29: eval netns="\$netns_${IFVAR}" This could instead be be written as: deref "netns_${IFVAR}" netns Alternatively, it could be written so as to use a command substitution: netns=$(deref "netns_${IFVAR}") Either method would protect against against illegal identifier names and code injection. Consider, also, the following code. net/iproute2.sh:185: eval "$x=$1" ; shift ;; This could instead be written as: assign "$x" "$1" As with deref, it would protect against illegal identifier names and code injection. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* test-functions: add several shellcheck exemptionsKerin Millar2024-08-031-1/+3
| | | | | | | Notably, SC2317 and SC3034 in the global scope. The former produces false positives whereas the latter permits the use of local. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* test-functions: jettison a few shellcheck exemptionsKerin Millar2024-08-031-2/+1
| | | | | | They are no longer applicable. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Alter a variable name in quote_args()Kerin Millar2024-08-031-2/+2
| | | | | | | Now that POSIX-1.2024 has been ratified, strictly_posix no longer makes sense as a variable name. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have chdir() enforce POSIX interpretation 1047Kerin Millar2024-08-032-19/+11
| | | | | | | | | | | | POSIX-1.2024 (Issue 8) requires for the cd builtin to raise an error where given an empty directory operand. However, various implementations have yet to catch up. Given that it is a sensible change, let's have the chdir() function behave accordingly. Further, since doing so renders the test_chdir_noop test useless, get rid of it. The purpose that the test served is now subsumed by test_chdir. Closes: https://bugs.gentoo.org/937157 Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Have hr() employ a divide-by-16 strategyKerin Millar2024-08-022-8/+8
| | | | | | | | | | | | | | | A factor of 16 was shown to be faster on average by timing how long it takes for bash to print a rule 5000 times for all lengths between 40 and 132, inclusive. Factor Time StdDev 8 87.004000 3.961607 16 82.893000 3.971257 Further, 16 remains a factor of 80, which is often the number of columns that a terminal emulator is initialised with. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Jettison the bash-specific hr() implementationKerin Millar2024-08-021-15/+9
| | | | | | | | | | | | | | | | | Testing the BASH variable for non-emptiness is an inadequate pretext for activating the bash-optimised code path. Instead, the test would have to be implemented like so ... if ! case ${BASH_COMPAT} in 3?|4[012]) false ;; esac && _has_bash 4 3 then ... fi Given that hr() is not expected to be called often, and that the sh code was already improved by employing a divide-by-8 strategy, I don't consider it to be worth the trouble. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Adhere to the Allman style for _select_by_mtime()Kerin Millar2024-08-021-1/+2
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Explain that get_nprocs() is called by parallel_run()Kerin Millar2024-08-021-4/+4
| | | | Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Move is_subset() to experimentalKerin Millar2024-08-023-41/+41
| | | | | | | I'm not yet ready to commit to it being among the core functions for the inaugural API level. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render hr() faster still for shells other than bashKerin Millar2024-08-022-2/+6
| | | | | | | Reduce the number of loop iterations by initially trying to append characters 8 at a time. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render hr() fasterKerin Millar2024-08-022-15/+21
| | | | | | | | | | Render hr() faster by eliminating the requirement to fork and execute any external utilities after having established the intended length of the rule. Also, use printf -v and string-replacing parameter expansion where the shell is found to be bash. Doing so helps considerably because bash is very slow at looping. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render contains_all() and contains_any() fasterKerin Millar2024-08-022-189/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-implement the contains_all() and contains_any() functions in such a way that they are faster than their forebears by an order of magnitude. In order to achieve this level of performance, the value of IFS is no longer taken into account. Instead, words are always presumed to be separated by characters matching the [[:space:]] character class. Consider a scenario in which the FEATURES variable is comprised of 33 words. $ FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" Let's say that the contains_any function is used to search for 10 words, where only the 10th can be matched and where FEATURES must be scanned in its entirety exactly 10 times. $ contains_any "$FEATURES" the quick brown fox jumped over the lazy hen xattr The following benchmarks show how long it took to call the function 50,000 times consecutively on a system with an Apple M1 CPU for both the original and new implementations. This is with the dash shell. contains_any (BEFORE) real 0m19.135s user 0m16.781s sys 0m2.258s contains_any (AFTER) real 0m1.571s user 0m1.497s sys 0m0.063s Now let's say that the contains_all function is used to search for 3 words, where all can be matched while requiring for FEATURES to be scanned in its entirety at least once. $ contains_all "$FEATURES" assume-digests news xattr Again, The following benchmarks show how long it took to call the function 50,000 times consecutively. contains_all (BEFORE) real 1m8.052s user 0m19.363s sys 0m42.742s contains_all (AFTER) real 0m0.689s user 0m0.627s sys 0m0.057s The performance improvements are similarly impressive if using bash. Signed-off-by: Kerin Millar <kfm@plushkava.net>
* Render quote_args() robust and implement a test caseKerin Millar2024-08-022-25/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coerce the effective character set as being C (US-ASCII) in the course of executing awk(1). Some implementations are strict and will otherwise fail in situations where the bytes cannot be decoded. $ uname -o Darwin $ echo "$LC_ALL" en_GB.UTF-8 $ printf '\200' | awk '/[\001-\037\177-\377]/' awk: towc: multibyte conversion failure on: '' In the above case, awk aborts because it has a need to decode the input, which turns out not to be valid UTF-8. Now, it is rather beyond the purview of quote_args() to guarantee that its parameters adhere to any particular character encoding. Fortunately, for it to contend with strings on a byte-by-byte basis is acceptable. Refactor the code somewhat. The behaviour has been adjusted so to be virtually identical to that of the "${*@Q}" expansion in bash, with the exception that the ESC character is rendered as $'\e' instead of $'\E'. Such an exception is necessary for POSIX-1.2024 conformance, wherein dollar-single-quotes are now a standard feature (see section 2.2.4 of the Shell Command Language). Revise the comment preceding the function so as to accurately document its behaviour. Finally, add a test case. It works by calling quote_args for every possible single-byte string before calculating a CRC checksum for the cumulative output and comparing it against a pre-determined value. Signed-off-by: Kerin Millar <kfm@plushkava.net>