Discussion:
WANTED: a C++ library for pattern matching filenames (wildcarding)
(too old to reply)
Andrew
2010-12-07 12:41:00 UTC
Permalink
I am writing some code that traverses a directory, looking for
matching filenames. At the moment the match is done using regular
expressions, using boost::regex. But I am not completely happy with
this. I reckon most people will expect pattern matching to work with
wildcarding as supported by the operating system. But I do not know of
a library that does that.

Of course on Unix it is very rare that one has to worry about such
things because wildcards on the command line are expanded by the
shell. I know that DOS/Windoze doesn't do this. Sigh. But every now
and then an app needs to be able to do the matching itself internally
so one cannot use the faciities provided by the shell. It has to be
done by the code.

Does anyone know of a cross-platform library for doing this please?
BTW, I do not want to use the facilities in boost::filesystem. I have
been burnt by it before with insufficient support for WIN32. I have my
own directory iterator that uses the WIN32 API on Windoze and opendir
on POSIX. It's just the wildcard filename matching I need.

Regards,

Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Goran
2010-12-07 23:10:43 UTC
Permalink
Post by Andrew
I am writing some code that traverses a directory, looking for
matching filenames. At the moment the match is done using regular
expressions, using boost::regex. But I am not completely happy with
this. I reckon most people will expect pattern matching to work with
wildcarding as supported by the operating system. But I do not know of
a library that does that.
Of course on Unix it is very rare that one has to worry about such
things because wildcards on the command line are expanded by the
shell. I know that DOS/Windoze doesn't do this. Sigh. But every now
and then an app needs to be able to do the matching itself internally
so one cannot use the faciities provided by the shell. It has to be
done by the code.
This is +/- platform-specific, so what' wrong with opendir and
FindFirst/NextFile? Define you interface to the functionality, then
use platform-specific code underneath.

That said, there must be code that does this already somewhere...

Goran.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
ShaunJ
2010-12-07 23:13:18 UTC
Permalink
{ please avoid top-posting. -mod }

Hi Andrew,

See glob(3) and wordexp(3)
http://www.opengroup.org/onlinepubs/9699919799/functions/glob.html
http://www.opengroup.org/onlinepubs/9699919799/functions/wordexp.html

which are both defined by the Single Unix Specification (SUS).

Cheers,
Shaun
Post by Andrew
I am writing some code that traverses a directory, looking for
matching filenames. At the moment the match is done using regular
expressions, using boost::regex. But I am not completely happy with
this. I reckon most people will expect pattern matching to work with
wildcarding as supported by the operating system. But I do not know of
a library that does that.
Of course on Unix it is very rare that one has to worry about such
things because wildcards on the command line are expanded by the
shell. I know that DOS/Windoze doesn't do this. Sigh. But every now
and then an app needs to be able to do the matching itself internally
so one cannot use the faciities provided by the shell. It has to be
done by the code.
Does anyone know of a cross-platform library for doing this please?
BTW, I do not want to use the facilities in boost::filesystem. I have
been burnt by it before with insufficient support for WIN32. I have my
own directory iterator that uses the WIN32 API on Windoze and opendir
on POSIX. It's just the wildcard filename matching I need.
Regards,
Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Andrew
2010-12-10 13:02:13 UTC
Permalink
Post by ShaunJ
{ please avoid top-posting. -mod }
Hi Andrew,
See glob(3) and wordexp(3)http://www.opengroup.org/onlinepubs/9699919799/functions/glob.htmlhttp://www.opengroup.org/onlinepubs/9699919799/functions/wordexp.html
which are both defined by the Single Unix Specification (SUS).
Thanks for the info but this is just the POSIX spec, not a portable
library. These functions are not available for Windoze and it turns
out there is a good reason.

Globbing in a portable way is really difficult. The POSIX routines
only work on ASCII. Windoze allows unicode filenames, hence globbing
routines for Windoze need to cope with unicode. I have seen some code
in Poco that does the job but it cannot be lifted easily because the
routine is built using its Path and unicode handling facilities.

If I was able to use Poco on the project I was on then I would
probably use their globbing functions. But as it is, I cannot. It was
achievement enough that I was able to move to a recent version of
boost.

I will stick with regular expressions for now. I added a flag that
allows the caller to reverse the sense of the regular expression
match. This makes it easier to use directory name patterns to exclude
directories as well as include them. This is similar to what you get
when walking a directory using a combination of find and grep with the
-v flag. This is something that can't easily be done using globbing
anyway. Furthermore, POSIX regular expressions have insufficient
support for identifying words so it seemed better to match words then
reverse the sense so such words are excluded. I know that PCREs (perl-
compatible regular expressions) have better word support but I want to
stick with POSIX since it is more familiar to me and the people I work
with.

Regards,

Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Edward Rosten
2010-12-13 17:05:53 UTC
Permalink
Post by Andrew
Post by ShaunJ
{ please avoid top-posting. -mod }
Hi Andrew,
See glob(3) and wordexp(3)http://www.opengroup.org/onlinepubs/9699919799/functions/glob.htmlhtt...
which are both defined by the Single Unix Specification (SUS).
Thanks for the info but this is just the POSIX spec, not a portable
library. These functions are not available for Windoze and it turns
out there is a good reason.
There is similar functionality available in Windows, and the glue code
for the two operating systems seems short. Here's an example of a
function of the following signature:

std::vector<std::string> globlist(const std::string& gl);

Implemented for POSIX (by me):
http://cvs.savannah.gnu.org/viewvc/libcvd/libcvd/cvd_src/globlist.cxx?revision=1.7&view=markup

Implemented for Windows (by someone else):
http://cvs.savannah.gnu.org/viewvc/libcvd/libcvd/cvd_src/Win32/win32.cpp?revision=1.4&view=markup

I don't know how well the Windows version works, as I've never used
it.

-Ed
--
(You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258)

/d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1
r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12
d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Miles Bader
2010-12-08 12:04:34 UTC
Permalink
Post by Andrew
I am writing some code that traverses a directory, looking for
matching filenames. At the moment the match is done using regular
expressions, using boost::regex. But I am not completely happy with
this. I reckon most people will expect pattern matching to work with
wildcarding as supported by the operating system. But I do not know of
a library that does that.
If you already have code using regexps, can't you just add a frontend
that does a simple mapping of platform-specific globbing chars into the
appropriate regexp fragments?

I.e., on unix, "?" => "[^/]", "*" => "[^/]*", etc.

-Miles
--
Patience, n. A minor form of despair, disguised as a virtue.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Loading...