Belated Response to appenwar's
'tokenizers'
Blog
Most of the points have been well-spoken by others with more
experience than I have; therefore, I'll stick to my own
points. This has less to do with what you actually said and
more to do with the principle.
One thing that always irritates me is how gcc will ignore
C/C++ file extensions and take a guess, or
it will default to C++. For example, a .h will only be
taken as C if included strictly by a chain of C files and
only if you don't use g++. One must therefore include the
awkward #ifdef __cplusplus \ extern "C" {
because some people don't know how to use the correct file
extensions, otherwise you might have linking problems if
your header is actually backed by a C source. If
you have to use a
C feature not carried over to C++ in a C file (e.g.
the .sym =
member initializer,) you
can't #include
your file in a C++
file even with extern "C"
. You can also get
away with not qualifying structure variables with
struct
in C headers if a C++ file includes it. All of this leads
to less concise code, all because of acceptable ambiguity. I do
concede that early
C++ used .h extensions for the standard headers, so it's
partly lack of foresight.
Today I finally got around to using libxml2, which struck me
as extensively (yet somehow poorly) documented and extremely
ambiguous. On the other
hand, it will save having to write my
own compliant parser to parse the ~1.4M lines of XML I need
to convert and load into a database.
This has little to do with libxml2 not accepting
partial errors because the data I received was probably
exported from SQL using the same library. I'd actually copy
the trees created by libxml2
into a
more usable structure if they weren't going right into a
database, but XML is meant as a format, not as a run-time
representation.
If someone is actually hand-writing XML-proper, chances are
they're missing the point (or they're dealing with a
software interface that misses the point.) Additionally, if
someone is
using software other than libxml2 to generate XML, they're
either
missing the point or they lack the appropriate language
bindings. That being said, I
use my own library to assemble and parse "XML-like"
structures (closer to HTML, I guess) for IPC. It wouldn't make
sense for me to use formal XML for the
application, and especially not libxml2.
Though the formats are very similar, the run-time
organization used by libxml2 isn't anywhere near being
suitable
for what I use the data for. Then again, I don't need any
sort of
standardization because the data doesn't go anywhere outside
of the application. It's a symmetrical system because
data importation and exportation are designed concurrently to
compliment each other, which I can only assume is the case
with libxml2.
Something many formal projects lack (software and
otherwise) are explicit correlations between the core
purposes of the project and the aspects of implementation
(yes, I'm guilty, too.) If I were to author something
comparable to XML, I'd explicitly state that it isn't meant
to be hand-written and it's primarily intended to allow data
transfer between applications with different maintainers.
At the point of deciding whether or not to accept simple
errors, I'd defer back to those principles and conclude that
errors should not be accepted. If I were to author
something like HTML on the other hand, I would account for
hand-written code and acknowledge that rendering with errors
is better than rejecting a file. All too often projects
are approached with founding principles, yet they fail to
rationally extrapolate those principles to the level of
implementation (guilty, again.)
Rather than getting into everything already brought up, I'll
leave it at that.
Kevin Barry