Process error page correctly. TODO: Why don't *_process_data take a const pointer to the data?
RISC OS Development Builds
The development build available on this page is built automatically from the latest source code in SVN. It is likely to be unstable and may crash your machine.
- NetSurf for RISC OS 2.4 MB 20 Nov 2008 15:01 UTC
By regularly running a recent development build you can help us improve NetSurf by providing feedback to the authors.
Any bug reports or feature requests should be posted on the NetSurf issue tracker. If you aren't running RISC OS, you can fetch the latest source and build NetSurf for your system.
Recent SVN Activity
All times are in UTC.
Add thanks page link.
Require Iconv 0.09
Remove hard-coded date string - not needed now we've tagged the release.
Tag Iconv 0.09
Tagged Iconv version tree
Hard-code the date string for 0.09
Update ChangeLog with release date.
Drop -s option. There's no point implementing it.
Make handling of EILSEQ resynchronise stream if we've been asked to ignore errors. Some kind of handling for failure to read from input file.
Shuffle data through a fixed-size buffer
Dump the list of known encoding aliases when asked. Don't expect any kind of sort order -- that would require a level of thought I don't have right now.
Use a zip binary that has half a chance of preserving filetypes.
On RISC OS, use Unicode:, rather than attempting to getenv("Unicode$Path") and concatenating the leafname on the end.
Fix riscos-dist target -- failed to ensure Aliases file ended in release tree.
Release announcement template.
Remove obsolete makefile
More detailed changelog
Update patch
Stage Aliases file directly into distribution template. Fix compilation of makealiases when cross-compiling. Update dependencies so that aliases file gets built when needed.
Resurrect the sources of the Aliases file generator.
Ugh. Jump through hoops for RiscPkg. I'm not sure why I'm bothering with this.
Create zip file
Move the declarations of iconv_initialise/iconv_finalise to a different header. This keeps the public iconv.h free of such nonsense. Move the source for the RISC OS stubs to the distribution template tree. We will no longer shipped compiled stubs. People are quite capable of compiling this themselves. Also take the opportunity to tidy it up a bit. Bump the version number to 0.09 Introduce a "riscos-dist" target in Makefile-riscos. Update various bits of documentation.
Fix up output buffer length on memory exhaustion
Fix strncasecmp implementation.
Patch for UnicodeLib to make all enabled tests pass. Document the need for this patch, and how to apply it.
Tools tree. Currently comprises something approximating a parser for the Unihan database.
Restore EUC-KR testdata to original contents, now we've added the appropriate mappings to KS X 1001.
Add three new mappings from CP949: + 0xA2E6 -> U+20AC + 0xA2E7 -> U+00AE + 0xA2E8 -> U+327E
Modify Big5 testdata to match expectations. UnicodeLib's implementation of Big5 contains mapping table entries for 0xf9d6-0xf9dc, inherited from ETENS/CP950. It also implements a couple of Mac extensions, too.
Poke around with the ShiftJIS testdata to make the test pass. This probably isn't the best approach, but there we go. UnicodeLib's ShiftJIS implementation incorporates the CP932 extensions, along with a number of Mac extensions. Thus, compare SHIFT_JIS.IRREVERSIBLE.TXT with CP932.IRREVERSIBLE.TXT to give an indication of how complete/bug-free the CP932 support is.
Update mapping tables for JIS X 0208 and JIS X 0212. Changes: JIS X 0208: + The mapping for entry 1/35 has been changed from U+005C to U+FF3C. JIS X 0212: + The mapping for entry 2/23 has been changed from U+007E to U+FF5E. These mappings better represent the codepoints in question. They also remove irreversible mappings for U+005C and U+007E, thus making round-trip conversions more robust.
Revert unintended commit of new mapping tables
Ignore expected failure of EUC-JP test -- there's no way to load two charsets into G0 at once, which is what the test is expecting. For future reference, should we want to fix this, 0x5C should map to U+005C and U+00A5, 0x7E should map to U+007E and U+203E. U+005C/U+007E come from US-ASCII, which is the default mapping. The other pair come from JIS X 0201 (Roman).
Drop mappings to Unicode private use area. These aren't likely to be of any use when performing conversion. I can't see where this is specified, either.
Disable the GBK test -- at present, we support only GBK/{1,2}, as that's all that can be achieved through the use of an ISO-2022-based codec. To get full GBK support will most likely require a new codec, preferably supporting the whole of GB18030 (which is a superset of GBK).
Add missing characters from GB2312-80.
Make test for Acorn Latin1 work.
Lose 3 entries that aren't mapped except in CP949, which this isn't.
Change previous logic -- we want to ignore U+FFFF regardless of the translit state.
Treat U+FFFF as invalid. Otherwise, we end up writing the first mapped out character encountered in the target encoding.
Don't bail when U+FEFF results in EINVAL. UnicodeLib eats this at the start of input, assuming it's a BOM. There's nothing we can do to avoid this, so work around it here. In practice, this shouldn't be a problem -- noone's going to sanely want to convert a string containing a BOM and nothing else.
ECMA-35. This is identical to ISO-2022.
Bring ISO-2022-KR test data in line with what UnicodeLib produces. This is semantically equivalent to what was there before, it's just that UnicodeLib outputs the G1 designation at the start rather that immediately before the first character that needs KS X 1001.
Rather less hideous approach to error detection and input pointer maintenence. We now simply decode one character at a time and check for error afterwards. This has the benefit of being less code, clearer, less likely to crash if encoding state changes involve memory (de)allocation, and removes the reliance on UnicodeLib internals. It's probably slower, however, but correctness is more important here. Fix ISO-2022-JP-2 test data to not include characters from the JIS X 0201-1976 Kana set -- this set is not
HTML files with an icon but no MIMETYPE tooltype were being picked up by the simplehtml datatype. As I'm using dth_BaseName rather than dth_Name, they were being tagged with the MIME type text/simplehtml. Have made an exception for this case to translate it to text/html, may in the future consider whether using dth_Name may be better. Minor adjustments to the local file requester to prevent .info files from being displayed.
Comment out the rest of the tests for encodings we don't support. It appears that there's a fair number of issues with the handling of CJK charsets, particularly in the case of ISO-2022-x, which segfault. Make test binaries depend on the module target, so the module gets built if make test is done on a clean tree.
Comment out more tests for charsets we don't support
Bring MacRoman test data into line with current mappings -- 0xDB should map to U+20AC (euro) and not U+00A4 (currency sign)
Fix MacRoman mapping table -- 0xF3 should map to U+00DB and not U+00D8
Fix error in CP1256 table -- 0xC0 should be mapped to U+06C1, not U+061C
Factor out acquisition of paths to files in the Unicode resource. This fixes *ReadAliases on <> RISC OS, and Iconv's eightbit codec. Fix iconv_eightbit_read to ensure that it treats the input as unsigned bytes.
Add set 06/06 (Latin 10) from ROOL tree.
Bring set 05/04 (Thai) into line with 8859-11 spec: + 0xdb,0xdc,0xdd,0xde are undefined + 0xfc,0xfd,0xfe,0xff are undefined
Bring set 05/14 (Hebrew) into line with 8859-8 spec: + 0xaf mapping is now to U+00AF and not U+203E + 0xfd/0xfe should be mapped to U+200E/U+200F and not U+020E/U+020F, respectively
Update set 04/06 (Greek) to bring it in line with the spec. It was previously based on the ISO-8859-7 FCD, after which 0xAE was removed, and the mapping of 0xB7 was changed.
We don't support ISO-8859-6
Ensure that we return the correct errors and, when we do, point to the correct place in the input sequence (namely the start of the erroneous sequence). Unfortunately, UnicodeLib reads past the erroneous sequence so we previously returned a pointer to the middle/end of the sequence rather than the start. The only way I could think of doing this was to perform the conversion twice -- counting the number of successfully processed characters first, then to convert that number of characters again. We then play
NetSurf will now check if it is already running (ie. if ARexx port NETSURF exists), and quit and send an OPEN command to the one in memory if this is the case. Setting files as projects of NetSurf is now possible, as is multi-select launching from Workbench. Multi-selects are not passed through using ARexx yet, only the first file in the list will be opened if NetSurf is already running. Plain text files which have no MIMETYPE tooltype are now correctly identified as text/plain instead of text/ascii, al
Sparse "About" requester - version number, compile date and URL only.
Make *Iconv flush through any remaining shift sequences at the end of the data.
Allow opening of local files from anywhere, not just the parent of the current dir.
Various hackery in a vain attempt to make more tests pass. Disable the UTF-8 test, as it currently fails: 1) A bug in UnicodeLib's UTF-8 decoder results in 0x80 being treated as valid input 2) There's no way of determining if U+FFFD was the result of invalid input or valid input which happened to decode to that codepoint 3) UnicodeLib drops U+FEFF on the floor Disable UCS-{2,4}{BE,LE}, as we don't support those encodings.
Ensure temporary data files are put in the right place. Bail on the first error.
Clean up and make clearer what packages need to be obtained for other distros.
Oh look. US-ASCII was broken. There's a good indication if ever I saw one.
