Repo created

This commit is contained in:
Fr4nz D13trich 2025-11-22 13:58:55 +01:00
parent 4af19165ec
commit 68073add76
12458 changed files with 12350765 additions and 2 deletions

View file

@ -0,0 +1,121 @@
BSD Protection License
February 2002
Preamble
--------
The Berkeley Software Distribution ("BSD") license has proven very effective
over the years at allowing for a wide spread of work throughout both
commercial and non-commercial products. For programmers whose primary
intention is to improve the general quality of available software, it is
arguable that there is no better license than the BSD license, as it
permits improvements to be used wherever they will help, without idealogical
or metallic constraint.
This is of particular value to those who produce reference implementations
of proposed standards: The case of TCP/IP clearly illustrates that freely
and universally available implementations leads the rapid acceptance of
standards -- often even being used instead of a de jure standard (eg, OSI
network models).
With the rapid proliferation of software licensed under the GNU General
Public License, however, the continued success of this role is called into
question. Given that the inclusion of a few lines of "GPL-tainted" work
into a larger body of work will result in restricted distribution -- and
given that further work will likely build upon the "tainted" portions,
making them difficult to remove at a future date -- there are inevitable
circumstances where authors would, in order to protect their goal of
providing for the widespread usage of their work, wish to guard against
such "GPL-taint".
In addition, one can imagine that companies which operate by producing and
selling (possibly closed-source) code would wish to protect themselves
against the rise of a GPL-licensed competitor. While under existing
licenses this would mean not releasing their code under any form of open
license, if a license existed under which they could incorporate any
improvements back into their own (commercial) products then they might be
far more willing to provide for non-closed distribution.
For the above reasons, we put forth this "BSD Protection License": A
license designed to retain the freedom granted by the BSD license to use
licensed works in a wide variety of settings, both non-commercial and
commercial, while protecting the work from having future contributors
restrict that freedom.
The precise terms and conditions for copying, distribution, and
modification follow.
BSD PROTECTION LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION
----------------------------------------------------------------
0. Definitions.
a) "Program", below, refers to any program or work distributed under
the terms of this license.
b) A "work based on the Program", below, refers to either the Program
or any derivative work under copyright law.
c) "Modification", below, refers to the act of creating derivative works.
d) "You", below, refers to each licensee.
1. Scope.
This license governs the copying, distribution, and modification of the
Program. Other activities are outside the scope of this license; The
act of running the Program is not restricted, and the output from the
Program is covered only if its contents constitute a work based on the
Program.
2. Verbatim copies.
You may copy and distribute verbatim copies of the Program as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice; keep
intact all the notices that refer to this License and to the absence of
any warranty; and give any other recipients of the Program a copy of this
License along with the Program.
3. Modification and redistribution under closed license.
You may modify your copy or copies of the Program, and distribute
the resulting derivative works, provided that you meet the
following conditions:
a) The copyright notice and disclaimer on the Program must be reproduced
and included in the source code, documentation, and/or other materials
provided in a manner in which such notices are normally distributed.
b) The derivative work must be clearly identified as such, in order that
it may not be confused with the original work.
c) The license under which the derivative work is distributed must
expressly prohibit the distribution of further derivative works.
4. Modification and redistribution under open license.
You may modify your copy or copies of the Program, and distribute
the resulting derivative works, provided that you meet the
following conditions:
a) The copyright notice and disclaimer on the Program must be reproduced
and included in the source code, documentation, and/or other materials
provided in a manner in which such notices are normally distributed.
b) You must clearly indicate the nature and date of any changes made
to the Program. The full details need not necessarily be included in
the individual modified files, provided that each modified file is
clearly marked as such and instructions are included on where the
full details of the modifications may be found.
c) You must cause any work that you distribute or publish, that in whole
or in part contains or is derived from the Program or any part
thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
5. Implied acceptance.
You may not copy or distribute the Program or any derivative works except
as expressly provided under this license. Consequently, any such action
will be taken as implied acceptance of the terms of this license.
6. NO WARRANTY.
THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
THE COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT, EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

View file

@ -0,0 +1,31 @@
Name: bsdiff
URL: http://www.daemonology.net/bsdiff/
License: BSD
License File: LICENCE
Description:
This directory contains an extensively modified version of Colin Percival's
bsdiff, available in its original form from:
http://www.daemonology.net/bsdiff/
The basic principles of operation are best understood by reading Colin's
unpublised paper:
Colin Percival, Naive differences of executable code, http://www.daemonology.net/bsdiff/, 2003.
The copy on this directory so extensively modified that the binary format is
incompatible with the original and it cannot be compiled outside the Chromium
source tree or the Courgette project.
List of changes made to original code:
- Wrapped functions in 'bsdiff' namespace.
- Renamed .c files to .cc files.
- Added bsdiff.h and bsdiff_search.h header files.
- Changed the code to use streams.h from Courgette.
- Changed the encoding of numbers to use the 'varint' encoding.
- Reformatted code to be closer to Google coding standards.
- Renamed variables.
- Added comments.
- Fixed search() comparison issue: http://crbug.com/620867.
- Replaced QSufSort with modified version of libdivsufsort.

View file

@ -0,0 +1,511 @@
// Copyright 2003, 2004 Colin Percival
// All rights reserved
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted providing that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
// DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
// OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
// IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
//
// For the terms under which this work may be distributed, please see
// the adjoining file "LICENSE".
//
// Changelog:
// 2005-04-26 - Define the header as a C structure, add a CRC32 checksum to
// the header, and make all the types 32-bit.
// --Benjamin Smedberg <benjamin@smedbergs.us>
// 2009-03-31 - Change to use Streams. Move CRC code to crc.{h,cc}
// Changed status to an enum, removed unused status codes.
// --Stephen Adams <sra@chromium.org>
// 2013-04-10 - Added wrapper to apply a patch directly to files.
// --Joshua Pawlicki <waffles@chromium.org>
// 2017-08-14 - Moved "apply" and "create" to the header file, rewrote
// all routines to use OMaps readers and writers instead
// of Courgette streams and files.
// --Maxim Pimenov <m@maps.me>
// 2019-01-24 - Got rid of the paged array. We have enough address space
// for our application of bsdiff.
// --Maxim Pimenov <m@maps.me>
// Changelog for bsdiff_apply:
// 2009-03-31 - Change to use Streams. Move CRC code to crc.{h,cc}
// --Stephen Adams <sra@chromium.org>
// 2013-04-10 - Add wrapper method to apply a patch to files directly.
// --Joshua Pawlicki <waffles@chromium.org>
// Changelog for bsdiff_create:
// 2005-05-05 - Use the modified header struct from bspatch.h; use 32-bit
// values throughout.
// --Benjamin Smedberg <benjamin@smedbergs.us>
// 2005-05-18 - Use the same CRC algorithm as bzip2, and leverage the CRC table
// provided by libbz2.
// --Darin Fisher <darin@meer.net>
// 2007-11-14 - Changed to use Crc from Lzma library instead of Bzip library
// --Rahul Kuchhal
// 2009-03-31 - Change to use Streams. Added lots of comments.
// --Stephen Adams <sra@chromium.org>
// 2010-05-26 - Use a paged array for V and I. The address space may be too
// fragmented for these big arrays to be contiguous.
// --Stephen Adams <sra@chromium.org>
// 2015-08-03 - Extract qsufsort portion to a separate file.
// --Samuel Huang <huangs@chromium.org>
// 2015-08-12 - Interface change to search().
// --Samuel Huang <huangs@chromium.org>
// 2016-07-29 - Replacing qsufsort with divsufsort.
// --Samuel Huang <huangs@chromium.org>
// Copyright 2016 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
#ifndef COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_H_
#define COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_H_
#include "coding/varint.hpp"
#include "coding/write_to_sink.hpp"
#include "coding/writer.hpp"
#include "base/cancellable.hpp"
#include "base/checked_cast.hpp"
#include "base/logging.hpp"
#include "base/string_utils.hpp"
#include "base/timer.hpp"
#include <array>
#include <cstdint>
#include <sstream>
#include <vector>
#include "3party/bsdiff-courgette/bsdiff/bsdiff_common.h"
#include "3party/bsdiff-courgette/bsdiff/bsdiff_search.h"
#include "3party/bsdiff-courgette/divsufsort/divsufsort.h"
#include "zlib.h"
namespace bsdiff {
// A MemWriter with its own buffer.
struct MemStream {
MemStream(): m_writer(m_buf) {}
MemWriter<std::vector<uint8_t>> & GetWriter() { return m_writer; }
size_t Size() const { return m_buf.size(); }
std::vector<uint8_t> const & GetBuf() const { return m_buf; }
private:
std::vector<uint8_t> m_buf;
MemWriter<std::vector<uint8_t>> m_writer;
};
inline uint32_t CalculateCrc(const uint8_t* buffer, size_t size) {
// Calculate Crc by calling CRC method in zlib
const auto size32 = base::checked_cast<uint32_t>(size);
const uint32_t crc = base::checked_cast<uint32_t>(crc32(0, buffer, size32));
return ~crc;
}
// Creates a binary patch.
template <typename OldReader, typename NewReader, typename PatchSink>
BSDiffStatus CreateBinaryPatch(OldReader & old_reader,
NewReader & new_reader,
PatchSink & patch_sink) {
ReaderSource<OldReader> old_source(old_reader);
ReaderSource<NewReader> new_source(new_reader);
auto initial_patch_sink_pos = patch_sink.Pos();
base::Timer bsdiff_timer;
CHECK_GREATER_OR_EQUAL(kNumStreams, 6, ());
std::array<MemStream, kNumStreams> mem_streams;
auto & control_stream_copy_counts = mem_streams[0];
auto & control_stream_extra_counts = mem_streams[1];
auto & control_stream_seeks = mem_streams[2];
auto & diff_skips = mem_streams[3];
auto & diff_bytes = mem_streams[4];
auto & extra_bytes = mem_streams[5];
const int old_size = static_cast<int>(old_source.Size());
std::vector<uint8_t> old_buf(old_size);
old_source.Read(old_buf.data(), old_buf.size());
const uint8_t * old = old_buf.data();
std::vector<divsuf::saidx_t> suffix_array(old_size + 1);
base::Timer suf_sort_timer;
divsuf::saint_t result = divsuf::divsufsort_include_empty(old, suffix_array.data(), old_size);
LOG(LINFO, ("Done divsufsort", suf_sort_timer.ElapsedSeconds()));
if (result != 0)
return UNEXPECTED_ERROR;
const int new_size = static_cast<int>(new_source.Size());
std::vector<uint8_t> new_buf(new_size);
new_source.Read(new_buf.data(), new_buf.size());
const uint8_t * newbuf = new_buf.data();
int control_length = 0;
int diff_bytes_length = 0;
int diff_bytes_nonzero = 0;
int extra_bytes_length = 0;
// The patch format is a sequence of triples <copy,extra,seek> where 'copy' is
// the number of bytes to copy from the old file (possibly with mistakes),
// 'extra' is the number of bytes to copy from a stream of fresh bytes, and
// 'seek' is an offset to move to the position to copy for the next triple.
//
// The invariant at the top of this loop is that we are committed to emitting
// a triple for the part of |newbuf| surrounding a 'seed' match near
// |lastscan|. We are searching for a second match that will be the 'seed' of
// the next triple. As we scan through |newbuf|, one of four things can
// happen at the current position |scan|:
//
// 1. We find a nice match that appears to be consistent with the current
// seed. Continue scanning. It is likely that this match will become
// part of the 'copy'.
//
// 2. We find match which does much better than extending the current seed
// old match. Emit a triple for the current seed and take this match as
// the new seed for a new triple. By 'much better' we remove 8 mismatched
// bytes by taking the new seed.
//
// 3. There is not a good match. Continue scanning. These bytes will likely
// become part of the 'extra'.
//
// 4. There is no match because we reached the end of the input, |newbuf|.
// This is how the loop advances through the bytes of |newbuf|:
//
// ...012345678901234567890123456789...
// ssssssssss Seed at |lastscan|
// xxyyyxxyyxy |scan| forward, cases (3)(x) & (1)(y)
// mmmmmmmm New match will start new seed case (2).
// fffffffffffffff |lenf| = scan forward from |lastscan|
// bbbb |lenb| = scan back from new seed |scan|.
// ddddddddddddddd Emit diff bytes for the 'copy'.
// xx Emit extra bytes.
// ssssssssssss |lastscan = scan - lenb| is new seed.
// x Cases (1) and (3) ....
int lastscan = 0, lastpos = 0, lastoffset = 0;
int scan = 0;
SearchResult match(0, 0);
uint32_t pending_diff_zeros = 0;
while (scan < new_size) {
int oldscore = 0; // Count of how many bytes of the current match at |scan|
// extend the match at |lastscan|.
match.pos = 0;
scan += match.size;
for (int scsc = scan; scan < new_size; ++scan) {
match = search<decltype(suffix_array)>(suffix_array, old, old_size, newbuf + scan,
new_size - scan);
for (; scsc < scan + match.size; scsc++)
if ((scsc + lastoffset < old_size) &&
(old[scsc + lastoffset] == newbuf[scsc]))
oldscore++;
if ((match.size == oldscore) && (match.size != 0))
break; // Good continuing match, case (1)
if (match.size > oldscore + 8)
break; // New seed match, case (2)
if ((scan + lastoffset < old_size) &&
(old[scan + lastoffset] == newbuf[scan]))
oldscore--;
// Case (3) continues in this loop until we fall out of the loop (4).
}
if ((match.size != oldscore) || (scan == new_size)) { // Cases (2) and (4)
// This next chunk of code finds the boundary between the bytes to be
// copied as part of the current triple, and the bytes to be copied as
// part of the next triple. The |lastscan| match is extended forwards as
// far as possible provided doing to does not add too many mistakes. The
// |scan| match is extended backwards in a similar way.
// Extend the current match (if any) backwards. |lenb| is the maximal
// extension for which less than half the byte positions in the extension
// are wrong.
int lenb = 0;
if (scan < new_size) { // i.e. not case (4); there is a match to extend.
int score = 0, Sb = 0;
for (int i = 1; (scan >= lastscan + i) && (match.pos >= i); i++) {
if (old[match.pos - i] == newbuf[scan - i])
score++;
if (score * 2 - i > Sb * 2 - lenb) {
Sb = score;
lenb = i;
}
}
}
// Extend the lastscan match forward; |lenf| is the maximal extension for
// which less than half of the byte positions in entire lastscan match are
// wrong. There is a subtle point here: |lastscan| points to before the
// seed match by |lenb| bytes from the previous iteration. This is why
// the loop measures the total number of mistakes in the the match, not
// just the from the match.
int lenf = 0;
{
int score = 0, Sf = 0;
for (int i = 0; (lastscan + i < scan) && (lastpos + i < old_size);) {
if (old[lastpos + i] == newbuf[lastscan + i])
score++;
i++;
if (score * 2 - i > Sf * 2 - lenf) {
Sf = score;
lenf = i;
}
}
}
// If the extended scans overlap, pick a position in the overlap region
// that maximizes the exact matching bytes.
if (lastscan + lenf > scan - lenb) {
int overlap = (lastscan + lenf) - (scan - lenb);
int score = 0;
int Ss = 0, lens = 0;
for (int i = 0; i < overlap; i++) {
if (newbuf[lastscan + lenf - overlap + i] ==
old[lastpos + lenf - overlap + i]) {
score++;
}
if (newbuf[scan - lenb + i] == old[match.pos - lenb + i]) {
score--;
}
if (score > Ss) {
Ss = score;
lens = i + 1;
}
}
lenf += lens - overlap;
lenb -= lens;
};
for (int i = 0; i < lenf; i++) {
uint8_t diff_byte = newbuf[lastscan + i] - old[lastpos + i];
if (diff_byte) {
++diff_bytes_nonzero;
WriteVarUint(diff_skips.GetWriter(), pending_diff_zeros);
pending_diff_zeros = 0;
diff_bytes.GetWriter().Write(&diff_byte, 1);
} else {
++pending_diff_zeros;
}
}
int gap = (scan - lenb) - (lastscan + lenf);
for (int i = 0; i < gap; i++) {
extra_bytes.GetWriter().Write(&newbuf[lastscan + lenf + i], 1);
}
diff_bytes_length += lenf;
extra_bytes_length += gap;
uint32_t copy_count = lenf;
uint32_t extra_count = gap;
int32_t seek_adjustment = ((match.pos - lenb) - (lastpos + lenf));
WriteVarUint(control_stream_copy_counts.GetWriter(), copy_count);
WriteVarUint(control_stream_extra_counts.GetWriter(), extra_count);
WriteVarInt(control_stream_seeks.GetWriter(), seek_adjustment);
++control_length;
#ifdef DEBUG_bsmedberg
LOG(LDEBUG, ("Writing a block: copy:", copy_count, "extra:", extra_count, "seek:", seek_adjustment));
#endif
lastscan = scan - lenb; // Include the backward extension in seed.
lastpos = match.pos - lenb; // ditto.
lastoffset = lastpos - lastscan;
}
}
WriteVarUint(diff_skips.GetWriter(), pending_diff_zeros);
suffix_array.clear();
MBSPatchHeader header;
// The string will have a null terminator that we don't use, hence '-1'.
static_assert(sizeof(MBS_PATCH_HEADER_TAG) - 1 == sizeof(header.tag),
"MBS_PATCH_HEADER_TAG must match header field size");
memcpy(header.tag, MBS_PATCH_HEADER_TAG, sizeof(header.tag));
header.slen = old_size;
header.scrc32 = CalculateCrc(old, old_size);
header.dlen = new_size;
WriteHeader(patch_sink, &header);
for (auto const & s : mem_streams)
{
uint32_t const sz = base::checked_cast<uint32_t>(s.Size());
WriteToSink(patch_sink, sz);
}
for (auto const & s : mem_streams)
patch_sink.Write(s.GetBuf().data(), s.GetBuf().size());
size_t diff_skips_length = diff_skips.Size();
std::ostringstream log_stream;
log_stream << "Control tuples: " << control_length
<< " copy bytes: " << diff_bytes_length
<< " mistakes: " << diff_bytes_nonzero
<< " (skips: " << diff_skips_length << ")"
<< " extra bytes: " << extra_bytes_length
<< "\nUncompressed bsdiff patch size "
<< patch_sink.Pos() - initial_patch_sink_pos
<< "\nEnd bsdiff "
<< bsdiff_timer.ElapsedSeconds();
LOG(LINFO, (log_stream.str()));
return OK;
}
// Applies the given patch file to a given source file. This method validates
// the CRC of the original file stored in the patch file, before applying the
// patch to it.
template <typename OldReader, typename NewSink, typename PatchReader>
BSDiffStatus ApplyBinaryPatch(OldReader & old_reader, NewSink & new_sink,
PatchReader & patch_reader, const base::Cancellable & cancellable)
{
ReaderSource<OldReader> old_source(old_reader);
ReaderSource<PatchReader> patch_source(patch_reader);
MBSPatchHeader header;
BSDiffStatus ret = MBS_ReadHeader(patch_source, &header);
if (ret != OK)
return ret;
const auto old_size = static_cast<size_t>(old_source.Size());
std::vector<uint8_t> old_buf(old_size);
old_source.Read(old_buf.data(), old_buf.size());
const uint8_t* old_start = old_buf.data();
const uint8_t* old_end = old_buf.data() + old_buf.size();
const uint8_t* old_position = old_start;
if (old_size != header.slen)
return UNEXPECTED_ERROR;
if (CalculateCrc(old_start, old_size) != header.scrc32)
return CRC_ERROR;
CHECK_GREATER_OR_EQUAL(kNumStreams, 6, ());
std::vector<uint32_t> stream_sizes(kNumStreams);
for (auto & s : stream_sizes)
s = ReadPrimitiveFromSource<uint32_t>(patch_source);
std::vector<ReaderSource<PatchReader>> patch_streams;
patch_streams.reserve(kNumStreams);
for (size_t i = 0; i < kNumStreams; ++i) {
uint64_t size = static_cast<uint64_t>(stream_sizes[i]);
patch_streams.emplace_back(ReaderSource<PatchReader>(patch_source.SubReader(size)));
}
auto & control_stream_copy_counts = patch_streams[0];
auto & control_stream_extra_counts = patch_streams[1];
auto & control_stream_seeks = patch_streams[2];
auto & diff_skips = patch_streams[3];
auto & diff_bytes = patch_streams[4];
auto & extra_bytes = patch_streams[5];
std::vector<uint8_t> extra_bytes_buf(static_cast<size_t>(extra_bytes.Size()));
extra_bytes.Read(extra_bytes_buf.data(), extra_bytes_buf.size());
const uint8_t* extra_start = extra_bytes_buf.data();
const uint8_t* extra_end = extra_bytes_buf.data() + extra_bytes_buf.size();
const uint8_t* extra_position = extra_start;
// if (header->dlen && !new_sink->Reserve(header->dlen))
// return MEM_ERROR;
auto pending_diff_zeros = ReadVarUint<uint32_t>(diff_skips);
// We will check whether the application process has been cancelled
// upon copying every |kCheckCancelledPeriod| bytes from the old file.
constexpr size_t kCheckCancelledPeriod = 100 * 1024;
while (control_stream_copy_counts.Size() > 0) {
if (cancellable.IsCancelled())
return CANCELLED;
auto copy_count = ReadVarUint<uint32_t>(control_stream_copy_counts);
auto extra_count = ReadVarUint<uint32_t>(control_stream_extra_counts);
auto seek_adjustment = ReadVarInt<int32_t>(control_stream_seeks);
#ifdef DEBUG_bsmedberg
LOG(LDEBUG, ("Applying block: copy:", copy_count, "extra:", extra_count, "seek:", seek_adjustment));
#endif
// Byte-wise arithmetically add bytes from old file to bytes from the diff
// block.
if (copy_count > static_cast<size_t>(old_end - old_position))
return UNEXPECTED_ERROR;
// Add together bytes from the 'old' file and the 'diff' stream.
for (size_t i = 0; i < copy_count; ++i) {
if (i > 0 && i % kCheckCancelledPeriod == 0 && cancellable.IsCancelled())
return CANCELLED;
uint8_t diff_byte = 0;
if (pending_diff_zeros) {
--pending_diff_zeros;
} else {
pending_diff_zeros = ReadVarUint<uint32_t>(diff_skips);
diff_byte = ReadPrimitiveFromSource<uint8_t>(diff_bytes);
}
uint8_t byte = old_position[i] + diff_byte;
WriteToSink(new_sink, byte);
}
old_position += copy_count;
// Copy bytes from the extra block.
if (extra_count > static_cast<size_t>(extra_end - extra_position))
return UNEXPECTED_ERROR;
new_sink.Write(extra_position, extra_count);
extra_position += extra_count;
// "seek" forwards (or backwards) in oldfile.
if (old_position + seek_adjustment < old_start ||
old_position + seek_adjustment > old_end)
return UNEXPECTED_ERROR;
old_position += seek_adjustment;
}
if (control_stream_copy_counts.Size() > 0 ||
control_stream_extra_counts.Size() > 0 ||
control_stream_seeks.Size() > 0 ||
diff_skips.Size() > 0 ||
diff_bytes.Size() > 0 ||
extra_bytes.Size() > 0)
{
return UNEXPECTED_ERROR;
}
if (cancellable.IsCancelled())
return CANCELLED;
return OK;
}
} // namespace bsdiff
#endif // COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_H_

View file

@ -0,0 +1,75 @@
#ifndef COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_HEADER_H_
#define COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_HEADER_H_
#include "coding/reader.hpp"
#include "coding/varint.hpp"
#include <string>
namespace bsdiff {
// The following declarations are common to the patch-creation and
// patch-application code.
int constexpr kNumStreams = 6;
enum BSDiffStatus {
OK = 0,
MEM_ERROR = 1,
CRC_ERROR = 2,
READ_ERROR = 3,
UNEXPECTED_ERROR = 4,
WRITE_ERROR = 5,
CANCELLED = 6,
};
// The patch stream starts with a MBSPatchHeader.
typedef struct MBSPatchHeader_ {
char tag[8]; // Contains MBS_PATCH_HEADER_TAG.
uint32_t slen; // Length of the file to be patched.
uint32_t scrc32; // CRC32 of the file to be patched.
uint32_t dlen; // Length of the result file.
} MBSPatchHeader;
// This is the value for the tag field. Must match length exactly, not counting
// null at end of string.
#define MBS_PATCH_HEADER_TAG "GBSDIF42"
template <typename Sink>
void WriteHeader(Sink & sink, MBSPatchHeader* header) {
sink.Write(header->tag, sizeof(header->tag));
WriteVarUint(sink, header->slen);
WriteVarUint(sink, header->scrc32);
WriteVarUint(sink, header->dlen);
}
template <typename Source>
BSDiffStatus MBS_ReadHeader(Source & src, MBSPatchHeader* header) {
src.Read(header->tag, sizeof(header->tag));
header->slen = ReadVarUint<uint32_t>(src);
header->scrc32 = ReadVarUint<uint32_t>(src);
header->dlen = ReadVarUint<uint32_t>(src);
// The string will have a NUL terminator that we don't use, hence '-1'.
static_assert(sizeof(MBS_PATCH_HEADER_TAG) - 1 == sizeof(header->tag),
"MBS_PATCH_HEADER_TAG must match header field size");
if (memcmp(header->tag, MBS_PATCH_HEADER_TAG, 8) != 0)
return UNEXPECTED_ERROR;
return OK;
}
inline std::string DebugPrint(BSDiffStatus status) {
switch (status) {
case OK: return "OK";
case MEM_ERROR: return "MEM_ERROR";
case CRC_ERROR: return "CRC_ERROR";
case READ_ERROR: return "READ_ERROR";
case UNEXPECTED_ERROR: return "UNEXPECTED_ERROR";
case WRITE_ERROR: return "WRITE_ERROR";
case CANCELLED: return "CANCELLED";
}
return "Unknown status";
}
} // namespace bsdiff
#endif // COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_HEADER_H_

View file

@ -0,0 +1,97 @@
// Copyright 2003, 2004 Colin Percival
// All rights reserved
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted providing that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
// ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
// DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
// OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
// IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
//
// For the terms under which this work may be distributed, please see
// the adjoining file "LICENSE".
//
// ChangeLog:
// 2005-05-05 - Use the modified header struct from bspatch.h; use 32-bit
// values throughout.
// --Benjamin Smedberg <benjamin@smedbergs.us>
// 2015-08-03 - Change search() to template to allow PagedArray usage.
// --Samuel Huang <huangs@chromium.org>
// 2015-08-19 - Optimized search() to be non-recursive.
// --Samuel Huang <huangs@chromium.org>
// 2016-06-28 - Moved matchlen() and search() to a new file; format; changed
// search() use std::lexicographical_compare().
// 2016-06-30 - Changed matchlen() input; changed search() to return struct.
// --Samuel Huang <huangs@chromium.org>
// Copyright 2016 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
#ifndef COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_SEARCH_H_
#define COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_SEARCH_H_
#include <algorithm>
namespace bsdiff {
// Return values of search().
struct SearchResult {
SearchResult(int pos_in, int size_in) : pos(pos_in), size(size_in) {}
int pos;
int size;
};
// Similar to ::memcmp(), but assumes equal |size| and returns match length.
inline int matchlen(const unsigned char* buf1,
const unsigned char* buf2,
int size) {
for (int i = 0; i < size; ++i)
if (buf1[i] != buf2[i])
return i;
return size;
}
// Finds a suffix in |old| that has the longest common prefix with |keybuf|,
// aided by suffix array |sa| of |old|. Returns the match length, and writes to
// |pos| a position of best match in |old|. If multiple such positions exist,
// |pos| would take an arbitrary one.
template <class T>
SearchResult search(const T & sa,
const unsigned char* srcbuf,
int srcsize,
const unsigned char* keybuf,
int keysize) {
int lo = 0;
int hi = srcsize;
while (hi - lo > 1) {
int mid = (lo + hi) / 2;
if (std::lexicographical_compare(
srcbuf + sa[mid], srcbuf + srcsize, keybuf, keybuf + keysize)) {
lo = mid;
} else {
hi = mid;
}
}
int x = matchlen(srcbuf + sa[lo], keybuf, std::min(srcsize - sa[lo], keysize));
int y = matchlen(srcbuf + sa[hi], keybuf, std::min(srcsize - sa[hi], keysize));
return (x > y) ? SearchResult(sa[lo], x) : SearchResult(sa[hi], y);
}
} // namespace bsdiff
#endif // COURGETTE_THIRD_PARTY_BSDIFF_BSDIFF_SEARCH_H_

View file

@ -0,0 +1,7 @@
project(bsdiff_tests)
set(SRC bsdiff_search_tests.cpp)
omim_add_test(${PROJECT_NAME} ${SRC})
target_link_libraries(${PROJECT_NAME} bsdiff)

View file

@ -0,0 +1,135 @@
#include "testing/testing.hpp"
#include "base/macros.hpp"
#include <cstring>
#include <string>
#include <vector>
#include "3party/bsdiff-courgette/bsdiff/bsdiff_search.h"
#include "3party/bsdiff-courgette/divsufsort/divsufsort.h"
using namespace std;
// Adapted from 3party/bsdiff-courgette.
UNIT_TEST(BSDiffSearchTest_Search)
{
// Initialize main string and the suffix array.
// Positions: 000000000011111111111222222222333333333344444
// 012345678901234567890123456789012345678901234
string const str = "the quick brown fox jumps over the lazy dog.";
int const size = static_cast<int>(str.size());
auto buf = reinterpret_cast<unsigned char const *>(str.data());
vector<divsuf::saidx_t> suffix_array(size + 1);
divsuf::divsufsort_include_empty(buf, suffix_array.data(), size);
// Specific queries.
struct
{
int m_expMatchPos; // -1 means "don't care".
int m_expMatchSize;
string m_query_str;
} const testCases[] = {
// Entire string: exact and unique.
{0, 44, "the quick brown fox jumps over the lazy dog."},
// Empty string: exact and non-unique.
{-1, 0, ""},
// Exact and unique suffix matches.
{43, 1, "."},
{31, 13, "the lazy dog."},
// Exact and unique non-suffix matches.
{4, 5, "quick"},
{0, 9, "the quick"}, // Unique prefix.
// Partial and unique matches.
{16, 10, "fox jumps with the hosps"}, // Unique prefix.
{18, 1, "xyz"},
// Exact and non-unique match: take lexicographical first.
{-1, 3, "the"}, // Non-unique prefix.
{-1, 1, " "},
// Partial and non-unique match: no guarantees on |match.pos|!
{-1, 4, "the apple"}, // query < "the l"... < "the q"...
{-1, 4, "the opera"}, // "the l"... < query < "the q"...
{-1, 4, "the zebra"}, // "the l"... < "the q"... < query
// Prefix match dominates suffix match (unique).
{26, 5, "over quick brown fox"},
// Empty matchs.
{-1, 0, ","},
{-1, 0, "1234"},
{-1, 0, "THE QUICK BROWN FOX"},
{-1, 0, "(the"},
};
for (size_t idx = 0; idx < ARRAY_SIZE(testCases); ++idx)
{
auto const & testCase = testCases[idx];
int const querySize = static_cast<int>(testCase.m_query_str.size());
auto query_buf = reinterpret_cast<unsigned char const *>(testCase.m_query_str.data());
// Perform the search.
bsdiff::SearchResult const match =
bsdiff::search<decltype(suffix_array)>(suffix_array, buf, size, query_buf, querySize);
// Check basic properties and match with expected values.
TEST_GREATER_OR_EQUAL(match.size, 0, ());
TEST_LESS_OR_EQUAL(match.size, querySize, ());
if (match.size > 0)
{
TEST_GREATER_OR_EQUAL(match.pos, 0, ());
TEST_LESS_OR_EQUAL(match.pos, size - match.size, ());
TEST_EQUAL(0, memcmp(buf + match.pos, query_buf, match.size), ());
}
if (testCase.m_expMatchPos >= 0)
{
TEST_EQUAL(testCase.m_expMatchPos, match.pos, ());
}
TEST_EQUAL(testCase.m_expMatchSize, match.size, ());
}
}
// Adapted from 3party/bsdiff-courgette.
UNIT_TEST(BSDiffSearchTest_SearchExact)
{
string const testCases[] = {
"a",
"aa",
"az",
"za",
"aaaaa",
"CACAO",
"banana",
"tobeornottobe",
"the quick brown fox jumps over the lazy dog.",
"elephantelephantelephantelephantelephant",
"011010011001011010010110011010010",
};
for (size_t idx = 0; idx < ARRAY_SIZE(testCases); ++idx)
{
int const size = static_cast<int>(testCases[idx].size());
unsigned char const * const buf =
reinterpret_cast<unsigned char const *>(testCases[idx].data());
vector<divsuf::saidx_t> suffix_array(size + 1);
divsuf::divsufsort_include_empty(buf, suffix_array.data(), size);
// Test exact matches for every non-empty substring.
for (int lo = 0; lo < size; ++lo)
{
for (int hi = lo + 1; hi <= size; ++hi)
{
string query(buf + lo, buf + hi);
int querySize = static_cast<int>(query.size());
CHECK_EQUAL(querySize, hi - lo, ());
unsigned char const * const query_buf =
reinterpret_cast<unsigned char const *>(query.c_str());
bsdiff::SearchResult const match =
bsdiff::search<decltype(suffix_array)>(suffix_array, buf, size, query_buf, querySize);
TEST_EQUAL(querySize, match.size, ());
TEST_GREATER_OR_EQUAL(match.pos, 0, ());
TEST_LESS_OR_EQUAL(match.pos, size - match.size, ());
string const suffix(buf + match.pos, buf + size);
TEST_EQUAL(suffix.substr(0, querySize), query, ());
}
}
}
}