Product in Terms of Summation

Posted July 9, 2012 by dashmorethan
Categories: Mathematics

A useful identity for those whose calculators lack an easy way to do products, but can use summation.

d^{\left(\sum\limits_{a=b}^c \log_d(f(a))\right)} = \prod\limits_{a=b}^c f(a)

Note that this method works best with

Theorem: d^{\left(\sum\limits_{a=b}^c \log_d(f(a))\right)} = \prod\limits_{a=b}^c f(a)

Proof:

d^{\left(\sum\limits_{a=b}^c \log_d(f(a))\right)} = \prod\limits_{a=b}^c f(a)

d^{\log_d(f(a)) + \log_d(f(a+1)) + \log_d(f(a+2)) + \cdots + \log_d(f(c))} = \prod\limits_{a=b}^c f(a)

d^{\log_d(f(a))} d^{\log_d(f(a+1))} d^{\log_d(f(a+2))} \cdots d^{\log_d(f(c))} = \prod\limits_{a=b}^c f(a)

f(a) f(a+1) f(a+2) \cdots f(c) = \prod\limits_{a=b}^c f(a)

Equal by definition.

And as a direct result (take the \log_d of both sides):

\sum\limits_{a=b}^c \log_d(f(a)) = \log_d\prod\limits_{a=b}^c f(a)

Classifying Strings

Posted April 23, 2012 by dashmorethan
Categories: Programming

Tags: , , ,

Purpose

To determine ways to classify strings into two sets, as either contained by a data set or not contained by a data set.

Living on the fast track? Click for conclusion.

Obvious Solution (Ifs)

Well the most obvious solution is a chain of if's and else if's. It's also the simplest... well, at first.

if (strcmp(input, "Hello") == 0) {
    puts("I like that word.");
} else if (strcmp(input, "World") == 0) {
    puts("I like that word.");
} else {
    puts("I don't know what you said.");
}

However, an understanding of how this works shows why it is a bad idea for a large number of strings. Let's use the above program, for simplicity's sake.

If the input is "Hello", then the code is something like this:

Is the first letter 'H'? Yes.

Is the next letter 'e'? Yes.

Is the next letter 'l'? Yes.

Is the next letter 'l'? Yes.

Is the next letter 'o'? Yes.

Is the last character a '0'? Yes. Return 0.

That is six comparisons to compare the string. If the input is "World", then the code is:

Is the first letter 'H'? No. Return non-zero -> if fails.

Is the first letter 'W'? Yes.

Is the next letter 'o'? Yes.

Is the next letter 'r'? Yes.

Is the next letter 'l'? Yes.

Is the next letter 'd'? Yes.

Is the last character a '0'? Yes. Return 0.

So now, we have 7 comparisons. But what if the input is "foo" (neither "Hello" nor "World")...

Is the first letter 'H'? No. Return non-zero -> if fails.

Is the first letter 'W'? No. Return non-zero -> else if fails.

We're out of options. Execute the else.

The result is very slow for large amounts of strings with large lengths. Below is some personally crafted code that takes an input file and returns an if construct using strcmp. It is obviously very rudimentary, but it was created to illustrate a point.

/* Copyright (C) 2012 <http://dashmorethan.wordpress.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included
 * in all copies or substantial portions of the Software.
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int read_string(FILE *fp, unsigned char **string);

int main(int argc, char **argv) {
    FILE *input;
    unsigned char *in;

    if (argc != 2) {
        printf("usage: %s [input file]\n", argv[0]);
        return 1;
    }

    input = fopen(argv[1], "r");
    if (input == NULL) {
        printf("File %s does not exist (input file).\n", argv[1]);
        return 2;
    }

    if (read_string(input, &in) != 0) {
        fclose(input);
        return 3;
    }
    printf("if (strcmp(input, \"%s\") == 0) {\n", in);

    while (read_string(input, &in) == 0) {
        printf("\telse if (strcmp(input, \"%s\") == 0) {\n\t\t\n}",
            in);
        free(in);
    }

    puts("\telse {\n\t\t\n}");

    fclose(input);

    return 0;
}

int read_string(FILE *fp, unsigned char **string) {
    int count = 0;
    int current = 0;
    *string = NULL;

    while ((unsigned char)current != '\n') {
        current = fgetc(fp);
        if (current == EOF)
            return -2;
        count++;
    }

    fseek(fp, -count, SEEK_CUR);
    *string = malloc(count);
    if (*string == NULL) {
        return -1;
    }

    for (current = 0; current < count; current++) {
        (*string)[current] = fgetc(fp);
    }
    (*string)[count - 1] = '\0';

    return 0;
}

Rolling our own Hash Function

It would be nice if we could do:

switch (input) {
    case "Hello":
        puts("I like that word.");
        break;
    case "World":
        puts("I like that word.");
        break;
    default:
        puts("I don't know what you said.");
        break;
}

But no such thing exists, because the switch construct only accepts integers for cases. The next best thing is a hash function, which would look something like this:

switch (hash(input)) {
    case hash("Hello"):
        puts("I like that word.");
        break;
    case hash("World"):
        puts("I like that word.");
        break;
    default:
        puts("I don't know what you said.");
        break;
}

Unfortunately, this is still unallowed. Even if hash is a function, C will refuse to call it and output the results. However, we could precompute the hashes at run-time, using a pre-determined hash function. Below is a program that uses the sdbm (actually, gawk's implementation of the sdbm hash) and generates code that will hash a string and then create a switch construct for an input input.

/* Copyright (C) 2012 <http://dashmorethan.wordpress.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included
 * in all copies or substantial portions of the Software.
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

uint32_t hash(unsigned char *str);
int read_string(FILE *fp, unsigned char **string);

int main(int argc, char **argv) {
    FILE *input;
    unsigned char *in;

    if (argc != 2) {
        printf("usage: %s [input file]\n", argv[0]);
        return 1;
    }

    input = fopen(argv[1], "r");
    if (input == NULL) {
        printf("File %s does not exist (input file).\n", argv[1]);
        return 2;
    }

    puts("#include <stdint.h>\n\
/* sdbm / gawk hash */\n\
uint32_t hash(unsigned char *str) {\n\
    uint32_t hash = 0;\n\
    int c;\n\
\n\
    while ((c = *str++))\n\
        hash = c + (hash << 6) + (hash << 16) - hash;\n\
    return hash;\n\
}\n\
switch (hash(input)) {\n");

    while (read_string(input, &in) == 0) {
        printf("\tcase %lu: /*%s*/\n\t\t\n\t\tbreak;\n",
            (unsigned long)hash(in), in);
        free(in);
    }

    puts("\tdefault:\n\n\t\tbreak;\n}\n");

    fclose(input);

    return 0;
}

/* sdbm / gawk hash */
uint32_t hash(unsigned char *str) {
    uint32_t hash = 0;
    int c;

    while ((c = *str++))
        hash = c + (hash << 6) + (hash << 16) - hash;

    return hash;
}

int read_string(FILE *fp, unsigned char **string) {
    int count = 0;
    int current = 0;
    *string = NULL;

    while ((unsigned char)current != '\n') {
        current = fgetc(fp);
        if (current == EOF)
            return -2;
        count++;
    }

    fseek(fp, -count, SEEK_CUR);
    *string = malloc(count);
    if (*string == NULL) {
        return -1;
    }

    for (current = 0; current < count; current++) {
        (*string)[current] = fgetc(fp);
    }
    (*string)[count - 1] = '\0';

    return 0;
}

And the output (when our own code is added to it):

#include <stdint.h>
/* sdbm / gawk hash */
uint32_t hash(unsigned char *str) {
    uint32_t hash = 0;
    int c;

    while ((c = *str++))
        hash = c + (hash << 6) + (hash << 16) - hash;
    return hash;
}
switch (hash(input)) {
    /* fall through */
    case 2873473298: /*Hello*/
    case 2052574866: /*World*/
        puts("I like that word.");
        break;
    default:
        puts("I don't know what you said.");
        break;
}

However there is a major problem with this method, hash collisions. If a string could be generated such that it hashes to 2052574866 (for World), then it would appear to be right despite possibly being wrong! To decrease the chances of this happening, we could use a stronger hash function. Alternatively, we may check that the string is actually the one expected. But programs such as our hash creator have already been made, and they are far more professional than anything that you can roll at home.

gperf - the perfect hash generator

gperf is a program that generates a perfect hash function in C code. By perfect, we mean it has no collisions, and it only needs one string comparison. This article will only cover the bare minimum, but I highly recommend you read IBM's excellent guide on the subject. You can produce a gperf for C by just running gperf -L C -C #input file#, where "#input file#" is a list of strings.

Here is our Hello World example in gperf:

/* C code produced by gperf version 3.0.3 */
/* Command-line: gperf -L C -C hello_world  */
/* Computed positions: -k'1' */

#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
      && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
      && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
      && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
      && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
      && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
      && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
      && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
      && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
      && ('F' == 70) && ('G' == 71) && ('H' == 72) && ('I' == 73) \
      && ('J' == 74) && ('K' == 75) && ('L' == 76) && ('M' == 77) \
      && ('N' == 78) && ('O' == 79) && ('P' == 80) && ('Q' == 81) \
      && ('R' == 82) && ('S' == 83) && ('T' == 84) && ('U' == 85) \
      && ('V' == 86) && ('W' == 87) && ('X' == 88) && ('Y' == 89) \
      && ('Z' == 90) && ('[' == 91) && ('\\' == 92) && (']' == 93) \
      && ('^' == 94) && ('_' == 95) && ('a' == 97) && ('b' == 98) \
      && ('c' == 99) && ('d' == 100) && ('e' == 101) && ('f' == 102) \
      && ('g' == 103) && ('h' == 104) && ('i' == 105) && ('j' == 106) \
      && ('k' == 107) && ('l' == 108) && ('m' == 109) && ('n' == 110) \
      && ('o' == 111) && ('p' == 112) && ('q' == 113) && ('r' == 114) \
      && ('s' == 115) && ('t' == 116) && ('u' == 117) && ('v' == 118) \
      && ('w' == 119) && ('x' == 120) && ('y' == 121) && ('z' == 122) \
      && ('{' == 123) && ('|' == 124) && ('}' == 125) && ('~' == 126))
/* The character set is not based on ISO-646.  */
error "gperf generated tables don't work with this execution character set. Please report a bug to <bug-gnu-gperf@gnu.org>."
#endif

#define TOTAL_KEYWORDS 2
#define MIN_WORD_LENGTH 5
#define MAX_WORD_LENGTH 5
#define MIN_HASH_VALUE 5
#define MAX_HASH_VALUE 6
/* maximum key range = 2, duplicates = 0 */

#ifdef __GNUC__
__inline
#else
#ifdef __cplusplus
inline
#endif
#endif
static unsigned int
hash (str, len)
     register const char *str;
     register unsigned int len;
{
  static const unsigned char asso_values[] =
    {
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 1, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 0, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
      7, 7, 7, 7, 7, 7
    };
  return len + asso_values[(unsigned char)str[0]];
}

#ifdef __GNUC__
__inline
#ifdef __GNUC_STDC_INLINE__
__attribute__ ((__gnu_inline__))
#endif
#endif
const char *
in_word_set (str, len)
     register const char *str;
     register unsigned int len;
{
  static const char * const wordlist[] =
    {
      "", "", "", "", "",
      "World",
      "Hello"
    };

  if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
    {
      register int key = hash (str, len);

      if (key <= MAX_HASH_VALUE && key >= 0)
        {
          register const char *s = wordlist[key];

          if (*str == *s && !strcmp (str + 1, s + 1))
            return s;
        }
    }
  return 0;
}

And what do we need to do?

if (in_word_set(input, strlen(input)) {
    puts("I like that word.");
} else {
    puts("I don't know what you said.");
}

Wonderful (we could also change in_word_set to another function with a better name).

Benchmarks

Objective

Given a list of states and territories in the United States of America determine if a certain input is a state or territory or is not a state or territory. (Total Count: 56).

States and Territories

Alabama
Alaska
American Samoa
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Guam
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Northern Marianas Islands
Ohio
Oklahoma
Oregon
Pennsylvania
Puerto Rico
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Virgin Islands
Washington
West Virginia
Wisconsin
Wyoming

Results

A file states, that contains 98496 test cases (including some random gibberish for default cases), was piped into three programs, one using if, the other using sdbm, and the last using gperf. They were compiled with the highest optimization settings (gcc -O3). The average for 10 tests are below in seconds:

IF: 0.046
SDBM: 0.017
GPERF: 0.015

The sizes of the programs in bytes (stripped) follows:

10320 state_if
 6216 state_hash
10320 state_gperf
26856 total

Downloads

You can download this as a neat .zip or .tar.gz file below:

Contents:

  • unique_states - a list of all states and territories.
  • states - a large file, that is a bunch of randomly ordered states. Use it as input for benchmarks.
  • if_maker.c - a program that creates an if structure based on a file (string1\nstring2\nstring3 ...)
  • state_if.c - a program that uses if_maker's generated structure as a benchmark.
  • hash_maker.c - a program that creates a switch structure based on a file (string1\nstring2\nstring3 ...)
  • state_hash.c - a program that uses hash_maker's generated structure as a benchmark.
  • gperf_hash.h - a gperf's generated hash function that determines if a string is a state.
  • state_gperf.c - a program that uses "gperf_hash.h" as a benchmark.

Conclusion

Speed (seconds) Size (bytes)
If 0.046s 10320
SDBM 0.017s 6216
GPERF 0.015s 10320

There is no reason to use if, it is not only three times as slow as the other implementations, but it is also the largest resulting size. The home-made hash brings the smallest file, and is second place in speed, but users should beware of hash collisions. Ultimately, gperf is the best, removing collisions and also clocking in at the highest time, its only downfall being size.

Arbitrary Data Sizes with C

Posted March 26, 2012 by dashmorethan
Categories: Programming

Tags: ,

Arbitrary Data Types in C

C includes an abundant amount of data types. It also allows you to create your own data types by using structs.

But the largest amount of precision C provides as a data type is a byte. So how can we get smaller data types, such as the 4-bit long nibble?

This is no theory question: the nibble certainly has some application. In situations where a full byte is not needed or RAM is tight, 2 nibbles fit in the same amount of space as a byte. On desktop computers, this technique is simply a novelty. On microcontrollers, this could be a very important optimization. The nibble and similar small-sized data types are also used in networking.

Bitfield Solution

Most of the suggestions I've read include this:

    struct t {
        unsigned char s : 4; // 4-bit long s
        unsigned char b : 4; // 4-bit long b
    };

This method has one major problem, memory alignment, when the compiler will pad (add zeros at the end or between elements of a structure) the struct to fit into a word (or into a multiple of the largest element of a structure, because the definition of how large to pad the structure varies as well). Although access is faster when the data type is padded by the compiler into a word, this may present problems where keeping RAM usage to an absolute minimum is important. The problem is that memory alignment is hard to control. Some compilers offer a packed struct, where no memory alignment can be used (__attribute__((packed)) or compile with -fpack-struct for gcc). This solution is compiler dependent. In addition, the size of a word varies from computer to computer, so trying to make sure your structs fit in a word all the time is not entirely possible. Modern compilers can usually optimize this to the point where it's relatively fast, but a decent amount of the time the code will be slow. Bitfields cannot be used with arrays, so access time is not O(1).

This is not to say that this method doesn't have it's merits, such as ease of use and the fact that the compiler can optimize it easily.

Ribble Solution

The advantage of the bitfield solution should not be overlooked: it's mostly native. There are no function calls involved and it can be implanted in very little developer-visible code. These were some of the things I wanted to include in my alternative.

I made a new solution. I call it the ribble (phonetically: Romeo India Bravo Bravo Lima Echo). It is basically just bitshifting and masking joined in a macro. I am sure someone out there has done it before.

    #define ribble_get(storage, rept, size, offset) (((storage) >> ((rept) * (size) + (offset))) & ((1 << (size)) - 1))

This can seem quite awkward at first, so let us break it down step by step.

  • storage - the data type which will contain the smaller ribble. It can be any size, but int is recommended because int is usually the size of a word.
  • rept - indicates that the rept^th ribble should be the one read.
  • size - the size of the ribble.
  • offset - the offset that the ribble should be read from.

First, the storage is shifted the appropriate amount of bytes. This is basically the number of ribbles times the size of the ribbles plus the offset. If the ribbles are of varying lengths, the number of ribbles should be set to 0 and the offset should be used to represent distance. In order to remove the other unimportant values (the other ribbles which were not removed by shifting), the ribble is bitwise-and'd with ((1 << (size)) - 1). This is 1 shifted to the left size spaces, and then subtracted by one (so 1 << 4 = 0b10000, 0b10000 - 0b00001 = 0b01111). The result is just the ribble we wanted!

This allows us to get a ribble. In order to set a ribble, you would use:

    #define ribble_set(storage, rept, size, offset, set) ((storage) = ((storage) & ~(((1 << (size)) - 1) << ((rept) * (size) + (offset)))) | ((set) << ((rept) * (size) + (offset))))

That is more complicated than our get method. Note that we added a new variable, set, which is what we want to set the ribble to.

A diagram might aid in understanding this:

t = 0b0110
        ^^ overall objective: set that to 01
        (storage) = ((storage)
t & 0b1100 = 0b0110 & 0b1100 = 0b0100
        ^^ create a mask such that everything but those two are set to 0.
        & ~((1 << ((rept * (size) + (offset)))) - 1))
0b0100 | 0b0001 = 0b0101
             ^^ create a mask such that everything is 0 except the part we want to set
             | (((set) << ((rept) * (size) + (offset))))

Well, that is essentially the ribble!


Note that this method still requires you to create a byte if you want just one nibble. The advantage is that most of the time you know that you're just using a byte and you have complete control over that, unlike structs which can pack to an amount you might not know.

The Offset Only Ribble

Originally when creating the ribble, I planned to have a uniform size for every element. That doesn't seem like a good idea now, especially because it relies on the compiler to calculate the constants at compile time, otherwise a significant speed penalty will occur. I have made an offset only version of the two ribble commands so you can fine tune exactly what you want to access.

    #define ribble_get(storage, size, offset) (((storage) >> (offset)) & ((1 << (size)) - 1))

    #define ribble_set(storage, size, offset, set) ((storage) = ((storage) & ~(((1 << (size)) - 1) << (offset))) | ((set) << (offset)))

Benchmarks

I have created benchmarks for the three methods: Bitfield and Ribble. The source code and results follow. I do know there is a bias against bitfield.c (when we use a conditional) here, because if you really wanted to access the nth element you would either use a switch or an if.

Be sure to pass an argument. Each of these programs take an argument of which nibble to set [0 or 1]. You should be sure not to compile with optimization as the compiler will remove the dead code (in a real world application it would work fine).

ribble.c:

#define ribble_set(storage, rept, size, offset, set) (storage) = ((storage) & ~(((1 << (size)) - 1) << ((rept) * (size) + (offset)))) | ((set) << ((rept) * (size) + (offset)))
#define ribble_get(storage, rept, size, offset) ((storage) >> ((rept) * (size) + (offset))) & ((1 << (size)) - 1)

#include <stdlib.h>

int main(int argc, char **argv) {
    unsigned char t = 0;
    int which = iota(argv[1]);
    for (loop_count = 0; loop_count <= 1000000000; loop_count++) {
        ribble_set(t, which, 4, 0, 5);
        ribble_get(t, which, 4, 0);
    }
    return 0;
}

bitfield.c:

#include <stdlib.h>

struct t {
    unsigned char s : 4;
    unsigned char r : 4;
};

int main(int argc, char **argv) {
    struct t t = {.s = 4, .r = 15};
    int which = iota(argv[1]);
    unsigned long long loop_count;

    for (loop_count = 0; loop_count <= 1000000000; loop_count++) {
        if (which == 0) {
            t.s = 5;
            t.s;
        } else {
            t.r = 5;
            t.r;
        }
    }
    return 0;
}

offset_only_ribble.c:

#define ribble_get(storage, size, offset) ((storage) >> (offset)) & ((1 << (size)) - 1)

#define ribble_set(storage, size, offset, set) (storage) = ((storage) & ~(((1 << (size)) - 1) << (offset))) | ((set) << (offset))

#include <stdlib.h>

int main(int argc, char **argv) {
    unsigned char t = 0;
    int which = atoi(argv[1]) * 4;
    unsigned long long loop_count;

    for (loop_count = 0; loop_count <= 1000000000; loop_count++) {
        ribble_set(t, which, 4, 0, 5);
        ribble_get(t, 4, which);
    }
    return 0;
}

Benchmark Results

End results are accurate to the nearest millisecond. Tests were run 10 times each with no processes being created nor killed while the test was running.

ribble.c : 2932 milliseconds.
bitfield.c : 3243 milliseconds.
ribble_offset_only : 2924 milliseconds.

When using gcc, ribble_offset_only produces the same assembly output as ribble.c. This is because gcc pre-calculates the values. You would be hardpressed to find a compiler that doesn't, but ribble_offset_only is included just in case.

Conclusion

The ribble is a very powerful tool that can be used for the micromanagement of data types. However, it's a little confounded. Unless you really, desperately need the RAM or crave the micro-sized increase in speed, stay away from the ribble. It will do you more harm than good.

Known Bugs / Disadvantages / Errata

You can't create a ribble to the size of the variable that contains it without getting an overflow warning / an overflow. It's due to the (1 << size) - 1 code, if it shifts 9 slots (if the maximum is 8) you get 0b100000000 (9 slots) and then you go back to 0b1111111. This isn't a real problem because if you want to use the variable size anyway, you shouldn't be using a ribble.

I feel as if the ribble_set function could be made faster. I haven't looked into it much but it seems to take a lot longer than ribble_get. If anybody has any idea on this, I would appreciate the feedback.

A previous version of this post used 0b# for binary numbers. This worked in gcc but produced problems for other compilers.

The ribble is not thread safe.

I made the article (hopefully) easier to read.

I don't know any other bugs, or problems, with the ribble. Corrections are welcome and encouraged.

To Recap / tl;dr

    #define ribble_get(storage, size, offset) (((storage) >> (offset)) & ((1 << (size)) - 1))

    #define ribble_set(storage, size, offset, set) ((storage) = ((storage) & ~(((1 << (size)) - 1) << (offset))) | ((set) << (offset)))

ribble_get: read the bits in storage, with a size of size and an offset of offset

ribble_set: set the bits in storage to the bits that are set in set, with a size of size, an offset of offset.


Follow

Get every new post delivered to your Inbox.