stringsparser

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2026 License: MIT Imports: 5 Imported by: 1

README

go-strings-parser

strings.Split on steroids

A flexible Go library for parsing strings with support for separators, quoting, escaping, and character set validation.

Features

  • Configurable separators - Support for multiple separator runes (spaces, tabs, commas, or custom)
  • Quote handling - Single and double quote support for preserving spaces and special characters
  • Escape sequences - Configurable escape processing (\n, \t, \\, \", \', etc.)
  • Empty element control - Choose whether to preserve or skip empty elements
  • Character set validation - Predefined charsets for POSIX paths, Windows paths, and alphanumeric text
  • Custom processing - Apply custom transformation functions to parsed elements
  • Detailed errors - Error reporting with character positions for debugging

Installation

go get github.com/4nd3r5on/go-strings-parser

Quick Start

import sp "github.com/4nd3r5on/go-strings-parser"

// Basic parsing with default options
result, err := sp.Parse("foo bar 'baz qux'")
// Returns: ["foo", "bar", "baz qux"]

// Parse with quotes
result, err := sp.Parse("hi 'hi hi' hello")
// Returns: ["hi", "hi hi", "hello"]

// Parse with escaped quotes
result, err := sp.Parse("hi \"hi \\\" hi\" hello")
// Returns: ["hi", "hi \" hi", "hello"]

Usage Examples

Custom Separators
// Use comma as separator
result, err := sp.Parse("apple,orange,banana", 
    sp.WithSeparators(','))
// Returns: ["apple", "orange", "banana"]
Allow Empty Elements
// Preserve empty elements between separators
result, err := sp.Parse("a   b", 
    sp.WithAllowEmpty(true))
// Returns: ["a", "", "", "b"]

// Skip empty elements (default behavior)
result, err := sp.Parse("a   b", 
    sp.WithAllowEmpty(false))
// Returns: ["a", "b"]
Windows Path Validation
// Valid Windows path
result, err := sp.Parse("C:\\Users\\Documents\\file.txt", 
    sp.WithWindowsPath())

// Invalid path (contains illegal characters < >)
result, err := sp.Parse("C:\\user\\invalid<>\\file.txt", 
    sp.WithWindowsPath())
// Returns error
Custom Processing Functions
// Trim whitespace from each element
result, err := sp.Parse("'fine' ' clean space' '  clean both spaces  '", 
    sp.WithProcessFunc(func(s string) (string, bool, error) {
        return strings.TrimSpace(s), false, nil
    }),
    sp.WithAllowEmpty(false))
// Returns: ["fine", "clean space", "clean both spaces"]

// Skip elements based on condition
result, err := sp.Parse("apple banana cherry", 
    sp.WithProcessFunc(func(s string) (string, bool, error) {
        if s == "banana" {
            return "", true, nil // skip this element
        }
        return s, false, nil
    }))
// Returns: ["apple", "cherry"]

// Convert to uppercase
result, err := sp.Parse("hello world", 
    sp.WithProcessFunc(func(s string) (string, bool, error) {
        return strings.ToUpper(s), false, nil
    }))
// Returns: ["HELLO", "WORLD"]
Custom Rune Processing
// Custom escape replacement
customEscapes := map[rune]rune{
    'n': '\n',
    't': '\t',
    's': ' ',  // \s becomes space
}

result, err := sp.Parse("hello\\sworld\\n", 
    sp.WithProcessRuneFuncs(
        sp.NewReplaceEscapedFunc(customEscapes),
    ))
// Returns: ["hello world\n"]

// Combine multiple rune processors
result, err := sp.Parse("path/to/file", 
    sp.AppendProcessRuneFuncs(
        sp.NewCharsetProcessRuneFunc(posixCharset),
    ))
// Default escape processing + charset validation

How It Works

The parser processes input strings character by character through a state machine:

  1. Quote Detection - Recognizes single (') and double (") quotes to group elements
  2. Escape Handling - Processes backslash escapes (\n, \t, \", etc.)
  3. Separator Recognition - Splits elements at configured separator characters
  4. Character Processing - Applies rune-level transformations via ProcessRuneFunc
  5. Element Processing - Applies string-level transformations via ProcessFunc
  6. Validation - Checks character sets if configured
Parsing Rules
  • Quotes must appear at the start of an element (after a separator or at the beginning)
  • Escaped characters are processed according to configured rules
  • Separators inside quotes are treated as literal characters
  • Empty elements between separators can be preserved or skipped
  • The parser maintains strict validation of quote pairing and escape sequences

Configuration Options

Default Configuration
DefaultSeparators = []rune{' ', '\t', ','}

DefaultReplaceEscaped = map[rune]rune{
    'n': '\n',
    'r': '\r',
    't': '\t',
    '0': '\000',
}

DefaultProcessRuneFuncs = []ProcessRuneFunc{
    NewReplaceEscapedFunc(DefaultReplaceEscaped),
}
Available Options
  • WithSeparators(...rune) - Set custom separator characters
  • WithAllowEmpty(bool) - Control empty element preservation
  • WithProcessFunc(ProcessFunc) - Apply custom transformation to each element
  • WithProcessRuneFuncs(...ProcessRuneFunc) - Set rune-level processing functions (replaces defaults)
  • AppendProcessRuneFuncs(...ProcessRuneFunc) - Add additional rune processors (keeps defaults)
  • WithCharset(*Charset) - Validate against custom character set
  • WithWindowsPath() - Use Windows path charset and validation
Function Signatures
// ProcessFunc transforms completed elements
// - processed: the transformed string
// - skip: if true, element is discarded
// - err: processing error
type ProcessFunc func(s string) (processed string, skip bool, err error)

// ProcessRuneFunc processes individual characters
// - idx: character position in original string
// - current: string built so far
// - escaped: whether this rune was escaped with backslash
// - r: the rune to process
type ProcessRuneFunc func(idx int, current string, escaped bool, r rune) (
    processed rune, skip bool, err error)

type Option func(*Options)

Error Handling

The parser provides detailed error messages with character positions:

result, err := sp.Parse("'unclosed quote")
if err != nil {
    // Error: unclosed quote
    log.Fatal(err)
}

result, err := sp.Parse("hello 'there")
if err != nil {
    // Error: unclosed quote
    log.Fatal(err)
}

result, err := sp.Parse("hello\\")
if err != nil {
    // Error: dangling escape
    log.Fatal(err)
}

result, err := sp.Parse("hel'lo")
if err != nil {
    // Error: invalid character ''' at index 3: unexpected unescaped quote inside token
    var invalidCharErr *sp.InvalidCharError
    if errors.As(err, &invalidCharErr) {
        fmt.Printf("Invalid char '%c' at position %d\n", 
            invalidCharErr.Char, invalidCharErr.Index)
    }
}
Error Types
  • ErrUnclosedQuote - Quote opened but not closed
  • ErrDanglingEscape - Backslash at end of input with nothing to escape
  • ErrUnexpectedQuote - Quote appears in middle of unquoted token
  • ErrNotInTheCharset - Character not allowed by configured charset
  • InvalidCharError - Wraps errors with character position information

Common Use Cases

Parsing Command-Line Arguments
args, err := sp.Parse(`command --flag "value with spaces" --another-flag`)
// Returns: ["command", "--flag", "value with spaces", "--another-flag"]
Parsing CSV-Like Data
fields, err := sp.Parse("John,Doe,\"New York, NY\"", 
    sp.WithSeparators(','))
// Returns: ["John", "Doe", "New York, NY"]
Parsing File Paths
paths, err := sp.Parse("/home/user /opt/app '/path/with spaces'")
// Returns: ["/home/user", "/opt/app", "/path/with spaces"]
Building a Command Parser
func parseCommand(input string) ([]string, error) {
    return sp.Parse(input,
        sp.WithSeparators(' ', '\t'),
        sp.WithAllowEmpty(false),
        sp.WithProcessFunc(func(s string) (string, bool, error) {
            // Trim and skip empty
            trimmed := strings.TrimSpace(s)
            if trimmed == "" {
                return "", true, nil
            }
            return trimmed, false, nil
        }),
    )
}

parts, err := parseCommand(`git commit -m "Initial commit"`)
// Returns: ["git", "commit", "-m", "Initial commit"]
Advanced: Conditional Processing
// Parse and validate email addresses
result, err := sp.Parse("[email protected] [email protected]", 
    sp.WithProcessFunc(func(s string) (string, bool, error) {
        if !strings.Contains(s, "@") {
            return "", false, fmt.Errorf("invalid email: %s", s)
        }
        return strings.ToLower(s), false, nil
    }))

Documentation

Overview

Package stringsparser provides a flexible string parsing library with support for separators, quoting, escaping, and character set validation.

The parser splits input strings into elements using configurable separator runes, while respecting quoted sections and processing escape sequences. It supports optional validation against predefined or custom character sets, making it suitable for parsing file paths, command-line arguments, CSV-like data, and other structured text formats.

Key features:

  • Configurable and multiple separator support (spaces, tabs, commas, or custom runes)
  • Single and double quote handling
  • Configurable escape sequences (\n, \t, \\, \", \', etc)
  • Configurable empty element handling
  • Character set validation with predefined charsets for POSIX paths, Windows paths, and alphanumeric text
  • Case-sensitive or case-insensitive charset matching
  • Detailed error reporting with character positions
  • Custom element processing functions

Basic usage:

result, err := stringsparser.ParseStrings("foo bar 'baz qux'")
// Returns: ["foo", "bar", "baz qux"]

With charset validation:

result, err := stringsparser.ParseStrings(
    "/home/user/file.txt",
    stringsparser.WithWindowsPath(), // validates windows path format and charset
)

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrUnclosedQuote   = errors.New("unclosed quote")
	ErrDanglingEscape  = errors.New("dangling escape")
	ErrUnexpectedQuote = errors.New("unexpected unescaped quote inside token")
	ErrNotInTheCharset = errors.New("character is not in the charset")
)
View Source
var CharsetWindowsPath = NewCharset(windowsPathRunes(), true)

CharsetWindowsPath contains valid characters for Windows file paths. Excludes: <>:"/\|?* and control characters (0-31)

View Source
var DefaultOptions = Options{
	Separators:       DefaultSeparators,
	AllowEmpty:       false,
	ProcessFunc:      nil,
	ProcessRuneFuncs: DefaultProcessRuneFuncs,
}

DefaultOptions contains the default parsing configuration.

View Source
var DefaultReplaceEscaped = map[rune]rune{
	'n': '\n',
	'r': '\r',
	't': '\t',
	'0': '\000',
}
View Source
var DefaultSeparators = []rune{' ', '\t', ','}

Functions

func Parse

func Parse(str string, opts ...Option) ([]string, error)

Parse splits an input string into elements using a set of separator runes, with support for quoting, escaping, configurable handling of empty elements, and optional character set validation.

Separators define element boundaries unless they appear inside quotes. Multiple different separator runes may be used at once.

Quoted elements may be enclosed in single (') or double (") quotes. Quotes are not included in the output. Quotes must start at the beginning of an element; encountering a quote after other characters results in an error.

Backslash escapes are processed both inside and outside quotes

If AllowEmpty is false, consecutive separators are treated as a single separator and empty elements are discarded. If AllowEmpty is true, each separator produces a boundary and empty elements are preserved.

If a Charset is configured, all characters (after escape processing) are validated against the allowed character set. Invalid characters result in an InvalidCharError that includes the character and its position.

Errors are returned for dangling escape characters, unclosed quotes, or invalid characters.

Types

type Charset

type Charset struct {
	// contains filtered or unexported fields
}

Charset defines which characters are allowed in parsed elements.

func NewCharset

func NewCharset(chars []rune, caseSensitive bool) *Charset

NewCharset creates a charset from a set of allowed runes.

func (*Charset) Contains

func (cs *Charset) Contains(r rune) bool

Contains checks if a rune is in the charset.

type InvalidCharError

type InvalidCharError struct {
	Err   error
	Char  rune
	Index int
}

InvalidCharError represents an error for an invalid character in the input.

func NewInvalidCharError

func NewInvalidCharError(idx int, char rune, srcErr error) *InvalidCharError

NewInvalidCharError creates new InvalidCharError

func (*InvalidCharError) Error

func (e *InvalidCharError) Error() string

func (*InvalidCharError) Unwrap

func (e *InvalidCharError) Unwrap() error

type Option

type Option func(*Options)

Option is a function that modifies parsing options.

func AppendProcessRuneFuncs

func AppendProcessRuneFuncs(fns ...ProcessRuneFunc) Option

func WithAllowEmpty

func WithAllowEmpty(allow bool) Option

WithAllowEmpty sets whether empty elements should be preserved.

func WithCharset

func WithCharset(charset *Charset) Option

WithCharset sets the allowed character set for validation.

func WithProcessFunc

func WithProcessFunc(fn ProcessFunc) Option

WithProcessFunc sets a function to process each element before adding to output.

func WithProcessRuneFuncs

func WithProcessRuneFuncs(fns ...ProcessRuneFunc) Option

func WithSeparators

func WithSeparators(separators ...rune) Option

WithSeparators sets the separator runes.

func WithWindowsPath

func WithWindowsPath() Option

WithWindowsPath makes parser use Windows path charset and validates path with a regexp

type Options

type Options struct {
	Separators       []rune
	AllowEmpty       bool
	ProcessFunc      ProcessFunc
	ProcessRuneFuncs []ProcessRuneFunc
}

Options configures the string parsing behavior.

type ProcessFunc

type ProcessFunc func(element string) (processed string, skip bool, err error)

ProcessFunc is called for each parsed element before it's added to the output. It receives the element string and returns: - processed: the transformed string to use - skip: if true, the element is not added to the output - err: if non-nil, parsing stops and the error is returned

type ProcessRuneFunc

type ProcessRuneFunc func(idx int, element string, escaped bool, c rune) (processed rune, skip bool, err error)

ProcessRuneFunc is called for each parsed character before it's added to the element Receives: - idx: index of the parsed rune - element: the element before adding parsed rune - escaped: if character is escaped (has backslash before it) - c: parsed rune Returns: - processed: rune that will be added to the final element - skip: if true, the element is not added to the output - err: if non-nil, parsing stops and the error is returned (with added character and it's index)

func NewCharsetProcessRuneFunc

func NewCharsetProcessRuneFunc(charset *Charset) ProcessRuneFunc

func NewReplaceEscapedFunc

func NewReplaceEscapedFunc(replaceMap map[rune]rune) ProcessRuneFunc

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL