diff --git a/.github/workflows/docs-ci.yml b/.github/workflows/docs-ci.yml index 0c06ac6d..3a12f809 100644 --- a/.github/workflows/docs-ci.yml +++ b/.github/workflows/docs-ci.yml @@ -9,7 +9,7 @@ jobs: strategy: max-parallel: 4 matrix: - python-version: [3.9] + python-version: [3.13] steps: - name: Checkout code @@ -20,15 +20,8 @@ jobs: with: python-version: ${{ matrix.python-version }} - - name: Install Dependencies - run: pip install -e .[docs] - - name: Check Sphinx Documentation build minimally - working-directory: ./docs - run: sphinx-build -E -W source build + run: make docs - name: Check for documentation style errors - working-directory: ./docs - run: ./scripts/doc8_style_check.sh - - + run: make check diff --git a/.readthedocs.yml b/.readthedocs.yml index 8ab23688..3028be4c 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -7,9 +7,9 @@ version: 2 # Build in latest ubuntu/python build: - os: ubuntu-22.04 + os: ubuntu-24.04 tools: - python: "3.11" + python: "3.13" # Build PDF & ePub formats: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..92957546 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,91 @@ +# Contributing to AboutCode + +Welcome! We're excited that you're interested in contributing to AboutCode. This document will help you get started with contributing to our projects. + +## Quick Start for New Contributors + +1. **Explore Our Projects**: Browse through our [project list](README.md#projects) to find something that interests you +2. **Find an Issue**: Look for issues labeled `good first issue` or `help wanted` in the repository you're interested in +3. **Join the Community**: Introduce yourself in our [Gitter chat](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) or on [Slack](https://join.slack.com/t/aboutcode-org/shared_invite/zt-1paqwxccw-IuafuiAvYJFkTqGaZsC1og) +4. **Read the Documentation**: Check out our [full documentation](https://aboutcode.readthedocs.io/en/latest/) for detailed guides + +## Communication Channels + +- **Gitter Chat**: [aboutcode-org#discuss](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) +- **Slack**: [Join our workspace](https://join.slack.com/t/aboutcode-org/shared_invite/zt-1paqwxccw-IuafuiAvYJFkTqGaZsC1og) +- **Weekly Meetings**: We hold weekly community calls. See [meeting minutes](https://github.com/aboutcode-org/aboutcode/wiki/MeetingMinutes) for details + +## How to Contribute + +### Reporting Issues + +If you find a bug or have a feature request: +- Check if the issue already exists +- If not, create a new issue with a clear title and description +- Include steps to reproduce (for bugs) or use cases (for features) + +### Contributing Code + +1. **Fork the repository** you want to contribute to +2. **Create a branch** for your changes: `git checkout -b fix-issue-123` +3. **Make your changes** following the project's coding standards +4. **Write tests** if applicable +5. **Commit your changes** with a clear commit message (see our [commit message guidelines](https://aboutcode.readthedocs.io/en/latest/contributing/writing_good_commit_messages.html)) +6. **Push to your fork** and submit a pull request + +### Contributing Documentation + +Documentation improvements are always welcome! You can: +- Fix typos or clarify existing documentation +- Add examples or tutorials +- Improve the structure and organization + +Documentation is built using Sphinx. To build locally: +```bash +make docs +``` + +## Development Guidelines + +### Commit Messages + +We follow specific commit message conventions: +- Use imperative mood ("Add feature" not "Added feature") +- Keep the subject line under 50 characters +- Reference issue numbers when applicable (#123) +- Sign off your commits with `Signed-off-by: Your Name ` + +For detailed guidelines, see our [commit message documentation](https://aboutcode.readthedocs.io/en/latest/contributing/writing_good_commit_messages.html). + +### Testing + +Before submitting a pull request: +- Run existing tests to ensure nothing breaks +- Add new tests for new functionality +- Ensure all tests pass + +See the [testing documentation](https://aboutcode.readthedocs.io/en/latest/contributing/testing.html) for more details. + +## Code of Conduct + +We are committed to providing a welcoming and inclusive environment. Please read our [Code of Conduct](CODE_OF_CONDUCT.rst) before participating. + +## Getting Help + +- **Documentation**: Start with our [ReadTheDocs](https://aboutcode.readthedocs.io/en/latest/) +- **Chat**: Ask questions on [Gitter](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) or [Slack](https://join.slack.com/t/aboutcode-org/shared_invite/zt-1paqwxccw-IuafuiAvYJFkTqGaZsC1og) +- **Issues**: Open an issue in the relevant repository if you need help with something specific + +## Additional Resources + +- [Full Contributing Guide](https://aboutcode.readthedocs.io/en/latest/contributing.html) +- [Documentation Contribution Guide](https://aboutcode.readthedocs.io/en/latest/contributing/contrib_doc.html) +- [GSoC Information](https://github.com/aboutcode-org/aboutcode/wiki) + +## License + +By contributing to AboutCode, you agree that your contributions will be licensed under the Apache License 2.0. + +--- + +Thank you for contributing to AboutCode! Your efforts help make open source software safer and more transparent for everyone. diff --git a/apache-2.0.LICENSE b/LICENSE similarity index 100% rename from apache-2.0.LICENSE rename to LICENSE diff --git a/Makefile b/Makefile index 6970889b..23d40bf7 100644 --- a/Makefile +++ b/Makefile @@ -9,7 +9,10 @@ conf: docs: conf rm -rf docs/build/ - @${ACTIVATE} sphinx-build docs/source docs/build/ + @${ACTIVATE} sphinx-build -E -W docs/source docs/build/ + +check: + @${ACTIVATE} doc8 --max-line-length 100 docs/source/ --ignore D000 --quiet clean: @echo "-> Clean the Python env" diff --git a/README.md b/README.md index 0e73029c..b25bf14c 100644 --- a/README.md +++ b/README.md @@ -1,149 +1,119 @@ # AboutCode -### What is AboutCode? +## What is AboutCode? -AboutCode is a family of FOSS projects to uncover data ... about software: +AboutCode is a family of FOSS projects to uncover metadata about software: - where does the code come from? which software package? - what is its license? copyright? - is the code vulnerable, maintained, well coded? - what are its dependencies, are there vulnerabilities/licensing issues? -All these are questions that are important to answer: there are millions of free -and open source software components available on the web for reuse. +All these are questions that are important to answer: there are millions of free and open source software components available on the web for reuse. -Knowing where a software package comes from, what its license is and whether it -is vulnerable should be a problem of the past such that everyone can safely -consume more free and open source software. We support not only open source -software, but also open data, generated and curated by our applications. +Knowing where a software package comes from, what its license is and whether it is vulnerable should be a problem of the past such that everyone can safely consume more free and open source software. We support not only open source software, but also open data, generated and curated by our applications. > [!NOTE] -> This is a repository with information on aboutcode open source -> activities and not the actual code repository. See the -> [projects section](https://github.com/aboutcode-org/aboutcode#projects) below -> for links to all the code repositories of our projects with a brief overview -> and our [wiki](https://github.com/aboutcode-org/aboutcode/wiki) if you are -> looking to participate. +> This is a repository with information on aboutcode open source activities and not the actual code repository. See the [projects section](#projects) below for links to all the code repositories of our projects with a brief overview and our [wiki](https://github.com/aboutcode-org/aboutcode/wiki) if you are looking to participate. -### Documentation Build +## Important Links -![Doc Build](https://github.com/aboutcode-org/aboutcode/actions/workflows/docs-ci.yml/badge.svg) +- **Homepage**: http://aboutcode.org +- **Documentation**: https://aboutcode.readthedocs.io/en/latest/ +- **Chat**: [Gitter](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) | [Slack](https://join.slack.com/t/aboutcode-org/shared_invite/zt-1paqwxccw-IuafuiAvYJFkTqGaZsC1og) +- **Weekly Meetings**: [Meeting Minutes](https://github.com/aboutcode-org/aboutcode/wiki/MeetingMinutes) +- **GSoC**: [Wiki](https://github.com/aboutcode-org/aboutcode/wiki) +- **Documentation Build**: ![Doc Build](https://github.com/aboutcode-org/aboutcode/actions/workflows/docs-ci.yml/badge.svg) -> [!NOTE] -> To manually build the documentation, run the `$ make docs` command from -> the root of this repo. - -### Important Links - -Our homepage is at http://aboutcode.org - -Our documentation (in progress) is at -https://aboutcode.readthedocs.io/en/latest/ - -Join the chat online at -[app.gitter.im : aboutcode-org#discuss](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) -or if you're using the element app set the homeserver to `gitter.im` and then -join the -[aboutcode-org#discuss](https://matrix.to/#/#aboutcode-org_discuss:gitter.im) -chatroom. Introduce yourself and start the discussion! - -Look at our [wiki](https://github.com/aboutcode-org/aboutcode/wiki) for -information about our participation in the GSoC program. - -We have a weekly meeting, see more details -[here](https://github.com/aboutcode-org/aboutcode/wiki/MeetingMinutes). - -### Projects - -Each AboutCode project has its own repository: - -- **[ScanCode Toolkit](https://github.com/aboutcode-org/scancode-toolkit)**: a - set of code scanning tools to detect the origin and license of code and - dependencies. ScanCode now uses a plug-in architecture to run a series of - scan-related tools in one process flow. This is the most popular project and - is used by 100's of software teams . The lead maintainer is @pombredanne - -- **[Scancode.io](https://github.com/aboutcode-org/scancode.io)**: is a - web-based and API to run and review scans in rich scripted pipelines, on - different kinds of containers, docker images, package archives, manifests - etc, to get information on licenses, copyrights, source, vulneribilities. - The lead maintainer is @tdruez - -- **[VulnerableCode](https://github.com/aboutcode-org/vulnerablecode)**: is a - web-based API and database to collect and track all the known software - package vulnerabilities, with affected and fixed packages, references and a - standalone tool Vulntotal to compare this vulneribility information across - similar tools. This is maintained by @tg1999 and @pombredanne - -- **[univers](https://github.com/aboutcode-org/univers)** is a package to - parse and compare all the package versions and all the ranges. - -- **[purlDB](https://github.com/aboutcode-org/purldb)** consists of tools to - create and expose a database of purls (Package URLs) and also has package - data for all of these packages created from scans. This is maintained by - @jyang - -- **[FetchCode](https://github.com/aboutcode-org/fetchcode)** is a library to - reliably fetch any code via HTTP, FTP and version control systems such as - git. - -- **[Scancode Workbench](https://github.com/aboutcode-org/scancode-workbench)**: - a desktop application based on typescript and react to visualize and review - scan results from scancode scans. - -- **[AboutCode Toolkit](https://github.com/aboutcode-org/aboutcode-toolkit)**: - a set of command line tools to document the provenance of your code and - generate attribution notices. AboutCode Toolkit uses small yaml files to - document code provenance inside a codebase. The lead maintainer is - @chinyeungli - -- **[container-inspector](https://github.com/aboutcode-org/container-inspector)**: - a tool to analyze the structure and provenance of software components in - Docker images using static analysis. Maintained by @pombredanne - -- **[python-inspector](https://github.com/aboutcode-org/python-inspector)** - and **[nuget inspector](https://github.com/aboutcode-org/nuget-inspector/)** - inspects manifests and code to resolve dependencies (vulnerable and - non-vulnerable) for python and nuget packages respectively. - -- **[license-expression](https://github.com/aboutcode-org/license-expression/)**: - a library to parse, analyze, compare and normalize SPDX and SPDX-like - license expressions using a boolean logic expression engine. See - https://spdx.org/spdx-specification-21-web-version#h.jxpfx0ykyb60 to - understand what an expression is. See - https://github.com/aboutcode-org/license-expression for the code. The - underlying boolean engine is live at https://github.com/bastikr/boolean.py . - Both are co-maintained by @pombredanne - -- **ABCD aka AboutCode Data**: a simple set of conventions to define data - structures that all the AboutCode tools can understand and use to exchange - data. The details are at - [AboutCode Data](https://aboutcode.readthedocs.io/en/latest/aboutcode-data/abcd.html). - ABOUT files and ScanCode Toolkit data are examples of this approach. Other - projects such as https://libraries.io and and - [OSS Review Toolkit](https://github.com/heremaps/oss-review-toolkit) are - also using these conventions. - -- **[TraceCode Toolkit](https://github.com/aboutcode-org/tracecode-toolkit)**: - a set of tools to trace files from your deployment or distribution packages - back to their origin in a development codebase or repository. The primary - tool uses strace https://github.com/strace/strace/ to trace system calls on - Linux and construct a build graph from syscalls to show which files are used - to build a binary. We are contributors to strace. Maintained by @pombredanne - -We also co-started and worked closely with other FOSS orgs and projects: - -- [Package URL](https://github.com/package-url): a widely used standard to - reference software packages of all types with simple, readable and concise - URLs. - -- [SPDX](http://SPDX.org): aka. Software Package Data Exchange, a spec to - document the origin and licensing of packages. - -- [CycloneDX](https://cyclonedx.org) aka. OWASP CycloneDX is a full-stack Bill - of Materials (BOM) standard that provides advanced supply chain capabilities - for cyber risk reduction - -- [ClearlyDefined](https://ClearlyDefined.io): a project to review and help - FOSS projects improve their licensing and documentation clarity. This - project is incubating with https://opensource.org +> [!TIP] +> To manually build the documentation, run `make docs` from the root of this repo. + +## Contributing + +We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation, we'd love your help. + +**Get started:** +- Read our [CONTRIBUTING.md](CONTRIBUTING.md) guide +- Look for [good first issues](https://github.com/search?q=org%3Aaboutcode-org+label%3A%22good+first+issue%22+state%3Aopen&type=Issues) +- Join our [community chat](https://app.gitter.im/#/room/#aboutcode-org_discuss:gitter.im) + +## Projects + +### Core Tools + +| Project | Description | Maintainer | +|---------|-------------|------------| +| [ScanCode Toolkit](https://github.com/aboutcode-org/scancode-toolkit) | Detect origin, license, and vulnerabilities in code, packages, and dependencies | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [ScanCode.io](https://github.com/aboutcode-org/scancode.io) | Web UI and API for running complex scans in pipelines with CycloneDX and SPDX support | [@tdruez](https://github.com/tdruez) | +| [ScanCode LicenseDB](https://github.com/aboutcode-org/scancode-licensedb) | Free database of 2400+ software licenses with metadata and detection rules ([public instance](https://scancode-licensedb.aboutcode.org/)) | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra), [@DennisClark](https://github.com/DennisClark) | +| [ScanCode Workbench](https://github.com/aboutcode-org/scancode-workbench) | Desktop application to visualize and review ScanCode Toolkit scan results | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra), [@mjherzog](https://github.com/mjherzog) | +| [DejaCode](https://github.com/aboutcode-org/dejacode) | Enterprise application for open source license compliance and supply chain integrity | [@tdruez](https://github.com/tdruez), [@DennisClark](https://github.com/DennisClark) | +| [VulnerableCode](https://github.com/aboutcode-org/vulnerablecode) | Database of software package vulnerabilities with Web UI and API ([public instance](https://public.vulnerablecode.io/)) | [@TG1999](https://github.com/TG1999), [@keshav-space](https://github.com/keshav-space) | +| [PURLDB](https://github.com/aboutcode-org/purldb) | Database of package metadata keyed by PURL with API access | [@JonoYang](https://github.com/JonoYang) | + +### Inspectors + +Special-purpose analysis tools that run as ScanCode Toolkit plugins, ScanCode.io pipeline steps, or from the command line. + +| Project | Description | Maintainer | +|---------|-------------|------------| +| [binary-inspector](https://github.com/aboutcode-org/binary-inspector) | Extract symbols from ELF, Mach-O, WinPE and other binary formats | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [container-inspector](https://github.com/aboutcode-org/container-inspector) | Analyze structure and provenance of Docker image layers | [@JonoYang](https://github.com/JonoYang), [@chinyeungli](https://github.com/chinyeungli) | +| [source-inspector](https://github.com/aboutcode-org/source-inspector) | Inspect source code to collect symbols, strings, and comments | [@JonoYang](https://github.com/JonoYang) | +| [nuget-inspector](https://github.com/aboutcode-org/nuget-inspector) | Resolve dependencies for .NET/NuGet projects without requiring dotnet SDK | [@JonoYang](https://github.com/JonoYang) | +| [python-inspector](https://github.com/aboutcode-org/python-inspector) | Analyze PyPI packages and resolve Python dependencies | [@TG1999](https://github.com/TG1999), [@chinyeungli](https://github.com/chinyeungli) | +| [debian-inspector](https://github.com/aboutcode-org/debian-inspector) | Parse and inspect Debian control files and codebases | [@JonoYang](https://github.com/JonoYang), [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [elf-inspector](https://github.com/aboutcode-org/elf-inspector) | Inspect binary ELF files and collect metadata | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [go-inspector](https://github.com/aboutcode-org/go-inspector) | Extract dependencies and symbols from Go binaries | [@JonoYang](https://github.com/JonoYang) | +| [rust-inspector](https://github.com/aboutcode-org/rust-inspector) | Extract dependencies and symbols from Rust binaries | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | + +### Libraries + +| Project | Description | Maintainer | +|---------|-------------|------------| +| [license-expression](https://github.com/aboutcode-org/license-expression) | Parse, analyze, and normalize SPDX license expressions | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [commoncode](https://github.com/aboutcode-org/commoncode) | Common utilities for paths, dates, files, and hashes | [@AyanSinhaMahapatra](https://github.com/AyanSinhaMahapatra) | +| [extractcode](https://github.com/aboutcode-org/extractcode) | Universal archive extraction library and CLI tool | [@JonoYang](https://github.com/JonoYang) | +| [fetchcode](https://github.com/aboutcode-org/fetchcode) | Reliably fetch code via HTTP, FTP, and version control systems | [@JonoYang](https://github.com/JonoYang) | + +### Other Tools + +| Project | Description | Maintainer | +|---------|-------------|------------| +| [aboutcode-toolkit](https://github.com/aboutcode-org/aboutcode-toolkit) | Document code provenance and generate attribution notices using ABOUT files | [@chinyeungli](https://github.com/chinyeungli) | +| [univers](https://github.com/aboutcode-org/univers) | Parse and compare package versions across all ecosystems | [@TG1999](https://github.com/TG1999) | +| [federatedcode](https://github.com/aboutcode-org/federatedcode) | Decentralized, federated metadata system for open source software | [@keshav-space](https://github.com/keshav-space) | + +### AboutCode Data + +AboutCode Data is a set of conventions for data structures that all AboutCode tools can use to exchange data. ABOUT files and ScanCode Toolkit data are examples of this approach, supporting projects like [libraries.io](https://libraries.io/) and [OSS Review Toolkit](https://github.com/heremaps/oss-review-toolkit). + +## Standards and Related Projects + +AboutCode is based on key industry standards and works closely with other FOSS organizations: + +### PURL (Package URL) + +[PURL](https://github.com/package-url/purl-spec) is a URL string used to identify and locate software packages universally across programming languages, package managers, and tools. It originated from ScanCode and is in process to become an Ecma standard. + +**Maintainer**: [@johnmhoran](https://github.com/johnmhoran) + +### VERS (Version Range Specification) + +VERS is an emerging specification for resolving dependency and vulnerable version ranges. It originated as part of the PURL project and is in process to become an Ecma standard. + +**Specification**: [VERSION-RANGE-SPEC.rst](https://github.com/package-url/purl-spec/blob/c29b870ab33382309eefee2a0975ef7f71fdb742/VERSION-RANGE-SPEC.rst) + +### Related Organizations + +- [Package URL](https://github.com/package-url): A widely used standard to identify software packages with simple, readable URLs. See the [PURL discussions](https://github.com/package-url/purl-spec/discussions) for Ecma standardization details. + +- [SPDX](http://SPDX.org): System Package Data Exchange, a specification to document the origin and licensing of packages. + +- [CycloneDX](https://cyclonedx.org): OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard for supply chain security. + +- [ClearlyDefined](https://ClearlyDefined.io): A project to help FOSS projects improve their licensing and documentation clarity (incubating with [opensource.org](https://opensource.org)). + +--- + +**License**: Apache License 2.0 | **Code of Conduct**: [CODE_OF_CONDUCT.rst](CODE_OF_CONDUCT.rst) diff --git a/docs/source/_static/gsoc2025/scancodeio_varsha/project_flow.png b/docs/source/_static/gsoc2025/scancodeio_varsha/project_flow.png new file mode 100644 index 00000000..b9fda8b7 Binary files /dev/null and b/docs/source/_static/gsoc2025/scancodeio_varsha/project_flow.png differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png new file mode 100644 index 00000000..266d2e78 Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/api.png differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif b/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif new file mode 100644 index 00000000..874eb53d Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/extension_demo.gif differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png new file mode 100644 index 00000000..111eaba5 Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png differ diff --git a/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png b/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png new file mode 100644 index 00000000..fae1fa7d Binary files /dev/null and b/docs/source/_static/gsoc2025/vulnerablecode_michael/registries.png differ diff --git a/docs/source/_static/images/aboutcode_logo.svg b/docs/source/_static/images/aboutcode_logo.svg new file mode 100644 index 00000000..66233177 --- /dev/null +++ b/docs/source/_static/images/aboutcode_logo.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css index 5863ccf5..5f25342e 100644 --- a/docs/source/_static/theme_overrides.css +++ b/docs/source/_static/theme_overrides.css @@ -2,7 +2,7 @@ .wy-nav-content { max-width: 100%; padding: 0px 40px 0px 0px; - margin-top: 0px; + margin-top: 20px; } .wy-nav-content-wrap { @@ -12,15 +12,44 @@ div.rst-content { max-width: 1300px; border: 0; - padding: 10px 80px 10px 80px; + padding: 30px 80px 10px 80px; margin-left: 50px; + line-height: 1.6; } @media (max-width: 768px) { div.rst-content { max-width: 1300px; border: 0; - padding: 0px 10px 10px 10px; + padding: 20px 10px 10px 10px; margin-left: 0px; + line-height: 1.5; } } + +/* Minimal UI fixes */ + +/* Fix 1: Reduce excessive header sizes */ +.rst-content h1 { + font-size: 2.2rem; +} + +.rst-content h2 { + font-size: 1.8rem; +} + +/* Fix 2: Improve dark mode link visibility */ +[data-theme="dark"] .rst-content a { + color: #5ca8ff; +} + +/* Fix 3: Better sidebar navigation spacing */ +.wy-menu-vertical li a { + padding: 0.6em 1.2em; +} + +/* Fix 4: Reduce list indentation */ +.rst-content ul, +.rst-content ol { + margin-left: 1.5rem; +} \ No newline at end of file diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html index c0ecc66e..7360336e 100644 --- a/docs/source/_templates/layout.html +++ b/docs/source/_templates/layout.html @@ -1,8 +1,29 @@ {% extends "!layout.html" %} - {% block menu %} - {{ super() }} -
- Index -
- {% endblock %} +{%- block sidebartitle %} +{%- set _logo_url = logo_url|default(pathto('_static/' + (logo or ""), 1)) %} + + {% if not theme_logo_only %}{{ project }}{% endif %} + {%- if logo or logo_url %} + + {%- endif %} + + +{%- if READTHEDOCS or DEBUG %} +{%- if theme_version_selector or theme_language_selector %} +
+
+
+
+{%- endif %} +{%- endif %} + +{%- include "searchbox.html" %} +{%- endblock %} + +{% block menu %} +{{ super() }} +
+ Index +
+{% endblock %} \ No newline at end of file diff --git a/docs/source/archive/gsoc-toc.rst b/docs/source/archive/gsoc-toc.rst index 421be092..54b1ffbd 100755 --- a/docs/source/archive/gsoc-toc.rst +++ b/docs/source/archive/gsoc-toc.rst @@ -8,6 +8,19 @@ designed to encourage university student participation in open source software development. It was started by Google in 2005. More about GSoC - ``_ +GSoC 2025 +--------- + +.. toctree:: + :maxdepth: 2 + + gsoc/reports/2025/scancodeio_varsha + gsoc/reports/2025/scancodeio_aayush + gsoc/reports/2025/scancodeio_manit + gsoc/reports/2025/scancode_toolkit_alok + gsoc/reports/2025/vulnerablecode_michael + + GSoC 2024 --------- diff --git a/docs/source/archive/gsoc/org_pages/gsoc_2017.rst b/docs/source/archive/gsoc/org_pages/gsoc_2017.rst index d6f595a6..78f43cfe 100644 --- a/docs/source/archive/gsoc/org_pages/gsoc_2017.rst +++ b/docs/source/archive/gsoc/org_pages/gsoc_2017.rst @@ -2,9 +2,10 @@ Google Summer of Code 2017 ========================== -.. image:: https://cdn.rawgit.com/wiki/nexB/aboutcode/aboutcode_logo.svg - :target: http://www.aboutcode.org/ +.. image:: /_static/images/aboutcode_logo.svg + :target: https://www.aboutcode.org/ :alt: AboutCode Logo + :width: 200px Welcome to AboutCode! This year AboutCode is a mentoring Organization for diff --git a/docs/source/archive/gsoc/reports/2025/scancode_toolkit_alok.rst b/docs/source/archive/gsoc/reports/2025/scancode_toolkit_alok.rst new file mode 100644 index 00000000..1694a3bd --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/scancode_toolkit_alok.rst @@ -0,0 +1,201 @@ +======================================================================== +Have variable license sections in license rules +======================================================================== + +**Organization:** `AboutCode `_ + +**Projects:** `Scancode Toolkit `_ + +**Mentee:** `Alok Kumar (alok1304) `_ + +**Mentors:** + +- `Philippe Ombredanne `_ +- `Ayan Sinha Mahapatra `_ + +Overview +-------- +This project aims to enhance the `detection_log` by clearly indicating when `extra-words` +are detected. These `extra-words` represent variable parts in the license rules, which +previously caused the match score to fall below 100. + +To address this issue, the implementation now verifies whether the `extra-words` +appear in the correct position within the license text. If they do, the score is +adjusted and improved accordingly, resulting in more accurate license rule matching. + +-------------------------------------------------------------------------------- + +Implementation +-------------- + +- **Enhanced the detection_log:** + + - Display `extra-words` when they are detected. + +- **Added extra-phrase marker like [[n]] for the extra-words:** + + - The `extra-phrase` is denoted by double opening square brackets ``[[`` + and double closing square brackets ``]]``. + - Here, `n` represents the maximum number of allowable `extra-words`. + - The `extra-phrase` ``[[n]]`` is inserted in license rules at positions + where `extra-words` may appear. + - The value of `n` specifies how many `extra-words` are permitted + at that location. + +- **Improve Score:** + + - Check whether `extra-words` appear in the correct position as defined by + the `extra-phrase`, and ensure they do not exceed the maximum allowable limit. + - If the conditions are satisfied, increase the match score to ``100``. + +- **Shows in detection_log:** + + - If the score is increased that means `extra-words` are in the correct + position, then show ``extra-words-permitted-in-rule`` in the `detection_log`. + - If the `extra-words` are at wrong place or exceed the maximum allowable limit, + then show ``extra-words`` in the `detection_log`. + +- **Testing:** + + - Added tests for the `extra-phrase` functionality, such as + `test_extra_phrase_tokenizer` and `test_extra_phrase_spans`, to ensure that + phrases are correctly identified and processed. + - Implemented multiple tests to verify that `extra-words` appear in the correct + position according to the rules and that the match score is updated correctly + when they are within the allowable limit. + - Covered various edge cases where `extra-words` might be misplaced or exceed + the maximum allowable count, ensuring the scoring and logging behave as expected. + +-------------------------------------------------------------------------------- + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 60 30 10 + :header-rows: 1 + + * - Sr. no + - Name + - Link + - Status + * - 1 + - Display `extra-words` in `detection_log` if present + - `aboutcode.org/scancode-toolkit#4402 + `_ + - Merged + * - 2 + - Improve score by supporting `extra_phrase` for `extra-words` in rules + - `aboutcode.org/scancode-toolkit#4432 + `_ + - Open + * - 3 + - Add extra-phrase in rules + - `aboutcode.org/scancode-toolkit#4518 + `_ + - Open + +Related Issues +-------------- + +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - `extra-words` does not show up in detection_log properly + - `#4400 + `_ + * - 2 + - Improve score when `extra-words`` are found in the correct position + - `#4420 + `_ + +Pre GSoC Work +------------- + +Before GSoC, I had contributed the following PRs: + +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - Renaming the dependency attribute `is_resolved` to `is_pinned` + - `aboutcode-org/scancode-workbench#638 + `_ + * - 2 + - Add test for all PyPI METADATA versions + - `aboutcode-org/scancode-toolkit#4180 + `_ + * - 3 + - Add test for false positive GPL3 license + - `aboutcode-org/scancode-toolkit#4106 + `_ + * - 4 + - Add new rules for EUPL license + - `aboutcode-org/scancode-toolkit#4204 + `_ + * - 5 + - Add DUMB License and detection rule + - `aboutcode-org/scancode-toolkit#4400 + `_ + * - 6 + - Fixing the dead link by cross-reference in the documentation + - `aboutcode-org/purldb#550 + `_ + * - 7 + - Add test for equivalent word + - `aboutcode-org/scancode-toolkit#4305 + `_ + * - 8 + - Enhance code visibility in dark mode + - `aboutcode-org/scancode-workbench#637 + `_ + +Post GSoC +--------- + +I plan to continue contributing by adding `extra-phrase` support across many +license rules. This will strengthen license detection by making it more accurate +and flexible in handling variations within the rules. + +For identifying named entities in rules, I created a new repository i.e +`named-entity-utils `_ which I am +currently working on. This utility is used to add `extra-phrase` markers in rules +at positions where named entities are present. + +Links +----- + +* `Project Idea + `_ + +* `Official GSoC project page + `_ + +* `GSoC Proposal + `_ + +* `Project Board `_ + +Acknowledgements +---------------- + +I would like to thank my mentors: + +- `Philippe Ombredanne`_ +- `Ayan Sinha Mahapatra`_ + +A special thanks to my mentors who always supported me throughout this journey. Whenever +I faced a problem, we discussed it in depth during our weekly status calls. Without +their guidance and constant help, completing this project would not have been possible. + +I also plan to explore more projects in AboutCode and contribute whenever I get +time, because I would love to remain a part of this wonderful organization. diff --git a/docs/source/archive/gsoc/reports/2025/scancodeio_aayush.rst b/docs/source/archive/gsoc/reports/2025/scancodeio_aayush.rst new file mode 100644 index 00000000..d9e040bc --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/scancodeio_aayush.rst @@ -0,0 +1,161 @@ +======================================================================== +Create file-system tree view for scanned codebases +======================================================================== + +**Organization:** `AboutCode `_ + +**Projects:** `Scancode.io `_ + +**Mentee:** `Aayush Kumar (aayushkdev) `_ + +**Mentors:** + +- `Thomas Druez `_ +- `Tushar Goel `_ +- `Omkar Phansopkar `_ +- `Swastik Sharma `_ + +Overview +-------- +ScanCode.io previously allowed browsing project scans only one directory at a time, +which made exploring large codebases or container images slow and inefficient. + +This project introduced an interactive codebase tree view that lets users +navigate directories and files hierarchically, similar to a file explorer. + +-------------------------------------------------------------------------------- + +Implementation +-------------- +- **Changes in the CodebaseResource model:** + - Introduced a new parent_path field to the CodebaseResource model to + efficiently fetch the children of a directory. + - Ensured that top-level paths are stored during resource creation, + which is necessary for rendering root-level nodes in the file tree. + +- **Backend View:** + - Implemented a new `CodebaseResourceTreeView` View to fetch and display + immediate children of a directory. + - Added a new `CodebaseResourceTableView` View to display the details of a file + in tabular format with support for filtering. + - Used HTMX to update data in place without needing to reload the file for each change. + +- **Frontend Codebase Tree:** + - Introduced a collapsible file tree panel in the left pane of the project resource view. + - Implemented chevron toggling to expand or collapse a directory’s immediate children: + - If children were already fetched, they are simply shown or hidden. + - Directories with no children display without a chevron. + - Enabled lazy loading to fetch directory contents only when expanded, + reducing initial load time. + + +- **Testing:** + + - Conducted large-scale testing to ensure API and UI can handle thousands of files efficiently. + - Added unit tests for both backend and frontend to verify that the APIs return correct data, + the tree view expands and collapses properly, and file/directory details are displayed as + expected. + + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 60 30 10 + :header-rows: 1 + + * - Sr. no + - Name + - Link + - Status + * - 1 + - Add support for tracking parent of CodebaseResource + - `aboutcode.org/scancode.io#1691 + `_ + - Merged + * - 2 + - Add a resource tree explorer to explore scanned images + - `aboutcode.org/scancode.io#1704 + `_ + - Open + * - 3 + - Add filter and search support to the codebase tree + - `aboutcode.org/scancode.io#1828 + `_ + - Open + +Related Issues +-------------- + +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - Provide an explorer-style tree in resource view + - `#697 + `_ + * - 2 + - Add support for tracking parent of CodebaseResource entries and ensure top level paths are stored + - `#1687 + `_ + * - 3 + - Add a resource tree explorer to explore scanned images + - `#1682 + `_ + +Pre GSoC Work +------------- + +Here are some of the PR's I submitted before GSoC: + +- `Enforced --path as a required parameter for scancode-license-data module + `_ +- `Fixed missing migration for Project.purl field + `_ +- `Reorder XLSX output fields in RESOURCES sheet + `_ +- `Added the ability to export the current filtered QuerySet of a FilterView to JSON format + `_ +- `Added support for “caramel” license + `_ +- `Added the is_notice flag to the --classify option + `_ + +Post GSoC +--------- + +I plan to continue contributing by implementing further performance optimizations in my project and +enhancing the overall user experience by refining and polishing the UI. + +Links +----- + +* `Project Idea + `_ + +* `Official GSoC project page + `_ + +* `GSoC Proposal + `_ + +* `Project Board `_ + +Acknowledgements +---------------- + +I would like to thank my mentors: + +- `Thomas Druez `_ +- `Tushar Goel `_ +- `Omkar Phansopkar `_ +- `Swastik Sharma `_ + +The weekly status calls were extremely valuable, as they provided me with guidance +on how to approach problems, break tasks into manageable steps, and stay on track +with my progress. These discussions helped me clarify doubts quickly and gave me a +clear direction on how to get things done efficiently. diff --git a/docs/source/archive/gsoc/reports/2025/scancodeio_manit.rst b/docs/source/archive/gsoc/reports/2025/scancodeio_manit.rst new file mode 100644 index 00000000..6f44d550 --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/scancodeio_manit.rst @@ -0,0 +1,237 @@ +===================================================== +Enhance Compliance Mechanisms and CI Provider Support +===================================================== + + +**Organization:** `AboutCode `_ + + +**Projects:** `Scancode.io `_ and `Scancode-action `_ + + +**Mentee:** `Manit Singh (NucleonGodX) `_ + + +**Mentors:** + + +- `Thomas Druez `_ +- `Dennis Clark `_ +- `Pranay Das `_ +- `Avishrant Sharma `_ + + +Overview +-------- +ScanCode.io previously supported compliance mechanisms only based on license policies, +which limited the comprehensive assessment of software projects for organizations +with diverse compliance requirements. + + +This project enhanced ScanCode.io to support additional compliance mechanisms beyond +license policies, including license clarity scores, vulnerability levels, and scorecard scores. +Additionally, the project expanded scancode-action support to multiple CI providers beyond +GitHub Actions, including Azure Pipelines and Jenkins CI. + + +-------------------------------------------------------------------------------- + + +Implementation +-------------- +- **Independent Compliance Mechanisms:** + + - Developed an independent mechanism for compliance based on scorecard scores + and license clarity scores. + - Integrated these mechanisms into the database's project extra_data field, API endpoints, + check compliance command, and UI project view. + - Created a unified threshold mechanism for both license clarity and scorecard compliance, + reducing code duplication and improving maintainability. + + +- **Unified Script Structure:** + + - Implemented a unified Python script structure that generates bash code for use across + different CI providers including GitHub Actions, Azure Pipelines, and Jenkins CI. + - This approach significantly reduces redundancy in CI provider implementations and + ensures consistent behavior across platforms. + + +- **CI Provider Expansion:** + + - Added comprehensive support for Azure Pipelines with proper configuration templates + and integration workflows. + - Added PR for using scancode-action with Jenkins. + - Ensured all CI providers utilize the same core scanning functionality through + the unified script approach. + + +- **Policy Validation Improvements:** + + - Fixed policy validation logic to properly handle different compliance mechanisms + without requiring license_policies for all policy files. + - Enhanced error handling and validation messages for better user experience. + + +Linked Pull Requests +-------------------- + + +.. list-table:: + :widths: 10 60 30 10 + :header-rows: 1 + + + * - Sr. no + - Name + - Link + - Status + * - 1 + - Introduce Independent License Clarity Thresholds Mechanism + - `scancode.io#1689 + `_ + - Merged + * - 2 + - Integration of Clarity compliance mechanism + - `scancode.io#1705 + `_ + - Merged + * - 3 + - Refactor a common threshold mechanism for both license clarity and scorecard score + - `scancode.io#1799 + `_ + - Merged + * - 4 + - Add compliance support based on OpenSSF Scorecard score + - `scancode.io#1800 + `_ + - Merged + * - 5 + - Fix policies validation + - `scancode.io#1814 + `_ + - Merged + * - 6 + - Add Azure pipelines support + - `scancode-action#19 + `_ + - Open + * - 7 + - Add support for jenkins-ci + - `scancode-action#21 + `_ + - Open + * - 8 + - Add support for python script for ci providers + - `scancode-action#23 + `_ + - Open + + +Related Issues +-------------- + + +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + + * - Sr. no + - Name + - Link + * - 1 + - Add license clarity score-based Compliance support + - `#1678 + `_ + * - 2 + - Add Vulnerability Severity-Based Compliance Support + - `#1679 + `_ + * - 3 + - Add support for Azure pipelines + - `#18 + `_ + * - 4 + - Add support for Jenkins + - `#20 + `_ + * - 5 + - Add scorecard based compliance support + - `#1794 + `_ + * - 6 + - Add a mechanism to eliminate redundant Bash code across CI providers + - `#22 + `_ + * - 7 + - Refactor License Clarity and Scorecard Compliance Thresholds into Unified Module + - `#1797 + `_ + * - 8 + - Policies validation incorrectly requires license_policies for all policy files + - `#1813 + `_ + + +Pre GSoC Work +------------- + + +Here are some of the PRs I submitted before GSoC: + + +- `Enhanced package detection and improved license detection accuracy + `_ +- `Fixed vulnerability data processing issues + `_ +- `Improved license classification and detection mechanisms + `_ +- `Enhanced vulnerability database integration + `_ + + +Post GSoC +--------- + + +I plan to continue contributing by: + + +- Completing the Pull requests of integrating other CI providers in scancode-action + +Links +----- + + +* `Project Idea + `_ + + +* `Official GSoC project page + `_ + + +* `GSoC Proposal + `_ + + +* `Project Board `_ + + +Acknowledgements +---------------- + + +I would like to thank my mentors: + + +- `Thomas Druez `_ +- `Dennis Clark `_ +- `Pranay Das `_ +- `Avishrant Sharma `_ + + +Their guidance was instrumental throughout the project development. The regular feedback sessions +helped me navigate complex architectural decisions, especially when designing the unified +compliance mechanism. diff --git a/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst b/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst new file mode 100644 index 00000000..ca2717a5 --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/scancodeio_varsha.rst @@ -0,0 +1,149 @@ + +===================================================== +Adding Ability to Store and Query Downloaded Packages +===================================================== + +**Organization:** `AboutCode `__ + +**Project:** `ScanCode.io `__ + +| **Contributor:** Varsha U N +| **GitHub:** `VarshaUN `__ +| **LinkedIn:** `Varsha U N `__ + +**Mentors:** +- `Philippe Ombredanne `__ +- `Ayan Sinha Mahapatra `__ + +Overview +-------- + +ScanCode.io currently stores scanned packages on disk without a centralized index, +leading to duplicate storage, project-specific data, and potential data loss when +inputs are deleted. This project enhances ScanCode.io by introducing structured +package storage and querying, enabling indexing, reuse across projects, and +reliable preservation. + +Implementation +-------------- + +The project involved the following key components and steps: + + +.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png + :alt: Project Flow Diagram + :align: center + :width: 70% + +This project addresses the limitations of ScanCode.io's unstructured package +storage by adding a system to index, reuse, and preserve packages reliably. + + +Storage System Development: + +- Created a `DownloadStore` abstract base class in `archiving.py` to + define the interface for managing package content and metadata + storage. + +- Built the `LocalFilesystemProvider` class to store downloads on the + local filesystem, using a SHA256-based nested directory structure. + +- Implemented methods for storing (`put`), retrieving (`get`), listing + (`list`), and searching (`find`) downloads, with metadata saved in + `origin-.json` files. + +Integration with ScanCode.io: + +- Updated `pipelines/init.py` to incorporate the archiving system into + ScanCode.io’s pipeline workflow, ensuring downloaded packages are + stored during execution. + +- Revised `input.py` to process package download inputs, passing + content, `download_url`, `download_date`, and `filename` to the + archiving system. + +User Interface Enhancements: + +- Modified the project resource view to display stored package + information, including download URLs and dates. + +Validation and Testing: + +- Wrote unit tests in `test_archiving.py` to verify + `LocalFilesystemProvider` functionality (`put`, `get`, `list`, + `find`), testing normal cases, edge cases (e.g., empty files), and + errors (e.g., duplicate origins). + + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. No + - Name + - Link + * - 1 + - Add download archiving system + - `scancode.io#1815 `__ + * - 2 + - Support local package storage + - `scancode.io#1685 `__ + +Related Issues +-------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. No + - Name + - Link + * - 1 + - Store and retrieve scanned packages + - `#1063 `__ + * - 2 + - Support local package storage + - `#1683 `__ + +Pre-GSoC Work +------------- + +Here are some PRs submitted before GSoC: + +- `Add bluefin-container image support `__ +- `Tag whitedout files `__ +- `Support python-private-classifier `__ +- `Parse labels in Dockerfile `__ +- `Add OCI labels to Dockerfile `__ +- `Extract LibreOffice documents `__ + +Links +----- + +- **Project Idea:** `GSoC 2025 Idea `__ +- **GSoC Project Page:** `GSoC 2025 `__ +- **Proposal:** `Project Proposal `__ + +Future Work +----------- + +Future enhancements include implementing the web UI for the `LocalFilesystemProvider` +to enable package uploads, searches, listings, and retrievals in ScanCode.io, with +Django views, templates, and URL routes, backed by comprehensive testing. Additionally, +integrating an external cloud storage option (e.g., AWS S3) alongside the local +filesystem will extend the `DownloadStore` interface, providing scalable and remote +storage capabilities. + +Closing Note +------------ + +During GSoC 2025, my mentors and I held weekly meetings to discuss progress, +challenges, and next steps. I am deeply grateful to my mentors for their guidance +and support, which greatly enriched my learning experience. + + + diff --git a/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst b/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst new file mode 100644 index 00000000..1e43e607 --- /dev/null +++ b/docs/source/archive/gsoc/reports/2025/vulnerablecode_michael.rst @@ -0,0 +1,230 @@ +VulnerableCode: On-demand live evaluation of packages +===================================================== + +Organization - `AboutCode `_ +----------------------------------------------------------- +| **Michael Ehab Mikhail** +| GitHub: `michaelehab `_ +| LinkedIn: `@michaelehab16 `_ +| Project: `VulnerableCode + `_ +| Official GSoC project page: `Project Link + `_ +| GSoC Proposal: `Proposal Link + `_ + +Overview +-------- + +VulnerableCode traditionally relied on **batch importers** to fetch +and store all advisories from a source at once. While effective for +building complete databases, batch importers are slow and +resource-heavy for developers who only need vulnerability +data for a **single package**. + +This project introduces **live importers**, a new class of +importers that operate in a *package-first* mode. Instead of +pulling all advisories, they run against a single +PackageURL (PURL), returning only the advisories affecting +that package. This makes vulnerability evaluation +**faster, more efficient, and more personalized**, since the +database is gradually filled with only the advisories +that matter to each user. + +To support this, I added: + +* A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers. +* A new **API endpoint** that accepts a **PURL**, enqueues compatible + live importer pipelines into a Redis queue, and executes them asynchronously + via workers. +* Integration with **VulnTotal** and its **browser extension**, enabling users + to evaluate packages in real-time through a seamless interface. + +This work bridges the gap between **batch-first databases** and +**package-first queries**, improving VulnerableCode's flexibility and enabling +better integration with developer workflows. + +.. note:: + A PURL (Package URL) is a universal way to identify and locate software + packages. `More on PURL `_ + + +Project Design and Architecture +------------------------------- + +The new live importers system builds on existing batch importers, while introducing +a parallel registry and asynchronous execution model for package-first runs. + +Importer Registries +^^^^^^^^^^^^^^^^^^^ + +* ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2). +* ``LIVE_IMPORTERS_REGISTRY`` holds live importers. + +Each live importer: + +* Inherits from its batch importer (when logic can be reused), or directly + from ``VulnerableCodeBaseImporterPipelineV2`` when a separate + implementation is needed. +* Declares a ``supported_types`` array, defining compatible package + ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc). +* Implements a package-first ``collect_advisories()`` method, which + restricts results to advisories relevant to the given PURL. + +Live importer executions are asynchronous: once triggered, they are placed in +a Redis-backed job queue and processed by dedicated workers. This prevents +blocking the main API thread and allows multiple evaluations to run safely +in parallel. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png + :alt: Class architecture of importers registries + :align: center + :width: 70% + + Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and + ``LIVE_IMPORTERS_REGISTRY``. + +API Endpoint +^^^^^^^^^^^^ + +The new API endpoint is responsible for handling live evaluation requests. + +* Input: + + * ``purl`` (required) +* Execution: + + * Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types`` + match the PURL. + * Enqueues the pipelines runs of these live importers in a ``live`` rq. + * Returns the **Live Run ID**, information about the pipelines to + run, and the status url. + * The status URL shows the current state of a live evaluation run + and its individual pipeline runs. + +* Output: + + * Once workers complete execution, the resulting advisories are imported + into the database and exposed as JSON through the status endpoint. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png + :alt: Live Pipeline Run Class + :align: center + :width: 70% + + Live Pipeline Run Class and how it groups multiple PipelineRuns. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png + :alt: Live Importers API request flow + :align: center + :width: 70% + + Flow of API endpoint: selecting compatible live importers and executing + them in parallel. + +Integration with VulnTotal +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The new API was integrated into VulnTotal as an optional datasource: + +* VulnTotal now checks the local environment for + ``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``. +* If enabled, VulnTotal queries VulnerableCode in package-first mode. +* This allows VulnTotal to use both its proprietary datasources **and** + the user's gradually built local database, improving coverage and + personalization. + +Integration with VulnTotal Browser Extension +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The VulnTotal browser extension was updated to support live importers: + +* Users can enable the "Local VulnerableCode" datasource and live evaluation option. +* When enabled, package lookups are forwarded to the new API, retrieving + advisories in real-time. +* This reduces setup effort—developers can get live vulnerability checks + directly in their browser, provided they have a local VC instance. + +.. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif + :alt: Live evaluation demo in VulnTotal browser extension + :align: center + :width: 70% + + VulnTotal and its browser extension consuming the new live evaluation API. + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 40 20 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - Add Live Evaluation API endpoint and PyPa live pipeline importer + - `aboutcode-org/vulnerablecode#1969 + `_ + * - 2 + - Add Gitlab Live V2 Importer + - `aboutcode-org/vulnerablecode#1910 + `_ + * - 3 + - Add Curl Live Importer V2 + - `aboutcode-org/vulnerablecode#1923 + `_ + * - 4 + - Add Elixir Security Live V2 Importer + - `aboutcode-org/vulnerablecode#1935 + `_ + * - 5 + - Add NPM Live Importer V2 + - `aboutcode-org/vulnerablecode#1941 + `_ + * - 6 + - Add GitHub OSV Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1977 + `_ + * - 7 + - Add Postgres Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1982 + `_ + * - 8 + - Add PySec Live V2 Importer Pipeline + - `aboutcode-org/vulnerablecode#1983 + `_ + * - 9 + - Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation + - `aboutcode-org/vulnerablecode#1985 + `_ + * - 10 + - Integrate Local VulnerableCode datasource and live evaluation + - `aboutcode-org/vulntotal-extension#17 + `_ + + +Closing Thoughts +------------------- + +This project was an exciting step forward from my 2024 GSoC work. By moving +from batch importers to package-first live importers, We enabled a faster, +more personalized, and more flexible way of building vulnerability databases. + +I especially enjoyed designing the **registry + API architecture** and +integrating Redis queues and workers for asynchronous execution. This improved +scalability, responsiveness, and fault tolerance, ensuring the API never blocks +and multiple live evaluations can run in parallel. I also appreciated discussing +it with mentors and integrating it seamlessly across +**VulnerableCode, VulnTotal, and the browser extension**. + +This work lays the foundation for even richer interactivity +in the ecosystem and brings vulnerability evaluation closer +to developers' workflows. + +I appreciated the weekly status calls and the feedback I received from my +mentors and the amazing team. They were really helpful and supportive. +`Philippe Ombredanne `_, +`Ayan Sinha Mahapatra `_, +`Tushar Goel `_, +`Keshav Priyadarshi `_ diff --git a/docs/source/conf.py b/docs/source/conf.py index 04c6a2ae..58d67567 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -79,7 +79,7 @@ html_context = { "display_github": True, - "github_user": "nexB", + "github_user": "aboutcode-org", "github_repo": "aboutcode", "github_version": "master", # branch "conf_py_path": "/docs/source/", # path in the checkout to the docs root diff --git a/setup.cfg b/setup.cfg index 5623cf43..42ff897a 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,5 +1,6 @@ [metadata] name = aboutcode +version = 0.0.1 license = Apache-2.0 # description must be on ONE line https://github.com/pypa/setuptools/issues/1390 @@ -29,7 +30,7 @@ license_files = [options] zip_safe = false setup_requires = setuptools_scm[toml] >= 4 -python_requires = >=3.7 +python_requires = >=3.10 install_requires =